Culling the Lares & Penates

Fixing the EHR, Part 6

Feb 22, 2026

Picture of the exploding Ariane 5: a column of smoke with descending stars. In the background, vaguely visible are the gods of Olympus. — The explosion of Ariane 5, overlaying Giulio Romano’s *The Gods of Olympus*

There are two sorts of complexity. We want minimal, necessary complexity.

In Roman times, it was considered wise to honour, or perhaps placate, the household gods by tossing some of your dinner into the fireplace. They invented a heck of a lot of gods, you see. Not content with all of the sitcom-like antics of Jupiter, Juno, Minerva and Vulcan, they came up with multiple local guardians of towns, agriculture, livestock and whatnot (the Lares); and a plethora of Penates, the domestic gods. There was even a fig-tree god.1

The other sort of complexity—the one we don’t want—comes from those small gods of specific circumstances. Gods that don’t generalise, and that need to be placated using ritualistic behaviours.

This might sound theoretical—or even theological—but it’s easy to find real examples. Take the first Ariane 5 mission (flight V88), pictured above. The minor god here was legacy software that wasn’t actually needed at the time; the wrong placatory rituals were used. You can look them up if you’re really interested. During flight, the inertial reference system, intended for wobbles on the launchpad, repeatedly coerced a 64 bit floating point number into a 16-bit signed integer. This forced hardware resets and the shit data were interpreted as flight data, causing full nozzle deflections of the boosters and the Vulcain main engine. Kaboom. At least a few hundred million Euros down the drain.2 That’s a bit more than your dinner in the fireplace.

Legacies

I’m a great believer in reusable code. I’m also a fascinated observer of the sheer amount of mythology associated with this idea.3 Reality tends to get in the way of theory. It’s not because reuse is impossible. It’s just that you need to get the design principles right. Strategy is important; otherwise you end up indulging in lots of unnecessary tactics. And these lead to an abundance of small gods. We’ll find some in healthcare.

SNOMED

Recently I sketched a relational structure^⌘ that starts to describe healthcare. The rest of this post will fill in most of the gaps. We’ll try to step back and find reusable motifs that allow us to stamp on some small gods.

We’ve already discussed^⌘ SNOMED. You’ll recall that the original SNOP had four dimensions that SNOMED extended to eleven:

anatomy, morphology, organism, drug, functional effects, occupation, diagnostic term, procedure, device, social context, and ‘general’.

This then proliferated into a monstrous multi-stemmed fig tree, SNOMED CT, with the added minor deities of Read Codes, OWLs and post-co-ordination. So let’s start with a reality check. A good anchor is an insightful document from the Royal College of Physicians (RCP) in the UK. This describes the information needed to reliably record ‘diagnoses and problems’.

Record-keeping standards

They start with a lament:

Electronic recording of diagnoses using a terminology such as SNOMED CT (or previously the Read Clinical Terms) has been standard practice in GP in the UK for 30 years, but it does not represent details such as laterality, evidence and clinical manifestations in a structured way.

Referring to Larry Weed’s problem-centred approach,^⌘ they see several issues:

Problem lists exist, but are inconsistent and tend to become cluttered with minor and inactive issues.
Severity of problems is variably recorded.
There’s tension among definitions of terms like ‘problem’, ‘condition’, and so on.
Linkage to evidence and actions is spotty.
Sharing is tricky, both with colleagues and patients.
Maintenance is a bugger (or words to this effect).

Better still, the authors make clear recommendations:

Use the term ‘problem’ to describe any type of entry in a problem list, be it a diagnosis, symptom, social factor, etc.
Classify how sure you are of the diagnosis.4
Describe the following properties of the problem: Whether it’s active or not; its severity, body site, laterality, ‘stage’, clinical behaviour, evidence, manifestation, and aetiology.
Update the problem list at each encounter.
It should be trivially easy to share problem lists, without transcription from one system to another.

Sensible. They also provide guidance on how to build problem lists, emphasising the provision of examples.

Let’s begin

Our basic architecture^⌘ already covers a lot of this.5 We are on the brink of revealing our information structure to the functional layer,^⌘ but there are still some little details. As with Ariane 5, we need to get the fiddly bits right:

How likely is the problem? (A Bayesian^⌘ take on things)
Is the problem active?
How severe is it? And where relevant, what ‘stage’6 is it in a relevant classification system?
Where is it anatomically? What side?
How is it manifesting and ‘behaving’ clinically, aka ‘functional effects’?7
Information about organisms, drugs, ‘morphology’,8 devices, and more ‘general’ things.

Consider my crude ‘entity-relationship’ sketch below. You’ll see that since the last iteration,^⌘ I’ve added a box: the AIMS (or intentions). This is useful for things like prescribing a drug.

Our previous entity relationship diagram (ERD) that models the recursive nature of Science using linked tables. Prominent are PROBLEMS, linked to results and management processes.

Common ground!

I’d like to draw your attention to five tables: PROBLEMS, AIMS, PROCESSES, INTERVENTIONS and RESULTS. Stepping back, I noticed that they contain five common themes that fit nicely into five of those SNOMED dimensions:

anatomy : anatomical location, for example ‘the left forearm’, ‘within a vein’, ‘the hippocampus’ or ‘the gastrointestinal tract’.

drug : pharmaceutical (substance, compound, formulation).

device : therapeutic device, for example a pacemaker or an intravenous line; or even a specific device identified by a serial number.

organism : for example, Mycobacterium tuberculosis.

‘thing’ : A miscellany of general bits and bobs that otherwise don’t fit.9

For example, I might have a process that involves long-term administration of a drug like, say, digoxin^⌘ (a process). This would involve specific prescribing of a formulation of that drug (an aim/intent), with an actual intervention of taking or giving the drug. We might monitor digoxin levels (in the results), and then identify a problem with the drug (e.g. toxicity or an adverse reaction).

We can construct similar scenarios for e.g. infection at a specific site, use of a device like a pacemaker, and so on. We may need to use several dimensions in a single database row. We’ll also need to add a sixth field (side) to address the RCP’s lament about laterality.

Let’s join things up properly. Okay, initially the result is a bit daunting …

A more detailed ERD that initially is startling in its complexity. But if you look carefully, you see that the original tables have been linked by those 5 'pentad' components, and suitable tables, tailored to fit.

Necessary complexity

But this is necessary, minimal complexity.10 From simple basics, we’ve built a nearly complete ‘structure of Medicine’.

We also kill a few small but complex gods. We’ve already established^⌘ that SNOMED CT needs fixing, or careful curation at the very least. It’s not just that it’s full of synonyms, nonsense terms and concepts out of place—like ‘entire psyche’ as an anatomical construct. It’s not just that there’s poor coverage of, for example, laterality, and infections with various organisms at various sites. There is also a proliferation of terms, and worse still, you often have to burrow deep within concepts to extract meaningful information. Bad strategy has resulted in the need for obscure tactics, just to get what you want. We’ve addressed these problems structurally.

We can use this approach to make actual database structures:

CREATE TABLE Canon
(  PK BIGINT -- primary key.
  ,src BIGINT
    ,constraint src_canon foreign key(src) references SOURCES(src)
  ,srcID BIGINT
  ,tag_person integer
    ,constraint bad_canon_patient foreign key(tag_person) references PEOPLE(person)
  ,t_amended BIGINT -- when created/amended
  ,reason int
  ,p_amended BIGINT -- where created/amended
    ,constraint bad_canon_amended_at foreign key(p_amended) references PLACES(place)
  ,amender integer -- of the actual table row! 
    ,constraint bad_canon_amender foreign key(amender) references PEOPLE(person)
  ,t_start BIGINT -- actual time started
  ,t_start_P int 
  ,p_start BIGINT -- where started
    ,constraint bad_canon_start_at foreign key(p_start) references PLACES(place)
  ,starter integer -- actual clinical participant
    ,constraint bad_canon_starter foreign key(starter) references PEOPLE(person)
  ,t_end BIGINT -- when actually ended
  ,t_end_P int
  ,p_end BIGINT -- where ended
    ,constraint bad_canon_end_at foreign key(p_end) references PLACES(place)
  ,ender integer
    ,constraint bad_canon_ender foreign key(ender) references PEOPLE(person)
-- the final nine fields (where the ’pentad’ is present):
  ,side integer  -- ... with the pentad:
  ,formulation integer
    ,constraint bad_canon_formulation foreign key(formulation) references Formulations(formulation)
  ,device integer
    ,constraint bad_canon_device foreign key(device) references devices(device)
  ,thing integer
    ,constraint bad_canon_thing foreign key(thing) references things(thing)
  ,anatomy integer
    ,constraint bad_canon_anatomy foreign key(anatomy) references anatomy(anatomy)
  ,organism integer
    ,constraint bad_canon_organism foreign key(organism) references organisms(organism)
-- the final three fields, always present:
  ,responsible integer
    ,constraint bad_responsible_group foreign key(responsible) references USERGROUPS(usergroup)
  ,ver integer default 0
  ,chk int
) CHARACTER SET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

Reuse at last!

That’s a prototype or ‘canonical’ table structure. It’s reusable across multiple tables. There are still lots of finer details—and things you can argue about.11 Feel free to do so! And if you absolutely hate names like ‘gadgets’ (specific instances of a device), simply change them.

It may not yet be obvious, but we still haven’t touched the bulk of our data. This is where the RESULTS table comes in, and in our next post we’ll also examine the ENSEMBLES that snuck into that last diagram. We’ll get some perspective on several minor gods along the way.^⌘

My 2c, Dr Jo.

^{⌘ This symbol is used to indicate posts where I’ve discussed the flagged topic in more detail.}

We don’t talk about the fig-tree god in front of the children.* It’s quite likely that many of the smaller deities were legacy gods, worshipped in Rome long before they stole the Greeks’ pantheon.

*If you look at the statuary and frescoes from Pompeii however, the Romans were quite frank about his little problem!

Ariadne (Fr. Ariane), daughter of King Minos, was associated with mazes and labyrinths. Vulcain is French for Vulcan, the Roman god of fire.

There is solid & reusable software: look at TeX versioning, or how old Fortran routines are still the basis of most modern use of linear algebra. But I remember how, in about 1987 when Object Oriented Programming really started taking off, one of the promises made was reuse. Two things persist: code bloat and my hollow laughter. This is not because the intentions are bad. Many of the principles are solid :)

Although they provide but 2 options, something we will explore in a later post.

Interventions should cover procedures. We have problems, their causes (aetiology) and how they relate to one another, links to both evidence and management actions that launch relevant processes, and even more detailed relationships between these processes.

Staging is tricky and prone to alterations. It may fit better as a ‘result’ than elsewhere.

Observed effects fit well into RESULTS too.

‘Morphology’ is another strong RESULTS candidate, and likewise for observations like social context. We still need to agonise over family relationships and whānau too.

For example, if the administered ‘substance’ wasn’t made according to GMP,^⌘ who knows what’s in it?

There is a lot here. If anyone is really keen to understand my choices, write a note at the bottom and I’ll explain. Pretty much every box deserves a post of its own! Within a few posts, I’ll release full source code on GitHub, by the way.

Most of those fields are also useful even more generally than in those 5 tables. If, for example, you don’t like the BIGINT timestamps, please make sure you’ve read the relevant documentation^⌘ :)

Jeremy Singer

Feb 22

Warning!

Everywhere in your create table code where you used 'integer' instead of 'bigint' is a place where your database may stop working someday.

You think that integer is a much larger number than you will ever have, but someday, when your database gets very big, and not used in exactly the way you were initially using it, you will enter the last possible kind of value in one of those integer fields, and your database will stop working.

I know. You think you were being parsimonious with resources by saving a few bytes, but it can come back to byte you.

As a database analyst, I spent a lot of time fixing legacy databases, and saving enterprises from this kind of disaster... sometimes after their systems had stopped working.

Please, avoid false economy. Its not safe.

1 reply by Dr Jo

Bernard Peek

Relational databases are the go-to solution for most 'big-data' problems and the usual approach to designing a schema is similar to yours. But the great grandpappy of big-data is the library classification system, and it's structure is not based on relational algebra. Lots of people have implemented library management systems using relational databases but there is an 'impedance mismatch' akin to trying to match the set of real numbers to binary representations.

The golden rule is that the data your user is looking for should always be in the first place they look for it. In an alphabetic list of names should they look for you under V or S. If they are Americans it would obviously be under V. If they are Dutch it should equally obviously be under S.

Mapping unstructured data on to a structured database schema is always going to be imperfect. It's inherent in the data itself. The English-language library system does it by having acknowledged authorities. That was AACR2 and MARC until 2005 now MARC and BIBFRAME.

Those have limitations if you don't understand English. You should be able to retrieve the same information whether your search term is in English or Tagalog.

So much for the easy bit.

New 'graph database' technology and AI are about to pile confusion on chaos. Graph databases have a completely different approach to data structures and arguably might be a better match to your application. AI systems can rapidly find plausible answers to natural-language questions. NB plausible does not necessarily mean correct.

7 replies by Dr Jo and others

10 more comments...

Dr Jo

Discussion about this post

Ready for more?