Discussion about this post

User's avatar
Jeremy Singer's avatar

Warning!

Everywhere in your create table code where you used 'integer' instead of 'bigint' is a place where your database may stop working someday.

You think that integer is a much larger number than you will ever have, but someday, when your database gets very big, and not used in exactly the way you were initially using it, you will enter the last possible kind of value in one of those integer fields, and your database will stop working.

I know. You think you were being parsimonious with resources by saving a few bytes, but it can come back to byte you.

As a database analyst, I spent a lot of time fixing legacy databases, and saving enterprises from this kind of disaster... sometimes after their systems had stopped working.

Please, avoid false economy. Its not safe.

Bernard Peek's avatar

Relational databases are the go-to solution for most 'big-data' problems and the usual approach to designing a schema is similar to yours. But the great grandpappy of big-data is the library classification system, and it's structure is not based on relational algebra. Lots of people have implemented library management systems using relational databases but there is an 'impedance mismatch' akin to trying to match the set of real numbers to binary representations.

The golden rule is that the data your user is looking for should always be in the first place they look for it. In an alphabetic list of names should they look for you under V or S. If they are Americans it would obviously be under V. If they are Dutch it should equally obviously be under S.

Mapping unstructured data on to a structured database schema is always going to be imperfect. It's inherent in the data itself. The English-language library system does it by having acknowledged authorities. That was AACR2 and MARC until 2005 now MARC and BIBFRAME.

Those have limitations if you don't understand English. You should be able to retrieve the same information whether your search term is in English or Tagalog.

So much for the easy bit.

New 'graph database' technology and AI are about to pile confusion on chaos. Graph databases have a completely different approach to data structures and arguably might be a better match to your application. AI systems can rapidly find plausible answers to natural-language questions. NB plausible does not necessarily mean correct.

10 more comments...

No posts

Ready for more?