Over the course of more than two decades as a software engineer, architect, system lead, and CTO I’ve worked on dozens of data-driven software applications. I’ve help build some of the first iPhone apps before there was a thing known as “the App Store”, build custom Windows applications, dozens of applications for Unix and Linux back in the 80×132 terminal window days, and way too many web applications in both “standalone stack” and SaaS models.
One thing they all have in common is a persistent data store to keep track of the application data. They also all share very similar traits when it comes to that data. One of those traits is having multiple data tables with a core set of “critical information” that almost always ends up needing additional “less critical information” to support advanced application features. This less critical information is something that many systems term the “meta data” as often this provides additional data about the core data.
While I had seen this architecture crop up in multiple projects over the years, I started to see a different approach to managing this meta data. In the “good ol’ days” the meta data was almost always carried along with the core data; Typically this manifests as an additional data column in the main data table. However as I continued to work on multiple inherited projects I started to see this be presented as a related table where you had a core data table (core_data) and a directly related core data meta table (core_data_meta). This presented some pretty cool advantages over the simpler, but potentially slower and harder-to-maintain “single table holds all information” approach from earlier projects.
A project I am managing for a local client, PFP Logistics, is moving into the next phase of development now that the main project has launched. This means more features to better support the users. That also means we are seeing a need for additional information , the “meta”, for our core data tables to support that functionality.
In the examples noted below we are discussion adding a feature related to a “Labor Type” table that will fill in some form defaults when a user selects the given Labor Type form a drop down menu when filling out a form.
The Meta Table Approach
A mechanism that works extremely well that allows for a standard “fetch additional meta data for operations related to this table” (in this case “fetch the default values) is to have a directly related meta table. In this more generalized model you have a table, “labor_types” in our example, and a related table that holds meta values “labor_types_meta”. This is a model that I’ve seen several places, but the main concept where it has proven highly functional is on WordPress core.
Using this type of model you can add flexible and extensible information into the data tables without having to update the backend every time you want to process additional features related to a core data table (like labor_types, or employees, or sites, etc.). You end up with a single BE module with a shared architecture for ALL TABLES that need additional meta data. That table uses an identical table structure and can store all kinds of additional information within a basic data struct:
All core tables that need meta end up with another “sister table” <core_table_name>_meta with a struct that looks similar to this:
labor_type_id (in our example, this would always be present, but for maximum flexibility this would be optional)
code (the key to lookup : i.e. “DEFAULT_NEW_LOG”)
label (a short description of what this key is: i.e. “Labor Log Defaults”)
value_jsonb : the data we return for complex sets – jsonb is super efficiently especially with pgSQL
value_varchar (or just “value) : for simple values can be text, numeric, etc. FE/BE can convert as needed
…maybe some stuff like timestamps, uid, etc.
The idea is that ANY table can have extra information stored in the meta table.
Meta Table Payload Advantage
The information stored in the meta table is NOT used on a regular basis. The core table stores information we need to access regularly and is sent between backend (BE) and frontend (FE). This makes relational lookups faster AND keeps the payloads smaller. That is one advantage to the separation.
In our example it is super easy for the BE to send back the entire core table data struct (record) and not include too much extra “junk” we don’t need. For example, when the FE pulls a BE record for labor type, for a drop down for example, it does not need a huge JSON struct coming back with “what are the default values”.
If done efficiently BE should have one simple GET operation for 90% of labor type interactions that returns the entire core labor_type record when given an ID, or an array of those if given a range or other filter params. Makes the “R” part of CRUD ops super efficient and easily maintained.
Since meta is separated the default payloads are smaller.
Getting Extra Meta Data When Needed
In the rare cases where you want additional data, you use a different endpoint to fetch the extra meta data. A smart design is to have a singular endpoint for the core table such as /labor_type/extended/<code>. For our “new entry defaults” the URL would look like GET /labor_type/extended/DEFAULT_NEW_LOG. This endpoint would send back the core data plus the extra meta from the related labor_type_meta table by joining the labor_type.id to labor_type_meta.labor_type_id having code = DEFAULT_LABOR_TYPE. The FE gets all the extra data needed to provided the extended functionality of setting form defaults whenever a user picks a new labor type from the form’s “Pick A Labor Type” drop down menu.
Why Use Added Complexity?
This is a simple example that does not fully delve into why you would use meta tables. However, this is only the tip of the iceburg. Let’s say that in some point in the future we want to track additional information related to the labor types that 90% of the application does not need. For example there may be a new customer site added to the mix that wants to see THEIR internal tracking code for labor types. 90% of the clients are perfectly OK with seeing the “PFP labor type codes” on invoices. Rather than add a new “customer_code” to the main data struct, requiring updates to a lot of BE and potentially FE code, it could easily be added to the new labor_type_meta table.
Thanks to the non-specific nature of the meta table, you do not even need to change anything related to the data struct for labor_type_meta. That means no migrations to run during an upgrade. No changes to existing CRUD operations. The only thing you MIGHT need to do, and even that is a maybe, not a definite, is add a new endpoint; Here if we didn’t create a POST operation on our REST endpoint we’d likely need to add one to track customer codes whenever we add a new labor type. In our case we could simply add a new labor_type_meta record with a code “CUSTOMER_CODE” and even use the already-in-place FE form for adding labor types just by adding a new field to the form. Granted, if it is the first time we’ve implemented “PUT with meta” we might need to connect to a modified endpoint to make sure the BE knows “hey, some meta data is coming” so it can write core data to the core table and meta data to the meta table.
As we start to delve into these theoretical future extensions to our application, you can see where utilizing a standard meta table allows for a high level of extensibility and flexibility without having to rewrite data structures, perform data migrations (which are EXTREMELY costly in time and CPU resources on large data sets), and rarely-if-ever having to write new BE endpoints.
Why Have a JSONB and Varchar Value Field?
This is my unique take on the meta table concept. I’ve worked on many projects including WordPress core and several private projects that have implemented the meta table approach. It works well.
That said, I have noticed that there are essentially TWO primary kinds of meta data that are often used – complex and simple.
In a lot of cases, like our “customer’s labor type code” noted in “Why Use Added Complexity” the app needs a singular value back. In nearly all coding languages, storing data in a string (char) format allows for many types of conversions that are easily handled by the language engine. Convert “3.14159” to a float is efficient and works well”. Or just keep it text “John Doe, M.D.”. Or format it. Simple if you store the data in a varchar. Granted less efficient for database-engine driven math on large data sets, but I’d argue there are plenty of cloud compute resources that handle large variable data sets like “text” and are super efficient are returning the results of “big math” on the data set.
However there are also cases where we want a data set back, essentially a key/value pairing of multiple values that are related to a SINGLE record in the core data set. Our “default values for multiple form fields” that we want to keep track of for a “single labor type” is a perfect example of this. Storing this in a native JSON/JSONB format for database engines that support it, and many do these days, is a great option. It unlocks the door to using the data engine’s native JSON processing functions when needed. For those data engines that do not have that support, the JSONB value field can be dropped as JSON data can easily be stored in a varchar format.
Are there cases where this is not the most efficient design? Certainly. That said, I have seen many large scale projects where this design works wonderfully well if implemented properly. The key to making this type of design work well is to consider the “big picture” and making smart decisions of when to implement a related meta structure and when to “just add another column to the core table”. There is a time and place for both.