RDBMS, an expensive mistake for reasons you may not suspect
TL:DR; ⚠️ WARNING THIS is a long post
Applications are made up of tiers or layers, an User Interface Tier, an Application Tier, and a Service Tier. Business logic is the part of the application that encodes real-world business-rules that determine how data can be created, stored, and changed. RDBMS encourage you to scatter your business logic across your database, ORM and API. In this post, we will introduce Document Database Design and compare it to RDBMS approaches with some document database design basics using hyper Data Service.
Introduction
Document Database Design is a different mindset from relational database design, it still requires modeling and might even require more thought about your application, than you may normally do with relational databases. To design with documents, you need to think about how the application is going to use the data it needs and model your data in terms of data access, not normalized tables. Data normalization was mainly designed at a time when storage and memory were very expensive.
Problem
A lot of developers struggle with NoSQL Databases or specifically Document Databases, they do not fit the mindset of a relational database, and there are so many tools in the relational database world crafted to use relational databases as part or all of your business rules. This may be fine for any application, but I would argue, that over time relational DB usage results in heavy technical debt.
What is technical debt? It is the amount of effort required to change a software application based on the architectural decisions of that application. Many software teams see debt as a never ending set of maintenance tasks that do not move the project forward, but slow the development down and increases the cost to maintain.
Don't apply old patterns to a different approach
If you design your data using relational patterns in a document-oriented store you will have a painful experience. Take the time to learn the patterns of document data access and design your data using patterns that provide success. By opening your mindset and separating the special to your application and the general to the data service, you will arrive at a cleaner architecture and design, which will result in a more maintainable application over time.
Background
In this article, I will make a case for what I call Document Database Design, and AWS Architects call Single Table Design. Before we jump into Document Database Design, let's turn back the clock.
RDBMS
The relational database management system or RDBMS is the current mainstream structured data storage solution, it has been taught for years as the go-to tool for structured data storage. You get MySQL, Postgres, or SQL server and you design your database schema, by specifying tables and columns. Then you draw relationships from one table to another table by using primary keys and foreign keys. You may create some indexes on some columns of some tables so that you can filter on something other than the primary key or foreign key.
ORM
Now that you have designed the perfect data structure, you are ready to start building your application layer, but you don't want to hand-roll your SQL queries, you need an Object Relational Mapper, this ORM maps your application objects to your relational table/column schema. This process adds a level of complexity to your application, so part of your business logic is in the form of objects and methods and part of your business logic is in the form of tables, columns, and rows.
Change Management
Every time there is a change in your database schema both the application layer (code) and the services layer (database) must be modified at the same time or your application could fail to work properly.
NoSQL
In 2007, the NoSQL movement came out with huge fanfare, we are going to solve the problems with the RDMS/ORM bridge, buzzwords came about like schema-less, CAP Theory, Big Data, etc. But as implementations and designs iterated many development teams became frustrated and arrived at the conclusion that NoSQL databases were much harder to maintain, scale and manage reverted back to RDMS/ORM counterparts. What were the lessons learned? Was it a lack of best practices? Was it the fact that many systems did not fit the NoSQL Document approach? Was the developer mindset not able to think differently?
Serverless and DynamoDB
In 2016-2017 AWS started refining their story with AWS Lambda and AWS DynamoDB, AWS Lambda is a server-less process that is instantiated when requested by a user on demand and AWS DynamoDB is a fully managed high-performance DocumentDB database driven by async requests. To take advantage of the server-less movement the best practice became connecting these two technologies. AWS launched workshops on the single table design and created some educational material showing how just about any application type could be modeled using a single table design for scalable high-performance computing.
Present
This catches us up today where the large part of the software development world is very much using RDBMS/ORM systems and even more, tooling is being built around this tight coupling between your application/business layer and your data services layer, marrying your application's secret sauce to an implementation detail.
Assessment
To improve the way we build applications, we need to re-think the way we design applications, we need to change our mindset. Much like the great architects did in the past, they did not keep extending mud houses with grass roofs, they re-architected how they build, by adding arches and domes. It is worth taking the time to learn about document design and determine if document design fits your needs.
It is my opinion if you take the time to learn this technique you can greatly reduce the technical debt of an application and empower your team to make changes to business rules with a higher velocity than database-driven applications even years down the road.
Example
Let's look at an example, say we want to build a Mario Universe Wiki application, this application, users would be able to create character records and create game records of the Mario Universe. We also want to be able to document what characters appear in specific games and what games belong to a specific character. Let's look at our data:
Characters
id | name | appearance |
---|---|---|
character-1 | Mario | 1981 |
character-2 | Luigi | 1983 |
Games
id | name | released |
---|---|---|
game-1 | Donkey Kong | 1981 |
game-2 | Mario Bros. | 1983 |
game-3 | Super Mario | 1985 |
This is a good start, we have two data models, character
and game
. And each model has an id
and name
column, and the character
the model has a column called appearance
to represent the year that the character appeared, and the game
the model has a column called released
which represents the year the game was released.
In the future we will add more data to these models, but for now, we will keep them lean.
So what are the use cases of our application?
As a user, I want to see all the games released in the Mario Universe
As a user, I want to see all the characters in the Mario Universe
As a user, I want to be able to add a character to the Mario Wiki
As a user, I want to be able to add a game to the Mario Wiki
As a user, I want to be able to see all the games that a character appeared in when viewing the character profile
As a user, I want to be able to see all the characters that a game contained when viewing a game profile
As a user, I want to associate games with a character when adding or updating the character
As a user, I want to associate characters with games when adding or updating the game
These use cases can be broken into data actions
- Create/Update a character
- Create/Update a game
- List all characters
- List all games
- List all games that a character that appeared
- List all characters that appeared in a game
- View a character with a list of game appearances
- View a game with a list of character appearances
- Find games released for a given year
- Find characters that appeared in a given year
Relational DB Design
characters
column | type | |
---|---|---|
id | integer | unique primary key |
name | varchar(255) | |
appearance | varchar(4) |
games
column | type | |
---|---|---|
id | integer | unique primary key |
name | varchar(255) | |
released | varchar(4) |
appearances
column | type | |
---|---|---|
id | integer | unique primary key |
character_id | integer | foreign key characters.id |
game_id | integer | foreign key game.id |
There could be other ways to model the database, but this model should satisfy all of the current use cases.
Let's look at some SQL
- Create/Update a character
INSERT INTO characters (name, appearance)
VALUES ('Mario', '1981')
- Create/Update a game
INSERT INTO games (name, released)
VALUES ('Donkey Kong', '1981')
- List all characters
SELECT id, name, appearance FROM characters
- List all games
SELECT id, name, released FROM games
- List all games that a character has made an appearance
SELECT appearances.game_id,
games.name, games.released
FROM appearances
LEFT OUTER JOIN games
ON games.id = appearances.game_id
WHERE appearances.character_id = ?
- List all games that a character that appeared
SELECT appearances.game_id,
games.name, games.released
FROM appearances
LEFT OUTER JOIN games
ON games.id = appearances.game_id
WHERE appearances.character_id = ?
- List all characters that appeared in a game
SELECT appearances.character_id,
characters.name, characters.appearance
FROM appearances
LEFT OUTER JOIN characters
ON characters.id = appearances.character_id
WHERE appearances.game_id = ?
- View a character with a list of game appearances
SELECT id, name, appearance
FROM characters WHERE id = ?
SELECT appearances.game_id, games.name,
games.released
FROM appearances
LEFT OUTER JOIN games
ON games.id = appearances.game_id
WHERE appearances.character_id = ?
- View a game with a list of character appearances
SELECT id, name, release
FROM games WHERE id = ?
SELECT appearances.character_id,
characters.name, characters.appearance
FROM appearances
LEFT OUTER JOIN characters
ON characters.id = appearances.character_id
WHERE appearances.game_id = ?
- Find games released for a given year
SELECT id, name FROM games where released = ?
- Find characters that appeared in a given year
SELECT id, name FROM characters where appearance = ?
Document DB Design with Zod and ⚡️hyper
const characterSchema = z.object({
id: z.string().regex(/^character-/),
type: z.literal("character"),
name: z.string(),
appearance: z.string().min(4).max(4)
})
const gameSchema = z.object({
id: z.string().regex(/^game-/),
type: z.literal("game"),
name: z.string(),
released: z.string().min(4).max(4)
})
const appearanceSchema = z.object({
id: z.string().regex(/^appearance-/),
type: z.literal('appearance'),
character_id: z.string().regex(/^character-/),
character_name: z.string(),
game_id: z.string().regex(/^game-/),
game_name: z.string()
})
Let's look at some business logic using Zod, Ramda, and ⚡️ hyper
Compose and ComposeP are ramda functions take the result of one function and passes to the next function from right to left or bottom to top - compose(f,g)(x) === f(g(x))
- Create/Update a character
export const createCharacter = compose(
hyper.data.add,
characterSchema.parse
)
- Create/Update a game
export const createGame = compose(
hyper.data.add,
gameSchema.parse
)
- List all characters
export const listCharacters = composeP(
prop('docs'),
hyper.data.query,
always({ type: 'character' })
)
- List all games
export const listGames = composeP(
prop('docs'),
hyper.data.query,
always({ type: 'game' })
)
- List all games that a character that appeared
/**
@param {string} id
*/
export const listGamesbyCharacterId = composeP(
prop('docs'),
hyper.data.query,
id => ({type: 'appearance', character_id: id })
)
- List all characters that appeared in a game
/**
@param {string} id
*/
export const listCharactersByGameId = composeP(
prop('docs'),
hyper.data.query,
id => ({type: 'appearance', game_id: id })
)
- View a character with a list of game appearances
export const viewCharacterWithGames = composeP(
transformDocsToCharacterObject,
prop('docs'),
hyper.data.query,
id => ({ $or: [
{ type: 'character', id },
{ type: 'appearance', character_id: id }
]})
)
- View a game with a list of character appearances
export const viewGameWithCharacters = composeP(
transformDocsToGameObject,
prop('docs'),
hyper.data.query,
id => ({ $or: [
{ type: 'game', id },
{ type: 'appearance', game_id: id }
] })
)
- Find games released for a given year
export const findGamesByYear = composeP(
prop('docs'),
hyper.data.query,
year => ({ type: 'game', released: year })
)
- Find characters that appeared in a given year
export const findCharactersByYear = composeP(
prop('docs'),
hyper.data.query,
year => ({ type : 'character', appearance: year })
)
Document Database Design Basics
use conventions
Instead of creating a table, collection, or container for each document type, create one container to contain all of your document types and use the unique identifier structure of the document to infer the type.
- Every document must be a standalone document that contains a globally unique identifier
- Every document must have a document type property that identifies the type of document
For example:
{
"id": "movie-12345",
"type": "movie",
"title": "Dune"
}
The "id" is the unique identifier
Then "type" is the document type
By naming my movie documents this way, I can list all movie documents by doing a query where type === 'movie' or I could do a range query on the key, start = 'movie' and end = 'movie/u0001'
With this example, I could query containers all over the world and concat the the results or I could copy the results from one container into another container, because, I guarantee the identifier is globally unique and the document type is the same for the given resource. Without having to migrate or sync service layers, these documents could be stored in different data storage systems, the application layer does not care.
Responsibility
When you commit to this design you are committing to responsibilities for your application layer, which it always had, but many applications leaned on the RDMS/ORM solution for.
- Document Schema Integrity
- Data Field Consistency
- Referential Integrity
In the application layer, with your business rules, the developers need to validate/parse incoming document data and make sure it meets the requirements and is consistent. If a document property called year is a 4 character string that only accepts numbers, it is up to the business layer to maintain that consistency, it is also up to the business layer to be the gatekeeper of all reads and writes of the data, including analytics and reporting. Cascading deletes or bulk updates, need to be invoked by the application layer via a deployment/migration.
While it may appear that extra responsibility is being added the application layer, the application layer should never trust any incoming data without verifying the structure and integrity at the point of the API. So if you are skipping that verification and relying on the RDBMS to catch these issues, you are trusting a service and ORM that you may not manage and could be at risk for security attacks.
Change your mindset
Instead of picturing your data residing in a normalized dry structure during data modeling, change your mindset think about the user interactions and use cases of your application, what data do I need for those use cases, what is the best way to get that data, how can I get it with one request? This will help you design your documents and indexes to be effective and efficient. The document database design mindset is to think in terms of general access of any document with indexed filters that can provide batching very similar to joins in a relational database.
Summary
While this is by far from a complete picture of Document Database Design, hopefully, it gives you some basic practices to consider and experiment with some side projects. And think about some yeah buts, try to branch out, many companies are doing great things with Document Databases and the word is out, if you separate your application layer from your services layer, you will have increased velocity, improved reliability, and long term flexibility.
⚡️ hyper Data service is a general abstraction around the Document Database and gives you the boundary to properly separate business logic from your Data service tier.