Database library (src/db
)
This library is the primary way with which we interact with the HPC database,
and uses knex
in the backend. We made a conscious decision to move away from
using sequelize (which is used in the legacy codebase), and use knex
for a
number of reasons:
-
Discouraging the use of Joins:
The HPC API codebase has been known to make SQL queries with a very large number of joins (i.e. dozens) in a single query. These queries have previously locked-up the postgres database due to the contention that it causes, and drastically impact performance.
Sequelize's ORM methodology & design (along with our model definitions), in particular the ease of writing queries that "include" data from other tables, encourages developers to write these large queries that should be avoided.
It's still possible to write queries in
knex
with joins of course, but the temptation to write them is reduced.(beyond this, we go further to encourage a more no-sql approach to accessing data, which is described below)
-
Discouraging the use of hooks:
Sequelize allows the definition of "hooks" as part of the model, which are functions that run as part of a database access (e.g. modifying data before an update, or performing an action after a new row has been inserted into a table, etc...).
In the HPC codebase, hooks have had a history of obscuring exactly what's happening to data in the database at the business-logic layer, when it is assumed that the
update
,create
ordestroy
actions that are run are the only changes that will happen to the database.It has been agreed upon by the development team that a better approach would be to have logic that would typically be encapsulated in hooks be expressed as libraries built on-top of the database model instead, and used directly by the business-logic code. This allows for these behaviors to be much more clearly discovered and understood.
-
Better library for query-building:
Without the use of hooks and joins, then we'd be using the sequelize library for little more than a query building library (and to define the schema). Given that:
- Defining the model/schema has to be done in migrations anyway (so is effectively duplicated)
- Sequelize model definitions don't provide any static-type checking
- Knex is a more feature-rich query building library
- We already use
knex
in portions of our codebase (such as in some unit-tests)
It made more sense to move forward with
knex
.
In addition to the above, the library has been designed in a way to try and help us reduce existing technical debt, and prevent it's accumulation in the future by:
-
Encouraging a no-sql approach to database accesses:
For our use of database storage, eventually consistent access is sufficient.
More specifically, when reading data, it's usually okay for us to read data using multiple queries (where data may have changed between query executions) and combining the data as appropriate using business-logic.
In cases where this is insufficient, transactions can be used.
Borrowing from ideas of other no-sql systems (such as Google Datastore), the library is designed to make single-table accesses very easy and discoverable, and should be used in almost all cases.
(that being said, it's always possible to access a raw knex table for a model when the exposed functions are insufficient, though this must be used sparingly).
-
Special handling of "accidentally optional" fields:
Historically, the HPC schema used to have many columns (across various tables) defined as possibly
NULL
, whereas those columns should really beNOT NULL
. In cases where application code treated these columns asNOT NULL
, a migration fixed all these definitions and addedNOT NULL
constraints to the columns.However, there are some columns which should also have
NOT NULL
constraint, but there are some (usually very old) records where this column is actuallyNULL
, thus these columns were not covered by the aforementioned migration.In order to prevent creating new records where this column would have
NULL
, those columns can be marked as "accidentally optional".When this is done, the static type-checking requires this field to be specified when creating new rows for a table, but when reading data from the table, it is defined as possibly
null
. -
Requiring type-validation of JSON fields:
Rather than assuming that each JSON field of a table has type
any
orunknown
, when defining a model using this library, anio-ts
type definition needs to be provided. This type definition is then used to validate all data as it's read from the database (and used for static type-checking of database writes), throwing an error upon read if invalid data is encountered.This means that unexpected data is discovered as soon as possible, and doesn't propagate through the application.
-
Allowing for type-narrowing of string fields:
In addition to specifying generic string columns, string unions can be defined when the values are expected to be one of a specific set of values, but where an ENUM was not used.
This validation occurs in the same way as JSON fields, throwing an error upon a read when unexpected data is encountered.
-
Limiting returned data to well-known fields:
There may be cases when a database table has more columns that is defined in the model (for example when a new field has been added to a table via a migration, but the legacy API has not been updated with the new model).
In these cases, the new fields will be removed when reading the data from the database to avoid returning data that may be unexpected.
-
Model Abstraction layers & masking (as opposed to "hooks"):
As written above, hooks are not implemented using this library given how they hide or mask behavior. However, they do have the nice property that they are always applied, no matter who is interacting with the database.
As a result, mandatory actions are always ensured when using hooks.
So if we are moving this behavior to libraries instead, how to we prevent these libraries being circumvented, and ensure that we can require mandatory use?
To do this, our model library is written in a way that allows specific tables to:
- overwrite their implementation of certain functions
- hide certain functions that should never be directly accessed
- introduce functions specific to that model
For example:
- the util
sequelize-model
builds on-top ofraw-model
to:- add
createdAt
,updatedAt
anddeletedAt
fields do model definitions. - overwrite the behavior of
create
andupdate
to update the above fields as required.
- add
authGrant
builds on-top of a model defined usingsequelize-model
to:- define custom
create
andupdate
functions that transactionally log the change to the grant in addition to changing or creating the grant itself. - define a helper function
createOrUpdate
that is accessible in the model.
- define custom
When choosing which library functions to use when creating new tables, it ultimately depends on what data needs to be tracked for each table.
For most data storage, use Versioned Tables. If there's no specific reason why something else should be used, then this is the most sensible default, as it provides a number of different features see below.
Exceptions: However, Versioned Tables are not always suitable, and sometimes it makes more sense to use Raw Tables instead.
- Tables that cache data, or are only used for lookups / optimizing queries
- Tables that are solely data derrived from other tables (i.e. they're non- canonical)
- Tables where versioning data is deliberately handled in another manner, or
not needed, and for a good reason, such as:
- auth tables (where logging is handled via
authGrantLog
) - Logging data (as each entry is final, and will never change)
- auth tables (where logging is handled via
- Creating models for tables that already exist (i.e. legacy tables that were
created using sequelize in
hpc_service
).
These tables have a number of features that are handled automatically:
- Using the
no-sql
methodology to store the primary data (data definitions are defined usingio-ts
, and allow for migration-less updates to many changes in the model) - Being versioned (providing a full auditable and usable (revertable) log, and optionally storing the user IDs of those who made changes)
- Preventing asynchronous changes (i.e. providing functionality that allows for transactional checks that prevent one user accidentally overwriting / changing data from another user)
- Easy derivation of lookup columns by specifying a
prepare
function
To create a new Versioned table, the best bet is to look at some examples that already use them:
But in short, you need to do the following:
-
Create a new module in
src/db/models
with the same name that the table will have. Inside this module:-
Specify a branded ID type:
export type MyTableId = Brand< number, { readonly s: unique symbol }, 'myTable.id' >; export const MY_TABLE_ID = brandedType<number, MyTableId>(t.number);
The type definition
MyTableId
and accompanyingio-ts
codecMY_TABLE_ID
are then used to ensure that IDs from one table are not accidentally used in other tables (when e.g. filtering data from the database) using TypeScript static typing.Though as they are simply numeric values, there is no runtime type checking.
-
Specify an
io-ts
type definition that defines the structure of the data that needs to be stored (i.e., stored for each version of an entity of this type).export const MY_TABLE_DATA = t.type({ foo: t.number, bar: t.string, });
-
Export the final definition as a default value. Call
defineVersionedModel
like so:export default defineVersionedModel({ tableBaseName: 'myTable', idType: MY_TABLE_ID, data: MY_TABLE_DATA, lookupColumns: { columns: {}, prepare: async () => ({}), }, });
You can optionally specify lookup columns that should be added to the "root" (i.e. head) database entry for each entity of this table. This allows for creating postgres queries with where clauses that allow for filtering the results based on the data in the most recent version
It's also possible to use lookup columns to create additional constraints on the data, such as foreign keys or uniqueness constraints.
-
-
Add the model to
src/db/index.ts
-
Create a migration in
hpc_service
to create the tables as needed.
These tables are indirectly used by Versioned Tables
above,
and should only be used for a table if there is a very good reason that
Versioned Tables are unsuitable.
- Create a new module in
src/db/models
with the same name that the table will have. Inside this module:- Pick an appropriate function to base your model definition on, i.e:
defineIDModel
: to describe a table that was previously defined using sequelize, and that has a unique ID fielddefineSequelizeModel
: to describe a table that was previously defined using sequelize, and that does not have a uniqueID
fielddefineRawModel
: to describe any other table (e.g. association tables)
- Export a call to the above function as default.
- Pick an appropriate function to base your model definition on, i.e:
- Add the model to
src/db/index.ts