Skip to content

Latest commit

 

History

History
597 lines (467 loc) · 19.4 KB

lf101.md

File metadata and controls

597 lines (467 loc) · 19.4 KB

Lovefield 101: Fundamentals

Lovefield is a relational database written in JavaScript and can be executed on browsers. Relational database is a huge topic and it's not my intention to lecture what it is. For those who do not understand what relational database is and how to use it, Stanford University has generously offered their course materials online:

CS145: Data Management and Data Systems

In this document, it is assumed the readers know the fundamentals of relational databases.

Limitations of Lovefield

Technical limitations

Different database systems are designed for different usages. Lovefield is constrained by available browser heaps and therefore one can assume that Lovefield can operate on less than 2GB of memory (when the browser process will become extremely slow). A safer bet is the database size should be around or below 500MB.

During initialization, Lovefield loads everything from persistent store into memory because there is not a reliable memory swap provided by HTML5/Javascript and the only reliable lock is JavaScript's single-main-thread policy.

Knowing these two technical constraints are crucial for the usage of Lovefield library.

Artificial limitations

All naming in Lovefield must pass the following JavaScript regex check:

/^[A-Za-z_][A-Za-z0-9_]*$/.test(name)

Names that violates this check will result in exceptions.

Schema and builder pattern

Lovefield uses builder pattern to create schema. As a result, every database starts with requesting a schema builder:

const builder = lf.schema.create(
    /* database name */ 'HR',
    /* database version */ 1
);

All database instances are identified by the combination of origin and database name. The version is used to determine whether to trigger a database upgrade or not since Lovefield assumes the schema is the same if the version number of a given database does not change.

One can use the builder to define table schema:

builder.createTable('Asset')
    .addColumn('id', lf.Type.STRING)
    .addColumn('asset', lf.Type.STRING)
    .addColumn('owner', lf.Type.STRING)
    .addColumn('acquisitionDate', lf.Type.DATE_TIME)
    .addColumn('tagId', lf.Type.STRING)
    .addPrimaryKey(['id'])
    .addNullable(['owner'])
    .addUnique(['tagId']);

Column types

A table schema contains column(s) and/or index(es) and constraint(s). A table contains at least one column, which is added using addColumn() API. Each column must associate with a data type:

Type Default value Nullable by default Description
Type.ARRAY_BUFFER null Yes JavaScript ArrayBuffer object
Type.BOOLEAN false No JavaScript boolean
Type.DATE_TIME Date(0) No JavaScript Date object
Type.INTEGER 0 No JavaScript number assuming to be integer
Type.NUMBER 0 No JavaScript number assuming to be decimals
Type.STRING '' No JavaScript string
Type.OBJECT null Yes JavaScript Object stored as-is

Nullable means a column can be null. In Lovefield, JavaScript null is used to represent NULL value in RDBMS.

You are not supposed to pass undefined or null for not null columns. Doing so will cause undefined behavior and/or errors.

Index and constraints

Lovefield supports the following constraints

  • Primary key (also serves as index)
  • Unique key (also serves as index)
  • Foreign key (indirectly serves as index)
  • Nullable / Not-nullable

Primary key, unique key, and indices can be added to a table via addPrimaryKey(), addUnique(), addIndex() accordingly. They all accept an array of column names, which means these columns in combination forms the index. For example:

builder.createTable('Asset')
    ...
    .addPrimaryKey(['id']);   // column |id| is the *only* key.

builder.createTable('AssetTag')
    ...
    // column pair (|tagId|, |assetId|) forms the primary key.
    .addPrimaryKey(['tagId', 'assetId']);

addNullable() also accepts an array of column names, but it means every column in this list can be nullable. They do not form indices.

We'll talk more about use of indices in later section.

Connection and database instances

Lovefield is designed to persist data within the browser client. By default, IndexedDB is used as Lovefield's data store. Why do we need Lovefield when IndexedDB is supposed to work by itself? That's another story worth a separate rant article about it, and we'll leave the spicy contents there.

Each database instance (i.e. persisted data within the browser client) is identified by the tuple of (origin, database name, database version). The instance is accessed via the connect() function.

If there were no instances found in the browser, Lovefield will create a new instance for you. If there were matching instance found, Lovefield will load that instance into memory so that you can resume using it from the last session.

If you do not wish to persist your database within the browser storage, you can instruct Lovefield to operate in memory only mode:

const db = await builder.connect({
  storeType: lf.DataStore.MEMORY
});

If you wish all database connections in your session to be in-memory only, you can also use the options API to do so:

lf.options.set({memoryOnly: true});

Note: in node.js, only in-memory is supported and is already enforced in the node.js module offered.

When connect() is called, it will return a connection object that can be used to run queries. At this moment the schema set in builder can no longer change.

Transaction and queries

A query in relational database means a request to manipulate data or schema, it can be insert/delete/update data, or retrieval of data (which is usually named as select). All queries in Lovefield is executed in the context of a transaction. A transaction is an atomic unit of execution: all queries inside a transaction either succeed together (called commit), or fail together (called rollback). If the transaction failed, data remained in their original state untouched.

Insert and update

All queries starts with inserting data into the database. To insert data, one needs to create rows first:

const item = db.getSchema().table('Item');  // get the |item| table schema

// Use table schema to create row
const row = item.createRow({
  id: 1234,
  description: 'a todo item',
  deadline: new Date(),
  done: false
});

// These two APIs are the few Lovefield synchronous API, they return concrete
// objects, not promises.

Once the rows are created, you can use one of the insert() queries to store them into database:

// This writes the |row| into the |item| table.
// If there were an existing row with the same primary key, it will raise
// an exception.
await db.insert().into(item).values([row]).exec();

// This writes the |row| into the |item| table.
// If there were an existing row with the same primary key, it will replace
// that one with this |row|.
await db.insertOrReplace().into(item).values([row]).exec();

// Queries are all promise-based APIs. So either handle the promise, or use
// await syntax.

You can also just update some fields of an existing row without the need to provide a full row:

// Change the done field to true for todo item 1234.
await db.update(item).set(item.done, true).where(item.id.eq(1234)).exec();

// TypeScript users: use col() to avoid annoying typing errors.
const id = item.col('id');
const done = item.col('done');
await db.update(item).set(done, true).where(id.eq(1234)).exec();

// We'll just use col() for the rest of the examples.

Search conditions

The where() function accepts predicates, also known as search conditions, which are conditions for Lovefield to match the rows. To avoid free text parsing and offer better security, Lovefield uses functions to define the search conditions:

Function Name Example SQL Equivalent
eq equals column.eq(value) column = value
neq not equals column.neq(value) column <> value
lt less than column.lt(value) column < value
lte less than or equals to column.lte(value) column <= value
gt greater than column.gt(value) column > value
gte greater than or equals to column.gte(value) column >= value
match regex match column.match(/\w*/) See Note below
between between column.between(100, 200) column BETWEEN 100 AND 200
in in the array column.in([1, 2, 3]) column IN (1, 2, 3)
isNull is null column.isNull() column IS NULL
isNotNull is not null column.isNotNull() column IS NOT NULL

Note: match is similar to SQL SIMILAR, however, due to technical limitations, we can only offer JavaScript regex search instead. Please also note that match will force a full table scan, which means performance penalty.

Delete

You can of course delete the rows that you do not want, for example:

// Remove all done items from the database.
await delete().from(item).where(item.col('done').eq(true)).exec();

Once committed, the data is gone forever and there's no way to get it back.

You might wonder that previously we mentioned

All queries in Lovefield is executed in the context of a transaction.

And where is that transaction? When you call the exec() function, it creates an implicit transaction to run that single query. When the promise is resolved, the implicit transaction is committed.

Select

Select queries are the most used queries in a relational database. Most use cases will have at least 10x more select queries than all other queries combined. Select queries retrieve data from the database.

Lovefield provides the following features for select query:

  • Filtering via search conditions
  • Sorting
  • Paging
  • Grouping
  • Aggregation
  • Join

Filter just the fields needed (a.k.a. Projection)

You can filter the select results by specifying the fields needed instead of getting back the full row. For example:

// List the description of done items.
const descriptions = await db
    .select(item.col('description'))
    .from(item)
    .where(item.col('done').eq(true))
    .exec();

descriptions.forEach(desc => {
  // Returned value are object arrays, each object will have only one property
  // named 'description'.
  console.log(desc.description);
});

// Rename the returned object property to 'd'
const d2 = await db
    .select(item.col('description').as('d'))
    .from(item)
    .where(item.col('done').eq(true))
    .exec();

// Flatten returned value into a string array.
const res = d2.map(d => d['d']);

Lovefield always return an array of objects as the result of select() query, and by default the property names are the column names. The as() function offered in column schema object will allow renaming the returned property name into something else. This is especially useful for join, grouping, and aggregations in later sections.

Multiple search conditions

Search conditions is briefly mentioned previously. In select query, you can also use where() to specify the search conditions:

import * as lf from './node_modules/lovefield-ts/dist/es6/lf';

// List all to-do items.
const res = await db
    .select()
    .from(item)
    .where(item.col('done').eq(false))
    .exec();

// List all to-do items due today.
const start = new Date().setHours(0, 0, 0, 0);
const end = new Date().setHours(23, 59, 59, 999);
const res = await db
    .select()
    .from(item)
    .where(lf.op.and(
      item.col('done').eq(false),
      item.col('deadline').between(start, end)
    ))
    .exec();

There can be only one where() call in query, and thus to combine search conditions, you'll need one of the combining predicates to help:

Function Example SQL equivalent
op.and op.and(sc1, sc2) sc1 AND sc2
op.and(sc1, sc2, sc3) sc1 AND sc2 AND sc3
op.or op.or(sc1, sc2) sc1 OR sc2
op.or(sc1, sc2, sc3) sc1 OR sc2 OR sc3
op.not op.not(sc) NOT sc

Note: op.and() and op.or() accept two or more arguments. op.not() accepts only one argument.

Sorting

Lovefield provides sorting for select() query results. For example:

// Return TODO items sorted by their due date.
const res = await db
    .select()
    .from(item)
    .where(item.col('done').eq(false))
    .orderBy(item.col('deadline'))  // default to ascending, i.e. smaller first.
    .exec();

It's also possible to do advanced sorting, for example:

import * as lf from './node_modules/lovefield-ts/dist/es6/lf';

// Return TODO items sort by their due date first,
// then reverse alphabetical order of description.
const res = await db
    .select()
    .from(item)
    .where(item.col('done').eq(false))
    // specifically called out sorting orders for better code maintainability.
    .orderBy(item.col('deadline'), lf.Order.ASC)
    .orderBy(item.col('description'), lf.Order.DESC)
    .exec();

There can be multiple orderBy() calls within a select() query, and the final results will be sorted by the order of their appearance accordingly. One can specify the sorting order by using the constants provided:

Constant Meaning
Order.ASC ascending order, meaning smaller first, e.g. 1-9 A-Z
Order.DESC descending order, meaning larger first, e.g. 9-1, Z-A

Paging

Paging of select() results are achieved by combining the limit() and skip() functions together:

const res = db
    .select()
    .from(item)
    .where(item.col('done').eq(false))
    .limit(10)  // return at most 10 items
    .skip(20)  // starting from the 21th elements if possible
    .exec();

This is very similar to SQL's LIMIT and SKIP syntax. Users to be aware of the transactions in play:

async function getPage(p) {
  return db
      .select()
      .from(item)
      .orderBy(item.col('deadline'))
      .limit(10)
      .skip(p * 10)
      .exec();
}

async function addData(r) {
  return db.insertOrReplace().into(item).values([r]).exec();
}

const res1 = await getPage(0);
await addData(r);
const res2 = await getPage(0);  // res2 may not be the same as res1!

If you changed the data in between select() calls, the results are picked at the snapshot of the time that select() call is invoked. If you want to keep a consistent snapshot, you'll need to use explicit transactions that will be detailed in Lovefield 102.

Aggregation

A lot of time we are only interested in the aggregated results of target rows, for example, number of rows, or a maximum value of a column in the target rows. Lovefield provides the following aggregation functions for you to use:

Function SQL Equivalent What is it?
fn.avg(order.col('amount')) AVG(order.amount) Arithmetic mean of all values in this column (integer and number types only)
fn.count(order.col('id')) COUNT(order.id) Number of rows matching this column (usually used in join query)
fn.count() COUNT(*) Number of rows of the associated table
fn.distinct(order.col('type')) DISTINCT order.type All unique values of this column
fn.geomean(order.col('amount')) None Geomean of all values in this column (integer and number types only)
fn.max(order.col('amount')) MAX order.amount Maximum value of this column, applicable to integer, number, string, and datetime
fn.min(order.col('amount')) MIN order.amount Minimum value of this column, applicable to integer, number, string, and datetime
fn.stddev(order.col('amount')) STDDEV order.amount Standard deviation of this column (integer and number types only)
fn.sum(order.col('amount')) SUM order.amount Summation of all values in this column (integer and number types only)

Here's a quick example showing its usage:

import * as lf from './node_modules/lovefield-ts/dist/es6/lf';

const order = db.getSchema().table('order');
const amt = order.col('amount');
const res = await db
    .select(
      lf.fn.count().as('# of orders'),
      lf.fn.avg(amt).as('average amount')
    )
    .from(order)
    .exec();

// res will be returned as an array with only one element like this:
// {
//   "# of orders": 345,
//   "average amount": 12345.67
// }

Grouping

With aggregators, sometimes we need separate groups to make the number even more meaningful. For example, we usually wanted to know number of orders and average amount per day or per customer. In this case, we can use grouping:

import * as lf from './node_modules/lovefield-ts/dist/es6/lf';

const order = db.getSchema().table('order');
const amt = order.col('amount');
const res = await db
    .select(
      order.col('customerId').as('cid'),
      lf.fn.count().as('cnt'),
      lf.fn.avg(amt).as('avg')
    )
    .from(order)
    .groupBy(order.col('customerId'))
    .orderBy(order.col('customerId'))
    .exec();

// res will be an array like this
// [
//   {cid: 1, cnt: 2, avg: 300.25},
//   {cid: 2, cnt: 4, avg: 172.25}
// ]

There are two important limitations of groupBy() that you need to keep in mind:

  • You can only groupBy() columns of type integer, number, string, datetime, and boolean (though groupBy boolean is NOT suggested)
  • You can only project the groupBy() column alongside the aggregated columns. Any other columns will cause Lovefield to throw an exception. For example:
import * as lf from './node_modules/lovefield-ts/dist/es6/lf';

// This will throw an error.
const res = await db
    .select(
      order.col('customerId').as('cid'),
      order.col('clerkId').as('agent'),  // ERROR!
      lf.fn.count().as('cnt'),
      lf.fn.avg(amt).as('avg')
    )
    .from(order)
    .groupBy(order.col('customerId'))

Parameterized query

Parameterized query are very common for RDBMS programming, and Lovefield supports it. The concept is to bind some variables inside query at runtime. For example:

import * as lf from './node_modules/lovefield-ts/dist/es6/lf';

// SQL equivalent:
// SELECT * FROM order where customerId = ?0 AND date BETWEEN ?1 AND ?2
//
// There is a runtime cost to create a query object and its execution plans.
const query = db
    .select()
    .from(order)
    .where(lf.op.and(
      order.col('customerId').eq(lf.bind(0)),
      order.col('date').between(lf.bind(1), lf.bind(2))
    ))
    .limit(lf.bind(3))
    .skip(lf.bind(4));

// We can use the same query object binding to different variables to avoid
// recreating execution plan. This is especially helpful to execute queries
// inside loops.
const res1 = await query
    .bind([
      /* 0 */ customer1,
      /* 1 */ dateStart1,
      /* 2 */ dateStart2,
      /* 3 */ 10,
      /* 4 */ 0])
    .exec();

The bind() function can also bind to a whole array, for example:

import * as lf from './node_modules/lovefield-ts/dist/es6/lf';

// The values expects an array.
const query = db.insert.into(items).values(lf.bind(0));
query.bind([[row]]).exec();  // Therefore we gave it an array as element 0.

So where exactly can I use bind()? The table below summarize it for you.

Query type Parameterizable places
all search conditions
select() limit(), skip()
insert(), insertOrReplace() values()
update() set()

We have covered most of the fundamentals of using Lovefield so far. Lovefield actually offers more features than just these, which we will reveal them in the next session: Lovefield 102.