Skip to content

Latest commit

 

History

History
518 lines (425 loc) · 18 KB

lf102.md

File metadata and controls

518 lines (425 loc) · 18 KB

Lovefield 102: Advanced Usage

Join select

In SQL, there are four different joins defined: inner join, left outer join, right outer join, and full outer join. In Lovefield, only inner join and left outer join are defined.

Again, it's not my intention to lecture the relational database course. If you have questions regarding the differences and details of joins, please refer to available online class materials like this one.

Inner joins are used to find intersection of two data sets, for example, "give me the names of customer in orders table, you can find the customer name from the customer table". Lovefield supports two different syntaxes of inner join:

// The following examples wanted to find all orders from a customer.

const order = db.getSchema().table('order');
const customer = db.getSchema().table('customer');

// Explicit inner join
//
// SQL equivalent:
// SELECT customer.name, order.date, order.amount
//   FROM order
//   INNER JOIN customer
//   ON order.customerId = customer.id
//   WHERE customerId = 1
//   ORDER BY date
const res = await db
    .select(customer.col('name'), order.col('date'), order.col('amount'))
    .from(order)
    .innerJoin(customer, order.col('customerId').eq(customer.col('id')))
    .where(order.col('customerId').eq(1))
    .orderBy(order.col('date'))
    .exec();

// Implicit inner join
//
// SQL equivalent:
// SELECT c.name, o.date, o.amount
//   FROM customer c, order o
//   WHERE c.id = o.customerId AND o.customerId = 1
//   ORDER BY o.date
const o = order;
const c = customer;
const res2 = await db
    .select(c.col('name'), o.col('date'), o.col('amount'))
    .from(c, o)
    .where(lf.op.and(
      c.col('id').eq(o.col('customerId')),
      o.col('customerId').eq(1)
    ))
    .orderBy(o.col('date'))
    .exec();

// In these cases Lovefield returns rows like this:
// [
//   {
//     Customer: {name: "John Smith"},
//     Order: {date: 2020-01-22T00:00:00.000Z, amount: 89.99}
//   },
//   {
//     Customer: {name: "John Smith"},
//     Order: {date: 2020-01-22T00:00:00.000Z, amount: 9.99}
//   }
// ]

By default the join query returns data in their corresponding hierarchy. To flatten that, you will need to use the as() function:

const res2 = await db
    .select(
        c.col('name').as('name'),
        o.col('date').as('date'),
        o.col('amount').as('amount')
    )
    .from(c, o)
    .where(lf.op.and(
      c.col('id').eq(o.col('customerId')),
      o.col('customerId').eq(1)
    ))
    .orderBy(o.col('date'))
    .exec();

// Results now become something like this:
// [
//   {name: "John Smith", date: 2020-01-22T00:00:00.000Z, amount: 89.99},
//   {name: "John Smith", date: 2020-01-22T00:00:00.000Z, amount: 9.99}
// ]

Sometimes we want to join data from within the same table. For example, find all orders that is sold by and to the same person. In this case, we can do self inner join:

const o1 = db.getSchema().table('Order').as('o1');  // Must have as() function.
const o2 = db.getSchema().table('Order').as('o2');
const c = db.getSchema().table('Customer');

// Find orders sold by and to the same person.
const res = await db
    .select(
      c.col('name').as('name'),
      o1.col('date').as('date'),
      o1.col('amount').as('amount')
    )
    .from(c, o1, o2)
    .where(lf.op.and(
      o1.col('customerId').eq(o2.col('clerkId')),
      o1.col('customerId').eq(c.col('id'))
    ))
    .orderBy(o1.col('date'))
    .exec();

In some cases we will need to use left outer join. For example, when we want to list all transactions from customers ordered by customer names. There will be customers without transactions, and hence a left outer join will help show that:

// Left outer join
//
// SQL equivalent:
// SELECT c.name, o.amount, o.date
//   FROM customer c
//   LEFT OUTER JOIN order o ON c.id = o.customerId
//   ORDER BY c.name
const o = db.getSchema().table('Order');
const c = db.getSchema().table('Customer');
return await db
    .select(
      c.col('name').as('name'),
      o.col('amount').as('amount'),
      o.col('date').as('date')
    )
    .from(c)
    .leftOuterJoin(o, c.col('id').eq(o.col('customerId')))
    .orderBy(c.col('name'))
    .exec();

Observers (read-only triggers)

Lovefield supports data observation, which means that when a select() query's associated tables have changed, your handler will be called. This is designed to handle a common scenario: data model is updated in the background via AJAX, and we need a way to update foreground components when that happened.

const o = db.getSchema().table('Order');
const query = db
    .select()
    .from(o)
    .where(o.col('date').between(lf.bind(0), lf.bind(1)));

const range: [Date, Date] = [];
const handler = changes => {
  // This handler is called every time when data inside order has changed.
  query
      .bind(range)
      .exec()
      .then(res => updateList(res));
};

// Start observing the change
db.observe(query, handler);

$('datePicker1').change(ev => {
  range = ev.target.value;
  handler();
});

$('back').click(() => {
  // Leaving this page, stop the observe.
  db.unobserve(query, handler);
});

In the example above, we assume there's a date-picker web component whose value is an array with two elements, start and end date. When order table is updated by any background AJAX, the observer will be triggered to update current UI. When you no longer needs the observer, do remember to unobserve().

Transactions

A transaction is an atomic unit of execution: all queries inside a transaction either succeed together (called commit), or fail together (called rollback). If the transaction failed, data remained in their original state untouched.

Transactions can be implicit (triggered by calling exec() function in Lovefield), or explicit. You control creation of an explicit transaction, and also when to start the transaction and when to commit or rollback the transaction. This is the real benefit of using Lovefield over IndexedDB.

Let's start with a quick example:

async function setVacationsForNewEmployees(startDate) {
  const e = db.getSchema().table('Employee');
  const v = db.getSchema().table('Vacations');

  // All explicit transactions in Lovefield are read/write, which means you can
  // alter data and/or retrieve data within the same transaction.
  // createTransaction() returns a transaction object.
  const tx = db.createTransaction();

  // Secure the "scope" of queries, which means to lock these tables so that
  // there will not be surprises when our queries were run.
  await tx.begin([e, v]);

  const q1 =
      db.select(e.col('id')).from(e).where(e.col('hireDate').gt(startDate));

  // Attach will actually run the query in memory and get back the results.
  const res1 = await tx.attach(q1);
  const ids = res1.map(row => row['id']);

  // Update all employees found from q1 with 15 days of vacation.
  // Because we locked the table, we are sure that at this moment what we
  // selected from q1 will not be changed when we need to run q2.
  const q2 = db.update(v).set(v.col('days'), 15).where(v.col('empId').in(ids));
  await tx.attach(q2);

  // Remember, when you use tx.begin(), you MUST call tx.commit(). If tx went
  // out of scope, the DEFAULT IS ROLLBACK!
  return tx.commit();
}

Besides attach(), you can also call exec(). There are three main differences between attach() and exec():

  • exec() automatically calculate scope and automatically commit when all queries are executed.
  • exec() don't let you rollback it unless there's an error happening during its execution.
  • Queries passed into exec() cannot depend on each other, i.e. later query can not use results from previous queries.

The following code snippet explains the differences:

// Assume we have two queries defined: query1 and query2.
const tx1 = db.createTransaction();

// Run query1 first, then run query2. The transaction auto-commits when both
// queries are done.
tx1.exec([query1, query2]);

// Now let's do the same thing using begin-attach-commit.
const tx2 = db.createTransaction();
// You are allowed to lock the scope wider than needed. Useful in some cases.
await tx.begin([table1, table2]);
const res1 = await tx.attach(query1);
// It's possible for query2 to use query1's results.
const res2 = await tx.attach(query2);

return tx.commit();  // You can choose to do tx.rollback() when you wish.

To dig more into the concurrency model, let's see another example:

// Assume we already have queries defined: query1 to query6.
const tx1 = db.createTransaction();

//
// Execute in order: query1, query2.
tx1.exec([query1, query2]);

// Run query3 first, then run query4.
// Because they call their own exec(), they run in their individual implicit
// transaction contexts.
query3.exec().then(() => query4.exec());

// Execute in parallel (syntactically): tx1, query3, query5, query6.
query5.exec();
query6.exec();

const tx2 = db.createTransaction();
tx2.exec([query7, query8]);
// Execute in parallel (syntactically): tx1, query3, query5, query6, tx2.

So technically transactions can run in parallel, and it is mentioned that Lovefield does table-level locking. How would that affect parallel execution? Let's use some more examples to explain:

const c = db.getSchema().table('Customer');
const o = db.getSchema().table('Order');

// Select query implicit transaction, there will be a shared lock placed on
// table Customer.
const res = db.select().from(c).exec();

// Update query implicit transaction, there will be a write lock placed on
// table Customer. However, this will not affect previous query, and previous
// query will not see any of the changes made here.
db.update(c)
  .set(c.col('name'), 'Bar')
  .where(c.col('id').eq(1))
  .exec();

const tx = db.createTransaction();

// Explicit transaction attempting to lock both Customer table and Order table.
// In this case it will wait until the Customer table lock is freed by previous
// query.
tx.begin([c, o]);

// Another implicit transaction attempting to write Order. This will not
// run until previous tx committed/rolled back and release the table locks.
db.insertOrReplace().into(o).values([someOrder]).execute();

Therefore Lovefield's locking rules are very simple:

  • Implicit transactions running select() queries create a shared lock on the associated tables. This lock only ensure proper snapshot, but will not block other writing queries.
  • All other transaction types create a reserved lock on the associated tables. Any new transactions attempting to modify these tables will have to wait until the locks are lifted (i.e. transactions been committed or rolled back).

Constraints

Lovefield supports quite a few constraints features in RDBMS:

  • Primary key: uniquely identify each row in a database table.
  • Unique: ensure all values in a column (or column tuples) are different.
  • Not null: make sure values in a single column are not null.
  • Foreign key: ensure values in a single column are existing in another table.

Constraints are done via indices, which are data structures that can help to quickly find data (and therefore quickly identify if a constraint were violated or not).

Primary key and auto increment

Primary key is meant to uniquely identify each row in a database table. As a result, it provides auto-increment features. One can define an integer column named id and set it to auto-increment. The database will automatically generate a value for that column and ensure later rows have a higher number than previous rows.

schemaBuilder.createTable('OnlineTransactions')
    .addColumn('id', lf.Type.INTEGER)
    .addColumn('orderId', lf.Type.INTEGER)
    .addPrimaryKey(['id'], /* auto-increment */ true)
    .addForeignKey('fk_OrderId', {
      action: lf.ConstraintAction.RESTRICT,
      local: 'orderId',
      ref: 'Order.id'
    });

const db = await schemaBuilder.connect();
const ot = db.getSchema().table('OnlineTransactions');

const row = ot.createRow({orderId: 2});  // Do not assign |id| for a new row.
await db.insert().into(ot).values([row]).exec();  // |id| is auto-assigned.

Also, primary keys can be formed by tuples. For example,

builder.createTable('AssetTag')
    ...
    // column pair (|tagId|, |assetId|) forms the primary key.
    .addPrimaryKey(['tagId', 'assetId']);

You can also control the default sorting order of primary keys:

builder.createTable('Asset')
    ...
    .addPrimaryKey([{  // Use full index spec to construct primary key.
      name: 'id',
      order: lf.Order.DESC,  // The only way to override default lf.Order.ASC.
      autoIncrement: false
    }]);

builder.createTable('AssetTag')
    ...
    // column pair (|tagId|, |assetId|) forms the primary key.
    .addPrimaryKey([{
      name: 'tagId'   // Keep |tagId| sorting by default ASC
    }, {
      name: 'assetId',
      order: lf.Order.DESC  // Make |assetId| sort DESC
    }]);

Unique

Unique keys only guarantee values in a column, or combination of columns, are unique across the table. It does not allow specifying sorting order or auto-increment like primary key. Unique key does involve creating an index on the target column(s) so that its enforcement will be efficient.

Not null

Lovefield takes a different approach compared to other RDBMS in this front. In Lovefield, only ARRAY_BUFFER and OBJECT types are by default nullable, and everything else is not nullable by default. This means you can only add columns to be nullable (by using addNullable() in the schema builder) instead. If you were porting legacy RDBMS programs over, please note this subtle difference.

Foreign keys

Foreign keys are designed to enforce referential integrity. There are pros and cons of enforcing that in a client-side database and thus please use it with caution (i.e. know why you're using it and trade-off with performance hit).

The following statements must be true for all foreign key constraints:

  • A foreign key constraint is placed between exactly two existing columns, the parent column (a.k.a. referenced) and the child column (a.k.a. referencing).
  • The parent and child columns may belong to the same or to different tables.
  • The parent and child columns must have the same indexable lf.Type.
  • The parent column must be unique: either a primary key, or marked as unique.
  • A child column of a constraint can not be declared as the parent column of another foreign key constraint (i.e. no foreign key constraint chains).
  • Cyclic references are disallowed.

If any of the statements above is violated an Exception.SYNTAX_ERROR will be thrown.

A foreign key can be either ConstraintAction.RESTRICT or ConstraintAction.CASCADE. If no action mode is specified in the schema definition, the default value is ConstraintAction.RESTRICT. The differences are:

  • RESTRICT
    In this mode any constraint violation results in cancelling the operation that violated the constraint. Parent table is the referenced table, and child table is the referencing table.

    • insert()
      • Can't insert a row into the child table if there is no related row in the parent table.
    • update() or insertOrReplace()
      • Can't update the parent table's referenced column if the row being modified has related rows in the child table.
      • Can't update the child table's referencing column if it doesn't have a related row in the parent table.
    • delete()
      • Can't delete a row from the parent table if it has related rows in the child table.
  • CASCADE In this mode a constraint violation results in modifying related tables as necessary to maintain data integrity.

    • insert() or insertOrReplace()
      • Same as in the RESTRICT mode. Note that for insertOrReplace() queries, there is no cascading. If cascading is desired, an update() query should be used instead.
    • update():
      • When a column in the parent table is updated, all referencing columns should be automatically updated with the same value.
    • delete()
      • When a row in the parent is deleted, all related rows in the child table, should be automatically deleted.

There are also timing differences for foreign keys: ConstraintTiming.DEFERRABLE or ConstraintTiming.IMMEDIATE. All CASCADE foreign keys are IMMEDIATE. Therefore the timing differences are only meaningful for RESTRICT foreign keys.

  • DEFERRABLE A deferrable constraint is enforced right before a transaction is committed. The constraint can be violated by individual statements during the lifetime of the enclosing transaction, without any errors being thrown.

  • IMMEDIATE An immediate constraint is enforced during execution of each individual statement.

Index structure and persistent index

Currently Lovefield uses two different indices internally: a hash index for its internal indexing, and B+ tree for all indices that specified in schema builder. Due to lack of funding, this is not likely to change in the future (unless we received contributions on other data structures, such as k-d tree, grid files, or other index structure that can improve specific retrieval).

Since Lovefield loads all its data from data store to memory during initialization, it also constructs the B+ tree indices needed on-the-fly. This can make even bigger impact for database initialization, and therefore a persistentIndex() option is provided:

schemaBuilder.createTable('SomeTable')
    ...
    .addPrimaryKey(...)
    .addIndex(..,)
    .persistentIndex(true);  // Save indices to disk, do not build on-the-fly.

Give that option in the table builder will force Lovefield to save a copy of index structure onto data store when it needs to. As a result, this will slightly impact every write query's performance (because index changes also need to be flushed to disk), in exchange of better initial loading time. Users are supposed to conduct their own experiments to check if persistentIndex meet their own performance requirements. As a library author, I cannot tell if this option is good for you because I cannot predict index structures generated in the runtime.