-
Notifications
You must be signed in to change notification settings - Fork 363
Frequently Asked Questions
The standard FAQ page. As you submit 'em, this page keeps getting bigger...
- Are there any recommended "best practices"?
- Can I use KVO on objects?
- Why SQLite?
- Should I store images in YapDatabase?
- How do I deal with asyncRegisterExtension?
- Is registering a YapDatabaseView expensive?
There are a few best practices you should follow in order to prevent "shooting yourself in the foot". These best practices follow naturally from a basic understanding of how YapDatabase works. Thus, I'll highlight the basics first, and then list the associated best practices.
For more detail, see the Performance Primer article.
- Every YapDatabaseConnection only supports a single transaction at a time.
Essentially, each YapDatabaseConnection has an internal serial queue. All transactions on the connection must go through the connection's serial queue. This includes both read-only transactions and read-write transactions. It also includes async transactions.
This means that connections are thread-safe. That is, you can safely use a single connection in multiple threads. But do not mistake thread-safety for concurrency. Thread-safe != concurrency.
A read-write transaction on connectionA will block read-only transactions on connectionA until the read-write transaction completes. (Even if its an asyncReadWrite transaction.)
You can get a similar effect if you have a really really expensive read-only transaction. For example, you loop over every object in the database and perform some expensive complex math for each one. If you do this expensive read-only transaction on connectionA in a background thread, you're still blocking connectionA for other threads.
- Concurrency comes through using multiple connections.
Concurrency in YapDatabase is incredibly simple to achieve. You just create and use multiple connections. And creating a new connection is a one-liner.
This brings us to Best Practice #1 :
- Be mindful of read-write transactions & expensive read-only transactions
- Perform such transactions on dedicated connections
And speaking of read-write transactions...
- A database can only perform a single read-write transaction at a time.
This is an inherit limitation of sqlite. And it means that even if you have multiple YapDatabaseConnection's, all your readWrite transactions will execute in a serial fashion.
Recall that each YapDatabaseConnection has a serial queue, and that all transactions on that connection go through the connection's serial queue. In a similar fashion, YapDatabase has a serial queue for read-write transactions, and all read-write transactions (regardless of connection) must go through this "global" serial queue.
Which brings us to Best Practice #2 :
- Never use a read-write transaction if a read-only transaction will do.
A read-only transaction is more lightweight than a read-write transaction. Plus read-only transactions increase concurrency.
The great thing about a read-only transaction on connectionA is that it can execute in parallel with a read-write transaction on connectionB.
And this means that you can easily avoid blocking your main thread.
Which brings us to Best Practice #3 :
- Use a dedicated connection for the main thread
- Do not use this connection anywhere but on your main thread
- Do not execute any readWrite transactions with this connection
- Only execute read-only transactions with this connection
- Create separate YapDatabaseConnection(s) for background operations
- Use these separate connections to do your readWrite transactions
The rationale behind this last "best practice" should be understandable. You don't want to block the UI thread. So you have a dedicated read-only connection for it. Which means that it only executes read-only transactions. Which means it won't ever block due to "expensive" read-write transactions.
Now having a read-only connection means you're going to need a way to notify the main thread when changes have occurred that require updating UI components (such as a UITableView).
Which brings us to Best Practice #4 :
- Configure your dedicated main-thread read-only connection to use LongLivedReadTransactions
- And then use YapDatabaseModifiedNotification to handle updating your UI
Follow these best practices and you'll enjoy just how simple and powerful YapDatabase can be.
Key-Value Observing in a database system is dependent upon several things. First, the objects that you fetch from the database must be mutable. Second, the mutable objects would need to be automatically updated by something. As in, changes to objects made on other threads/connections must get merged into the objects you already have in your hand (the objects you've already fetched from the database) on your thread. In order for this to happen:
- objects must be tied to a specific connection (so the connection knows what objects to update)
- objects must be mutable (so the connection can update them)
In order to satisfy these conditions we wind up going down the same road that has made Core Data such a pain. That is, our objects become non-thread-safe, and tied to a specific connection. Furthermore, it becomes mandatory (not optional) to use KVO as our objects might get changed underneath us at any time.
In addition to this, all objects that get stored in the database would need to support some kind of merge operation. At first this might seem feasible. But the feasibility comes into question when we realize YapDatabase can store basic objects such as String's, Number's, etc. And this is why Core Data requires "object wrappers" for everything. Even if you just want to store a simple string in the database, it has to be wrapped in some NSManagedObject wrapper.
The fundamental architecture and philosophy behind YapDatabase is radically different from Core Data.
YapDatabase:
- collection/key/value oriented with extensions
- Connections are thread safe
- Fetched objects are "bare" objects
- Straightforward concurrency
Core Data
- Object & relationship oriented
- NSManagedObjectContext is not thread safe
- Fetched objects are subclasses of NSManagedObject wrapper class
- Fetched objects are tied to a specific context and are thus not thread-safe
- Each context monitors and manages every fetched object
- Concurrency requires manual merges and conflict resolution
Long story short, pure KVO observing is not supported by YapDatabase. Doing so would require us to make concessions that would defeat the original purpose of the project. However, YapDatabase does support a method of observing changes to specific keys / objects.
You can register for the YapDatabaseModifiedNotification. When you receive notification(s), simply pass the notification object to the connection to see if any particular keys (which you may be "observing") have changed.
See the above linked wiki page for some code samples.
From YapDatabaseConnection.h :
// Query for any change to a collection
func hasChange(forCollection collection: String, in notifications: [Notification]) -> Bool
// Query for a change to a particular key/collection tuple
func hasChange(forKey key: String, inCollection collection: String, in notifications: [Notification]) -> Bool
// Query for a change to a particular set of keys in a collection
func hasChange(forAnyKeys keys: Set<String>, inCollection collection: String, in notifications: [Notification]) -> Bool
It's a valid question worth pondering. For a key/value store there are many other possible underlying databases. So its worth reviewing the reasons why sqlite was chosen as the underlying datastore.
Reason 1: Purpose
Most often this "question" is posed with teeth. Something along the lines of:
I read that database X is 5% faster than sqlite according to tailored benchmarks I read on the website for database X. This proves that sqlite is a dead technology. Therefore, you suck! And YapDatabase sucks!
It's important to understand why there are so many databases in the world. It's because there are so many different scenarios for using a database. There are client applications. And server applications. And server applications that run in parallel on a cluster of thousands of servers. And we can break down these domains much further.
So what is YapDatabase great at that other databases are not-so-great at?
YapDatabase is great for making apps. Client side apps. For macOS & iOS. It's designed to help you deal with tableViews & collectionViews. It knows the main thread is for UI, and helps you avoid blocking it. It has straight-forward concurrency, and built-in caching. It can give you a long-lived read-only connection for the main-thread, and pin it to a certain commit. It allows you to move that connection to newer commits in an atomic fashion. And when you do, it will tell you exactly what changed, and how that affects your user interface.
Simply put, YapDatabase was designed with modern client side apps in mind. Concurrency was not an afterthought - it was baked into the original design. Cloud sync was not an afterthought - it requires concurrency and extensibility. The ability to drive the UI was not an afterthought - it's why YapDatabase comes with views & full text search & secondary indexes.
And that brings us to the next reason.
Reason 2: Extensibility
Perhaps we could swap out sqlite for some other key/value database. And perhaps you'd get a 4% performance improvement. But would you rather have that, or an extension to YapDatabase that provides Full Text Search? Or an extension that provides support for your favorite cloud sync service? Or an extension for persistent views? Or secondary indexes? Or an extension for R*Trees that provides efficient geospatial queries? Or the ability to write your own extension that has access to the full power of SQLite underneath it?
A minor performance improvement is a theoretical question of little importance most of the time. If you're already using a key/value database then your bottleneck isn't likely something that can be solved with another key/value database that's barely faster. You need something else entirely. You need an efficient way to sort your data for presentation in a table view. Or a secondary index on a particular property to speed up an important query. Or full text search. Etc.
And this is why I think SQLite is a great fit. YapDatabase provides simplicity up front. But the power of YapDatabase is not in the key/value store, but in the extensions it provides atop this base layer. And these extensions have access to the power and flexibility of sqlite under the hood.
For more information on extensions, see the Extensions wiki article.
Reason 3: Dependability
SQLite has been around for over a decade. It's used almost everywhere. It's even what Apple uses under the hood of Core Data. You don't get to this level of ubiquity without being dependable.
Reason 4: Availability
SQLite is on all versions of Mac OS X and iOS (at least for as far back as I can remember). So there are no third party C++ libraries to download and compile. There are no dependency errors, or makefiles to tweak. It's been part of the system for a long long time.
Reason 5: It's Free
And that's a tough price to beat.
Long story short, sqlite was chosen because it's the best tool for the job. And the job is making apps. And everything that goes along with it.
First, to answer an alternative question: "Can I store images in YapDatabase?"
And the answer is Yes, you certainly can. Both UIImage & NSImage support NSCoding. Thus you can store images:
- directly
transaction.setObject(myUIImage, forKey:key inCollection:collection)
- within another object
transaction.setObject(objectWithImageProperty, forKey:key inCollection:collection)
- or by converting to JPEG/PNG, and then storing the raw Data
transaction.setObject(jpegData, forKey:key inCollection:collection)
But just because something can be done doesn't imply you should do it. So let's discuss performance, and alternative options.
First, there is a difference between storing an image by itself, and storing it within another object. For example, say you have a User object. And every user has an associated avatar (which is an image). Should you store the image directly within the user object?
Well, if you do so, then every time you fetch a User object you're also fetching the image. So if you're fetching a User from disk that's normally only 2K worth of bytes, you're now fetching 62K worth of bytes because of the image. Do you want/need the avatar every time you fetch a User object?
On top of this, fetching a User object will now also put the User+Image in the cache (as the image is a property of the user). Meaning that enumerating a bunch of users (for non-avatar purposes) will inflate the size (in bytes) of your cache.
Thus it is generally recommended that, if you're going to store images in the database, you store them separately from their associated object. Continuing with the User example, one could do something like this:
class MyUser: Codable {
var avatarKey: String?
// ...
}
And then you can always fetch the avatar if/when you want it:
let avatar = transaction.object(forKey: user.avatarKey, inCollection: "avatars") as? UIImage
Further, you can use the YapDatabaseRelationship extension to ensure that the avatar is automatically deleted from the database if you ever delete the associated user.
But is it faster to store my images in the database or directly on disk ?
Let's look at some numbers: https://www.sqlite.org/intern-v-extern-blob.html
On Apple systems, the default page_size is 4096. (And the page_size is configureable via YapDatabaseOptions, with caveats you can read about in the header file.) Which means, according to the chart, it's actually faster to store small images in a sqlite database. The break-even point is somewhere between 50K & 100K (according to the chart).
Important: The referenced test was done on a Linux workstation, using an Ext4 filesystem, with a SATA disk. Do you really think the numbers are going to be the same on an Apple system, using an APFS filesystem, with a flash disk? ... So if you're looking for a "hand-wavy rule of thumb", then saying the performance for "small" images is faster with sqlite is probably fine. But if you're serious about this particular performance optimization, then I'd strongly encourage you to run your own benchmarks on target systems.
So if your images are big, it would be preferable to do something like this:
class MyUser: Codable
var avatarURL: URL?
// ...
}
And again, you can use the YapDatabaseRelationship extension to ensure that the avatar file is automatically deleted from the filesystem if you ever delete the associated user. (Yes, YapDatabaseRelationship supports creating a relationship between an object in the database and a file on disk.)
PS - If you're storing NSURL's in your objects, don't forget to encode them properly. Apple recommends using their "bookmark data" system. If you want something even better, you can just copy the technique used in YapDatabaseRelationshipOptions.defaultFileURLSerializer.
There is one last thing that is possibly worth mentioning. From Apple's docs (for UIImage):
"In low-memory situations, image data may be purged from a UIImage object to free up memory on the system. This purging behavior affects only the image data stored internally by the UIImage object and not the object itself. When you attempt to draw an image whose data has been purged, the image object automatically reloads the data from its original file."
This auto-purging technique will only work for images loaded directly from a file on disk (not from an image in the database). This is because, if you load an image from the database, you're creating an image from data-in-memory. Which forces UIImage to retain its image data in low-memory situations, as it has no direct filesystem path to reload from.
Does this affect you? I'm not entirely sure, and it's a tough question to answer. Perhaps if you load a LOT of images from the database. And your app uses up a lot of memory by displaying many small images at the same time. And your app is deeply nested, where dozens of view controllers may be hidden in something like a navigation stack. Then perhaps, in this situation, it may be beneficial to allow the OS to automatically purge image data from all those images that are hidden in view controllers that are 6+ layers back. Maybe? This one is rather app-specific.
Extensions are generally registered on app launch. And most often they're registered asynchronously using code like this:
database.asyncRegisterExtension(view, withName: "sales") {(ready) in
if (!ready) {
print("Error registering view (sales) !!!")
}
}
The problem is that sometimes these extensions are required in order to drive the user interface. Using the example above, we may need the "sales" extension to populate our tableView. But now we can get into trouble depending on the timing.
Let's say you have code that sets up your database and looks something like this:
func setupDatabase() {
let databaseURL = self.databaseURL()
database = YapDatabase(url: databaseURL)
// setup extensions
self.setupNewItemsView(database)
self.setupOnSaleView(database)
self.setupSalesRankView(database)
self.setupSecondaryIndexes(database)
}
Where each setupX
method is going to use asyncRegisterExtension. This is equivalent to performing 3 asyncReadWrite operations, and then just assuming the data has hit the disk by the time your user interface is going to want it. And there are a multitude of reasons why that might not happen. Perhaps the disk is running slow because some operating system task is stressing it. Or perhaps you modified the OnSale view, which means the database has to re-populate that view. Or maybe the UI is simply getting initialized so quickly after calling setupDatabase, that it doesn't really stand a chance.
Whatever the case may be, we can now look at our UI code:
uiDatabaseConnection.read {(transaction) in
// BAD CODE !!!
let groups = ["fantasy", "sci-fi"]
mappings = YapDatabaseViewMappings(groups: groups, view:"SalesRank")
// Our YapDatabaseView hasn't been registered yet.
// So transaction.ext("SalesRank") will return nil !
mappings.update(with: transaction) // <= Oops !!! Bug !!!
}
So mappings is going to report zero items in both the 'fantasy' & 'sci-fi' sections. Normally this isn't a big deal. But in a moment you're going to receive & process a YapDatabaseModified notification. This notification is being delivered because the "SalesRank" view is now ready. And then mappings is going to jump from zero items in each group, to X & Y items in the groups. And, importantly, it won't have row insert operations to accompany this jump. (Because the rows weren't inserted. They were already there from a previous app launch. Its just the extension wasn't ready yet.)
To break the problem down:
- registering an extension generally involves disk IO. So it's recommended you do so via the asyncRegisterExtension method.
- populating your UI may require an extension. But you should never block your UI to wait for an asynchronous background task.
Thus, here is the recommended way to code for this:
func initializeMappings() {
uiDatabaseConnection.read {(transaction) in
if let _ = transaction.ext("SalesRank") as? YapDatabaseViewTransaction {
let groups = ["fantasy", "sci-fi"]
mappings = YapDatabaseViewMappings(groups: groups, view: "SalesRank")
mappings?.update(with: transaction)
}
else {
// View isn't ready yet.
// Wait for YapDatabaseModifiedNotification.
}
}
}
@objc func yapDatabaseModified(notification: Notification) {
// Jump to the most recent commit.
let notifications = uiDatabaseConnection.beginLongLivedReadTransaction()
if mappings == nil {
self.initializeMappings()
self.tableView.reloadData()
return;
}
// Process the notification(s) as usual.
guard let ext = uiDatabaseConnection.ext("SalesRank") as? YapDatabaseViewConnection else{
return
}
// What changed in my tableView?
let (sectionChanges, rowChanges) = ext.getChanges(forNotifications: notifications, withMappings: mappings)
// ... the boiler-plate tableView animation code here ...
}
There are a couple other solutions I'd like to point out too. Generally these solutions apply to "other" things that rely on an extension being in place. Like maybe updating the badge count. Or resuming unfinished operations from the last app launch. Stuff like that.
database.asyncRegisterExtension(view, withName: "sales") {(ready) in
// You already have a completion notification here.
// So you could use this to signal within your app.
}
// There's also this function:
database.flushExtensionRequests(withCompletionQueue: DispatchQueue.main){
// All extensions that were pending before
// you invoked `flushExtensionRequests()` have completed.
}
The thinking behind this question usually goes something like this:
Since a view has to enumerate every object in the database in order to populate itself, isn't this really expensive? What if I have a trillion rows in my database?
The first thing to keep in mind is a comparison with any other database. For example, say you were directly using sqlite, and you had a database table with 100,000 rows. Then you decided to add an index to this table. What would sqlite (or any database for that matter) have to do? It would obviously have to enumerate every single row in the table in order to create that index.
The good news is that you only have to take this hit once. That is, once the index is created, it gets automatically updated as the table is updated.
YapDatabaseView is similar, but with even more flexibility.
The very first time a YapDatabaseView is created, it will have to enumerate the objects in the database, in order to populate its internal tables. But once this has completed for the first time, the view has written its data to the database, and thus doesn't have to do this every time. That is, on subsequent app launches, the view is immediately ready because its data is already there in the database. (YapDatabaseView writes its information to a separate table within the sqlite database.) And, as you probably already know, YapDatabaseView participates in read-write transactions (automatically, behind the scenes) and automatically updates its table(s) as you insert/update/remove objects in the database.
You'll notice this correlates very closely with an index in any other database. There's an up-front cost to create the thing. And, once created, it can quietly update itself alongside the database.
But YapDatabaseView also has tools to limit the number of objects that it must enumerate. For example, say you have a database with 100,000 objects. But you're trying to create a view that only includes objects from 1 or 2 collections. These collections may represent a subset of the whole database. Say 10% of it. You can configure YapDatabaseView to filter everything but this subset of collections:
let collections = Set(["lions", "tigers", "bears"])
let options = YapDatabaseViewOptions()
options.allowedCollections = YapWhitelistBlacklist(whitelist: collections)
let view = YapDatabaseAutoView(grouping: grouping,
sorting: sorting,
versionTag: versionTag,
options: options)
And now, when YapDatabaseView enumerates the database to populate itself, it will only enumerate those few collections. And thus drastically reducing the initial overhead!
In addition to this, it will simply ignore inserts/updates/deletes of objects outside this set of allowed collections, and thus further reducing the overhead of having the View extension plugged into the database system.
PS: If you take a look at YapWhitelistBlacklist, you'll notice you can create these things based on a whitelist, blacklist, or even a block that can dynamically decide if a collection is allowed or not. So these things are pretty flexible.
In addition to this, keep in mind that registering a view (or registering any extension for that matter) is the equivalent of performing a read-write transaction. And you're strongly encouraged to register your extensions asynchronously. Thus the overhead of setting up your views is no different than an asyncReadWrite transaction. That is, it's relegated to a background thread that won't block your UI. In fact, there's even an FAQ question regarding dealing with asyncRegisterExtension.