-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Transparent Encryption/Decryption Layer between LevelDB and Filesystem "Block Manager" #5
Comments
Alternative is to instead work on leveldb directly and manipulate the C++ to allow one to plug in encryption/decryption. |
Along with IndexedDB, RocksDB, another option is lmdb based on the discussion here: 6107d9c#commitcomment-73269633. The lmdb-js project already supports native encryption at the block level thus ensuring keys and values are encrypted. |
Since we are working at the C++ layer, this should mean we can finally attempt block level encryption. I wonder if we can just bind to node's openssl. https://github.com/nodejs/node-gyp/blob/master/docs/Linking-to-OpenSSL.md since it's already there. Node's crypto and webcrypto API is likely built on top of the statically linked openssl. If we do the same, we would maintain parity with the crypto implementation. And it avoids bringing our own crypto library. Finally if we have to do, we can then do so with a native library rather needing it to be implemented in raw JS or web assembly. |
When doing this it's worth considering the ability to do incremental key rotation. This means if the key gets changed instead of re-encrypting EVERYTHING straight away, we can encrypt new values with the new key. However the old key would have to be kept around to decrypt old values and can only be discarded once all old values are gone and have been re-encrypted. We can one of 2 ways:
One could build 1 off 2. A background system can just read every single block. While in the case of 2, it just means a reference count has to be kept around for the key. However js-db doesn't keep around the key on disk. It is expected that one key is provided to the DB in-memory. The persistence of old keys will need to be hooked into through a ref counted system. How do we identify blocks that are encrypted with a particular key... we may hash the key, and keep the hash around as a "key identifier". Then each block would have a key identifier. Blocks would need to be large enough to justify keeping these key identifiers around. I imagine we may have something like 16 bit hashes or 8 bit hashes. Perhaps a counter could also work, but one would need to again remember some aspect of the key that is being used. Perhaps the db can remember there are X keys still be used. Imagine:
Then the user must provide those 6 keys again. If they don't, then the initial integrity/canary check will fail. |
When integrating our new symmetric crypto routines from sodium native to js-db, we need to consider how to integrate 2 native shared objects (native plugins) to nodeJS together. I asked ChatGPT about this https://chat.openai.com/share/d09826e1-ebb0-4584-9e89-d379ac7363b8. This will also be relevant to MatrixAI/Polykey#526. The key point is to avoid code duplication. We won't want to use the OpenSSL library inside NodeJS, because OpenSSL there is not likely to exist on other platforms, so we must supply our own crypto library which currently is the libsodium provided by |
Also see the discussion in MatrixAI/Polykey#526 (comment) for further elaboration on interactions between different shared object native libraries in the same NodeJS process. |
It seems then, that the right thing to do is to require peer dependencies, rather than direct dependencies. That is, the DB could depend on the peer dependency on sodium-native. This sort of implies that sodium-native is the host, and Thus requiring that the downstream project have It's a bit strange. Alternatively if Given that On top of this, one could argue that it's an optional dependency, because the DB doesn't actually need to have crypto switched on. Right now it's a dependency injection. However we still need to work out how exactly one would dependency inject into the RocksDB environment...? Especially since we would want to avoid having C++ code call JS then call C++, instead C++ should just call C++ directly. So I imagine this would have to be just a runtime boolean switch to turn it on/off. And thus it would be hardcoded to the sodium-native crypto facilities. No dependency injection possible here. I think though, there is this concept of calling a common interface/header, and being able to substitute for a different library as long as it exposed the same symbols. I see some native projects saying that you can swap out their SSL for different openssl variants. So this must be possible too. Therefore this would be a libsodium based interface. |
So upon further research, I see that it's possible to "dynamically" inject the function pointer into the C/C++ code. This is different technique to just using the same headers, and then using So imagine that in the C++ code, we wanted to have functions passed in that we would call to do the crypto operation. These would be considered C function pointers. How would we "pass" these in from JS. Well you could do something like this: #include <dlfcn.h>
int main() {
void* handle = dlopen("mylib.so", RTLD_LAZY);
void (*function_in_library)() = dlsym(handle, "function_in_library");
function_in_library(); // Indirect call through a function pointer
dlclose(handle);
return 0;
} Suppose this was called by NodeJS:
I'm not sure if it is possible to access the Then subsequently pass that into the C++ side of Then the
The handle still is managed by the caller though. If it wants to use I'm not sure if this is a better method. This sort of allows |
There is a slight performance penalty on using the |
Note the usage of https://nodejs.org/api/process.html#processdlopenmodule-filename-flags, primarily is about loading exported NAPI/node API functions. But if the shared object just has exposed symbols in general... that should be available to other shared objects right? This requires some experimentation and comparison to ESM async imports/static imports. |
Test with different symbols: https://nodejs.org/api/os.html#dlopen-constants |
Specification
Our current encryption/decryption layer sits on top of LevelDB. This causes problems for indexing #1 because when you want ot index something you'll need to expose keys, and keys have to be un-encrypted atm.
It may also increase performance of DB if encryption/decryption were operating at a block level rather at individual key-value level. It's the equivalent of using full-disk encryption and using leveldb on top.
We can't rely on OS provided full-disk encryption. So something that is in-between the current key-value DB like leveldb and the actual filesystem that is executed in JS or C++ would be needed.
There is a
level-js
which is a abstract-leveldown compliant store that can be wrapped in levelup. It is leveldb implemented in pure-JS which relies on IndexedDB. CurrentlyIndexedDB
doesn't exist natively on Node.js, but there are some implementations of it. This seems to give an opportunity to add a transparent encryption/decryption layer in between leveldb and IndexedDB.Additional context
level
is a library that bundlesleveldown
,level-js
,levelup
andencoding-down
to create a single batteries-included packageleveldown
andlevel-js
is thatleveldown
uses the C++ leveldb library which only works in Node.js whilelevel-js
works on IndexedDB which exists in browsersIndexedDB
were to exist in Node.js, one could uselevel-js
as an isomorphic library that works on browsers and Node.js, not sure about NativeScript thoughIndexedDB
could allow us to put a transparent encryption/decryption layer in betweenIndexedDB
andlevel-js
, thus enabling us flexible indexing, and probably better securityIndexedDB
in Node.js:fakeIndexedDB
becomesrealIndexedDB
via leveldb, then this should be possible in NS as well, but you are doing something a bit funny:levelup API -> level-js -> transparent encryption/decryption -> IndexedDB -> leveldb or whatever
, but this is sort of what happens in Chrome which implementsIndexedDB
using leveldbTasks
IndexedDB
IndexedDB
, perhaps by being implemented by leveldb or sqlite, it seems like any performant implementation would have to use C++ at some point, also there are bunch of wrapper libraries, but not sure which ones actually perform real persistenceThe text was updated successfully, but these errors were encountered: