Use data structures as database #73
Replies: 35 comments
-
If we look in the dependencies we find |
Beta Was this translation helpful? Give feedback.
-
Well, if you refer to the speed: This is done 2013, might be different today? |
Beta Was this translation helpful? Give feedback.
-
And I mean in general. Must not be a 1:1 copy. I just want to use code as data base. |
Beta Was this translation helpful? Give feedback.
-
It was told, that this is something similar in concept: |
Beta Was this translation helpful? Give feedback.
-
I was referring to that it is perhaps possible.
|
Beta Was this translation helpful? Give feedback.
-
The STM implementation originally referenced might also need some love. It looks like some of the code in the above |
Beta Was this translation helpful? Give feedback.
-
The performance measurements might be taken with a grain of salt. Perhaps @rofr knows more about how these things performs since he has done a lot of in memory db work on .net. |
Beta Was this translation helpful? Give feedback.
-
I always had the idea of including a STM in F#+ surrounding me, although I'm not sure if it will be used in real world code. |
Beta Was this translation helpful? Give feedback.
-
Yeah, sounds nice 😁 |
Beta Was this translation helpful? Give feedback.
-
We can look at the differences between the ancient fsharpx code and the different stm implementations in haskell. They couldn't find a maintainer for it (f#). |
Beta Was this translation helpful? Give feedback.
-
Yes, you can easily do the equivalent of acid-state in F#! The proof of concept implementation would probably be < 50 lines of code :) You don't actually need the STM bits to use data structures as a database. STM provides a mechanism to rollback but rollbacks are only strictly necessary to undo changes due to concurrency conflicts. In OrigoDB and memstate we persist each command object to the journal (write-ahead logging) and then apply the command to the in-memory state (usually data structures). Origodb will rollback by discarding all the in-memory data and rebuilding from the entire log if a command throws an unexpected exception. Memstate assumes that a failing command will not corrupt the in-memory data model. This persistence pattern goes by many names. Memory Image (Martin Fowler), system prevalence (Klaus Wuestefeld of prevayler.org), event sourcing (kind of), command sourcing, op-logging (mongodb), redis append-only file (aof). Write performance is io bound, constrained by how fast you can log commands to durable storage. OrigoDB can write 3k commands per second using local file system. memstate does about 100 K commands per second using Event Store. |
Beta Was this translation helpful? Give feedback.
-
Wonderful. ^-^
Can you link us some useful tutorials? Thanks a lot for mentioning OrigoDB. @gusty Are you interested yet? ;) |
Beta Was this translation helpful? Give feedback.
-
I quote Gerard here, which responded to the snippet I already posted above in Slack channel:
|
Beta Was this translation helpful? Give feedback.
-
Here is a POC implementation in java and a small discussion in the comments. https://gist.github.com/klauswuestefeld/1103582 Regarding the quote from Gerhard: If you don't wait for I/O to complete, durability is not guaranteed. Doesn't matter if it's a snapshot or a log entry. |
Beta Was this translation helpful? Give feedback.
-
Although that is somewhat true, the fact is, unfortunately, IO is extremely slow, so to wait for IO completion before every new state commit is not practical for any decent throughput. To overcome this, there are ways/tricks to fall back and recover from late bound IO failure, given it is a rare occurrence that only happens when there is disk fault or out of memory. Things like circular buffers (in memory) can hold the last 50 mutation messages such that if an IO failure comes back, you can freeze the incoming commits and re-run from the failed commit form the circular buffer or re-run from the logs. You can wait on the logs/state to come back confirmed but the speed will be terrible and create a massive bottleneck, it's the compromise that needs to be made to get the best of both worlds, have a smart fall-back mechanism to make up for the possibility of IO errors coming back after a few more state mutations have occurred but they are rare given threads are writing to new memory all the time, not modifying existing files. If throughput is not a feature then by all means await every IO write. The acid/atomic nature of the transactions is usually more important then the insurance of every single transaction being written, ie, missing the last few is acceptable provided everything is written in correct order up to the point of the initial failure. A few keys to getting good write performance is using specialised OS memory dump Apis, batch save operations if possible (maybe 5 message at a time?) and tricks that you can look up on blogs for Sql Server, Lucene, Event Store and other persistence systems... If throughput is not an issue than, once again, this would all be over baking it, you can just wait all IO. |
Beta Was this translation helpful? Give feedback.
-
How is it about graph based? |
Beta Was this translation helpful? Give feedback.
-
Not impossible in theory that I am aware of, are you perhaps confusing with CAP? But in practice every traditional (b-tree, disk-based) RDBMS implementation that I've worked with sacrifices isolation (I) for performance. Default isolation level for sql server is READ_COMMITED, if you crank it up to SERIALIZABLE you get perfect isolation while throughput drops significantly. @gerardtoconnor your default mode makes total sense in the kind of high throughput architecture that you mentioned. OrigoDB and memstate both target complex domains where the contention of reads and writes in the RDBMS and the complexity of moving data back and forth between disk and memory hurts performance, correctness and developer productivity. @ShalokShalom can you elaborate on "graph based"? |
Beta Was this translation helpful? Give feedback.
-
I thought it was interesting to read. In a way this is like an event store architecture, but storing the actual functions instead of the domain events. |
Beta Was this translation helpful? Give feedback.
-
@rofr Yeah, the impossibility more relevant to CAP but that's why I said instantaneous, to the cpu cycle, just to highlight that there is usually a tiny bit of flexibility on timing as long as it's controlled, to be considered ACID. Consistency is another flexible factor on perf like Casandra vs Sql Server. Luckily I guess, given this is a local store, not distributed, it's a far simpler problem then full-blown distributed systems. @voronoipotato This was in line with my comments, I think it's worth pointing out that persisting a function instance to IO is not really possible/practical, that's why in similar systems, there are messages that map to functions, via DU or some other mapping technique. In F# functions are abstract classes with Invoke methods that can be represented easily in memory but persisting needs to be mapped some way. It's generally the same thing though, record what function & variables to apply to rebuild the state. |
Beta Was this translation helpful? Give feedback.
-
@ShalokShalom I'd be happy to help out a bit and learn some more f# and functional patterns |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
An external database such as neo4j or gundb doesn't really make sense In the context of this thread (Use data structures as database) because you're data lives in RAM in the same process as the code that operates on it. But how you model the data is entirely up to you, if you want a graph representation you could use a library such as https://github.com/Rickasaurus/Edgy or https://github.com/CSBiology/FSharp.FGL Does that make sense? |
Beta Was this translation helpful? Give feedback.
-
Yes, I mean the model of the data. Thanks a lot for the links, while Edgy seems unmaintained and FSharpFGL got at least no commit since half a year. Anyway, thanks for linking me these ones. :) Graphs just make sense, since that is how our brains work and I think nature has thought about such concepts for a long. ;) I thought it might be possible to use such a graph based approach for our project idea here? |
Beta Was this translation helpful? Give feedback.
-
Well, it's easier in f# to get something that is more done, than compared to c#, so it might be that the above libraries are mature enough. |
Beta Was this translation helpful? Give feedback.
-
@wallymathieu Well, the question is how long it works. Of course, they may be mature enough. What happens, if they become incompatible? |
Beta Was this translation helpful? Give feedback.
-
How do you mean? |
Beta Was this translation helpful? Give feedback.
-
Do you think that they will work in a few years? The thing is: If I go to invest some time into studying this software, so I hope that I am still able to use it in some years. People which I trust and who are quite experienced with software would look at me with a questioning glance, if I tell them that I use a software which is untouched since couple of years. |
Beta Was this translation helpful? Give feedback.
-
That's always something that you have to deal with as a developer I guess. Many of the unix tools on mac os x and base tools on windows might not have been touched for decades. Some of the GNU versions of the unix tools are maintained, but see very few commits per year. |
Beta Was this translation helpful? Give feedback.
-
Well, so long as they are maintained, is it fine. |
Beta Was this translation helpful? Give feedback.
-
Hi 🤗 Would this be able to substitute a database in a PWA? I really like to avoid JavaScript here. |
Beta Was this translation helpful? Give feedback.
-
Can we do this in F-Sharp?
http://hackage.haskell.org/package/acid-state
Beta Was this translation helpful? Give feedback.
All reactions