-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FastBalance approaches and solutions #38
Comments
Ok. I gave it a very deep thought. :) Well, as deep as I can at midnight. I think your Few random thoughts.
Thinking to start implementation of your idea tomorrow. |
I thought about it. The approach seems to be problematic. There is no guarantee that transactions are always sequentially as e.g. the timestamp is generated in JavaScript with local time of node. So if one service is for some reason off some minutes, than we have an issue. So we still risk wrong balances with this approach. Even if we use the amount of notes/transactions we could get in some issues as it could still result in a wrong balance. So the partial balances approach is still racy. Maybe there is no other way than track account changes always to avoid wrong data. On the other hand this can result in heavy database degradation if one account is heavily used. |
AFAIK, in Medici the:
We should not rely on any of the above Date fields as they are not enough granular. This ordering issue is solvable in at least two ways. 1. Generate The Mongoose does not support In await Promise.all(this.transactions.map((tx) => new transactionModel(tx).save(options))); With something like this: const txModels = this.transactions.map((tx) => new transactionModel(tx));
for (const txModel of txModels) {
const err = txModel.validateSync();
if (err) throw err;
}
const { insertedIds } = await mongoCollection.insertMany(this.transactions); 2. Utilise Create new field (e.g. Quoting this page:
Again, mongoose does not support Mongoose is the problem At this point I see mongoose more as a problem. I am considering removing mongoose entirely. Here is why:
What do you think @Uzlopak ? |
Another feature we can try dropping is the |
1. Generate _id server side only. Looks like we don't need to maintain second connection. We can reuse mongoose's driver connection with one option: transactionModel.collection.insertMany(this.transactions, {
forceServerObjectId: true,
}) |
Yes. If you dont provide an _id and with forceServerObjectId it will generate the objectids on the server side. But it works as long as you don't use sharding. When you start using sharding the mongo instances can have issues with different times. Also if you want to use timestamps like you mention, probably doesnt matter if you just use mixed as schema type for it. As long as you don't want to read it and only want to sort by it, it should be no issue. But the timestamp has to be set by the server. Do the unit tests run though if you use mongoose v5? |
Also be careful. I wrote the stress tests because mongo transactions need atleast one awaited operation, before you can send the rest in a Promise.all. e.g. that's why I store first the journal and then Promise.all the transactions. The other way round would result in transaction errors. If you want to use server side ObjectIds you would need again to store first all the transactions, and then extract the object IDs, assign them to the journal and then store the journal. |
Maybe it makes sense to set mongoose to ">= 5" or "*" and remove the package.lock.json from the repo/npm package. Thus making the mongoose version irrelevant. We did not use any specific API from mongoose I think. We should still check our code with monoose 5 but there is no special code used. Only mongoose 6 has some more typings, like the pipeline.stage interfaces, which was only added by me to figure out why it broke from version 6.0 to 6.1.2. with mongoose 6.1.3 that speciifc type error got fixed, so we can remove the PipelineStage interfaces again, as they are not critical and the typings will be anyway checked implicitly in the aggregation method. |
I removed the pipelinestage interfaces. So I think mongoose 5 should work with the unit tests also. |
No. Btw, FYI we are not using TypeScript ourselves. Regarding Regarding replicaset issues. Typically only one node is a write-node. Well, at least that's what we have. So, the I replaced multiple ===== Here is how what I did to the Step 1 - validate all the data using mongoose model. const txModels = this.transactions.map((tx) => new transactionModel(tx));
for (const txModel of txModels) {
const err = txModel.validateSync();
if (err) throw err;
} Step 2 - bulk insert (single operation!) all the ledge entries. const result = await transactionModel.collection.insertMany(this.transactions, {
forceServerObjectId: true, // This improves ordering of the entries on high load.
ordered: true, // Ensure items are inserted in the order provided.
session: options.session, // We must provide either session or writeConcern, but not both.
writeConcern: options.session ? undefined : { w: 1, j: true }, // Ensure at least ONE node wrote to JOURNAL (disk)
});
let insertedIds = Object.values(result.insertedIds); Step 3 - read generated if (insertedIds.length === this.transactions.length && !insertedIds[0]) {
// Mongo returns `undefined` as the insertedIds when forceServerObjectId=true. Let's re-read it.
const txs = await transactionModel.collection
.find({ _journal: this.transactions[0]._journal }, { projection: { _id: 1 }, session: options.session })
.toArray();
insertedIds = txs.map((tx) => tx._id as Types.ObjectId);
} Step 4 - save the journal. // @ts-ignore
(this.journal._transactions as Types.ObjectId[]).push(...insertedIds);
await this.journal.save(options); |
Sounds fine. We should then document that we don't use mongoose operations so using mongoose hooks in schemas like save middleware won't be hit anymore. Also the replicaset issue is different from sharding I think. But on the other hand, if it is shard specific it should be no issue. Why does insertMany does not contain the insertedIds? According to the docs it should contain the insertedIds? https://docs.mongodb.com/manual/reference/method/db.collection.insertMany/#examples Also it sounds critical that you check if length of transactions is the same as insertedIds. What if they are not the same, for any reason. It should throw if it is not the same length just to avoid inconsistencies. |
Why do you do Object.values on insertedIds? InsertedIds should be already an Array of ObjectIds? Maybe this is the reason that ObjectIds are undefined? |
Ok, I get it. PS: This server side objectid bug by mongo is actually pretty bad. |
Fast balance was implemented. Gonna try it on large codebases. |
Ok. I've published the The medici v5 is 100% compatible with our codebases (well, except Going to publish the |
Houston, we have a problem.
The solution I see so far... The Also, it should apply other default indexes. transactionSchema.index({ _journal: 1 });
transactionSchema.index({ datetime: -1, timestamp: -1 });
transactionSchema.index({ accounts: 1, book: 1, datetime: -1, timestamp: -1 });
transactionSchema.index({ "account_path.0": 1, book: 1 });
transactionSchema.index({ "account_path.0": 1, "account_path.1": 1, book: 1 });
transactionSchema.index({ "account_path.0": 1, "account_path.1": 1, "account_path.2": 1, book: 1 }); Here is the code to retrieve info about all indexes: db.medici_transactions.getIndexes(); |
I actually recommend to drop the indexes. If you want to have a transition then use setTransactionSchema and add the old indexes, so they get not dropped. Then mongo will just add new indexes. Then in a week you drop the old indexes. |
Our system will stop functioning if we drop any single custom index at any point of time. Sorry, your recommendation is not possible. The |
You are not forced to call syncIndexes. And also syncIndexes will not drop any of the old indexes, if you define the old indexes in your custom Schema when using setTransactionSchema. |
This is incorrect. I am forced to sync indexes because we dropped
We are not using any custom schema. (Moreover, I believe "custom schema" should be removed from medici later.) So, in our case the Moreover,
Let's quickly choose which way to go: a) vivid warning in readme and JSDoc that |
The new MongoDB 5.0 "Time series" feature could be what we need: https://docs.mongodb.com/manual/core/timeseries-collections/ Thoughts?
|
I went with the option a) - updated the README. There is another idea for balance snapshot - read-only views. The balance query can be stored as a read-only view. So that medici wouldn't need to maintain the |
Released as v5.0.0
At last! :) |
I thought about this issue multiple times, and these were my "simple" solutions.
Account Table
All the transactions will be summed up in an account document. Potential issue is the writeLocks on heavy used accounts.
Create a balanceSumUp collection (UPD: we have implemented this solution)
The actual issue of balance is that the operation is O(N) where N is the amount of transactions assigned to an account. So probably the easiest solution would be to make a balance call and store the value into the balanceSumUpTable. So the BalanceSumUp would contain the last transactionId of the balance method, the sum of the balance, and an expiration value, e.g. daily. So what happens is, thay we first search for the balance in the dailyBalanceSumUp Collection. If we find it, we determine the balance and the transaction id we stored. We then do the balance but the _id has to be bigger than the last _id of balanceSumUp. Probably needs an appropriate index. But what happens is that if we have balance of an Account with 1000000 transactions we would not read 1000000 transactions but e.g. the last 1000 of the 1000000 transactions. Thus reducing the Runtime to O(n+1), where n is the amount of additional transactions since the persisted balance. If you set the expires (when the document is automatically deleted by mongo) to 1 day than you would have only once a day the slow down to run the whole balance. Or you set the expires (mongo expires) to 2 days and add a second expires to 1 day, but this expires does not result to a automatically deletion but indicates the freshness of the balance. So you take the last persisted balance, check if it is fresh enough, if so you calculate the balance since then. If it is not fresh, you persist the new balance to the collection, were you update freshness and expires. So you have a potential write conflict when writing to the persisted balance resulting in a retry to persist the balance?! But only once a day.
Or you don't expire at all and do just the freshness check.
This would not make it necessary to store additional information to the transactions. And it should be still faster than traditional balance.
The text was updated successfully, but these errors were encountered: