Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce initial request size #4233

Closed

Conversation

freimair
Copy link
Contributor

@freimair freimair commented May 4, 2020

Fixes #2547, fixes #2549, fixes #3763, part of bisq-network/projects#25

This PR reduces the initial request size to a O(1). By numbers, the number of historical keys to be sent to the seednodes is 1 now. With 1.3.4, that is 1 vs. 100k+ object keys and 2x1kB vs. 2x2500kB at startup.

Details

This is a big step towards a more pleasant user experience and due to changes to parts of the data store implementation it is also high risk.
I see the split data store implementations as a temporary solution until I can proceed to bisq-network/projects#29. Hence, even as there could be more tests, there aren't.

Testing efforts

I created a number of test classes handling the basics of the migration and update scenarios, everyday use cases plus a few special cases. Additionally, there is a shell script running a kind of manual integration test which helped in finding more issues. The "manual" test takes care of setting up a testing testnet and through the steps upgrades pieces of software as well. Take a look at it if you so please. The results of the test have to manually observed although there is a result.log once finished which lists all the request and response sizes. (please not that some logs are put there twice because that is simply how it is).

Please give this a thorough test. If there are any questions, please feel free to contact me on keybase.

EDIT:

as always when trying to test disk IO, travis complains and fails the tests. Please try the PR locally anyways. I will deactivate the tests for travis before the PR is merged.

Copy link
Member

@sqrrm sqrrm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've read through the core parts of the PR. For now just adding comments to gain a better understanding. This PR makes me concerned that we might lose data during upgrades or otherwise, but I don't have a full understanding yet.

@freimair freimair force-pushed the reduce_initial_request_size branch from 7169457 to 599110a Compare May 20, 2020 10:33
@freimair
Copy link
Contributor Author

freimair commented May 20, 2020

just rebased to master and folded in the first batch of feedback. (Thanks @sqrrm for your efforts)

Build is failing because JDK12 does not support removing final-modifiers through reflections. If I remove the tests that rely on that everything would be green. Not sure how to handle that. And no, there is no other way to test this except loosening something that clearly is final to being not final in the first place.

@freimair freimair requested a review from sqrrm May 21, 2020 08:49
Copy link
Member

@sqrrm sqrrm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking a bit about it I guess there is conceptually the same model we have now with one store, just split in many.

I am having a hard time getting a good overview of what's going on, but I think there is some muddling of concerns. A new client would not have to know about the old format as I see it. Only those nodes delivering data (seednodes typically) need to separate the kind of request they're getting and deliver data accordingly.

@freimair freimair requested a review from sqrrm May 22, 2020 15:05
Copy link
Contributor

@dmos62 dmos62 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requested changes. Mostly related to clean up, but still significant. I found readability difficult. Would appreciate more helpful names and more intermediary variables.

I wrote up a description of the PR and the problem it's solving. I'll include it below. Some of the things there may not be correct; it represents my ongoing learning about Bisq's P2P (and this PR). @freimair, after corrections, if they're necessary, would you consider including some of this information in an appropriate place in the code? I think future devs will appreciate an explanation of why this datastore design was chosen. I would appreciate feedback if I got something wrong.


A datastore is a store of Bisq p2p objects. It is synced between peers-peers and peers-seednodes,
and this way makes up a distributed database.

I presume that a peer's datastore is synced in this way:

  • a peer/client starts up;
  • it starts a "bulk sync" procedure:
    • gathers up UIDs (or hashes) of the objects it already has in its datastore;
    • uploads those UIDs to a seed node;
    • seed node takes a set difference of the passed UIDs and the UIDs of the objects it has;
    • seed node uploads to the peer/client the objects corresponding to that difference;
  • peer then enters p2p mode, listening for and publishing updates.

The problem is that this "bulk sync" procedure includes the peer sending a large number of UIDS
to the seed node, which can timeout, in case of insufficient network throughput, causing the
syncing to unrecoverably terminate.

That shows two problems:

  • Bisq does not recover from throughput issues;
  • the bulk sync is bandwidth-intensive.

This spec addresses the bandwidth-intensity.

Chronological information in the p2p objects would allow querying it by comparing objects to a
timestamp-like value; however, the p2p objects do not contain chronological information. It is
unclear whether introducing a timestamp would have any negative effects, but it would involve
making sensitive changes.

Freimair's proposal is about using chronologically-indexed datastore "checkpoints" that are
distributed with releases

  • (and thus identical across clients and seednodes)
  • each checkpoint contains objects generated since the last checkpoint
    • the first checkpoint contains all prior objects
    • this effectively assigns chronological information to objects
      • precision of which, for the purposes of bulk sync, corresponds to frequency of releases;
  • this way a bulk sync can be initiated by identifying
    • the latest checkpoint that the client has
    • and the objects it has stored since then (objects not in any checkpoint known to him)
    • which reduces the bandwidth requirement
      • to the equivalent of uploading UIDs that are newer than the release used
        • because identifying the checkpoint is negligible bandwidth-wise.

Requirements for the checkpoint datastore:

  • initializable from the previous monolithic datastore or from scratch;
  • a "running" checkpoint has to be maintained;
    • which holds objects generated since the last "release" checkpoint;
  • at no point can the datastore be mutated, except via appends;
  • "release" checkpoints must be identical across peers and seednodes.

storage.queueUpForSave();

return historicalStore;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we copying historical datastore checkpoints/splits out of resources? Because that's what the live-store-only solution we currently use does? For troubleshooting purposes? Seems unnecessary, though I may be missing something. I presume we keep the live store in the working directory, because we can't mutate resource files (is that even true?).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we copying historical datastore checkpoints/splits out of resources? Because that's what the live-store-only solution we currently use does? For troubleshooting purposes? Seems unnecessary, though I may be missing something.

the existing file storage mechanisms are all tailored to working in the working directory. Changing that would be equivalent to replacing the file storage mechanisms with something else entirely. Hence, leaving it as is (until bisq-network/projects#29).

I presume we keep the live store in the working directory, because we can't mutate resource files (is that even true?).

Well, obviously, the "live store" is something that is created and maintained by the Bisq app per installation and even per (configurable) app name and path. Placing that in the resources would only allow for a single instance of the data store AND would be overwritten and therefore invalidated on every app upgrade.

@freimair
Copy link
Contributor Author

freimair commented Jun 16, 2020

@dmos62 thanks for your review. Appreciate it.

First of all, your PR description is correct.

Furthermore, I believe I can address all of your comments in summary:
I see this PR as an as-small-as-possible fix for the 3/4 problem. Yes, files are copied around where they probably do not need to be, APIs could be more readable, return values are ignored or misused, and a lot of other stuff. Thing is, most of the issues you raised have been in the code already and I do not want to touch them. They have proven themselves. Keep the changes small, keep the isolated. Its a minefield.

That being said, bisq-network/projects#29 is in the queue. That project will get rid of the copying files around because there will be no more individual files, the API will need a major overhaul, the filter might get enhanced, the isSeednode code will be gone because it prevents our network from falling apart while upgrading from single data stores (now) to multiple (then) -After that is done, it is no longer needed.

Now to your obvious question: why not do it properly? I rather not spend time on reworking the same thing twice 😄 because I am a) a lazy bastard, and b) we need to use the resources we have efficiently. And of course, coding styles and our brains differ, so there is no perfect PR anyways.

@freimair freimair force-pushed the reduce_initial_request_size branch from f5d02d1 to 1d70856 Compare June 16, 2020 12:54
@freimair freimair requested a review from cbeams as a code owner June 16, 2020 12:54
@dmos62
Copy link
Contributor

dmos62 commented Jun 17, 2020

@freimair followed up above post with a Keybase message and we had a brief exchange there. I'll summarize how I see this situation and what my concerns are:

  • P2P datastore code has a fair amount of technical debt, which makes making changes to it without compromises slow;
  • we want to introduce changes quickly, which encourages us to make compromises;
  • compromising changes tend to degrade reliability and are more likely to introduce mistakes;
  • compromising changes require more resources to maintain: freimair points out that there are plans to replace large portions of this code entirely, which means that if all goes well those portions will not require maintenance;
  • compromising changes require more resources to review.

My concerns:

  • I've spent a long time on the PR (~2.5 days, I realise this says as much about my reviewing ability as it does about the PR), but I'm still not confident that all edge cases are covered; I don't feel comfortable seeing this merged at this time;
  • I think that reliability, maintainability and reviewability are preferable to speed, especially in sensitive code, though speed has its importance too; it's a balancing act;
  • introducing changes that are difficult to maintain, because if all goes well they won't have to be maintained, is planning for the best outcome; I don't think that's prudent;
  • I think it's good to address this human element: PRs that make hasteful changes, or side-step technical debt, often require the reviewer to address that complexity anyway, so that he can make sure that the changes are correct, which can be frustrating for the reviewer (who usually always has his plate full as is).

Idea:

Stack a backup system on top of this in such a way that the mutations we perform cannot lose data. Of course we have to be aware that this is addressing complexity with more complexity. Maybe this is an example of how not to solve this situation.

@stale
Copy link

stale bot commented Aug 1, 2020

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the was:dropped label Aug 1, 2020
@chimp1984
Copy link
Contributor

A few "high level" comments/thoughts:

By splitting up the data into groups we are creating "blocks" (I use that terminology with intention). How to uniquely identify a block in a distributed system is a challenge.

The approach used here makes the shortcut via the release maintainer to be the single entity to define what is the right "block". This might not be an issue (or at least does not make things worse as it is) but I am not 100% sure. The checkpoint terminology reveals the centralisation issue.

Is there another way to do it?
We have date fields in most (all?) of the relevant data structures and could use that for sorting the items but there could be multiple items with the same date. To get unique IDs ans sorting we could add the hash fields and create a tuple {date, hash} as ID and sort primarily by date, secondarily by hash.
But there can be a late insertion as well which could mutate our data blocks. For certain data we don't allow insertion outside a defined time period tolerance window (DateTolerantPayload). So we could consider blocks of a certain age as immutable at least. Newer blocks might get "reorged". Handling "reorgs" sounds like looking for toubles though...

Another approach could be to use a fixed time period window (e.g. 10 days) and fill up all data whose date falls into that period. The GetData request would send the index of the time window and the hash of its content. If the Seed node has a different hash it will send the whole package. For older data that should be very rare and the additional network overhead might be a good trade off with the more simple approach and low processing costs. The hash gets calculated once and is stored with the package. A problem is how to handle the case if a user has more data items than the seed node. The it would request each time again the package. If a seed is missing data it would get dossed by the network....

Another approach might be to use Bloom filters for the data request, but that is likely expensive for the seed node to process. A probably bigger problem than the 2.5 MB request is the load on the seed node to process the response.

@stale stale bot removed the was:dropped label Aug 6, 2020
@dmos62
Copy link
Contributor

dmos62 commented Aug 7, 2020

The main problem that we're trying to get around is that p2p objects are meant to be indexed by hash and don't have properties that would allow them to be indexable by something lightweight (like time). Bloom filters are normally the solution for these situations, but their processing is memory intensive, which opens up Bisq to denial-of-service attacks. It might be worthwhile to look at academic literature on the topic.

I can't dig deeper now, but I found some interesting articles following these links. Proabilistic set membership data structures are actively researched:

@chimp1984
Copy link
Contributor

@dmos62 Thanks for your input!

After a discussion with @sqrrm we agreed the best way forward is to limit a first mitiagtion to trade statistic objects only. This will reduce bandwith requirement by 50% as well as seed node processing load by 50% (as its about 50% of the hashes). It will also reduce required RAM for the Bisq app if it is only loaded on demand.

Rough idea:

Prune existing trade statistics by removing all data older than a cut off date - e.g. 2 months (TBD). Beside that handling of trade statistics data is exactly as it is now but instead of 4 years of data we only deal with 2 month. I assume that will be about 1-5% of current data.
At each release we add a snapshot named with the release version which contains all historical data until a defined cutoff date (can have an overlap with the in-memory data to be on the safe side - details have to be defined once we work on it).
This historical data is available for new users as its in the binary. For not updated users they will request it with a new request call from the seed nodes. They pass the versions they have and receive the missing ones. This will be only done once the data is actually needed (user goes on historical trade statistics UI). We could even spit that UI in recent trades which covers the in-memory data and historical data which makes the request and/or loads the shipped resource files. After leaving this UI we unload that data from RAM. Loaded data and in-memory data might overlap but as its hashmaps those will get canceld out.

This approach reduces a lot the risks as if anything goes wrong we only screw up trade statistics which are non-critical data.
If the concept of using version based buckets works out well it can be extended to the next data type like account witness objects (here we dont have the concept of on-deman data as all data are required).

Backward compatibility:
We need to take care that not updated nodes still work properly. I guess there are no big issues beside potential extra load when a updated seed node request data from a not updated seed. Here we get a bit of extra load as the not updated seed node would deliver all the missing data (up to the hard cap limit). The updated node would ignore those historical data as they will not fit the date check predicate. With customizing the update of seeds by assigning a seed to connect with (one which has alreayd updated) at startup we can minimize that load. Beside that we could use other tools like capabilities, verion numbers, new data fields as flags, activation date to make the transition smooth.
Users who have not updated longer than 2 months would see gaps in the trade statistics, but that is not a major issue as that data is only used for informational purpose. We can even require an update via the filter after a certain time.

I might work on that myself in a couple of weeks if I find time. Otherwise if anyone wants to pick it up, please leave a message.

@sqrrm
Copy link
Member

sqrrm commented Aug 7, 2020

Having discussed it and considering the risks it seems much safer to only focus on the tradestats to begin with. The risks are very high if any of the DaoState data or account witness data is lost. Losing some tradestats data is not critical in the same way but could still yield results. It would also help to prove the concept.

@chimp1984
Copy link
Contributor

I started working on it: #4405 Lets move discussion over there....

@freimair
Copy link
Contributor Author

freimair commented Sep 1, 2020

A bit late, I know, but I can be online again, local infrastructure is fixed, and I take my time to comment on your discussions. Although I would suggest a call sometime, just to get somewhere.

ad rough idea

pruning data does not solve the problem.

  • it lowers usability
  • we fix a piece of code that is at its limits with throwing even more code on it, thus increasing complexity instead of decreasing it
  • it adds a lot of complexity down the road as we have to handle missing data throughout Bisq
  • once Bisq gets more successful, we need to start pruning data a lot more rigorously (one month, 2 weeks?)

ad another way to do it

  • we cannot use time (as you guys already realized)
  • the only thing we control is releases. someone throws a number of messages into a bucket and put a stamp on it. Then we ship this bucket. No issues with later additions, not even incomplete buckets are an issue because the network keeps the databases synced anyways.
  • The checkpoint terminology reveals the centralisation issue. That is correct in a couple of ways:
    • reveals, because the issue is there already, cannot be discussed away
    • we have to deal with it eventually
    • by leaving it untouched, we do not move and cannot deal with "it"
    • we cannot solve all and everything now

ad backwards compatibility

  • seed nodes communicate using the old mechanics. nothing changed there. Because of all the reasons you mentioned and more.

ad if any of the DaoState data is lost

  • DaoState data isn't touched by this PR at all, and although the massive data store causes real issues, it does not have anything to do with the 3/4 problem

ad loosing any data in general

  • no data is deleted, no pruning, no shuffling around.
  • the only thing that happens is to move data from the live database once a new bucket arrives (through app update) and thus, reduce the Disk IO load

ad Having discussed it and considering the risks it seems much safer to only focus on the tradestats to begin with.

  • focusing on tradestats is possible if you guys so desire
  • however, I do not understand why there is this fear of loosing data because of this PR. Yet, pruning data - ie. intentionally loosing data, seems to be a way forward?
  • I do however, fully understand, and have seen it happen, that we loose data once in a while because the current implementation cannot cope with the sheer amount of data anymore (because eg. it takes too long to write to disk and files get clipped [which by the way is addressed by this PR as well, at least for non-dao stuff])
  • I do fully understand that we deny users access to Bisq which do not own a powerful dev machine
  • I do fully understand that we deny users access to Bisq which do not have decent broadband internet access

and finally

I am fully aware that there is a lot of technical dept in these parts of the code. And the dept needs to go. But we cannot eliminate this dept in one step (well, actually, we might be able to pull it off, but that would be a complete rewrite of the p2p module and all the risks that comes with touching this part of the code). And it seems like we are discussing stuff over and over again, not coming to a consensus for months. Meanwhile, the actual request size has grown beyond 6MB by now and we are denying more and more users access to the bisq network (and thereby loosing them).

The actual bad thing here is that the code stayed as is, no technical dept added, non removed, nothing changes. Just a massive waste of time.

@freimair freimair force-pushed the reduce_initial_request_size branch from 3542e27 to 5012aad Compare September 12, 2020 11:14
@boring-cyborg boring-cyborg bot added in:altcoins is:no-priority PR or issue marked with this label is not up for compensation right now labels Sep 12, 2020
@freimair freimair force-pushed the reduce_initial_request_size branch from 5012aad to 76cc37e Compare September 12, 2020 11:26
I thought I could get by using a fixed version string so we can
have tests involing future data stores...
Copy link
Contributor

@chimp1984 chimp1984 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NACK

Basic concept is ok but many impl. elements are not acceptable IMO. Specially the hack to encode the version into the payload of the hashes instead of adding a new protobug field to the request msg.

I suggest that we use #4519 which is based on this PR one but has a lot of changes. It is still WIP and not much tested, should be ready for review in a few days...

@chimp1984
Copy link
Contributor

@sqrrm @ripcurlx Can you close that as it was replaced by a merged PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
in:altcoins is:no-priority PR or issue marked with this label is not up for compensation right now
Projects
None yet
8 participants