-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[discussion] A theoretical estimation of Oxen chain's storage size vs Session message storage requirement #480
Comments
Good points. My view is that message storage grows almost linearly with Monthly Active Users (MAU), but that there's an exponential relationship between MAU and storage size, but it's probably not quadratic. so my primary case is closer to your lower bound estimation in which case increasing storage does allow us to significantly scale. Our storage servers currently allocate only 3GB for messages. I'd like to see this increase with any space freed by removing the blockchain. I suspect clients aren't optimizing storage use. We can likely reduce both the size and frequency of messages to save space. @venezuela01, this could be worth investigating. Additionally with Lokinet integration, we could move many messages to the P2P layer if both clients are online, reducing storage needs. if we reduce message size or the number of stored messages this could lower the exponent in our growth. Also, it might help to reduce our swarm sizes, which seem too big and cause data over-replication. The Service Node network is more stable than we thought, so reducing swarm size to 3-4 nodes could increase storage. But this requires more thought and analysis of historical deregistration events. |
Thanks for the review!
I don't understand what you mean here, isn't |
Sorry this is a strange way of phrasing things, my essential claim is that i do think Service Node storage size grows exponentially with MAU, but that the relationship is not quadratic. The relationship is something between linear and quadratic. And that this relationship can be further controlled via the measures i suggested in my previous comment |
Maybe you didn't use the right term? If you agree that We can say "y grows as a power of x" or "y grows polynomially with x." if An exponential relationship is Could you clarify? @KeeJef |
Sorry you are correct, it is a 'power' relationship I'm trying to describe, where the growth relationship of storage size to user count can be described by a power function where the exponent r is greater than 1 but less than 2, i.e its more than linear but less than quadratic. |
That makes sense to me, and it's important for my core argument. Maybe I didn't explain clearly enough, so let me try again harder:
I share the same observation that the relationship is closed to the lower bound estimation, where the exponent My original argument is, and I still insist, if 'r' is small, we don't have to worry about storage too early, we don't need to delete the legacy Oxen chain in a rush. Assume a service node operator follows the official guideline with 40 GB disk storage in total; then, they will have 10GB of storage available at the moment, according to my practical observation of my own real nodes. Now let's have a look at the same table in comment 1:
Let's denote the current year as the 1st year. Assume after 4 years, or the 5th year, MAU grows 16x, reaching 11.2M; then, we will need 8GB disk storage for Session messages. In other words, we still have 4 years until we need to worry about message storage, even if we don't remove the legacy Oxen chain. And if we really have 11.2M users, I believe the majority of Oxen service node operators will be happy and proud to upgrade their hardware to support more potential users in the future. Now let's assume we decide to remove the Oxen chain to free space for Session messages at the end of the 5th year. But if MAU keeps growing 2x a year, in the 6.5th year (between 16GB to 32GB required), we will exhaust all 20GB space freed by removing the legacy Oxen chain, and we have to upgrade hardware anyway. In other words, removing the legacy Oxen chain can only save us around one year and postpone the hardware upgrade one year later. My core argument is, if we believe The exponent And if we believe just one year after deleting the legacy Oxen chain, we will have to upgrade hardware anyway, then freeing up space should not be a primary consideration to support the decision of cleaning up the legacy Oxen chain, otherwise we might regret merely one year after deleting the legacy Oxen chain.
I don't understand, is it a hardcoded value in source code? Can we allocate more storage for messages immediately, consider there are still 10GB available on a server with 40GB hardware?
As a conclusion, I still believe that before that critical time window, the message storage requirement is too small to worry about yet. After that critical time window, the message storage requirement suddenly becomes so large that removing the Oxen chain contributes very little to it. This critical time window could be as short as roughly one year if MAU grows 2x in a year, regardless of whether the exponent
I like the way you suggest the direction of research, and I'm really keen to contribute. I think this is a great idea, and I'll put this into my TODO list and share any observation I found.
Agree, that's a great point as well.
I agree that reducing message size or the number of stored messages could save us more space, but mathematically, it will only reduce a constant factor multiplying in our formula. It won't change the exponent 'r', according to my understanding of the 'Power Law'. I had real-world experience working with the 'Power Law' in other products with over 10M visitors per month, and I think my understanding of the 'Power Law' is practical. The exponent 'r' is not determined by the actual size of the messages; it's defined by the "density" of the network, in other words, it's determined by how likely a user is going to chat with someone they are less familiar with.
Agree, I'll also investigate and share my findings. |
It's current a limit (3.5GB, not 3GB) on the sqlite max page count set at startup
This is intuitive to me, though I'd offer a slightly different explanation: when the network is small, new users add both a source of new messages, but also a positive feedback effect on existing users who have a new person to message. The latter effect gets smaller as the network gets bigger, though, because my time spent sending messages is closer to saturated (and so new contacts just squeezing out how much I message other contacts). I'd expect the effects to look more quadratic at early stages and trend towards linear as the network grows. |
Agree, this could happen in payment network as well. |
The ORC-8 The Session Network Token suggests the potential to save storage by removing the legacy Oxen chain, thereby freeing up more space for Session messages. This discussion aims to provide theoretical analysis to aid the development team in making a balanced decision. It is posted as a separate issue to prevent hijacking the original thread with off-topic discussions.
TL;DR: Saving storage by removing the legacy Oxen chain isn't as efficient a strategy as it first appears.
According to the official guidelines [1] the minimum storage size requirement for a service node is 40 GB. The main usage of storage is the Oxen chain and the Session message database. Removing the Oxen chain could free up about 20 GB, accounts for about 50% of total space. Although this seems significant at first glance, upon closer inspection, it may not be as critical as it appears.
There are approximately 700k Monthly Active Users (MAU) at present, or about 2k biweekly users per 'fat' swarm. Storage server stats log shows approximately 500 MB of user message storage on a service node with a 14 days TTL. Compared to the total disk space, this represents only a tiny fraction (1.25%). This suggests that storage size won't become a bottleneck for Session message storage in the short term.
But what about the mid to long term? Over the past few years, Session's user base has grown rapidly. We might want to assume a potential exponential growth rate for the next few years, an assumption that is both simple and practical, based on the growth history of other successful messenger apps.
The tricky part is that the growth rate of message storage size is unknown, but we can make some assumptions.
One naive assumption is that message storage size is proportional to the number of users, serving as a lower bound.
A more aggressive assumption is that message storage size grows much faster, proportionally to the square of the number of users. In real-world scenarios, users are often divided into many small 'villages,' each with a high internal connection density. Meanwhile, connections between users from different villages are fewer. Within 'villages' of
village_size
users, there arevillage_size*(village_size-1)/2
pairs of connections at most, resulting in an O(village_size^2) total connection count. When we aggregate all the villages, we get a network that is locally dense and globally sparse, where the number of connections is approximately O(user_count^2) with an extremely small constant factor like 0.0025. Additionally, if we assume message_storage_size ~ O(message_count) ~ O(connection_count), then we conclude that message storage size grows proportionally to the square of the number of users.Although the square assumption is somewhat arbitrary, it isn't baseless. Many networks exhibit a "Power Law" pattern [2], where in the case of social connections, the exponent might be a number between 1 and 2. In other words, message storage size might grow with O(user_count^r), where r is a number between 1 and 2. Thus, we use O(user_count^2) as an educated guess for the upper bound growth rate.
Lower Bound Estimation of Message Storage Size: Estimated Biweekly Message Storage Size Per ‘Fat’ Swarm Under Linear Assumption
Upper Bound Estimation of Message Storage Size: Estimated Biweekly Message Storage Size Per ‘Fat’ Swarm Under Square Assumption
If we look at the above table, when the network is small, message storage size is also small compared to the Oxen chain size. When the network is large, message storage size greatly exceeds the magnitude of the Oxen chain size. For example, under the square assumption, if the Session user base grows 4x to 2.8M, we will need 8GB of message storage size. It might make sense to remove the legacy Oxen chain to free up ~20GB. However, after another 2x growth to 5.6M MAU, we suddenly need 32GB of storage for messages, at which point saving space from the Oxen chain is no longer sufficient, and operators have to upgrade hardware eventually. In other words, there is a very short critical time window where freeing up storage space by deleting the Oxen chain makes sense. Before that critical time window, the message storage requirement is too small to worry about yet. After that critical time window, the message storage requirement suddenly becomes so large that removing the Oxen chain contributes very little to it. This critical time window could be as short as one year if the network grows 2x in a year.
Regardless of whether we assume linear growth of message storage requirement (lower bound), square growth of message storage requirement (upper bound), or any other growth rate in between, the conclusion doesn't change much. As a legacy chain, the Oxen chain storage size will likely stabilize, but as a rapidly growing social network, Session's message storage requirement will increase rapidly.
On the other hand, storage is relatively cheap compared to other computational resources. In the history of the IT industry, storage costs have consistently decreased, so it shouldn't be a bottleneck in our use case from a budget management perspective.
Note: I have spent time analyzing the historical status logs of the Oxen storage server, and it turns out that the relationship between user count and message storage size is quite complicated. The above numbers are theoretical and are used for ease of explanation rather than accurate prediction.
[1] https://docs.oxen.io/oxen-docs/using-the-oxen-blockchain/oxen-service-node-guides/full-service-node-setup-guide
[2] https://en.wikipedia.org/wiki/Power_law
The text was updated successfully, but these errors were encountered: