-
Notifications
You must be signed in to change notification settings - Fork 1
short names in collection: smaller state.json #167
base: release/8.8
Are you sure you want to change the base?
Conversation
…e in the end so that other nodes get a single refresh message
…e in the end so that other nodes get a single refresh message
…cene-solr into noble/prsCollCreationBug
…rom overseer. There is a follow up PR to avoid updates to state.json for all updates
Some thoughts for compact versions
Then see if we can reduce overall size of state.json to ~1 mb for 4096 shards and NRT+Pull replica. With indent size to 0, 4096 shards with NRT takes Would be good to make one page writeup. |
My two cents is that we probably don't need to go this far in terms of changes. With compression we can pretty easily reduce the size of state.json to something reasonable and it is a simpler change in my view. What is the case for making these more invasive changes if compression solves the size over the network issue? |
Agree compression should be enough! The only thing is we will lose text format, which is very helpful to debug any issue. Sometime we may need to edit state.json, which is very convenient with zk-shell. |
The changes in this PR are 100% backward compatible. Solr does not really care if the name of shard is
useful . but, this requires changes in reading
that's mostly achieved in this . we need to prefix the collection name to avoid collision of names in a node
Yes. a combination compression and this can take us pretty far. The next level of optimization has to be done on memory footprint of parsing and storing this object in memory. We should avoid using the |
we need compact format to save less data on zk.while reading the data it should remain as it is, what we have today. Verbose names are very useful in logging and debugging purpose. That means we can update replica and slice classes while serializing /de the state.json. Having said that, if we can't compact 50% or so then there is not much value. As compression works very well with state.jjson. my only concern is its not text format. And we look state.json file every day. Go to zk-shelll and look various data. |
Yes. But do we ever log the replica name anywhere? even if we do is
just by shortening the replica name we are saving >55% in PRS states |
|
We can only reduce the PRS state size if only we reduce the size of the replica name (core node name) |
I would leave as it is unless we see some major gain! it is more readable. |
PoC: do not merge
compact=true
while creating a collection. This is an opt-in featuresample collection with 10 shard replication factor =1
there is a 10% savings in state.json
and PRS data has a savings of > 55%
sample
state.json
PRS data