-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multi tree column option #232
Conversation
…e tree insertion work with commit queue.
…sed node sharing.
…built using that tree by deferring the removal.
I started looking at the PR, but it is big a big one (I did not look at the bench part yet), so I am summing up here my initial questions.
|
Thanks for the comment. Will try and answer.
The TreeReader stuff is there because we’re using Value table addresses directly as node addresses. Hence we need some way for a client to signal that it’s using nodes from an existing tree and the Db shouldn’t touch them (move or remove). This is what the TreeReader does.
The reference count table and cache are purely optimizations so yes could be in a separate PR. They only store reference counts > 1 so are expected to be sparse (most nodes have ref count of 1).
Yeah, this is just an optimization so we don’t have to store (and access) ref counts for most of the nodes at all. The first implementation just used the already existing ref counting that Value tables have.
Only the roots have entries in the index table.
When commiting a tree it ‘claims’ value table entries immediately (in commit) which can be done using the table header data and the free list without reading or writing any table data or log overlays. You’re right that the ordered list isn’t actually used yet! That would be used in the future for optimizing trees by laying out nodes contiguously.
Thanks, I’ll have a look at that! |
thanks for the replies,
I don't know if we need this, an error sounds fine to me. But ok, I guess I got it: if user try to access at a given address A and in between it got deleted and used again (very likelly due to the way we use the address back), then it will get an incorrect result but would not know it. I was just thinking it would get an error but I was wrong. I guess the tree viewer make sense (I could think of checking root presence in some cache every time we access node, but it is certainly better to just update an atomic over reader when we drop the tree, and then just return an error to get_node if tree_viewer is deprecated. So for a start I would find it easier to go with a treeviewer that do not defer commit, commit will just make all treeviewer session invalid.
I did not think about it, can be sparse indeed.
I understand that putting the reference count in their own table can be good, but I would have expected the table to get indexed indentically to the values tables, is it?
Then what happens for the reference count?? I am quite sure I don't get the reference count model, and the need to reindex it.
👍 |
Right. The client would have to cope with potentially failed commits? I think the Db would still need to track which TreeReaders were active when a dereference happened.
It uses a hash of the Value table address. This is because only a small proportion of nodes need a reference count. So didn’t want a ref count entry for every node.
Roots act like a standard Hash column entry. They have an index table entry and a Value table entry with potentially ref counting or not depending on if the client code wanted ref counted roots.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I finally get how the rc table works .
I tried to slimdown the comment I did at the time, especially the incorrect ones, but some may have slip through.
admin/src/lib.rs
Outdated
db_options.columns.push(info_column); | ||
} | ||
|
||
let info_column = &mut db_options.columns[1]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could use multitree_bench::INFO_COLUMN rather than 1 and 2 .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
src/db.rs
Outdated
#[derive(Debug, PartialEq, Eq)] | ||
pub enum NodeChange { | ||
/// (address, value, compressed value, compressed) | ||
NewValue(u64, RcValue, RcValue, bool), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't compressed defined per column ? (if so would be better to not have it here)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is whether the specific value was actually compressed or not (eg, if compressed length isn't smaller than the initial value then it is not compressed).
Though values are never compressed right now as discussed elsewhere so I could just remove it for now and store a single value.
src/table.rs
Outdated
@@ -698,6 +749,11 @@ impl ValueTable { | |||
last_removed, | |||
); | |||
self.last_removed.store(next_removed, Ordering::Relaxed); | |||
if let Some(mut free_entries) = free_entries_guard { | |||
let last = free_entries.stack.pop().unwrap(); | |||
assert_eq!(last, last_removed); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it be possible to skip reading last_removed here? (and remove assert_eq).
Otherwise would replace assert_eq by a debug_assert here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this actually used (or is claim_next_free
mainly use)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made it debug_assert. next_free does need to maintain the free list correctly as root nodes can share value tables with child nodes and root nodes still use standard committing (with next_free).
src/table.rs
Outdated
let last_removed = self.last_removed.load(Ordering::Relaxed); | ||
let index = if last_removed != 0 { | ||
let last = free_entries.stack.pop().unwrap(); | ||
assert_eq!(last, last_removed); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert_eq!(last, last_removed); | |
debug_assert!(last == last_removed); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
} | ||
} | ||
} | ||
// TODO: Remove TreeReader from Db. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not too sure about the meaning of this TODO?
Thanks. Will have a look at them and fix/reply. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally this should've been a --tree
option to the strees command, rather than a separate command. I'm fine with this version for now, but consider merging them in the future.
src/column.rs
Outdated
1 + data.len() + num_children as usize * 8 | ||
} | ||
|
||
pub fn pack_node_data(data: Vec<u8>, child_data: Vec<u8>, num_children: u8) -> Vec<u8> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, the optimal layout should be [data][children][num_children]. This way unpacking function can just shrink the input vec and avoid a memmove. Packing would also be appending to data
and returning it.
src/table.rs
Outdated
@@ -1000,6 +1124,11 @@ impl ValueTable { | |||
} | |||
|
|||
pub fn complete_plan(&self, log: &mut LogWriter) -> Result<()> { | |||
let _free_entries_guard = if let Some(free_entries) = &self.free_entries { | |||
Some(free_entries.write()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't read lock suffice here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, read should work here. Done.
src/column.rs
Outdated
let mut data_buf = [0u8; 8]; | ||
data_buf.copy_from_slice(&address.to_le_bytes()); | ||
data.append(&mut data_buf.to_vec()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let mut data_buf = [0u8; 8]; | |
data_buf.copy_from_slice(&address.to_le_bytes()); | |
data.append(&mut data_buf.to_vec()); | |
data.extend_from_slice(&address.to_le_bytes()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
src/db.rs
Outdated
} | ||
if !self.options.columns[col as usize].append_only && external_call { | ||
return Err(Error::InvalidConfiguration( | ||
"get_node can only be called on a column with append_only option.".to_string(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be allowed for now. We won't be using the TreeReader
in substrate initially
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There won’t be any guarantee that the NodeAddress is still valid though, as it might have been removed. The Db won’t even be able to warn if it has happened.
Is there some external guarantee that this won’t happen?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, in substrate there's a higher level state pinning mechanism. It prevents the tree from being deleted while there are active readers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. Should I add an option so the client has to choose to forego any checks/guarantees?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added the option.
old_bytes: usize, | ||
old_id: u64, | ||
new_id: Option<u64>, | ||
) -> Result<()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternative solution is to keep nodes referenced by TreeReaders in a shared memory storage. Once the node that's referenced by any of the live nodes in the cache is deleted from the disk, it is evicted to the memory storage. Something to cosider doing later.
For now I'd agree with @cheme. It looks like we can get away with simply detecting reading a dereferenced tree and producing an error.
@@ -1499,6 +2135,78 @@ impl IndexedChangeSet { | |||
} | |||
*ops += 1; | |||
} | |||
for change in self.node_changes.iter() { | |||
match change { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it possible to move all that code and write_dereference_children_plan
to the column module?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried this but it's a bit messy as the code potentially needs to create TreeReaders and they need DbInner.
} | ||
|
||
#[cfg(test)] | ||
mod test { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be good to get some tests working
I think at the end of the review I got accustomed to this usage, so ok for keeping key then. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like most concerns are addressed.
Maybe the one with passing db two time in function:
I drafted this branch here for it:
https://github.com/MattHalpinParity/parity-db/compare/multi_tree...cheme:parity-db:cheme/multi-tree?expand=1
specifically MattHalpinParity@f2c8dce
but it is a bit verbose (lot of .0 added), maybe just changing proto of
fn fn_name(&self, db: &Arc, .....
to
fn fn_name(db: &Arc, .....
would be better (even if then we have some Self::fn_name calls instead of self.fn_name calls).
(I will be updating to this branch on one of my polkadot sdk, and try to open a draft pr as soon as)
return Err(Error::InvalidValueData) | ||
} | ||
let data_len = data.len() - (child_buf_len + 1); | ||
let mut children = Children::new(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let mut children = Children::new(); | |
let mut children = Children::with_capacity(num_children); |
tier_index: &mut HashMap<usize, usize>, | ||
node_values: &mut Vec<NodeChange>, | ||
data: &mut Vec<u8>, | ||
) -> Result<()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would almost propose to return impl Iterator<Item = Address> by using Iterator::from_fn (not passing data). This way we can keep the pack_data function (next to unpack it makes it easy to check format is consistent).
But I am not sure if it make things easier to follow here.
I'd rather just use |
Issue #199
This introduces a multitree option for columns with related tree commit operations as well as querying functions.