Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Shamap doc etc. #3300

Closed
wants to merge 2 commits into from
Closed

Conversation

HowardHinnant
Copy link
Contributor

@HowardHinnant HowardHinnant commented Mar 13, 2020

This contains two parts:

  1. Update the README in the SHAMap module.
  2. Rename canonicalize into two separate functions.

In the event that duplicate data is discovered, the latter renaming clarifies which duplicate gets replaced.

Addresses https://ripplelabs.atlassian.net/browse/RIPD-187

@carlhua carlhua requested a review from nbougalis March 13, 2020 18:59

The SHAMap is a Merkle tree (http://en.wikipedia.org/wiki/Merkle_tree).
The SHAMap is also a radix tree of radix 16
(http://en.wikipedia.org/wiki/Radix_tree).

*We need some kind of sensible summary of the SHAMap here.*
A SHAMap is a tree with two node types:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
A SHAMap is a tree with two node types:
A SHAMap is a trie with two node types:

I'm not sure which nomenclature is better, but as far as I know a SHAmap has more characteristics of a trie than a tree.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be trie 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll make this substitution throughout the doc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure which nomenclature is better, but as far as I know a SHAmap has more characteristics of a trie than a tree.

@carlhua @MarkusTeufelberger @HowardHinnant
A trie is a type of tree. I think tree is actually more appropriate and correct here, because a SHAMap is a tree that has properties of a trie as well as a merkle tree. If we say a SHAMap is a trie, we are ignoring the merkle tree aspect of the SHAMap. I vote for tree here, and throughout the document, unless we are specifically talking about the aspects of the SHAMap that are a trie.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which property of a merkle tree are you referring to?

Copy link
Contributor

@cjcobb23 cjcobb23 Mar 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hashing. Each node has a hash that is derived from the hashes of its children.

transactions to the last closed ledger. To do so we'd make a mutable snapshot
of the state tree and then start applying transactions to it. Because the
snapshot is mutable, changes to nodes in the snapshot will not affect nodes in
other SHAMAps.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
other SHAMAps.
other SHAMaps.

Copy link
Contributor

@carlhua carlhua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update @HowardHinnant . I think the new information is extremely valuable in understanding how SHAMap works. I have left some comments asking for more details. Aside from those comments, I have one summary comment, and maybe this does not belong in shamap but regardless I think its important to consider: It was trivial to me why shamap is designed this way because I had previously stuided Etherem's implementation. This can be rather confusing for first time readers since I remember that it took me a while to understand the rationale on the design. I think it would be tremendously helpful if we can add two things to the documentation:

  1. The design rationale - why do we need shamap and how does it save us space? I think the "why" is important here.
  2. A diagram that depicts the ledger transformation from N to N+1 with details break down similiar to something like this https://ethereum.stackexchange.com/questions/51095/trie-radix-trie-patricia-and-merkle-patricia-trie

I understand 2 might not belong in here, but I think this would really help us understand how shamap works @cjcobb23

src/ripple/shamap/README.md Show resolved Hide resolved
src/ripple/shamap/README.md Outdated Show resolved Hide resolved
src/ripple/shamap/README.md Show resolved Hide resolved
src/ripple/shamap/README.md Show resolved Hide resolved
src/ripple/shamap/README.md Outdated Show resolved Hide resolved
@HowardHinnant
Copy link
Contributor Author

  1. The design rationale - why do we need shamap and how does it save us space? I think the "why" is important here.

I've added a little bit more to the introduction about this.

  1. A diagram that depicts the ledger transformation from N to N+1 with details break down similiar to something like this https://ethereum.stackexchange.com/questions/51095/trie-radix-trie-patricia-and-merkle-patricia-trie

I understand 2 might not belong in here, but I think this would really help us understand how shamap works @cjcobb23

This would be the first image file we place into the rippled repository. To-date, we don't have supporting .png or .jpg to the best of my knowledge.

The SHAMap is less complex than that depicted at https://ethereum.stackexchange.com/questions/51095/trie-radix-trie-patricia-and-merkle-patricia-trie . It has only two kinds of nodes:

  1. The inner node which has the 16 branches.
  2. The leaf node which has the data.

This makes traversal simple, but we sacrifice size because common prefixes aren't compacted into a single node as was done for the Ethereum trie. We did have such an optimization coded up at one time. But we decided not to deploy it. It was known as SHAMapV2.

@codecov-io
Copy link

codecov-io commented Mar 16, 2020

Codecov Report

Merging #3300 into develop will decrease coverage by 0.00%.
The diff coverage is 70.83%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #3300      +/-   ##
===========================================
- Coverage    70.62%   70.62%   -0.01%     
===========================================
  Files          676      676              
  Lines        54340    54341       +1     
===========================================
- Hits         38377    38376       -1     
- Misses       15963    15965       +2     
Impacted Files Coverage Δ
src/ripple/app/ledger/impl/LedgerMaster.cpp 42.07% <0.00%> (ø)
src/ripple/app/misc/NetworkOPs.cpp 64.00% <0.00%> (ø)
src/ripple/nodestore/impl/DatabaseShardImp.cpp 0.63% <0.00%> (ø)
src/ripple/app/ledger/impl/TransactionMaster.cpp 63.63% <33.33%> (ø)
src/ripple/nodestore/impl/Database.cpp 59.45% <50.00%> (ø)
src/ripple/app/ledger/LedgerHistory.cpp 50.93% <83.33%> (ø)
src/ripple/app/ledger/Ledger.cpp 79.26% <100.00%> (ø)
src/ripple/basics/TaggedCache.h 77.90% <100.00%> (+0.24%) ⬆️
src/ripple/nodestore/impl/DatabaseNodeImp.cpp 82.60% <100.00%> (ø)
src/ripple/nodestore/impl/DatabaseRotatingImp.cpp 73.46% <100.00%> (ø)
... and 4 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4e3dc0e...b4bf1f7. Read the comment docs.

Copy link
Contributor

@cjcobb23 cjcobb23 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. The design rationale - why do we need shamap and how does it save us space? I think the "why" is important here.

I've added a little bit more to the introduction about this.

  1. A diagram that depicts the ledger transformation from N to N+1 with details break down similiar to something like this https://ethereum.stackexchange.com/questions/51095/trie-radix-trie-patricia-and-merkle-patricia-trie

I understand 2 might not belong in here, but I think this would really help us understand how shamap works @cjcobb23

This would be the first image file we place into the rippled repository. To-date, we don't have supporting .png or .jpg to the best of my knowledge.

The SHAMap is less complex than that depicted at https://ethereum.stackexchange.com/questions/51095/trie-radix-trie-patricia-and-merkle-patricia-trie . It has only two kinds of nodes:

1. The inner node which has the 16 branches.

2. The leaf node which has the data.

This makes traversal simple, but we sacrifice size because common prefixes aren't compacted into a single node as was done for the Ethereum trie. We did have such an optimization coded up at one time. But we decided not to deploy it. It was known as SHAMapV2.

@HowardHinnant @carlhua A diagram I would like to see is one that shows how nodes are shareable between SHAMaps and now the same node is reachable from different roots. Ledger N and ledger N+1 would have many of the same nodes (ledger objects that were not modified), and I assume these nodes are only written once in the database, and are reachable from more than one root. Whereas nodes that were modified are only reachable from the account state root of ledger N+1, and result in a new node (as well as its ancestors, whose hashes have changed) being written to the database.

Let me know if this request makes sense. You could probably just do this with ASCII art if you wanted, so we don't need to insert an image file.

Overall though, great job on this; it will be very useful.


## SHAMap Thread Safety ##
When a SHAMap decides that it is safe to share a node of its own, it sets the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should add a description of what it means to "share" a node. When I read this sentence, I was confused about what it meant and subsequently skimmed through the entire document looking for what it meant to "share" a node. We use "share" in various places throughout the document, so I think it would be worth adding a section about what that means.


When a SHAMap gets a copy of a node from elsewhere, and must have a private
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does it mean for a SHAMap to get a copy of a node from elsewhere? When would this happen and where is the elsewhere? Why would the SHAMap need a private copy? I think it would be useful to add an example here.

Both `unshare()` and `flushDirty` walk the SHAMap by calling
`SHAMap::walkSubTree`. As `unshare()` walks the trie, nodes are not written to
the database, and as `flushDirty` walks the trie nodes are written to the
database. `walkSubTree` visits every node in the trie. This process must ensure
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit confused why walkSubTree would visit every node in the trie. My understanding is nodes that have not been modified do not need to be written to the database again. I was under the impression that walkSubTree just walked the dirty branches (the branches leading to modified/deleted/inserted nodes).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One ledger -> one SHAMap (actually one for transactions and one for account states). When the ledger closes, the entire SHAMap is written to the database. I had a sentence in there that said only some nodes are written at this time. I've removed that sentence. It now reads:

When consensus is reached, the ledger is closed. As part of this process, the
SHAMap is stored to the database by calling SHAMap::flushDirty.

the database, and as `flushDirty` walks the trie nodes are written to the
database. `walkSubTree` visits every node in the trie. This process must ensure
that each node is only owned by this trie, and so performs a COW operation
(`unshare()`) as it walks each node (from the root down). This is done by the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so performs a COW operation (unshare()) as it walks each node (from the root down).

This is a bit confusing to me, because it seems to be saying that walkSubTree calls unshare, but when I read the code, I don't see walkSubTree calling unshare() anywhere. Am I missing something in the code, or am I misunderstanding this sentence?

this time, and the child is reassigned back into the parent inner node just in
case the COW operation created a new pointer to this leaf node.

After processing each node, the node is then marked as sharable again by setting
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, I think we need a section talking about the meaning of "shareable", as the concept is a bit unclear to me while reading this doc.

The calls to `canonicalize()` make sure that if the resulting node is already in
the SHAMap, node `TreeNodeCache` or database, then we don't create duplicates.

On retrieving existing nodes, the copy in the database has top priority. If
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused here, as this seems to be in contradiction with the description of fetchExternalNodeNT. In that description, it says we look for the node in the cache first, and then in the database if the node is not in the cache. But here it says the copy in the database has priority over the copy in the cache. Why would we look in the database if the node is already in the cache?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The text here was in error, both with what I added, and part of the existing text is obsolete, not describing what canonicalize actually does. Greatly simplified.


As a depth-first walk of a SHAMap is performed, if an inner node answers true to
`isFullBelow()` then it is known that none of this node's children are missing
nodes, and thus that subtree does not need to be walked. These nodes are stored
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we need to walk a subtree where all children are present? Maybe an example would be useful

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fullbelow cache is only used for the missing node list. It exists only to optimize the "get missing nodes" operations. When all children are present, none of them are missing.

SHAMap, the copy in the `TreeNodeCache` prevails and replaces that in the
SHAMap.

On inserting new nodes (such as new transactions or account states), the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit confused here. When we insert a new node, how could there be a copy of that node, since the node we are inserting is new?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rewritten. This part was incorrect.

So, if all expected data is always present, the MissingNodeHandler should
never be executed.
belongs to a historic ledger with the given (non-zero, unsharable) sequence
number. So, if all expected data is always present, the MissingNodeHandler
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth adding a description on what the MissingNodeHandler is and does.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this section is obsolete too, rewritting...

@HowardHinnant
Copy link
Contributor Author

I've pushed a new commit to address almost all of the comments.

I don't know how to draw the requested diagram, even on paper. I'd like to see one too.


## Sequence numbers ##

Both SHAMap's and their nodes carry a sequence number. This is simply an
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sometimes SHAMaps is pluralized with an apostrophe (which I don't favor) and sometimes not. I think SHAMap (and other type names) should appear in code font too. I would pluralize it as SHAMaps.


The nodes of a SHAMap have their own copy of a sequence number. If the SHAMap
is not immutable, meaning it can change, then all of its nodes must have the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about "is mutable" instead of "is not immutable"?


The nodes of a SHAMap have their own copy of a sequence number. If the SHAMap
is not immutable, meaning it can change, then all of its nodes must have the
same sequence number as the SHAMap itself. This enforces an invariant that none
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this because a SHAMap only ever sees nodes from its "parent", and never a sibling (which might have the same sequence number)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SHAMaps don't have parents and siblings. The node sequence number is more or less just a sharable flag. 0 means sharable. Equal to the SHAMap's sequence number means not sharable.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no explicit parent, which is why I put it in quotes, but the majority of sites constructing a SHAMap use snapShot, where I consider this to be an implicit "parent", instead of a constructor. Is snapShot the only way that two SHAMaps can share a node? There doesn't seem to be any method to add a SHAMapAbstractNode constructed from outside the SHAMap, so there would be no way to pass a node from one SHAMap to another; the only way to share a node is for the second SHAMap to inherit it from the first because it was constructed via second = first.snapShot(isMutable).

snapshot. If the SHAMap snapshot is mutable then any of the nodes that might be
modified must be copied before they are placed in the mutable map.

A new SHAMap is created with each new ledger round. Transactions not executed
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing this paragraph is specifically talking about the account state SHAMap. We're starting to slip from talking about "a SHAMap" into "the SHAMap".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe a ledger contains two SHAMaps: one for transactions, one for account states.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but there are other SHAMaps outside of ledgers, created independently of consensus rounds.

snapshot. If the SHAMap snapshot is mutable then any of the nodes that might
be modified must be copied before they are placed in the mutable map.
snapshot. If the SHAMap snapshot is mutable then any of the nodes that might be
modified must be copied before they are placed in the mutable map.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see later that there is mention of copy-on-write, but I'm not sure why it doesn't happen here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've clarified the wording to say that it does happen here.

through the inner nodes, looking for a leaf node along a path in the trie
designated by a `uint256`.

As one walks the stack, one can *optionally* keep a stack of nodes that one has
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"walks the stack" -> "walks the trie"?

with the high 4 bits of the first byte indicating which child of the root is
chosen, the lower 4 bits of the first byte indicating the next child chosen, and
so on. See `SHAMapNodeID::selectBranch` for details of how a `SHAMapNodeID`
selects a "branch" (child) by combing a path with a depth.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My suggestion:

To assist in walking the trie, SHAMap::walkTowardsKey uses a SHAMapNodeID that identifies a node by its path from the root and its depth in the trie. The path is just a "list" of numbers, each in the range [0 .. 15], depicting which child was chosen at each node starting from the root. Each choice is represented by 4 bits, and then packed in sequence into a uint256 (such that the longest path possible has 256 / 4 = 64 steps). The high 4 bits of the first byte identify which child of the root is chosen, the lower 4 bits of the first byte identify the child of that node, and so on. The SHAMapNodeID identifying the root node has an ID of 0 and a depth of 0. See SHAMapNodeID::selectBranch for details of how a SHAMapNodeID selects a "branch" (child) by indexing into its path with its depth.

2. In the node cache.
3. In the database.

If the node is found in one of the second-two places, then it is installed into
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about "If the node is not found in the trie, then"

`snapShot`.

Sequence numbers are used to further customize the node ownership strategy. See
the section on sequence numbers for details on sequence numbers.

## SHAMap Types ##
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about renaming this section "Mutability"?

entirely evaporates when it is destroyed). Nodes, once introduced to the
immutable SHAMap, also never change their location in memory. So nodes in
an immutable SHAMap can be handled using raw pointers (if you're careful).
So, somewhat counter-intuitively, an immutable SHAMap may grow as new nodes are
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a new node is introduced, then its parent's digest must change, and so on up all the way to the root. Does that mean that when a node is introduced, the SHAMap points to a new root?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For immutable SHAMaps, the hashes never change. When a new node is added, it is because the node exists, it is just missing from the trie. The node is not a new account state or transaction. It will be found in a cache, local database or on the p2p network.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah! This might be one of the big a-ha moments I needed for the rest of this to click. A mutable SHAMap can add new nodes, while an immutable SHAMap can only fill in blanks. It wasn't clear to me before that an immutable SHAMap could have blanks (i.e. missing nodes). It might be explained further down, but I stopped reading once I was too confused. Maybe we can introduce this idea in the section where we explain mutability?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the section titled Mutability just below the node diagram (all committed just an hour ago). If that doesn't do it for you, I'm open to suggestions. And thanks for the other wording suggestions you made. They should all be included in this update.

@cjcobb23
Copy link
Contributor

I don't know how to draw the requested diagram, even on paper. I'd like to see one too.

@HowardHinnant this is roughly what I was thinking; I threw this together quickly so it might need more detail. Let me know if this makes sense. I made it on draw.io.

editable diagram: https://drive.google.com/file/d/1ieBKGxV7le7rj0o3m5PChNjH-GeqdJhp/view?usp=sharing

Let me know if there is any issues with these links; I've never shared an actual diagram from draw.io before.

SHAMapTree

seelabs
seelabs previously approved these changes Mar 23, 2020
Copy link
Collaborator

@seelabs seelabs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Love it. Left a couple nits, but nothing that prevents me from approving now.

(http://en.wikipedia.org/wiki/Radix_tree).

*We need some kind of sensible summary of the SHAMap here.*
The Merkle trie data structure is important because subtrees and even the entire
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume you mean "Merkle tree" here. I understand that a SHAMap is a cross between a Merkle tree and a radix tree, but I've never heard a SHAMap referred to as a "Merkle trie", and I'd rather not introduce the new term. I don't object to referring to the SHAMap as a "trie" when emphasizing the radix trie attributes of the SHAMap, but here we're referring to the "Merkel tree" part, and "tree" is a less confusing term here.

I also don't want to bikeshed this too much. If you like "trie" better here I'm OK with that.

(http://en.wikipedia.org/wiki/Radix_tree).

*We need some kind of sensible summary of the SHAMap here.*
The Merkle trie data structure is important because subtrees and even the entire
trie can be compared with other trees in O(1) time by simply comparing the hashes.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mixed trie and tree. I'd say the "entire tree can be compared with other trees...".


For each inner node encountered (starting with the root node), each of the
children are inspected (from 1 to 16). For each child, if it has a non-zero
sequence number (unshareable), the child is first COW'd. Then if the child is
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @thejohnfreeman here. To me "COW" implies lazy copying - i.e. the copy happens when we actually try to modify the data. I think the copy here happens non-lazily - we make a copy even if the child is never modified. I'd just say "copied" or "cloned" instead of "COW'd".

deleting a node from the trie, the stack is handy for repairing invariants in
the trie after the deletion.

To assist in walking the trie a `SHAMapNodeID` is used to keep a record of the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe for another PR we could go through this readme and add some comments to the code. This paragraph is a good example of something I think would be valuable to add to the code (maybe in condensed form referring back to this readme, but right now there's nothing at all).

@thejohnfreeman
Copy link
Collaborator

For the diagram, I'd like to suggest PlantUML, which Doxygen can render into a nice image, but which is committed as human-editable text.

Brushing up on my computer science terminology, it seems that SHAMapV2 would have been a radix trie, but SHAMap is just a trie or prefix tree. We can avoid the whole "trie" confusion if we just call it a "prefix tree" and a "Merkle tree". Trees all around.

With regards to sharing nodes, we might want to mention the term persistent data structure. That will help readers look up other references for guidance.

@cjcobb23
Copy link
Contributor

For the diagram, I'd like to suggest PlantUML, which Doxygen can render into a nice image, but which is committed as human-editable text.

@thejohnfreeman I generally like plantuml. However, sometimes it can be difficult to control the layout of the diagrams, to place components in very specific places and align things correctly. So if the diagram can be ported to plantuml, great! But I don't want to use plantuml and sacrifice clarity. Just my two cents.

Brushing up on my computer science terminology, it seems that SHAMapV2 would have been a radix trie, but SHAMap is just a trie or prefix tree. We can avoid the whole "trie" confusion if we just call it a "prefix tree" and a "Merkle tree". Trees all around.

Totally agree on this point here. The current SHAMap is not a radix trie, but is really just a regular trie, aka prefix tree. I feel we are running into confusion trying to use these specific computer science theory terms when most of us are engineers, not theroists. I vote for keeping it simple and using "tree" as much as possible, and using "prefix tree" or "Merkle tree" when we are trying to be specific. I think we can all agree that this data structure is definitely some type of tree

@thejohnfreeman
Copy link
Collaborator

However, sometimes it can be difficult to control the layout of the diagrams, to place components in very specific places and align things correctly.

@cjcobb23 Excellent point. Doxygen supports DOT graphs as well, which I think gives you what you want.

@HowardHinnant
Copy link
Contributor Author

Rebased to 1.5.0.

@HowardHinnant HowardHinnant requested a review from carlhua March 31, 2020 17:55
cjcobb23
cjcobb23 previously approved these changes Apr 1, 2020
Copy link
Contributor

@cjcobb23 cjcobb23 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Thanks for putting in the work to update this documentation. It will be super useful over time.

carlhua
carlhua previously approved these changes Apr 1, 2020
Copy link
Contributor

@carlhua carlhua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@HowardHinnant HowardHinnant added the Ready to merge *PR author* thinks it's ready to merge. Has passed code review. Perf sign-off may still be required. label Apr 1, 2020
1.  canonicalize_replace_cache
2.  canonicalize_replace_client

Now it is clear at the call site that if there are
duplicate copies of the data between the cache and
the caller, which copy gets replaced.

Additionally data parameter is now const-correct.
If it is not going to be replaced (canonicalize_replace_cache),
then the shared_ptr to the client data is const.
@HowardHinnant HowardHinnant dismissed stale reviews from carlhua and cjcobb23 via aea7b87 April 1, 2020 16:38
@HowardHinnant
Copy link
Contributor Author

Squashed.

This was referenced Apr 7, 2020
@manojsdoshi manojsdoshi mentioned this pull request Apr 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Ready to merge *PR author* thinks it's ready to merge. Has passed code review. Perf sign-off may still be required.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants