Skip to content

Conversation

@SolidWallOfCode
Copy link
Member

This is a replacement for TSHashTable and eventually TCL hash tables.

@SolidWallOfCode SolidWallOfCode added this to the 9.0.0 milestone Jul 12, 2018
@SolidWallOfCode SolidWallOfCode self-assigned this Jul 12, 2018
CMakeLists.txt Outdated
lib/ts/Tokenizer.h
lib/ts/Trie.h
lib/ts/TsBuffer.h
lib/ts/ts_meta.h
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this for?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There just a few comments in that header file to explain what it does...

It was part of some work that ended up in the bit bucket. This enabled optional types / functions in the template parameter H. The point was to allow overrides of certain deduced types. For example, H::value_type would override the type deduced from the link accessor methods but would not be required. Unfortunately the additional implementation effort to make that work was, IMHO, too much work for not enough return - the accessors in H can handle that, if needed, more easily. Mostly I forgot this was still here, but I do have some other use cases planned and I might well still use it for this work, so I'd rather leave it in. I suppose I could split that out in to a separate commit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I spent my precious time to find places that require this :-)

shinrich
shinrich previously approved these changes Jul 16, 2018
Copy link
Member

@shinrich shinrich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. But agree with @maskit that we may want to remove the ts_meta.h and add it back in a future PR where it is actually used.

@SolidWallOfCode
Copy link
Member Author

If you're going to gang up on me, I'll pull it out, but @shinrich will need to re-approve then.

@bryancall
Copy link
Contributor

bryancall commented Jul 24, 2018

Why are we introducing another hash table? This is not what we talked about doing in Cork. I thought we were going to look at doing a C++ wrapper around Concurrency Kit?

@SolidWallOfCode
Copy link
Member Author

As noted in the initial comment, this is a refresh of TSHashTable, it is not a new hash table. Once this PR is committed I will be replace all instance of TSHashTable with this better implementation. I need features in this update that can't wait on the concurrent hash table work, and further this will be used in places where a concurrent hash table is more than is required. I hope at some point to begin replacing uses of the TCL based hash table with this as well (see #3989 for an example).

@SolidWallOfCode
Copy link
Member Author

Note that #4029 is part of the work of this upgrade, replacing TSHashTable with IntrusiveHashMap.

@bryancall
Copy link
Contributor

bryancall commented Aug 1, 2018

The comment says it is a replacement for TSHashTable, but looking at the commit is an additional hash table. I see that there is another PR that has plans to remove IntrusiveHashMap, but was not referenced here.

What features does this add that you need? I worry that we are going down the path of relying on our own data structures that are immature and not well tested and will result in maintenance and incompatibility between versions.

I looked at the other two PRs that are going to use IntrusiveHashMap and it looks like they will be using them in a concurrent access pattern which mean that this hash table is solving the wrong problem.

@SolidWallOfCode
Copy link
Member Author

Some of the features are:

  • Use of C++ 11 for cleaner and more robust code.
  • Shifting the descriptor structure to require fewer pieces and more in line with other uses in TS.
  • Integration with InstrusiveDList instead of the DLL list, again for cleaner and more consistent code.
  • Remove the obscure and specialized Location class and use the STL iterator range style API.

What I really need this for are other uses, e.g. the work to eliminate tsconfig and use of TCL hashes. However, since it was agreed that we needed to consolidate hash tables, I did this as a TSHashTable replacement that could be use both for that other work and to replace current uses.

For many of the other uses, the intrusive nature of the hash map is critical, as I want to use it in conjunction with MemArena. There is therefore a requirement that all memory for the hash map is allocated from the MemArena. This class makes that possible.

This isn't needed for most (all?) of the replacement uses, but I have suggested use of classes such as std::unordered_map and been strongly rebuffed every single time. If I can't use that, and I have to have this for some specialized uses, I might as well use this class everywhere. I need to also use it to replace TSHashTable since I can't add more hash containers.

I looked at the other two PRs that are going to use IntrusiveHashMap and it looks like they will be using them in a concurrent access pattern which mean that this hash table is solving the wrong problem.

What then should I do? Not replace TSHashTable nor the TCL based ink_hash_table in those uses? Your original complaint, as far as I can tell, was that I hadn't. Therefore I worked on those PRs and linked them here. Or is it that I shouldn't do any work in this area until someone does the full concurrent hash table work?

@bryancall
Copy link
Contributor

bryancall commented Aug 1, 2018

We should use std::unordered_map instead of using TSHashTable or creating another non-concurrent hash table for hostdb and session manager. I don't believe the memory allocation would be any worse then the current TCL hash implementation and would be far easier for people to understand and maintain.

I don't see how this has a relation to getting rid of tsconfig.

@SolidWallOfCode
Copy link
Member Author

For the specific case of the session manager, std::unordered_set doesn't work well because it needs two hash keys to the same objects (by IP address and by FQDN). The intrusive hash maps make that easy because the same record can be in two hash tables at the same time.

For other cases, I have already made the argument that the memory allocation characteristics of std::unordered_set can not be any worse than the TCL ones and been rejected. IIRC this happened at the Cork summit on exactly this topic. When you get a community consensus that your suggestion here is reasonable, then I'll do it. Until then you are suggesting I act against clear community strictures and I'm too tired of all this conflicting nitpicking from various people to deal with that. The approach I am taking here does not make things any worse, as it replaces an existing custom hash table with a very similar one in both API and memory characteristics.

tsconfig is related because it depends on other pieces of infrastructure that in turn depend on intrusive containers. To do a decent job of converting that to YAML, I need to upgrade that infrastructure. This is part of that work (although this has other uses, I like to double up on projects to get more mileage out of the work). In particular it will work much better if I can put all of the memory in a single memory arena, such as MemArena. I can't do that with any STL container.

Copy link
Member

@shinrich shinrich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-approving after the conflict resolution.

@SolidWallOfCode SolidWallOfCode merged commit d6099b3 into apache:master Aug 20, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants