Skip to content

Conversation

@SolidWallOfCode
Copy link
Member

@SolidWallOfCode SolidWallOfCode commented Sep 9, 2018

This is a class I have intended on build for years, it is based on a class of the same name and similar function I wrote for Network Geographics. It was very handy but it required using Boost.MultiIndex which clearly wasn't going to work in the TS codebase. But now I have the infrastructure pieces needed to make the implementation simple, along with some nice C++ eleventy features. I ended up writing this now because it will be very useful in the YAML conversion of WCCP.

Dependent on #4223.

Waiting on other infrastructure PRs.

@SolidWallOfCode SolidWallOfCode added this to the 9.0.0 milestone Sep 9, 2018
@SolidWallOfCode SolidWallOfCode self-assigned this Sep 9, 2018
@SolidWallOfCode SolidWallOfCode force-pushed the lexicon branch 2 times, most recently from 57360fb to 48281f6 Compare September 10, 2018 02:12
@SolidWallOfCode
Copy link
Member Author

Current issue is a bug in clang interacting badly with the GCC headers, making std::variant essentially unusable in that situation. I'll need to rethink - I would rather not write my own version of std::variant.

Description
===========

A :class:`Lexicon` is templated by the enumeration class. It contains a set of **definitions**, each
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's more clear to say "a class template Lexicon takes an enumeration type as its only parameter".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it around to specify the type should be numeric, rather than an enumeration specifically.

name, the primary name or any secondary name will yield the same enumeration.

Defaults can be set so that any name or enumeration that does not match a definition yields the
default value. The default can be explicit or it can be a handler function. The handler functions as
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as -> acts as

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

default value. The default can be explicit or it can be a handler function. The handler functions as
an internal catch for undefined conversions and is generally used to log the failure while returning
a default. It could be used to compute a default but for names, this is problematic due to memory
ownership and thread safety issues.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Highly non-obvious.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reworded a bit, and a foot note added.

Assume the enumeration is

.. literalinclude:: ../../../lib/ts/unit-tests/test_Lexicon.cc
:lines: 30
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're going to do this, I think you need a "WARNING: Do not even BREATH on this file!" comment in test_Lexicon.cc.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code this depends on is marked off as being part of the documentation. I really want to be able to know the example code compiles and runs.

Copy link
Contributor

@ywkaras ywkaras Sep 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is for maintainers of test_Lexicon.cc to know that they need to change the documentation as well if they add or remove lines.

Maybe you should put code like this in test_Lexicon.cc: https://godbolt.org/z/b1H8tA

Insert the element :arg:`v` into the container.
Insert the element :arg:`v` into the container. If there are already elements with the same
key, :arg:`v` is inserted after these elements.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we call it IntrusiveHashMultimap for consistency with STL naming?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Different PR, but no. Functionality is important, naming less so.

template <typename Transform>
void
ATSHash64FNV1a::update(const void *data, size_t len, Transform xfrm)
ATSHash64FNV1a::update(const void *data, size_t len, Transform xf)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why in this case are we using 'Transform xf' instead of 'const Transform &xf'?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oversight, I think the const& is a better choice in case Transform has internal state.

*/
template <typename E> class Lexicon
{
using self_type = Lexicon; ///< Self reference type.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why use alias that's longer than what it is aliasing?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consistency, clarity. The point of using self_type is to be clear in declaration that a method is returning an instance of itself.

/// Storage for names.
MemArena _arena{1024};
/// Access by name.
IntrusiveHashMap<typename Item::NameLinkage> _by_name;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's been shown that, for small lookup tables, an array/vector with linear search is fastest. In any case, it's simpler and any difference in speed will be small for a small lookup table.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but I don't agree that in this case it would be simpler, in terms of the implementation, in Lexicon. The mapping also gives us nice locality of equivalent keys.

std::string_view localize(std::string_view name);

/// Storage for names.
MemArena _arena{1024};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't the string names probably be string literals? If so, it seems a shame to copy them rather than just use the passed string_view.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually, but that can't be guaranteed. Because it's a MemArena the amortized cost of the copy is small and I think that's worth paying to avoid any (likely complex) rules about input string lifetimes, or restricting Lexicon to only handling constant strings.

for (decltype(len) j = 0; j < len; ++j) {
s[j] = char_gen(randu);
}
s[len] = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the nul termination?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure, I suspect it's a left over from a prior iteration when it used a C-string style interface instead of string_view.

@maskit
Copy link
Member

maskit commented Sep 11, 2018

Just out of curiosity, is there any mechanism that ensure all enum items are in the lexicon? Let's say Example::Value_4 was defined but there's no entry for it in the lexicon. What will happen if I access exnames[Example::Value_4]?

@maskit
Copy link
Member

maskit commented Sep 11, 2018

Ah, I found set_default.

@maskit
Copy link
Member

maskit commented Sep 11, 2018

But I'm still curious how we can avoid the mismatch. If it was a switch-case, some compilers warn it if the switch-case doesn't have default.

Http2DebugNames::get_settings_param_name(uint16_t id)
{
switch (id) {
case HTTP2_SETTINGS_HEADER_TABLE_SIZE:
return "HEADER_TABLE_SIZE";
case HTTP2_SETTINGS_ENABLE_PUSH:
return "ENABLE_PUSH";
case HTTP2_SETTINGS_MAX_CONCURRENT_STREAMS:
return "MAX_CONCURRENT_STREAMS";
case HTTP2_SETTINGS_INITIAL_WINDOW_SIZE:
return "INITIAL_WINDOW_SIZE";
case HTTP2_SETTINGS_MAX_FRAME_SIZE:
return "MAX_FRAME_SIZE";
case HTTP2_SETTINGS_MAX_HEADER_LIST_SIZE:
return "MAX_HEADER_LIST_SIZE";
}
return "UNKNOWN";
}

@SolidWallOfCode
Copy link
Member Author

There isn't any way, at compile time, to verify all of the enumeration values are in the Lexicon. If that is really important then one could use set_default with a function to log when a missing value is hit. There is also the problem that in many cases we have an extra value that shouldn't be there (e.g. LAST_VALUE). There are also use cases where the type is an integral type and not actually an enumeration. Nothing in the Lexicon API requires an enumeration.

I could look at creating a constructor like

template < size_t N > Lexicon(const std::array<Definition, N> &defines);

and then the initialization could be

Lexicon<Ex> ExNames(std::array<Lexicon<Ex>::Definition, Ex::LAST_VALUE> { ...});

In this case, if there are not exactly Ex::LAST_VALUE initializers for the array, it is a compile error.

@ywkaras
Copy link
Contributor

ywkaras commented Sep 11, 2018

A possible alternative is a macro-based approach: https://wkaras.webs.com/itemlist/itemlist.html

@maskit
Copy link
Member

maskit commented Sep 12, 2018

The mismatch would not be a big issue. I just wanted to know if it's possible.

Copy link
Member

@maskit maskit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although I don't understand the implementation details but I'm ok with this. Only concern I have is maintainability. The code is much longer and more complicated than the switch-case example I posted above. I understand this is more general and more powerful, but I would switch back to switch-case way if I got into a trouble with this instead of fixing it.

@SolidWallOfCode
Copy link
Member Author

@maskit Yes, but one of the key things this does the switch does not is provide conversion the other way. This is really quite useful for my work on configuration parsing and YAML, especially when the enumeration set is dynamic. It's just easier to update the list, and I have found in the past the secondary names are a nice touch in that area. As noted, this is a reconstruction of a class I used in my pre-TS work and I have missed it all these years. It's a joy to finally be at a point I can bring it back.

@SolidWallOfCode
Copy link
Member Author

I added and documented a mechanism to do the size verification at compile time. Not quite as clunky as directly using a std::array.

or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
regarding copyright ownership. The ASF licenses this fileN
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is fileN really right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's actually in a different PR this depends on. #4223.

@SolidWallOfCode
Copy link
Member Author

Leaving this until more of the infrastructure is in place.

@zwoop zwoop removed this from the 9.0.0 milestone Jan 7, 2019
@SolidWallOfCode SolidWallOfCode deleted the lexicon branch July 27, 2021 15:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants