-
Notifications
You must be signed in to change notification settings - Fork 849
cachekey: added --key-type to allow parent selection URL to be modified #5872
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
alficles
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should update either the code or the documentation, at least. My 2¢, anyway.
| const char *end = _key.c_str() + _key.length(); | ||
| TSMLoc new_url_loc; | ||
| if (TS_SUCCESS == TSUrlCreate(_buf, &new_url_loc)) { | ||
| if (TS_PARSE_DONE == TSUrlParse(_buf, new_url_loc, &start, end)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will return an error if the key is not a url. It's rather significantly less picky than it probably ought to be about what constitutes a URL, but if the separator is set to zero and no prefix is used, user input can cause this to fail. Consider these parameters:
@plugin=cachekey.so
@pparam=--key-type=parent_selection_url
@pparam=--separator=
@pparam=--remove-prefix=1
With this request:
http://localhost/://-@@
That creates a URL that doesn't parse. Likewise, using an invalid URL char in the separator will create URLs that don't parse.
@plugin=cachekey.so
@pparam=--key-type=parent_selection_url
@pparam=--separator=|
If this cache key genuinely needs to be a valid URL, we should document that and think carefully about what sorts of request URLs could violate that. It might even be worth encoding the components in something like base64 to prevent user input from affecting the parent selection algorithm here.
If it doesn't need to be a valid URL (and I'm not entirely certain why it would need to be a URL, if you're just hashing it later), then we should avoid parsing it as one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alficles Currently proxy/ParentConsistentHash expects that, cache_info_parent_selection_url, is a URL. The function ParentConsistentHash::getPathHash() uses the path in the url to create a hash. So in this case If it fails to parse the key, an error is logged and the parent selection url is not set and the original request url would be used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What then, should the strategy be around ensuring that a given key will form a URL? The example provided in the docs produces lines that look like:
/hostname/port/path/to/content/querystring
And that's definitely not a URL, although the exceptionally generous parser ATS uses will happily take it. How does a user know that that's a valid URL and what portion of it will be used for parent selection?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alficles The override to the parents selection url needs to be type URL. parent selection then makes a call to ps_url->string_get_ref(&len); which returns a normalized path string with escape sequences which is used to create the PS hash. See ParentConsistentHash::getPathHash(). So it looks like cachekey creates a URL with a path string /hostname/port/path/to/content/querystring which is then normalized with escape sequences as necessary before creating a hash
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alficles hopefully the following plugin design info brings some clarification.
The cachekey plugin is designed to provide an easy and somewhat structured way to set cache key (and now parent selection URL) by performing the following.
- Takes the value of the each request element (URL or header)
- Runs a transformation (defined by the config) on it
- Calls the corresponding core API to get the final result set
It is a simple and generic design which would allow to keep enhancing with:
- new input types (i.e.
remappedandpristinerequest URI) - new target types (i.e. cache key and later parent selection URLs)
- new transformations (see the plugin documentation for examples)
The plugin does not get into the validation business which makes it generic and powerful.
The syntax validation is left to the lower level using the core API call. If the call fails it means that the transformation result is invalid, so the target should be (ideally) unchanged and an error message should be issued to the log. Pre-validation would be inefficient, error prone and would require unnecessary maintenance (keeping the two validations in-sync).
The semantic validation is left to the traffic server operator who ultimately knows the final goal (i.e. how the content should be cached or how the multi-tier caching should work).
It is true that one can easily shoot oneself in the foot but it is not feasible and also very limiting to code all use-cases into the plugin and even then we would still need to make sure it works in production.
HTH
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough. It feels like a bit of a footgun if you're not careful, but it's definitely useful. (And indeed I envision using it in the not-to-distant future.) Thanks!
Added ability to apply all transformations, available for modifying the cache key, to parent selection URL: * --key-type=cache_key - apply transformations to cache key * --key-type=parent_selection_url - apply transformations to parent selection URL TODO/TBD: After this change all transformations can be applied not only to the cache key but to parent selection URL as well. It would make sense to give the cachekey plugin a new, more suitable name.
| @pparam=--canonical-prefix=true \ | ||
| @plugin=cachekey.so \ | ||
| @pparam=--key-type=cache_key \ | ||
| @pparam=--static-prefix=this://goes.to/cache/key \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm about 90% sure that urls with unrecognized schemes don't get parsed the way you might expect. This should really be http:// for the prefix. This otherwise does suggest reasonable behaviour.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(And I clicked the wrong line. But both should probably be updated. Technically only the parent URI line needs to be http://
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alficles, it seems we are going too far with this, losing sight of what is important here. This is a made up example, just to demo that cachekey plugin instances can be used in the same remap rule if they set two different "urls" - the cache key and the parent selection url.
Initially seemed reasonable to change from /this/goes/to... to this this://goes.to/... just to emphasize that it should be a valid URL but I think it is pretty obvious that this is not a copy-and-paste example. Also it seems that it is parsed ok:
[Aug 28 22:02:50.258] [ET_NET 10] DEBUG: <ParentConsistentHash.cc:79 (getPathHash)> (parent_select) Using Over-Ride String='this://goes.to/parent/selection/url/path/to/file2'.
|
@jrushford, @alficles appreciate your comments. Please let me know if there is more info needed. |
jrushford
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gtenev looks good to me.
|
Cherry-picked to v9.0.x branch. |
Added ability in the cachekey plugin to apply all transformations, available for modifying
the cache key, to parent selection URL:
--key-type=cache_key- apply transformations to cache key--key-type=parent_selection_url- apply transformations toparent selection URL
TODO/TBD: After this change all transformations can be applied not
only to the cache key but to parent selection URL as well. It would
make sense to give the cachekey plugin a new, more suitable name.