Skip to content
This repository has been archived by the owner on Nov 14, 2024. It is now read-only.

Range scanning on VAR_STRING should be caught at schema generation, rather than producing a runtime log error #799

Closed
gsheasby opened this issue Aug 18, 2016 · 7 comments
Assignees

Comments

@gsheasby
Copy link
Contributor

From internal email thread: @markelliot "Hey, I see there's an ERROR level log message printed at runtime around this now. Per below, there are some valid uses here, and I think it's too late to print this message at runtime (where users can do nothing about it and have no recourse seeing the message) vs. alerting developers at schema generation time.

Perhaps a safer and more enforceable approach controlling schema generation behavior would be requiring passing the schema generation method an explicit override flag that has something to allow first row components to contain variable length types, or like many of the other table definition modifiers, just having a forExpertsOnlyAllowVariableLengthFirstRowComponents() method on the table def."

@schlosna "We talked about this a bit during AtlasCon, but I think there are some ways that if we really want to support variable length strings as part of range scans, we could do so using null terminated C-style strings and/or row component metadata appended to the row key which would hold the lengths for variable length components. This would require a new compose/decompose implementation (that could be opted into as part of the schema spec)."

@markelliot "Dave, if I have a schema which consists of two row components:

  1.  VAR_STRING
    
  2.  STRING
    

…and I promise that I will only ever range using a complete value for (1) so that I can fetch a range of values for (2), would you agree this is a valid use?

Separately, my objection below is that we're pushing the error into runtime, and the error currently states the functionality will likely be removed – I think a better place to error is going to be during schema generation, and a better way to enforce would be making the behavior overridable (because of scenario above) as a schema generation config argument."

@schlosna "I agree that it is reasonable to want to range scan over your VAR_STRING using a complete or partial value – the current Atlas just doesn’t support it, though it could should. I agree that we should make as many schema specification issues compile time/schema generation errors as possible.

The current limitations around row and column value types are not ideal, and we should work to remove the limitations. We could adjust our transform from row components to underlying byte[] to better support all relevant operations by storing the components in a better structure. We could store the current atlas schema definition in the DB itself and generate user defined functions for Cassandra and Postgres to allow introspection and (de)composition of row and column keys so that one could interact with the underlying store more easily through CQL/SQL."

@gsheasby gsheasby changed the title Range scanning on VAR_STRING should be either work, or be caught at schema generation, rather than producing a runtime log error Range scanning on VAR_STRING should either work, or be caught at schema generation, rather than producing a runtime log error Aug 18, 2016
@gbrova
Copy link
Contributor

gbrova commented Aug 23, 2016

We will first update the docs to explain the current behavior, and we will throw an error on schema generation.

@gbrova gbrova changed the title Range scanning on VAR_STRING should either work, or be caught at schema generation, rather than producing a runtime log error Range scanning on VAR_STRING should be caught at schema generation, rather than producing a runtime log error Aug 23, 2016
@rhero
Copy link
Contributor

rhero commented Aug 26, 2016

Moving to backlog for now since docs are updated, and we'll address throwing a warning/error or having a new method like forExpertsOnlyAllowVariableLengthFirstRowComponents() later.

@tpetracca
Copy link
Contributor

We should probably handle cassandra hotspot warning in a similar fashion. Runtime is too late: #933

@markelliot
Copy link
Contributor

Does AtlasDb still emit an ERROR level log statement on startup? If so I don't think it's fair to backlog this.

@rhero
Copy link
Contributor

rhero commented Sep 13, 2016

I don't believe we made changes to this log level, and I'm open to bumping the priority of this to support efforts in reducing noise of runtime logs. We should have a real fix here (and keep it open until we do), but would immediately changing this to a warning be okay, @markelliot ?

@markelliot
Copy link
Contributor

markelliot commented Sep 13, 2016 via email

@gsheasby
Copy link
Contributor Author

gsheasby commented Nov 1, 2016

Fixed by #947

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
6 participants