Implement spath partial document traversal#4185
Implement spath partial document traversal#4185Swiddis wants to merge 22 commits intoopensearch-project:mainfrom
Conversation
Signed-off-by: Simeon Widdis <sawiddis@amazon.com>
Signed-off-by: Simeon Widdis <sawiddis@amazon.com>
Signed-off-by: Simeon Widdis <sawiddis@amazon.com>
Signed-off-by: Simeon Widdis <sawiddis@amazon.com>
Signed-off-by: Simeon Widdis <sawiddis@amazon.com>
Signed-off-by: Simeon Widdis <sawiddis@amazon.com>
Signed-off-by: Simeon Widdis <sawiddis@amazon.com>
Signed-off-by: Simeon Widdis <sawiddis@amazon.com>
ykmr1224
left a comment
There was a problem hiding this comment.
I did not understand the description spath field=data inner.key should be able to parse {"data": {"inner": "{\"key\": 0}"}}
| * We want input=outer, path=inner.data to match records like `{ "outer": { "inner": "{\"data\": | ||
| * 0}" }}`. To rewrite this as eval, that means we need to detect the longest prefix match in the | ||
| * fields (`outer.inner`) and parse `data` out of it. We need to match on segments, so | ||
| * `outer.inner` shouldn't match `outer.inner_other`. | ||
| * | ||
| * @return The field from the RelBuilder with the most overlap, or inField if none exists. |
There was a problem hiding this comment.
I am confused. Is input parameter for specifying where we want to read JSON from? This description looks like we are reading JSON which includes outer attribute inside.
There was a problem hiding this comment.
Input is the outermost field we want to start extracting the inner values from, so on something like { "outer": { "inner": "{\"data\": 0}" }} then input=outer means that Spath will be processing the document { "inner": "{\"data\": 0}" }. Then path=inner.data would access the value 0, or path=inner would access the value "{\"data\": 0}"
There was a problem hiding this comment.
Does that mean outer is a column? In that case I think we want to separate that from JSON string.
There was a problem hiding this comment.
Yeah, I think in the future it makes sense to remove the input field and just navigate directly to the specified path. For this change, the intended behavior is to allow this type of mixing so you don't need to worry about where exactly the boundary is. I might cut a future PR to make input be an empty string by default?
There was a problem hiding this comment.
I feel it is making thing more complicated.
In my opinion, it is simpler and easier to understand if input simply point the column containing JSON, and path specifies the path in JSON.
But I am open if others feels current approach is better.
Signed-off-by: Simeon Widdis <sawiddis@amazon.com>
Signed-off-by: Simeon Widdis <sawiddis@amazon.com>
Signed-off-by: Simeon Widdis <sawiddis@amazon.com>
Signed-off-by: Simeon Widdis <sawiddis@amazon.com>
Signed-off-by: Simeon Widdis <sawiddis@amazon.com>
Signed-off-by: Simeon Widdis <sawiddis@gmail.com>
Signed-off-by: Simeon Widdis <sawiddis@amazon.com>
…into feature/spath-grace
Signed-off-by: Simeon Widdis <sawiddis@gmail.com>
Signed-off-by: Simeon Widdis <sawiddis@gmail.com>
Description
Permits
spathto partially traverse document fields, which helps with parsing some types of nested structures.Core idea is that
spath field=data inner.keyshould be able to parse{"data": {"inner": "{\"key\": 0}"}}Related Issues
N/A
Check List
--signoffor-s.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.