-
Notifications
You must be signed in to change notification settings - Fork 2.9k
[CORE][REST]: Add context aware response parsing #13191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CORE][REST]: Add context aware response parsing #13191
Conversation
d006bb7 to
2c140a2
Compare
2d05e2f to
76e69fd
Compare
amogh-jahagirdar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @singhpk234 from my side, this is largely great! just some minor comments that I feel like are worth addressing
| private final ConcurrentMap<Class<?>, ObjectReader> objectReaderCache = Maps.newConcurrentMap(); | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: Until initializing the object reader(s) is known to be a bottleneck, I think I'd prefer to not introduce this cache. I haven't gone so deep into the jackson implementation but I'd be kind of surprised if it wasn't already doing this under the hood
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, i was on the same boat as i expected Jackson to cache the reader by Type, turns out it doesn't, what jackson does cache is :
- Deserializers, serializers, type resolvers, etc. — these are cached inside ObjectMapper for performance.
This means the actual deserialization logic is reused and shared across threads and calls, making repeated deserialization of the same type efficient.
So i proactively went and added a cache for the overall reader so that i can just work with its copy with new injectable, i expect the Cache to be not put memory pressure as we have very limited parsers.
Happy to remove if you feel strongly about it !
76e69fd to
f665708
Compare
| \ java.lang.Class<T>, java.util.function.Consumer<org.apache.iceberg.rest.responses.ErrorResponse>,\ | ||
| \ java.util.function.Consumer<java.util.Map<java.lang.String, java.lang.String>>,\ | ||
| \ org.apache.iceberg.rest.ParserContext)" | ||
| justification: "Add context aware parsing" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: The Justification here is that no one can depend on this api correct? It's solely owned by the client and no one else can extend or use it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
precisely, I am not aware if there are folks who use this API, i just added here to follow existing pattern and make rev-api happy !
| import java.util.Map; | ||
| import org.apache.hadoop.util.Preconditions; | ||
|
|
||
| class ParserContext { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we wrapping InjectableValues? Is there something else we want to add to the interface?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm guessing this is to keep Jackson out of the public interface?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah trying to keep Jackson out of the interface #13191 (comment)
|
@amogh-jahagirdar any other followups from you? |
amogh-jahagirdar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my side, things look good to go! Even though Jackson doesn't cache the object reader by type under the hood, I'm skeptical the caching on our side is really required but I don't see any harm since it's not adding too much additional complexity.
|
Thanks @singhpk234 + @amogh-jahagirdar for Review |
About the change
Presently the REST parsers are stateless aka static, this makes hard to inject a state which can help the parser to construct the response object if it needs some prior info. Addding injectable values to the mapper and using a reader with injectable value passed during making the request will make sure that during deserialization we will get back the value injected from the DeserializationContext
iceberg/core/src/main/java/org/apache/iceberg/rest/RESTSerializers.java
Line 450 in 3469cf3
essentially something like this is possible now :
so now a client can essentially during making the post can pass the injectableValues and use it for stateful parsing i,e making these value available to the parser while doing deserilization.
please ref this : #13004 (comment) for more details.
Why this is required
This is required because of scan planning API response doesn't contains specs the caseSensitive field which is required for properly creating the FileScanTask / DataFile / DeleteFile objects essentially needing the state where the table has been loaded and we have already the specByID map and the caseSensitive field. Now these can be injected and percolated from the POST call to all the way to the mapper and hence to the parsers to make the cycle.
My understanding is Since we do it this way rather than injecting the whole at mapper end, we can guarantee thread safety
ref scan planning PR : #13004 for E2E machinery