Skip to content

Conversation

@singhpk234
Copy link
Contributor

@singhpk234 singhpk234 commented May 30, 2025

About the change

Presently the REST parsers are stateless aka static, this makes hard to inject a state which can help the parser to construct the response object if it needs some prior info. Addding injectable values to the mapper and using a reader with injectable value passed during making the request will make sure that during deserialization we will get back the value injected from the DeserializationContext

public T deserialize(JsonParser p, DeserializationContext context) throws IOException {

essentially something like this is possible now :

      // Retrieve injectable values
      @SuppressWarnings("unchecked")
      Map<Integer, PartitionSpec> specsById =
          (Map<Integer, PartitionSpec>) context.findInjectableValue("specsById", null, null);

      boolean caseSensitive = (boolean) context.findInjectableValue("caseSensitive", null, null);

so now a client can essentially during making the post can pass the injectableValues and use it for stateful parsing i,e making these value available to the parser while doing deserilization.

please ref this : #13004 (comment) for more details.

Why this is required

This is required because of scan planning API response doesn't contains specs the caseSensitive field which is required for properly creating the FileScanTask / DataFile / DeleteFile objects essentially needing the state where the table has been loaded and we have already the specByID map and the caseSensitive field. Now these can be injected and percolated from the POST call to all the way to the mapper and hence to the parsers to make the cycle.
My understanding is Since we do it this way rather than injecting the whole at mapper end, we can guarantee thread safety

ref scan planning PR : #13004 for E2E machinery

@github-actions github-actions bot added the core label May 30, 2025
@singhpk234
Copy link
Contributor Author

@singhpk234 singhpk234 force-pushed the feature/context-aware-parser branch 5 times, most recently from d006bb7 to 2c140a2 Compare June 2, 2025 20:42
@singhpk234 singhpk234 force-pushed the feature/context-aware-parser branch 2 times, most recently from 2d05e2f to 76e69fd Compare June 9, 2025 08:05
Copy link
Contributor

@amogh-jahagirdar amogh-jahagirdar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @singhpk234 from my side, this is largely great! just some minor comments that I feel like are worth addressing

Comment on lines +99 to 107
private final ConcurrentMap<Class<?>, ObjectReader> objectReaderCache = Maps.newConcurrentMap();

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: Until initializing the object reader(s) is known to be a bottleneck, I think I'd prefer to not introduce this cache. I haven't gone so deep into the jackson implementation but I'd be kind of surprised if it wasn't already doing this under the hood

Copy link
Contributor Author

@singhpk234 singhpk234 Jun 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, i was on the same boat as i expected Jackson to cache the reader by Type, turns out it doesn't, what jackson does cache is :

  • Deserializers, serializers, type resolvers, etc. — these are cached inside ObjectMapper for performance.
    This means the actual deserialization logic is reused and shared across threads and calls, making repeated deserialization of the same type efficient.

So i proactively went and added a cache for the overall reader so that i can just work with its copy with new injectable, i expect the Cache to be not put memory pressure as we have very limited parsers.
Happy to remove if you feel strongly about it !

@singhpk234 singhpk234 force-pushed the feature/context-aware-parser branch from 76e69fd to f665708 Compare June 17, 2025 00:16
\ java.lang.Class<T>, java.util.function.Consumer<org.apache.iceberg.rest.responses.ErrorResponse>,\
\ java.util.function.Consumer<java.util.Map<java.lang.String, java.lang.String>>,\
\ org.apache.iceberg.rest.ParserContext)"
justification: "Add context aware parsing"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: The Justification here is that no one can depend on this api correct? It's solely owned by the client and no one else can extend or use it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

precisely, I am not aware if there are folks who use this API, i just added here to follow existing pattern and make rev-api happy !

import java.util.Map;
import org.apache.hadoop.util.Preconditions;

class ParserContext {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we wrapping InjectableValues? Is there something else we want to add to the interface?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing this is to keep Jackson out of the public interface?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah trying to keep Jackson out of the interface #13191 (comment)

@RussellSpitzer
Copy link
Member

@amogh-jahagirdar any other followups from you?

Copy link
Contributor

@amogh-jahagirdar amogh-jahagirdar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my side, things look good to go! Even though Jackson doesn't cache the object reader by type under the hood, I'm skeptical the caching on our side is really required but I don't see any harm since it's not adding too much additional complexity.

@RussellSpitzer RussellSpitzer merged commit 5b50afe into apache:main Jun 17, 2025
43 checks passed
@RussellSpitzer
Copy link
Member

Thanks @singhpk234 + @amogh-jahagirdar for Review

eric-maynard pushed a commit to eric-maynard/iceberg that referenced this pull request Jun 18, 2025
nastra added a commit to nastra/iceberg that referenced this pull request Jun 25, 2025
nastra added a commit to nastra/iceberg that referenced this pull request Jun 25, 2025
nastra added a commit to nastra/iceberg that referenced this pull request Jun 25, 2025
devendra-nr pushed a commit to devendra-nr/iceberg that referenced this pull request Dec 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants