-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds a StreamInterceptor interface to allow users to plug in custom interceptors for formats like Zstd. #930
base: master
Are you sure you want to change the base?
Conversation
I think a bit of working backwards would help inform your design. Thinking out loud, for a customer migrating to new formats, the desired experience would be, in order of preference:
#1 is out due to the desire to limit this library's dependencies and scope. An example of #2 would be someone else building a new library that wraps ion-java and injects new functionality in a completely transparent way. This could be done with intercepting proxies using an AOP library or java.lang.reflect.Proxy. Customer would add the new library as a dependency without having to make any code changes. While that's cool in theory, ion-java has multiple places where IonReader is constructed, as you mention, so this might be challenging to implement. #3 is potentially a good tradeoff: if customers are willing to make a small configuration change, they might as well agree to a trivial code change. I think your proposal works well to enable this approach: they add the new library that provides custom ion-java interceptors and they make a low-risk code change where their reader is constructed. The other aspect of migration is being able to handle both the old gzip format and the new zstd/whatever format. Your proposed design addresses that by supporting multiple interceptors and using the first one that claims a header match. One can think of use cases where chaining more than one interceptor would be beneficial, but I don't know if that's stepping into overengineering territory. Just make sure to avoid a one-way door and leave the opening for future extension. Another possibly-overengineering point is that not all format detection logic fits into the "fixed header" mold. Sometimes headers are not fixed length, and sometimes they are not headers at all. This works 99% of the time though, so the same comment about avoiding one-way doors. |
Please consider automatically discovering interceptors via the services API, to make integration as easy as dropping them on the classpath. It's going to be annoying if one needs to configure these all over, when I expect most of the time a customer will want to enable something for the entire application or application suite. [Update] Per @artemkach comment, this is effectively a 1.5 classpath-only change, and much simpler for everyone than AOP or proxy injection. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new classes should be in com.amazon.ion.util
since they aren't coupled to Ion really.
* @see com.amazon.ion.system.IonReaderBuilder#addStreamInterceptor(StreamInterceptor) | ||
* @see com.amazon.ion.system.IonSystemBuilder#withReaderBuilder(IonReaderBuilder) | ||
*/ | ||
public interface StreamInterceptor { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Stream" has several meanings in this package, I suggest renaming to InputStreamInterceptor
to be more specific.
/** | ||
* The length of the byte header that identifies streams in this format. | ||
* @return the length in bytes. | ||
*/ | ||
int headerLength(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since some formats (eg Ion itself) may have variable-length headers, I'd make this a bit more general. Maybe headerMatchLength
.
* Determines whether the given candidate byte sequence matches this format. | ||
* @param candidate the candidate byte sequence. | ||
* @param offset the offset into the candidate bytes to begin matching. | ||
* @param length the number of bytes (beginning at 'offset') in the candidate byte sequence. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there some connection between this length
and the headerLength
?
The class could use more docs on exactly how the matching process works. After seeing headerLength
I expected this method to receive that number of bytes in candidate
, and nothing else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll try to clarify the documentation. length
is intended to be the actual length of bytes in candidate
, starting at offset
. It need not be the same as headerLength
, but if it is less than headerLength
then the candidate sequence cannot be a match.
* @return a new InputStream. | ||
* @throws IOException if thrown when constructing the new InputStream. | ||
*/ | ||
InputStream newInputStream(InputStream interceptedStream) throws IOException; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering whether this should have a sibling method for character streams, so one can accomplish transformations of text inputs.
For example, suppose I want to teach the Ion reader to ignore shebang lines atop my Fusion scripts...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can understand the use case, though I don't think it's as simple as adding one sibling method; headerLength
and matchesHeader
would also need different flavors to operate on text instead of bytes. At that point I think a different interface would be best. If we do this, then we could use the same pattern for registration/discovery as we establish in this PR, but I'll leave that out of scope for now.
That sounds like a good experience. I will look into it. |
…nterceptors for formats like Zstd.
…StreamInterceptor to InputStreamInterceptor.
8d3aede
to
4c24d36
Compare
Revision 2:
|
Description of changes:
This library has always had auto-detection of GZIP streams built in, meaning that when users attempt to construct an
IonReader
from anInputStream
orbyte[]
, the given input is checked for the GZIP format header and wrapped in aGZIPInputStream
if the header is present.Now that many users are replacing GZIP with other compression formats like Zstd, we have to decide how to make this library as user-friendly as possible for users making that transition while limiting the amount of special-case code and dependencies that we add to the library.
This PR sketches out one possibility: to define an interface (provisionally named
StreamInterceptor
) that can be implemented either by users directly, or by external libraries that we vend, to plug in support for any desired format. The PR demonstrates how this mechanism is used by replacing the existing GZIP detection support with support that is delivered via a newGZIPStreamInterceptor
implementation.The
ZstdStreamInterceptorTest
demonstrates how aStreamInterceptor
that recognizes Zstd streams can be plugged into theIonReaderBuilder
andIonSystem
. In summary, users that wish to support Zstd would change existing code that looks liketo
and code that looks like
to
Critically, this does not require code changes in every location that an IonReader is constructed, and works with all methods of constructing readers (e.g.
IonSystem.newReader
variants,IonSystem.singleValue
,IonSystem.iterate
,IonLoader.load
,IonReaderBuilder.build
variants, etc.).Comments on the approach are welcomed.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.