-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JSLT without Jackson #123
Comments
Hi Lars. We (Schibsted Pulse) are one of the users that would value a JSLT with no dependencies. Without boring everyone with the details I just mention that we run into conflicts between Spark which has a dependency on Jackson and JSLT that has a dependency on Jackson of a different version. Below are some thoughts around the problem but I don't claim they are well-thought out. As the JsonNode is exposed in the API using shadowjar to pull in jackson into the jar and rename the namespace don't seem feasable. An idea I have been toying with is that JSLT provides it's own writable JSON DOM (perhaps functionally identical to Jackson in all ways that matters to JSLT?). As a convenience to the users there could be a That way the core JSLT could have no dependencies and users like us that have specific runtime demands have the possiblilty to implement Important is that the API still have to be perceived as convenient to use so our maybe specialized needs don't reduce usability for the normal user. As we might be the odd guys here I would be ok with a slightly less convenient API for us as long as we can exclude Jackson and provide our own parsing. Another option is to provide several version of JSLT with different Jackson version but I am not a fan of that having to provide several versions of an internal library for different versions of scala. Regards, Mårten |
I've made an experiment replacing Jackson with a simple JSON library: The main JSLT code is mostly unchanged, except for replacing all Jackson classes with their equivalents. I've replaced almost all uses of asXxxx() methods with the equivalent xxxxValue(), leaving only a couple, where the behaviour differs. The JSON library is a pretty straightforward replacement for the corresponding Jackson classes, because I just wanted to see how much changes to JSLT are needed. I don't think it's very well designed at the moment. There's no parser. It's currently using Jackson for parsing. I don't see the lack of a parser as a big problem, since a JSON parser is not hard to implement. I like the idea of a VM for JSLT, but I think the representation of input & output JSON to JSLT is orthogonal to the implementation, so I would focus on it from an API design point of view. Here are some thoughts about how I was thinking of improving the JSON lib:
Jackson allows NumberNodes that are BigDecimal. This is potentially useful with sums of money, but JSLT doesn't currently really support that. Not sure if that needs to be considered. Jackson also supports other formats, such as YAML. Should it be an objective for JSLT too? |
There are two other options to having a JSON library within JSLT:
I don't really like the first option, because it would mean passing Objects around and doing a lot of instanceof checks. Also not sure how null would work. The second option would be nice in that it wouldn't require any copying between Jackson and another format. One potential issue is that it's pretty opaque to JSLT. |
What branch is that?
Uhmm, I think it ok to use Jackson internally, as long as you do a shadowJar/FatJar/UberJar where the jackson classes are relocated. The real "problem" is caused by exposing Jackson classes in the JSLT api that is If you think Jackson is too heavy weight to embed, then I think it's possible to do a separate JavaCC grammar to parse pure JSON and use the resulting AST (or a layer on top of that) as the internal json library. So that functions, etc are implemented in terms of that internal api and not Jacksons.
|
When I work with JSON I always use JSON-P. This is only a set of interfaces. It works with a Service Provider Interface. The implementation I use is Glassfish. However, you can write your own by implementing javax.json.spi.JsonProvider. |
JSLT without JacksonA JSLT implementation that does not depend on Jackson may be useful
However, there are also a potential downside, in that we may be forced There are a number of different approaches that could be taken, with Define JSLT JsonValue interfacesWe could define a set of Java interfaces to represent JSON values, The downside is that performance for Jackson users would probably (I see @jarno-r has tried implementing this. It would be very interesting Drop-in Jackson replacementAnother possibility would be to implement the Jackson The only downside to this would be that we probably could not get the Also, it probably would not help those who have dependencies that Packed JSON representationWe could make a completely different type of JSON representation: The downside here would be that we end up with two JSLT Generated Java code with adaptersWe could make a JSLT implementation that generates either Java source If we let the core logic be performed on values that are Different representations of objects and arrays could then be catered Note that this might also be used to support protobuf/avro input. There are a few complications that might make this more difficult than Tentative conclusionIt's too early to pick an approach, but it seems clear that both |
@jarno-r It would be interesting if you could do a benchmark comparison of your code with the existing JSLT code. Especially if you could push your code to a forked version of the repo for investigation. Are you interested in doing that? |
I wrote my own JSON interfaces and ported JSLT to run on top of that. I did a benchmark where I ran Jackson JSON objects wrapped for this new interface through filters and transforms, and compared it with processing Jackson objects directly. That approach seemed to result in a 7-8% slowdown. Which is not bad at all. I should note that in this case the output is in the internal JSON representation, and may need to be translated. Or we may need to produce the output as Jackson objects wrapped in this new representation, so that retrieving the Jackson data is trivial. Users who want to transform serialized JSON input into serialized JSON output probably don't care about Jackson at all. So I should probably do another benchmark for that, because it would involve parsing, transform, plus serialization. It's entirely possible that we could get better performance for this use case. It would be interesting to hear from users who want to use JSLT with Jackson. What are your use cases? Is it important to get Jackson objects as the output? This is important for me to understand, so that I don't make a new JSLT version that is unusable for you. |
The work done so far is available on branch own-json-interfaces |
93% of the tests now pass. A good part of what remains is parsing, which is easily fixed. Transforms with own JSON representation is now marginally slower than with Jackson objects. (Not sure why. Will work on improving this.) Parsing JSON now seems to have same performance as Jackson. |
I have some concerns about the approach taken in https://github.com/schibsted/jslt/tree/own-json-interfaces to drop the Jackson dependency. Rolling a custom JSON parser adds significant complexity to the project and is also added risk on the effort to detach JSLT from Jackson. Do we want to maintain a JSON parser in addition to the JSLT parser and language implementation? I see JSLT as a language to define transformations on a JSON-like object model. On that level, the language doesn't rely on Jackson or JSON. It's the current runtime implementation that picked Jackson and JSON. Could we move, instead, in a direction where we take the current runtime implementation and split the language from the runtime such that there is a Jackson/JSON-free core language, and a jslt-jackson as its first implementation? Ideally, the split could allow language and implementation to evolve somewhat separately. |
One way we could define the success criteria for a JSLT without Jackson could be that we're able to usefully maintain both the existing |
The motivation for dropping Jackson is that some users have dependencies (such as Spark) that require versions of Jackson that are incompatible with the version we have. That means we can't have Jackson among the dependencies at all, so using the Jackson JSON parser isn't going to work. The good part is that maintaining a JSON parser is not much effort. JSON is a very small language, so parsing JSON is hugely easier than parsing JSLT. In fact, the JSLT parser contains a JSON parser. The most difficult part is (believe it or not) decimal numbers.
Absolutely true.
Initially I tried thinking of ways to do that, but failed to come up with anything. The problem is that when the entire implementation is based on I wrote a longer analysis that you may want to read. I agree there is some cost to maintaining a separate JSON parser, but now that I've actually written the parser I find the cost is lower than I feared it might be. Performance tuning is the main cost, but the plus side is we can now optimize specifically for the use cases we have without worrying about lots of use cases Jackson must meet that we don't need to. |
I understand and can sympathise with some of the motivations for moving away from Jackson. We regularly have to deal with dependency hell across a few of our projects with conflicting requirements for Jackson (looking at you, Spark, Finatra, Kafka, ... you know the bunch 😉). In terms of dependencies, the good part, I think, is that JSLT builts on top of a relatively stable core of Jackson, so we've been able to enforce different Jackson versions so long as we keep all the Jackson libraries at a compatible version. I also recognise that JSON is a relatively small language, that is also somewhat embedded in JSLT itself. The main concern I have is that JSLT code we run is essentially code we maintain and control. While JSON data is essentially untrusted external input that we run through it. There is some value in using a JSON parser that has been hardened by time and is maintained on its own. I also have some concerns over the upgrade path of a once over switch away from Jackson, as it will currently imply changes to some of our core libraries that are shared across a few projects. This will require that we dedicate time to undergo the proposed update. |
Besides concerns that may be more operational than development related, I think these two stated goals require that we ponder where we're going with this effort:
PerformanceWith the concern for performance of JSON parsing/representation, rolling our own parser means the project is committing to maintain the most efficient parser/representation (for the use cases of the language). Its one thing to have encouraging numbers from initial experiments, but it's a different one to commit to developing and maintaining the edge. If we are to take this approach, it would be good to state what are the constraints that make it possible to develop higher performing JSON handling within JSLT than it is to maintain one externally. This is not clear to me, also because I have not yet taken the time to take a closer look at the approach you took in your branch. Supporting other formatsIf there is the goal to support alternate binary formats, I'm not sure that moving away from established libraries gets us closer to the goal. Jackson, for instance has support for different binary formats via https://github.com/FasterXML/jackson-dataformats-binary, and some other third-party libraries. From our side, we have some experience with running JSLT code on Avro input data in Kafka. I'll admit that our current approach is not the most efficient one. After experimenting with different Avro libraries we ended up not using Jackson, which means we pay a penalty from an Avro-to-JSON serialization followed by JSON parsing. (The main holdup to using Jackson directly was the lack of integration with Confluent's Schema Registry, which may be possible to address as an issue). |
Would an approach like that taken in https://github.com/jimblackler/jsonschemafriend#format be feasible? It seems that, in that project, they define interfaces in terms of Java interfaces, and the user is then responsible for plugging in the JSON parser and bridging the two.
|
Letting the user supply the JSON parser and representation is a real no-no, because it's going to make adoption so much harder for users. We have to give them a complete package they can use right out of the box. But I have to say I find it difficult to understand what you're concerned about. JSLT is an entire language with functions, value types, operators, expressions, etc etc. JSON, by contrast, is a very small language. So small, in fact, that json.org has room for the entire grammar in two different representations on the front page. In my experience the hardest part of parsing JSON is parsing the numbers. Seriously. And the So ... why worry about this? The approach taken in the new branch will let anyone who wants to plug in their own JSON parser and representation, anyway, so that option will still be there. It's just that you won't have to. Yeah, sure, there's a cost in effort to maintain a JSON parser, but it's my effort. |
Apologies. In my last comment I meant to suggest (and didn't) that the JSLT runtime could be defined in terms of Java interfaces. Of course, JSLT should still be usable out of the box, as it is today, and offer at least one JSON parser integration, be it Jackson or JSLT's own JSON parser. The main point I wanted to express is that it might be easier to hook up different JSON parsers (and potentially Avro, protobuf libraries) to the standard Java interfaces required by the runtime. About rolling your own JSON parser, my earlier comment was meant to question how well the different goals are being addressed by the approach:
Clearly, this goal is addressed by not using Jackson, JSLT gets out of the dependency game.
On this one, I'd venture a maybe. Yes, a focussed implementation can offer better performance than a general purpose parser. That said performance is not a static game nor one that has a single answer for all use cases. My concern with using this as a reason to roll your own parser is that the faster implementation today for a set of use cases may not be the fastest tomorrow or for a different set of use cases. So, while you may come up with a faster parser, I'm not convinced this approach properly addresses the goal.
This goal is not addressed by switching from Jackson to a custom parser and interface. Jackson today has support for more data formats, and this support is lost. |
I think everything you write here is totally fair. My plan is to make a version of JSLT which defines its own interfaces for the JSON representation. I also plan to make a full JSON parser and implementation of the JSON representation to bundle with JSLT. However, I very much want it to be possible to plug in other JSON parsers and representations, for those who prefer that. I think it would make a lot of sense to offer a separate artifact that has a Jackson binding, so that anyone who wants to use Jackson can keep doing that. This also means that if someone wants to try supporting Avro via Jackson that should also be possible. In other words, it looks to me like this should satisfy everyone? |
In my project we use kotlinx serialization library which brings it's own JSON representation - different than Jackson. a Jackson-less JSLT would be really valuable. |
Is this idea still being pursued? It seems like the branch |
I originally started this branch because it seemed Schibsted needed a non-Jackson JSLT, and it seemed like a good idea anyway. Schibsted then expressed skepticism about this approach (see @biochimia above), and nobody else has appeared to be very interested, so I set it aside. I still think this could be a valuable alternative to the Jackson-based implementation, but if users are not interested then there's little point. |
The current JSLT implementation is so tightly bound to Jackson that every value is a Jackson
JsonNode
, but there are users who want to use JSLT without Jackson. There is an experimental branch with a custom JSLT VM which could be developed further to provide a Jackson-less JSLT.There are three main challenges here:
All of these problems can be solved, but it would be good to get some input on to what extent there are people out there who need a JSLT with zero runtime dependencies. And also what requirements these people have for the input/output representation.
Input wanted!
The text was updated successfully, but these errors were encountered: