Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GSoC 2025: Better JSON Schema Errors #870

Open
jdesrosiers opened this issue Jan 27, 2025 · 25 comments
Open

GSoC 2025: Better JSON Schema Errors #870

jdesrosiers opened this issue Jan 27, 2025 · 25 comments
Labels
gsoc Google Summer of Code Project Idea

Comments

@jdesrosiers
Copy link
Member

Create a JavaScript library to convert standard JSON Schema output into clear, human-friendly error messages. The library should follow the examples set by existing tools like Atlassian's better-ajv-errors and Apideck's @apideck/better-ajv-errors, but use the standard JSON Schema output format introduced in draft-2019-09 instead of ajv's proprietary format.

Expected Outcomes

  • A library that transforms standard JSON Schema validation outputs into concise, easy-to-understand error messages.
  • A mechanism for loading and managing additional language packs to support presenting error messages in multiple languages.
  • Customization options for users to override default error messages or add custom ones.
  • Tested for compatibility with multiple implementations using the standard output format, including @hyperjump/json-schema.
  • Published on npm with proper versioning, a clear README, and example use cases.

Skills Required

  • JavaScript: Strong understanding of JavaScript and best practices for building libraries.
  • JSON Schema Specification: Familiarity with the JSON Schema standard, particularly the error output format introduced in draft-2019-09.
  • Error Message Design: Ability to translate structured error output into concise and meaningful human-friendly messages.
  • Library Development: Experience creating, testing, and documenting JavaScript libraries.
  • Open Source Practices: Understanding of Git, GitHub, and how to maintain a project for open source contributions.
  • Testing: Familiarity with Test-Driven Development (TDD) or Behavior-Driven Development (BDD) methodologies.
  • Collaboration: Comfortable using pair programming tools like VSCode LiveShare and participating in pair programming sessions for real-time collaboration.

Mentor(s)

Expected Difficulty

Medium

Expected Time Commitment

175 hours

@jviotti
Copy link
Member

jviotti commented Jan 27, 2025

I'm very interested in the outcome of this to improve my own validator. Let me know if I can help in any way

@heysujal
Copy link

Thanks @jdesrosiers for this idea.
I am interested to work on this idea in the upcoming GSoC event, if this gets accepted.
I have never built a JavaScript library before, so I am not aware about the best practices just yet but I have strong understanding of JavaScript. I have also got the Open Source Practices and Collaboration part covered. I will be needing to work on Testing, Library Development and familiarising myself more with JSONSchema part.

To gain more understanding of JSONSchema standard, I am going through the docs.
I am excited to get a chance to work on this one. Would you like to any guidance/resources to prepare for this project idea?

@jdesrosiers
Copy link
Member Author

Thanks for your interest @heysujal. IMO, there's no better way to get familiar with JSON Schema than to write a bunch of schemas. I suggest picking some domain and write some schemas to model it. It just needs to be complex enough to explore past the basics.

@benjagm benjagm added the gsoc Google Summer of Code Project Idea label Feb 4, 2025
@Julian
Copy link
Member

Julian commented Feb 5, 2025

(This is obviously a really good idea -- I want/wanted at some point soon to have Bowtie collect and compare error messages from implementations, so definitely keen to see where this goes).

@gregsdennis
Copy link
Member

I'd like to see the rules generalized somehow so that non-JS implementations can also be made.

Also worth mentioning the challenges described in my blog post around the ambiguity of determining a "right" error.

@jdesrosiers
Copy link
Member Author

I'd like to see the rules generalized somehow so that non-JS implementations can also be made.

That would be nice. However, I don't think the "rules" used by this project are necessarily the rules everyone would want to use. How you present an error doesn't have any one correct answer. For example, of the two libraries I linked, one is optimized for CLIs and the other is optimized for APIs. We can certainly make a test suite in JSON like our validation test suite, but that's probably not what's needed. In any case, the test suite would be a comprehensive set of examples that could be used as a reference for others making similar tools. They can use the test cases to make sure their implementation covers the same situations nicely even if they choose to handle them differently.

@karenetheridge
Copy link
Member

I feel like this is under-specified, or that determining the specification itself is part of the task (which might be beyond the scope and expectations of a GSoC participant).

Are we talking about transforming json schema error objects into a flat list of strings? If so, that's a very easy transformation if the error messages already exist (and some implementations, e.g. mine, already have that capability). Or are we wanting a standardized set of errors that can come from each keyword? (To do that, I would start by inventorying some popular implementations to see what they do, and attempt to come up with something similar or choose the best option of each of these - where "best" is not defined.) Or, perhaps propose an extension to the json schema error specification where standardized error codes could be used, together with sprintf-style arguments, so that an implementation could use a locale library to produce error strings in any language?

@jdesrosiers
Copy link
Member Author

I feel like this is under-specified, or that determining the specification itself is part of the task (which might be beyond the scope and expectations of a GSoC participant).

I think you're making this into a bigger thing than I had in mind. There's no specification and I don't expect a specification to be a result of this project. It's just a library. Hopefully it will be an example for others to make similar kinds of things, but I don't think there's anything to be standardized.

That said, yes, there is quite a bit that's left open that I expect candidates to provide details for in their proposals. Such as,

  • What audience is being served? (The examples I linked show two: CLI and API users)
  • What special features might be incldued? (For example, the two libraries I linked provide suggestions for misspelled enums)
  • How are they going to handle the infamously difficult to message oneOf/anyOf case and other tricky situations?

Are we talking about transforming json schema error objects into a flat list of strings?

No, I'd expect more than that. See the examples I linked in the description. better-ajv-errors, which is aimed at CLI output, presents the JSON where the error occurred with messaging inline. @apideck/better-ajv-errors, which is aimed at APIs, includes additional data that might be useful to applications such as an array of the required properties that are missing in addition to messaging.

If so, that's a very easy transformation if the error messages already exist

This tool couldn't use the messages from the validator. It would have to use its own messaging. One of the stated goals of this tool is to be able to provide messaging in multiple languages. Obviously we can't translate messages from arbitrary validators, so we would need to make our own messages and provide translations for those messages.

I didn't talk about this in the description, but one of the benefits of this approach to error messaging is that it decouples the messaging from the implementation. It makes it easier to change the implementation your application uses while knowing the messaging and how your application uses the messaging won't change. This only works if the library provides its own messaging.

Also, a huge motivator for me is to free implementers from the burden of having to worry about messaging and providing the right kind of messaging for every possible audience or not serving every audience. Ideally, implementers can just provide the standard output that provides instance location and schema location. Ideally, there would be multiple libraries that present messaging appropriately for different audiences (like the two examples I linked: CLI and API) and users can choose which one fits their domain independently of what implementation they choose.

Or are we wanting a standardized set of errors that can come from each keyword?

Definitely not.

Or, perhaps propose an extension to the json schema error specification

It wouldn't surprise me if this project inspires some proposals to improve the output specification, but I expect this project to work against the existing spec and nothing more. The schema location and instance location of an error should be enough as long as you have access to the schema and the instance to extract the necessary data to construct the message.

I hope that helps clarify my vision for this project. Thanks for bringing this up.

@GANESHSHARMA1
Copy link

Hi @jdesrosiers,

I’m thrilled about the idea of building a JavaScript library to convert standard JSON Schema (draft-2019-09) validation outputs into human-friendly error messages for GSoC 2025! I have solid JavaScript experience (e.g., building reusable libraries with Node.js) and a growing understanding of JSON Schema from experimenting with @hyperjump/json-schema. I love the challenge of turning technical data into clear, user-friendly messages—something I’ve done in Modern Vibe Homes.

I’d like to propose a library that not only delivers concise error messages but also supports language packs and customization, inspired by tools like better-ajv-errors. I’ve started digging into the draft-2019-09 output format and plan to submit a detailed GSoC proposal soon. Would you be open to reviewing a draft or suggesting specific features you’d like to see? I’d also be happy to prototype a small example if that’s helpful.

Excited to collaborate with you and the Hyperjump community! Looking forward to your thoughts.

@idanidan29
Copy link

idanidan29 commented Mar 1, 2025

Hey @jdesrosiers and the team! 👋

I’m Idan Levi, a software engineering undergrad with a strong interest in JavaScript and open-source contributions. I’ve worked with both front-end and back-end technologies like Next.js and express.js, and I have experience using JSON Schema in various projects.

A bit about me:

  • I’m deeply interested in building user-friendly tools and libraries, and this project aligns perfectly with my goal
  • I’ve worked on full-stack development and API integration projects and enjoy taking on challenges
  • I’ve also gained experience with Test-Driven Development (TDD), which will be helpful for ensuring this library is properly tested and works across different implementations.

I would love to contribute to the JSON Schema error message library as part of GSoC.

A quick question,
regarding the flexibility of error message formatting: Given that this library will provide its own set of messages and aim for multiple languages, how will the library handle complex error scenarios like oneOf and anyOf? Will there be a built-in mechanism for handling these cases, or will it be up to the users to customize the messaging for such edge cases?

Thanks in advance for the opportunity and looking forward to collaborating! 😊

@Vishv0407
Copy link

Hi @jdesrosiers,

I'm interested in contributing to this project for GSoC 2025. I have experience with JavaScript and npm libraries and have been exploring JSON Schema validation errors.

Are there any qualification tasks or prerequisites to complete before applying? Also, which repo should I contribute to for this project?

Looking forward to your response.

@jdesrosiers
Copy link
Member Author

Thanks everyone for showing your interest in this project.

Would you be open to reviewing a draft or suggesting specific features you’d like to see?

I will provide one and only one review of your proposal. Aside from that, all discussion must be in public spaces like here or in the Slack #gsoc channel.

I’d also be happy to prototype a small example if that’s helpful.

I won't be looking at any code or demos aside from the qualification task, but if prototyping helps you think through the issues and ask questions, I think that's a great idea.

A quick question,
regarding the flexibility of error message formatting: Given that this library will provide its own set of messages and aim for multiple languages, how will the library handle complex error scenarios like oneOf and anyOf? Will there be a built-in mechanism for handling these cases, or will it be up to the users to customize the messaging for such edge cases?

You tell me 😃. This is the kind of thing I want to see from your proposal. The fact that you've already identified the biggest challenge with error messaging is great. Now, analyze that problem and tell me in your proposal how you think is best to handle those kinds of errors and why.

Are there any qualification tasks or prerequisites to complete before applying?

The qualification task will be announced sometime in the next week.

Also, which repo should I contribute to for this project?

There's no repo yet. This will be a new project built from the ground up. I'll setup a repo when the project start date is approaching.

@GANESHSHARMA1
Copy link

Hi @jdesrosiers,

I’m super excited about the opportunity to work on a JavaScript library that transforms standard JSON Schema (draft-2019-09) validation outputs into clear, human-friendly error messages for GSoC 2025! I’ve got a strong grasp of JavaScript (e.g., crafting modular libraries with Node.js) and have been diving into JSON Schema through tools like @hyperjump/json-schema. I love the idea of making technical outputs more accessible.

Inspired by libraries like better-ajv-errors, I’d like to propose a solution that delivers concise messages, supports language packs for multilingual use, and offers customization options. To get started, here’s how I’d approach it:

  1. Parse the Output: Study the draft-2019-09 error format and write a utility to extract key details (e.g., instancePath, schemaPath, message).
  2. Message Templates: Create a default set of human-friendly templates (e.g., “Value at /age must be a number, got string”) with fallback handling.
  3. Language Packs: Design a simple system to load JSON-based language files (e.g., en.json, fr.json) for easy i18n support.
  4. Customization: Add an API for users to override messages or define custom ones via a config object.
  5. Testing & Publishing: Test against @hyperjump/json-schema and other implementations, then package it for npm with a solid README.

I’m planning to draft a full GSoC proposal soon—would you be willing to share feedback on it? I’d also love to hear your thoughts on these steps or any specific priorities you’d like to emphasize. If it helps, I can whip up a quick prototype to showcase the concept.

Can’t wait to collaborate with you and the Hyperjump team—this feels like a perfect fit for my skills and passion! Thanks for considering me.

@Hello-Ship-Code
Copy link

This sounds like a really cool and useful project! Making JSON Schema validation errors easier to understand will definitely help a lot of developers. Looking forward to seeing how this comes together! 🚀

@variable6
Copy link

I’m excited to dive in and collaborate on refining the details further.

@idanidan29
Copy link

Thanks for clarifying all that! I can't wait for the qualification task.

@jdesrosiers
Copy link
Member Author

@GANESHSHARMA1 -- I’d also love to hear your thoughts on these steps or any specific priorities you’d like to emphasize.

That's all fine, but but it's all very generic and doesn't really say anything of substance. It's mostly just a summary of the project description. When you write your proposal, I want to see you go a lot deeper. Identify the main challenges you'll face and how you intend to handle them.

@GANESHSHARMA1
Copy link

@jdesrosiers -- That's all fine, but but it's all very generic and doesn't really say anything of substance. It's mostly just a summary of the project description. When you write your proposal, I want to see you go a lot deeper. Identify the main challenges you'll face and how you intend to handle them.

Sure, I'm working on it. I'm deeply understanding the code and its functionality. Soon, I will submit my proposal, where I will mention all the challenges and my intentions to solve them.

@Kashika23
Copy link

Hey @jdesrosiers
I am eager to contribute to the project of building a JavaScript library for transforming JSON Schema validation outputs into human-friendly error messages. With a solid foundation in JavaScript and a growing interest in JSON Schema, I aim to create a library that simplifies error understanding for developers. I plan to implement multi-language support, customization options, and thorough testing for compatibility. This project will help me deepen my skills in library development, error message design, and open-source collaboration. I am excited to learn, contribute, and deliver a tool that enhances the developer experience with JSON Schema.

I am eager to learn, collaborate, and deliver a tool that simplifies schema exploration for developers worldwide.

@jdesrosiers
Copy link
Member Author

Qualification Task

There's probably no better way to prepare for a project like this than to implement the JSON Schema output format for yourself. So, that's what the qualification task is going to be.

I've provided a simple JSON Schema implementation that implements the Flag output format. Your task is to update it to support either the Basic or Detailed output formats.

The implementation and more details about what I expect can be found at https://github.com/hyperjump-io/json-schema-lite. Good luck!

@Vishv0407
Copy link

Thanks @jdesrosiers for sharing the qualification task!
I'll start working on implementing the Basic or Detailed output format in json-schema-lite.
I’ll also begin drafting my proposal based on my approach and findings.

@gregsdennis
Copy link
Member

@jdesrosiers may I suggest that the Verbose format be considered over Detailed? In my experience, trying to figure out which nodes/branches should be retained vs. pruned, especially in an automated way, proved difficult and ultimately unsuccessful. The Verbose output is quite straightforward.

@idanidan29
Copy link

Hey @jdesrosiers what is the deadline for submitting the qualification task?

@jdesrosiers
Copy link
Member Author

may I suggest that the Verbose format be considered over Detailed? In my experience, trying to figure out which nodes/branches should be retained vs. pruned, especially in an automated way, proved difficult and ultimately unsuccessful. The Verbose output is quite straightforward.

There are two reason I chose not to allow the Verbose output. The goal of this project is to be something that works with a variety of implementations. I've never heard of anyone other than you and I that have supported Verbose. The other reason is that I thought it was too trivial a task especially given the starting point I'm giving them. All that's left to do, and what I want them to do, is to go through each keyword and figure out what needs to be retained vs pruned.

I expect them to run into a few ambiguities and I expect we'll discuss them and decide on the expected behavior together. I did enough of the assignment myself to be confident that it's not too hard, but what I gave them is a different approach from how I implemented it in my validator so there might be hard parts I'm not aware of. I kinda hope there are. This project is highly experimental. It's likely that we'll find that some things we want to do just aren't possible. If we see some of that in the qualification task, I'll get to see how they deal with those kinds of road blocks.

@jdesrosiers
Copy link
Member Author

what is the deadline for submitting the qualification task?

I don't think we've set a deadline for qualification tasks as an organization. I'm willing to accept it as long as the application period is open (April 6), but you might want to get it in early enough to make use of the feedback in your application. I'd suggest trying to get it in by the time the application period begins (March 24).

Also, remember that I'm doing these reviews in my spare time, so turn around time will likely not be fast. The sooner you get it in, the more likely I'll have a review for you in time to inform your application. I can't guarantee that you'll get a review if it's close to the submission deadline.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
gsoc Google Summer of Code Project Idea
Projects
None yet
Development

No branches or pull requests