Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

StrictXmir #3652

Merged
merged 14 commits into from
Dec 13, 2024
Merged

StrictXmir #3652

merged 14 commits into from
Dec 13, 2024

Conversation

yegor256
Copy link
Member

@yegor256 yegor256 commented Dec 12, 2024

related to #3636

@yegor256
Copy link
Member Author

@volodya-lombrozo please, check this one

Copy link
Member

@volodya-lombrozo volodya-lombrozo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yegor256 This PR doesn't solve the #3636 issue. It only speeds up the inner eo tests. If we don't have an internet connection or there is an outage, this solution won't help.

/**
* XMIR that validates itself right after construction.
*
* <p>This class is supposed to be used ONLY for testing, because
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yegor256 I'm a bit confused here. You mentioned in the PR description that this PR closes #3636. And here, in the javadoc you write

This class is supposed to be used ONLY for testing

It means, I won't be able to use StrictXmir in jeo-maven-plugin production code. However, I'm using StrictXML during the xmir generation. In other words, I use it in the production code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@volodya-lombrozo you can try to do this, on your side: 1) create XML object, 2) save it on disc, 3) make StrictXmir(xml) and 4) either fail, or continue. If you fail, you fail. If you continue, you have a clean XML on disc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yegor256 [NIT] I mean, this class might be used not only for testing. Actually it's an important part of validation in a production code. Maybe we need to remove this line from javadoc:

This class is supposed to be used ONLY for testing

and replace it with something like

ATTENTION!

what do you think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@volodya-lombrozo explained better in 04af5af

).asBytes()
);
} catch (final IOException | URISyntaxException ex) {
throw new IllegalArgumentException(ex);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yegor256 [NIT] It's a bad idea by a several reasons:

  1. Exception without context: https://www.yegor256.com/2015/12/01/rethrow-exceptions.html
  2. Grouped exceptions: https://www.yegor256.com/2022/08/30/dont-group-exception-catchers.html

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.xpath("/program")
.attr(
"noNamespaceSchemaLocation xsi http://www.w3.org/2001/XMLSchema-instance",
"https://www.eolang.org/XMIR.xsd"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yegor256 You tested only one positive scenario. What will happen if we have a corrupted address? What will happen if we don't have the internet?

https://www.eCORRUPTEDolang.org/XMIR.xsd

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@volodya-lombrozo added test in 6f07f8f

).toString().replace("\\", "/")
);
}
new Xembler(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yegor256 Do we need to change the xmir in case !uri.startWith("http")?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@volodya-lombrozo nope, when it's file://.... we keep it as is

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yegor256 I see. I've overlooked indentations. Thank you.

@yegor256
Copy link
Member Author

@volodya-lombrozo we can't stay entirely offline. The schema (XSD) must come from somewhere. Either you download it and then provide to StrictXmir (I will make it possible now), or it will download it. Or I miss something?

@github-actions github-actions bot added the core label Dec 12, 2024
@volodya-lombrozo
Copy link
Member

@volodya-lombrozo we can't stay entirely offline. The schema (XSD) must come from somewhere. Either you download it and then provide to StrictXmir (I will make it possible now), or it will download it. Or I miss something?

@yegor256 Is it possible to place xsd to the classpath where StrictXmir is located?

@yegor256
Copy link
Member Author

@volodya-lombrozo we don't know which version to place there: there are many of them published at www.eolang.org. We can possibly try to download them all during build and put into JAR, but this sounds like a hack.

@yegor256
Copy link
Member Author

@volodya-lombrozo please, check the code one more time, I made some changes

Copy link
Member

@volodya-lombrozo volodya-lombrozo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yegor256 Looks good to me. I suggested several minor changes, but they aren't so important. Could you also replace the line in the comment from

closes #3636

to

related to #3636

please?

/**
* XMIR that validates itself right after construction.
*
* <p>This class is supposed to be used ONLY for testing, because
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yegor256 [NIT] I mean, this class might be used not only for testing. Actually it's an important part of validation in a production code. Maybe we need to remove this line from javadoc:

This class is supposed to be used ONLY for testing

and replace it with something like

ATTENTION!

what do you think?

).toString().replace("\\", "/")
);
}
new Xembler(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yegor256 I see. I've overlooked indentations. Thank you.

@volodya-lombrozo
Copy link
Member

@volodya-lombrozo we don't know which version to place there: there are many of them published at www.eolang.org. We can possibly try to download them all during build and put into JAR, but this sounds like a hack.

@yegor256 As I understand it, you publish them from XMIR.xsd. So, when you publish version 0.47.0 of eo, you also publish version 0.47.0 of XMIR.xsd (at least that’s my assumption). In other words, they are consistent and included together in eo-parser:0.47.0.jar. I don't clearly understand why we can't use the XMIR from this JAR and instead are required to download it from the internet. Most probably, I will implement this in jeo-maven-plugin.

@yegor256
Copy link
Member Author

@volodya-lombrozo when you do new StrictXmir(xml), we don't know what is the location of XSD that you mention inside the xml. It could be the version of the JAR where StrictXmir.class is located, but it may me something totally different (not even XMIR.xsd, but any other URI). That's why we have to download the file first and then use it. Make sense?

@yegor256
Copy link
Member Author

yegor256 commented Dec 13, 2024

@volodya-lombrozo what we can do is this: StrictXmir will compare the value of the URI inside XML with the expected location of XSD for the current version of eo-parser. If the URI is exactly where the XSD is supposed to be located, it will use the XMIR.xsd file from classpath. In your particular case, this will solve the problem (because you build the XMIR and validate it using the same version of eo-parser).

@volodya-lombrozo
Copy link
Member

@volodya-lombrozo when you do new StrictXmir(xml), we don't know what is the location of XSD that you mention inside the xml. It could be the version of the JAR where StrictXmir.class is located, but it may me something totally different (not even XMIR.xsd, but any other URI). That's why we have to download the file first and then use it. Make sense?

@yegor256 To be honest, I don't see any reason why we might need such flexibility. We have a single XSD and a single StrictXmir. They are placed together in one location, so StrictXmir can directly use it without any downloading. This sounds like overengineering to me in this particular case. However, I may not be seeing the whole picture.

I will try to use xsi:noNamespaceSchemaLocation="classpath:XMIR.xsd" in the generated XML files or something similar. I don't know if it will help, but that's a separate matter. I will investigate it.

@yegor256
Copy link
Member Author

@volodya-lombrozo

We have a single XSD and a single StrictXmir. They are placed together in one location, so StrictXmir can directly use it without any downloading.

StrictXmir doesn't know about this: whether the XSD your XML is using is the same XSD that StrictXmir has in the classpath. There is no such guarantee.

@yegor256
Copy link
Member Author

yegor256 commented Dec 13, 2024

@volodya-lombrozo you can try to use this:

xsi:noNamespaceSchemaLocation="classpath:XMIR.xsd"

However, 1) this won't work with StrictXML or StrictXmir and you will have to implement your own schema validation, and 2) the XML file won't be accepted by next steps of the pipeline, since they have their own validators that don't know anything about your classpath (some of them may not be even written in Java).

@volodya-lombrozo
Copy link
Member

@volodya-lombrozo you can try to use this:

xsi:noNamespaceSchemaLocation="classpath:XMIR.xsd"

However, 1) this won't work with StrictXML or StrictXmir and you will have to implement your own schema validation, and 2) the XML file won't be accepted by next steps of the pipeline, since they have their own validators that don't know anything about your classpath (some of them may not be even written in Java).

@yegor256
There are many questions:

  1. Why do the "next steps of the pipeline" require an xsd, and why do they specifically need the http version of this xsd?
  2. Moreover, I don't understand why the classpath won't work with StrictXML or StrictXmir.

Let me investigate these questions. Maybe it will become clear. And thank you very much for your help.

@yegor256
Copy link
Member Author

@volodya-lombrozo check this one: d0bf3b8 With this feature you will be fine. Just use StrictXmir from eo-parser and there will be no network usage.

@yegor256
Copy link
Member Author

@volodya-lombrozo XML documents that JEO (or any other tool) produces are supposed to be "valid" in terms of compliance with XSD schema. Technically this means that noNamespaceSchemaLocation is present and it points to the document publicly available (online or on disc). Classpath is not publicly available, it's a local storage of a particular instance of a running VM.

Copy link
Member

@volodya-lombrozo volodya-lombrozo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yegor256 There is one common part in all tests:

           new StrictXmir(
                 new Xmir(
                     new XMLDocument(
                         new Xembler(
                             new Directives()
                                 .append(new DrProgram("foo3"))
                                 .xpath("/program")
                                 .attr(
                                     "noNamespaceSchemaLocation xsi http://www.w3.org/2001/XMLSchema-instance",
                                     String.format(
                                         "https://www.eolang.org/xsd/XMIR-%s.xsd",
                                         Manifests.read("EO-Version")
                                     )
                                 )
                                 .add("objects")
                         ).xml()
                     )
                 ),
                 tmp
             ).validate(),

What do you think if we will move into a separate method?

@yegor256
Copy link
Member Author

@volodya-lombrozo you are right, duplication reduced: 809adde

@volodya-lombrozo
Copy link
Member

Technically this means that noNamespaceSchemaLocation is present and it points to the document publicly available (online or on disc)

@yegor256 As far as I know, we have several ways to validate an XML document against an XSD schema, and we are not obligated to use only noNamespaceSchemaLocation.

@yegor256
Copy link
Member Author

@volodya-lombrozo indeed, it's possible to validate it against any XSD schema, but relying on noNamespaceSchemaLocation or schemaLocation (which are provided by the author of XML document) is the more robust way. BTW, we can also use DTD: https://stackoverflow.com/questions/1544200/what-is-difference-between-xml-schema-and-dtd

@yegor256 yegor256 merged commit 714cdfc into master Dec 13, 2024
19 checks passed
@yegor256 yegor256 deleted the 3636 branch December 13, 2024 10:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants