Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document SAX usage #263

Open
oliviercailloux opened this issue Sep 17, 2021 · 1 comment
Open

Document SAX usage #263

oliviercailloux opened this issue Sep 17, 2021 · 1 comment

Comments

@oliviercailloux
Copy link
Contributor

oliviercailloux commented Sep 17, 2021

May I suggest to include in the README (or elsewhere) an example use for using Jing through the standard SAX API? (Issue #21 provides some partial example, but unfortunately the link provided there is dead.)

Here is an attempt of mine, so far unsuccessful.

public static void validate(String documentId, InputStream relaxSchema) throws SAXException, ParserConfigurationException, IOException {
  ErrorHandler errorHandler = new DraconianErrorHandler();

  System.setProperty(SchemaFactory.class.getName() + ":" + XMLConstants.RELAXNG_NS_URI, "com.thaiopensource.relaxng.jaxp.XMLSyntaxSchemaFactory");
  SchemaFactory schemaFactory = SchemaFactory.newInstance(XMLConstants.RELAXNG_NS_URI);
  schemaFactory.setErrorHandler(errorHandler);
  Schema schema = schemaFactory.newSchema(new StreamSource(relaxSchema));

  SAXParserFactory factory = SAXParserFactory.newInstance();
  factory.setNamespaceAware(true);
  factory.setSchema(schema);

  SAXParser parser = factory.newSAXParser();
  XMLReader reader = parser.getXMLReader();
  reader.setErrorHandler(errorHandler);
  reader.parse(documentId);
}

Then test as follows. Using docbook howto.xml and docbook.rng.

@Test
void testValidSax() throws Exception {
  try (InputStream rng = DocBookUtils.class.getResource("docbook.rng").openStream()) {
      assertDoesNotThrow(() -> DocBookUtils.validate(DocBookUtilsTests.class.getResource("docbook howto.xml").toString(), rng));
  }
}

The above test yields:

org.xml.sax.SAXParseException; systemId: file:/home/…/docbook%20howto.xml; lineNumber: 13; columnNumber: 31; attribute "xmlns" not allowed here; expected attribute "annotations", "arch", "audience", "class", "condition", "conformance", "dir", "label", "linkend", "os", "outputformat", "prefix", "property", "remap", "resource", "revision", "revisionflag", "role", "security", "status", "typeof", "userlevel", "vendor", "version", "vocab", "wordsize", "xl:actuate", "xl:arcrole", "xl:from", "xl:href", "xl:label", "xl:role", "xl:show", "xl:title", "xl:to", "xl:type", "xml:base", "xml:id", "xml:lang" or "xreflabel"
	at com.thaiopensource.relaxng.jaxp.ValidatorHandlerImpl.check(ValidatorHandlerImpl.java:148)
	at com.thaiopensource.relaxng.jaxp.ValidatorHandlerImpl.startElement(ValidatorHandlerImpl.java:68)
	at java.xml/com.sun.org.apache.xerces.internal.jaxp.JAXPValidatorComponent$XNI2SAX.startElement(JAXPValidatorComponent.java:419)
	at java.xml/com.sun.org.apache.xerces.internal.jaxp.JAXPValidatorComponent.startElement(JAXPValidatorComponent.java:182)
	at java.xml/com.sun.org.apache.xerces.internal.impl.dtd.XMLDTDValidator.startElement(XMLDTDValidator.java:731)
	at java.xml/com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.scanStartElement(XMLNSDocumentScannerImpl.java:374)
	at java.xml/com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl$NSContentDriver.scanRootElementHook(XMLNSDocumentScannerImpl.java:613)
	at java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:3063)
	at java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$PrologDriver.next(XMLDocumentScannerImpl.java:836)
	at java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:605)
	at java.xml/com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:112)
	at java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:534)
	at java.xml/com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:888)
	at java.xml/com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:824)
	at java.xml/com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141)
	at java.xml/com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1141)
	at java.xml/com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:647)
	at io.github.oliviercailloux.xml_utils.DocBookUtils.validate(DocBookUtils.java:128)
	at io.github.oliviercailloux.xml_utils.DocBookUtilsTests.lambda$0(DocBookUtilsTests.java:30)
	at org.junit.jupiter.api.AssertDoesNotThrow.assertDoesNotThrow(AssertDoesNotThrow.java:50)
	... 70 more

But that file does validate against that schema when not going through SAX (using com.thaiopensource.validate.Schema schema = new AutoSchemaReader().createSchema(relaxSchema, countingErrorProperties); Validator validator = schema.createValidator(countingErrorProperties); contentHandler = validator.getContentHandler(); xmlReader = ResolverFactory.createResolver(PropertyMap.EMPTY).createXMLReader(); and so on…)

Could you perhaps indicate what the recommended usage is for using Jing through the SAX interface?

Thank you for this useful library.

@opeongo
Copy link

opeongo commented Jun 11, 2022

I am also interested in this issue. My approach is basically the same as in your code.

I have found that factory.setSchema(schema); doesn't seem to do anything. What does seem to apply the schema is this line:

         reader.setProperty("http://apache.org/xml/properties/schema/external-noNamespaceSchemaLocation",
                            schemaFilename);

When I add this line the error message that I get after the parse has begun is:

s4s-elt-schema-ns: The namespace of element 'grammar' must be from the schema namespace, 'http://www.w3.org/2001/XMLSchema'.

If I comment out the line

 reader.setFeature("http://apache.org/xml/features/validation/schema", true);

then the error becomes

Document is invalid: no grammar found.

My guess is that the SAXParserFactory.newInstance(); method is returning an instance of a XML Schema parser, but what is needed is a RelaxNG-based parser. I have looked through the jing-trang source code but I cannot find an implementation of an implementation of SAXParseFactory which is what appears to be required to work with SAX, so maybe this plumbing was never developed? I'm just guessing here.

Here is a more complete example:

      System.setProperty(SchemaFactory.class.getName() + ":" + XMLConstants.RELAXNG_NS_URI, "com.thaiopensource.relaxng.jaxp.XMLSyntaxSchemaFactory"); 
      SchemaFactory sf = SchemaFactory.newInstance(XMLConstants.RELAXNG_NS_URI);
      Schema schema = sf.newSchema(scf);
      SAXParserFactory factory = SAXParserFactory.newInstance();
       factory.setSchema(schema);

      try {
	 SAXParser parser = factory.newSAXParser ();
         System.err.println("parser schema="+parser.getSchema());
	 XMLReader reader = parser.getXMLReader ();
         reader.setFeature("http://xml.org/sax/features/validation", true);
         reader.setFeature("http://apache.org/xml/features/validation/schema", true);
         reader.setProperty("http://apache.org/xml/properties/schema/external-noNamespaceSchemaLocation",
                            schemaFilename);
         reader.setFeature("http://apache.org/xml/features/xinclude", true);
         reader.setFeature("http://xml.org/sax/features/namespaces", true);
         reader.setFeature("http://apache.org/xml/features/xinclude/fixup-base-uris", false);
         reader.setFeature(XMLConstants.FEATURE_SECURE_PROCESSING, true);
         
	 reader.setErrorHandler   (this);
	 reader.setContentHandler (this);
	 reader.setEntityResolver (this);

         reader.parse (new InputSource(input));

As a workaround I am using trang to convert rng to xsd, and then validation with the built in XML Schema validator works just fine.

I would be nice to only work with RelaxNG, and not have to convert to xsd, but it's not a show stopper.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants