Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SLING-11865 - Conversion fails when initial content document does not include namespace declaration #168

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

rombert
Copy link
Contributor

@rombert rombert commented May 8, 2023

Added failing test

@rombert rombert requested a review from kwin May 8, 2023 13:33
@kwin
Copy link
Member

kwin commented May 8, 2023

IMHO throwing an exception for undeclared namespace prefixes is reasonable here (although in the past this might have been silently ignored). @rombert Do you think that this should just be logged and conversion should continue?

@rombert
Copy link
Contributor Author

rombert commented May 8, 2023

@kwin - I think it's important to preserve the current behaviour and give consumers a chance to gradually adapt. I think warn + proceed with the conversion is fine, as long as the docview file is still generated. For the record, the content loader will happily process this file, given that the JCR repository has the correct mappings.

I think we can also introduce a switch to the cpconverter that allows opt-in to the stricter behaviours that we introduce, so that once a content-package passes with strict checks they can be enabled.

… include namespace declaration

Added failing test
@kwin
Copy link
Member

kwin commented May 8, 2023

as long as the docview file is still generated.

That is impossible as the docview will be invalid (it has to have a proper namespace URL for the prefix). How did the old version of cp2fm behave for that IT? Which docview file has been generated, what was stripped? I don't think this ever was a behaviour which is worth to keep to be honest.

@rombert
Copy link
Contributor Author

rombert commented May 8, 2023

I pushed this additional test + a revert of SLING-11421 to https://github.com/apache/sling-org-apache-sling-feature-cpconverter/tree/issue/SLING-11865-with-revert . You can see exactly what is being generated by running those tests. The particular file that triggers the problem gets converted to

<?xml version='1.0' encoding='UTF-8'?>
<jcr:root xmlns="" xmlns:nt="http://www.jcp.org/jcr/nt/1.0" xmlns:xml="http://www.w3.org/XML/1998/namespace" xmlns:oak="http://jackrabbit.apache.org/oak/ns/1.0" xmlns:jcr="http://www.jcp.org/jcr/1.0" xmlns:rep="internal" xmlns:my="http://namespace.com/my" xmlns:mix="http://www.jcp.org/jcr/mix/1.0" xmlns:sling="http://sling.apache.org/jcr/sling/1.0" xmlns:vlt="http://www.day.com/jcr/vault/1.0" jcr:primaryType="nt:unstructured" description="{String}This node has been created by a sling bundle." title="{String}My first node">
  <cq:EditConfig jcr:primaryType="cq:EditConfig" description="{String}Some dummy data from sling initial content.">
    <cq:listeners jcr:primaryType="cq:EditListenersConfig" description="{String}Some dummy data from sling initial content.">
    </cq:listeners>
  </cq:EditConfig>
  <graniteComponent jcr:primaryType="sling:Folder" jcr:mixinTypes="[granite:Component]">
    <granite:data jcr:primaryType="nt:unstructured" description="{String}Some dummy data from sling initial content.">
    </granite:data>
    <my:subnode jcr:primaryType="nt:unstructured" description="{String}Some dummy data from sling initial content.">
    </my:subnode>
  </graniteComponent>
</jcr:root>

To your point on whether this is a behaviour to keep or not, in Sling we try hard to preserve backwards compatiblity, and that breaking existing projects is a very big change which we should only do for very good reasons. Being too restrictive with what we accept as input will only hurt adoption of new versions of CP Converter, leaving deployments on old versions because upgrading was too inconvenient.

Again, this is a conversion that worked on 1.3.0 and not on 1.3.2 and that was installable using the JCR ContentLoader.

If you do believe that this is something that must be eventually enforced, let's please start with a warning and make the behaviour opt-in via a command line flag, so consumers can control the evolution of their project.

@kwin
Copy link
Member

kwin commented May 8, 2023

The XML you shared is simply invalid. XML mandates that every prefix is declared (https://www.w3.org/TR/xml-names/#ns-using). Same as DocView XML in JCR spec.

@rombert
Copy link
Contributor Author

rombert commented May 8, 2023

... which is something that was working before. I am not making a technical argument, I am making an argument for not needlessly breaking customer projects.

@kwin
Copy link
Member

kwin commented May 8, 2023

So your proposal is to keep generating invalid XMLs unless user opted in to fail in that case?

For me

not needlessly breaking customer projects.

is incorrect here. I see a need to break here as generating invalid XML (with undeclared prefixes) may easily trigger follow-up issues which are close to impossible to detect then.

@reschke
Copy link
Contributor

reschke commented May 8, 2023

FWIW (not taking a position on the other points): undeclared prefixes are fatal in XML with namespace processing active. That said, XML without namespace processing would be able to parse these files, but leave the namespace processing to the application invoking the XML processor. For JCR DocView import, namespace-aware parsing is required by the JCR spec.

@rombert
Copy link
Contributor Author

rombert commented May 8, 2023

So your proposal is to keep generating invalid XMLs unless user opted in to fail in that case?

You can call it that way, yes. You can also formulate it as "we are releasing a minor version of the cp converter that might break existing projects, because of a refactoring". Given that our toolchain worked before, I think this is a drastic move and should be reconsidered.

I see a need to break here as generating invalid XML (with undeclared prefixes) may easily trigger follow-up issues which are close to impossible to detect then.

I'm not sure I follow. I think we are in one of the two scenarios:

Namespace present in Initial-Content file Namespace declared in Repository Outcome
No Yes Installation proceeds, based on the repository namespace mapping
No No Installation is halted

What we are doing is changing the outcome of the first scenario. Since no one actually complained about the lenient namespace handling and this only came up after a refactoring I am confident this was not a real problem to solve. That is not to say the refactoring was not valuable, I am very glad that it was done, but we should not introduce such behaviour changes.

@kwin
Copy link
Member

kwin commented May 8, 2023

There is no simple solution here except for reverting the https://issues.apache.org/jira/browse/SLING-11421 as the FileVault serializer cannot generate invalid DocView XML. Also I don't have capacity to work on a workaround here for the moment (and also my motivation is limited given

I see a need to break here as generating invalid XML (with undeclared prefixes) may easily trigger follow-up issues which are close to impossible to detect then.

So I would actually appreciate taking this matter to the mailing list to gather more opinions on it but I am also fine with reverting (but this will obviously reintroduce the old bugs which have been fixed by SLING-11421).

@rombert
Copy link
Contributor Author

rombert commented May 8, 2023

There is no simple solution here except for reverting the https://issues.apache.org/jira/browse/SLING-11421 as the FileVault serializer cannot generate invalid DocView XML. Also I don't have capacity to work on a workaround here for the moment (and also my motivation is limited given

I see a need to break here as generating invalid XML (with undeclared prefixes) may easily trigger follow-up issues which are close to impossible to detect then.

So I would actually appreciate taking this matter to the mailing list to gather more opinions on it but I am also fine with reverting (but this will obviously reintroduce the old bugs which have been fixed by SLING-11421).

I think reverting would be a shame. I'll take this to the mailing list, thanks for the clarification.

Update: dev@sling.apache.org conversation: SLING-11865 - Conversion fails when initial content document does not include namespace declaration

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants