Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preserve the casing of tags and attributes names. #182

Closed
wookie41 opened this issue Aug 26, 2019 · 17 comments · Fixed by #206
Closed

Preserve the casing of tags and attributes names. #182

wookie41 opened this issue Aug 26, 2019 · 17 comments · Fixed by #206
Assignees

Comments

@wookie41
Copy link

It would be nice if the the policy builder had an option that disabled the "sanitization" of tags and attributes names.
Currently all names are converted to lowercase which is ok when you're using it for HTML only, but if there is an SVG image nested inside the HTML it breaks.
For example, when viewBox attribute on is converted to viewbox and the image is not displayed correctly.

@zeeneir
Copy link

zeeneir commented Mar 27, 2020

Please add an option to preserve casing of tags and attributes. I've the same issue with sanitizing svg. This is one of the best tools out there and this particular shortcoming is a show stopper for my project. Any possible workarounds is also appreciated until this is fixed. Thanks!

@wookie41
Copy link
Author

@zeeneir a while back I did something like this and actually forgot to open a pull request for it. I'll review it on monday and open a PR.

wookie41@affcccd

@zeeneir
Copy link

zeeneir commented Mar 27, 2020

@wookie41 excellent and thanks 👌

@jmanico
Copy link
Member

jmanico commented Mar 27, 2020 via email

@wookie41
Copy link
Author

wookie41 commented Mar 27, 2020

@jmanico SVG renderers actually care about the casing (at least the one in Firefox) and the image simply doesn't show, when viewBox is sanitized to viewbox. I know that this isn't an svg sanitizer, but it does the job I need it to do after creating an svg policy.

@jmanico
Copy link
Member

jmanico commented Mar 27, 2020 via email

@zeeneir
Copy link

zeeneir commented Mar 27, 2020

Same comment as @wookie41, Agree that this is HTML sanitizer, but I think this small change would allow proper sanitizing of SVG documents. Some renderers don't care about case, but others do. Mozilla states:

SVG elements and attributes should all be entered in the case shown here since XML is case-sensitive (unlike HTML).
Here: https://developer.mozilla.org/en-US/docs/Web/SVG/Tutorial/Introduction

@zeeneir
Copy link

zeeneir commented Jun 26, 2020

any updates for this issue?

@mikesamuel
Copy link
Contributor

What's the list of names that need to be preserved?

HTML uppercase qualified name provides a notion that we could use for a canonical name that prefers lower-case ASCII.

If an element/attribute name is not defined in the HTML namespace and it is defined in the corresponding SVG or MathML namespace, then we use the preferred casing from that namespace, otherwise we convert to lower-case in a locale-insensitive manner.

I don't think the xlink namespace matters.

Would that be sufficient?

@zeeneir
Copy link

zeeneir commented Jun 26, 2020

If an element/attribute name is not defined in the HTML namespace and it is defined in the corresponding SVG or MathML namespace, then we use the preferred casing from that namespace, otherwise we convert to lower-case in a locale-insensitive manner.

Ideal would be a setting on the policy builder that indicates whether case should be preserved for tags/attributes, however, this proposal is good as well. If you go this route, you don't need us to provide a list, do you?

@mikesamuel
Copy link
Contributor

@zeeneir If the bug results from a mismatch in how converted names are interpreted by HTML parsers then I don't see how more policy settings would help.

@zeeneir
Copy link

zeeneir commented Jun 26, 2020

@mikesamuel, I see. Would you proposal cover overlapping tags/attributes?
One example I can think of: HTML has <textarea> tag and SVG (Tiny 1.2) has same tag with different case notation: <textArea>

@mikesamuel
Copy link
Contributor

Most of the overlaps are innocuous: <a>, <script>, src=.

I was planning on coming up with whitelists of foreign element and attribute names, and just not lower-casing if an identifier is in that set.

Unless there's a need for Tiny 1.2 I'd rather go with what spec.html references:

[MATHML] Mathematical Markup Language (MathML)
[SVG] Scalable Vector Graphics (SVG) 2

IIUC, for SVG, the lists are Appendix F and Appendix G

But I'm not super familiar with which drafts of SVG are supported by which browser vendors.

@zeeneir
Copy link

zeeneir commented Jun 26, 2020

Sounds good. Could you please include support for SVG Tiny. SVG Tiny is lightweight version of SVG, is suitable for mobile devices and is necessary for my use case.

Here’s the latest spec:
https://www.w3.org/TR/SVGMobile/single-page.html

Element List:
https://www.w3.org/TR/SVGMobile/single-page.html#chapter-elementTable

Attribute/Property list:
https://www.w3.org/TR/SVGMobile/single-page.html#chapter-attributeTable

@mikesamuel mikesamuel self-assigned this Jul 1, 2020
@mikesamuel
Copy link
Contributor

For the record, JS used to extract mixed-case attribute and element names from the specs in lieu of data files.

// https://www.w3.org/TR/SVGMobile/single-page.html#chapter-elementTable
let elementTable = document.getElementById('elementTable-elements');
let elementCells = Array.from(elementTable.querySelectorAll('tr > :first-child')).map(x => x.innerText);
function isMixedCase(x) { return !(/:/.test(x)) && /[a-z]/.test(x) && /[A-Z]/.test(x) }
elementCells.map(x => x.replace(/^\'|\'$/g, '')).filter(x => x != 'Elements' && isMixedCase(x))
// (8) ["animateColor", "animateMotion", "animateTransform", "foreignObject", "linearGradient", "radialGradient", "solidColor", "textArea"]

let attributeCells = Array.from(elementTable.querySelectorAll('tr > :nth-child(2)')).map(x => x.innerText);
Array.from(new Set(attributeCells.flatMap(x => x.split(/, /g)).filter(x => x != 'Attributes' && isMixedCase(x))));
// (37) ["externalResourcesRequired", "focusHighlight", "requiredExtensions", "requiredFeatures", "requiredFonts", "requiredFormats", "systemLanguage", "attributeName", "attributeType", "calcMode", "keySplines", "keyTimes", "repeatCount", "repeatDur", "keyPoints", "initialVisibility", "preserveAspectRatio", "syncBehavior", "syncMaster", "syncTolerance", "gradientUnits", "defaultAction", "pathLength", "mediaCharacterEncoding", "mediaContentEncodings", "mediaSize", "mediaTime", "baseProfile", "contentScriptType", "playbackOrder", "snapshotTime", "syncBehaviorDefault", "syncToleranceDefault", "timelineBegin", "viewBox", "zoomAndPan", "transformBehavior"] 

//https://svgwg.org/svg2-draft/eltindex.html
function isMixedCase(x) { return !(/:/.test(x)) && /[a-z]/.test(x) && /[A-Z]/.test(x) }
Array.from(document.querySelectorAll('.element-name')).map(x => x.innerText.replace(/^\u2018|\u2019$/g, '')).filter(isMixedCase)
// (32) ["animateMotion", "animateTransform", "clipPath", "feBlend", "feColorMatrix", "feComponentTransfer", "feComposite", "feConvolveMatrix", "feDiffuseLighting", "feDisplacementMap", "feDistantLight", "feDropShadow", "feFlood", "feFuncA", "feFuncB", "feFuncG", "feFuncR", "feGaussianBlur", "feImage", "feMerge", "feMergeNode", "feMorphology", "feOffset", "fePointLight", "feSpecularLighting", "feSpotLight", "feTile", "feTurbulence", "foreignObject", "linearGradient", "radialGradient", "textPath"]

// https://svgwg.org/svg2-draft/attindex.html
function isMixedCase(x) { return !(/:/.test(x)) && /[a-z]/.test(x) && /[A-Z]/.test(x) }
Array.from(new Set(Array.from(document.querySelectorAll('.attr-name')).map(x => x.innerText.replace(/^\u2018|\u2019$/g, '')).filter(isMixedCase)))
// (52) ["attributeName", "baseFrequency", "calcMode", "clipPathUnits", "diffuseConstant", "edgeMode", "filterUnits", "gradientTransform", "gradientUnits", "kernelMatrix", "kernelUnitLength", "keyPoints", "keySplines", "keyTimes", "lengthAdjust", "limitingConeAngle", "markerHeight", "markerUnits", "markerWidth", "maskContentUnits", "maskUnits", "numOctaves", "pathLength", "patternContentUnits", "patternTransform", "patternUnits", "pointsAtX", "pointsAtY", "pointsAtZ", "preserveAlpha", "preserveAspectRatio", "primitiveUnits", "refX", "refY", "repeatCount", "repeatDur", "requiredExtensions", "specularConstant", "specularExponent", "spreadMethod", "startOffset", "stdDeviation", "stitchTiles", "surfaceScale", "systemLanguage", "tableValues", "targetX", "targetY", "textLength", "viewBox", "xChannelSelector", "yChannelSelector"]

// https://www.w3.org/Math/draft-spec/appendixi.html
let [elementList, attributeList] = document.querySelectorAll('dl');
function isMixedCase(x) { return !(/:/.test(x)) && /[a-z]/.test(x) && /[A-Z]/.test(x) }
Array.from(elementList.querySelectorAll('dt')).map(x => x.innerText).filter(isMixedCase)
// []
Array.from(attributeList.querySelectorAll('dt')).map(x => x.innerText).filter(isMixedCase)
// (2) ["definitionURL", "schemaLocation"]

@mikesamuel
Copy link
Contributor

Unioning these lists gives us

function union(lists) {
  let a = Array.from(new Set(lists.flatMap(x => x)));
  a.sort();
  return a;
}

Mixed case element names in SVG Tiny, SVG 2, and MathML

union([
["animateColor", "animateMotion", "animateTransform", "foreignObject", "linearGradient", "radialGradient", "solidColor", "textArea"],
["animateMotion", "animateTransform", "clipPath", "feBlend", "feColorMatrix", "feComponentTransfer", "feComposite", "feConvolveMatrix", "feDiffuseLighting", "feDisplacementMap", "feDistantLight", "feDropShadow", "feFlood", "feFuncA", "feFuncB", "feFuncG", "feFuncR", "feGaussianBlur", "feImage", "feMerge", "feMergeNode", "feMorphology", "feOffset", "fePointLight", "feSpecularLighting", "feSpotLight", "feTile", "feTurbulence", "foreignObject", "linearGradient", "radialGradient", "textPath"],
[]
])

[
      "animateColor",
      "animateMotion",
      "animateTransform",
      "clipPath",
      "feBlend",
      "feColorMatrix",
      "feComponentTransfer",
      "feComposite",
      "feConvolveMatrix",
      "feDiffuseLighting",
      "feDisplacementMap",
      "feDistantLight",
      "feDropShadow",
      "feFlood",
      "feFuncA",
      "feFuncB",
      "feFuncG",
      "feFuncR",
      "feGaussianBlur",
      "feImage",
      "feMerge",
      "feMergeNode",
      "feMorphology",
      "feOffset",
      "fePointLight",
      "feSpecularLighting",
      "feSpotLight",
      "feTile",
      "feTurbulence",
      "foreignObject",
      "linearGradient",
      "radialGradient",
      "solidColor",
      "textArea",
      "textPath"
]

Mixed case attribute names in SVG Tiny, SVG 2, and MathML

union([
["externalResourcesRequired", "focusHighlight", "requiredExtensions", "requiredFeatures", "requiredFonts", "requiredFormats", "systemLanguage", "attributeName", "attributeType", "calcMode", "keySplines", "keyTimes", "repeatCount", "repeatDur", "keyPoints", "initialVisibility", "preserveAspectRatio", "syncBehavior", "syncMaster", "syncTolerance", "gradientUnits", "defaultAction", "pathLength", "mediaCharacterEncoding", "mediaContentEncodings", "mediaSize", "mediaTime", "baseProfile", "contentScriptType", "playbackOrder", "snapshotTime", "syncBehaviorDefault", "syncToleranceDefault", "timelineBegin", "viewBox", "zoomAndPan", "transformBehavior"] ,
["attributeName", "baseFrequency", "calcMode", "clipPathUnits", "diffuseConstant", "edgeMode", "filterUnits", "gradientTransform", "gradientUnits", "kernelMatrix", "kernelUnitLength", "keyPoints", "keySplines", "keyTimes", "lengthAdjust", "limitingConeAngle", "markerHeight", "markerUnits", "markerWidth", "maskContentUnits", "maskUnits", "numOctaves", "pathLength", "patternContentUnits", "patternTransform", "patternUnits", "pointsAtX", "pointsAtY", "pointsAtZ", "preserveAlpha", "preserveAspectRatio", "primitiveUnits", "refX", "refY", "repeatCount", "repeatDur", "requiredExtensions", "specularConstant", "specularExponent", "spreadMethod", "startOffset", "stdDeviation", "stitchTiles", "surfaceScale", "systemLanguage", "tableValues", "targetX", "targetY", "textLength", "viewBox", "xChannelSelector", "yChannelSelector"],

["definitionURL", "schemaLocation"]
])

[
      "attributeName",
      "attributeType",
      "baseFrequency",
      "baseProfile",
      "calcMode",
      "clipPathUnits",
      "contentScriptType",
      "defaultAction",
      "definitionURL",
      "diffuseConstant",
      "edgeMode",
      "externalResourcesRequired",
      "filterUnits",
      "focusHighlight",
      "gradientTransform",
      "gradientUnits",
      "initialVisibility",
      "kernelMatrix",
      "kernelUnitLength",
      "keyPoints",
      "keySplines",
      "keyTimes",
      "lengthAdjust",
      "limitingConeAngle",
      "markerHeight",
      "markerUnits",
      "markerWidth",
      "maskContentUnits",
      "maskUnits",
      "mediaCharacterEncoding",
      "mediaContentEncodings",
      "mediaSize",
      "mediaTime",
      "numOctaves",
      "pathLength",
      "patternContentUnits",
      "patternTransform",
      "patternUnits",
      "playbackOrder",
      "pointsAtX",
      "pointsAtY",
      "pointsAtZ",
      "preserveAlpha",
      "preserveAspectRatio",
      "primitiveUnits",
      "refX",
      "refY",
      "repeatCount",
      "repeatDur",
      "requiredExtensions",
      "requiredFeatures",
      "requiredFonts",
      "requiredFormats",
      "schemaLocation",
      "snapshotTime",
      "specularConstant",
      "specularExponent",
      "spreadMethod",
      "startOffset",
      "stdDeviation",
      "stitchTiles",
      "surfaceScale",
      "syncBehavior",
      "syncBehaviorDefault",
      "syncMaster",
      "syncTolerance",
      "syncToleranceDefault",
      "systemLanguage",
      "tableValues",
      "targetX",
      "targetY",
      "textLength",
      "timelineBegin",
      "transformBehavior",
      "viewBox",
      "xChannelSelector",
      "yChannelSelector",
      "zoomAndPan"
]

mikesamuel added a commit that referenced this issue Jul 3, 2020
…s exactly

> Currently all names are converted to lowercase which is ok when
> you're using it for HTML only, but if there is an SVG image nested
> inside the HTML it breaks.  For example, when `viewBox` attribute is
> converted to `viewbox` the image is not displayed correctly.

This commit splits *HtmlLexer*.*canonicalName* into variants which preserve
items on whitelists derived from the SVG and MathML specifications, and
adjusts callers of *canonicalName* to use the appropriate variant.

Fixes #182
mikesamuel added a commit that referenced this issue Jul 13, 2020
#206)

* Do not lcase element or attribute names that match SVG or MathML names exactly

> Currently all names are converted to lowercase which is ok when
> you're using it for HTML only, but if there is an SVG image nested
> inside the HTML it breaks.  For example, when `viewBox` attribute is
> converted to `viewbox` the image is not displayed correctly.

This commit splits *HtmlLexer*.*canonicalName* into variants which preserve
items on whitelists derived from the SVG and MathML specifications, and
adjusts callers of *canonicalName* to use the appropriate variant.

Fixes #182

* add unittests for mixed-case SVG names
@mikesamuel
Copy link
Contributor

Included in the latest releas

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants