Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decide JavaScript and Nunjucks API for passing translation strings #2780

Closed
2 tasks
Tracked by #1708
vanitabarrett opened this issue Aug 16, 2022 · 20 comments
Closed
2 tasks
Tracked by #1708

Comments

@vanitabarrett
Copy link
Contributor

What

We need to decide:

  • what the JavaScript object and keys look like for passing translation strings to component JS
  • what the Nunjucks API keys look like for passing translation strings through to component data attributes
  • what the data attributes look like for translation strings for JS

Example: are keys formatted like show_all_sections, showAllSections? Do they need prefixing with i18n to separate them from other Nunjucks params? What do the data attributes look like, e.g: data-show-all-sections="hello", data-i18n-show-all-sections="hello".

Explore the different options and come to a decision on our preferred option.

Why

We want to implement two ways of passing translation strings to component JS. By designing and agreeing the API for this now, we will make sure that the JS and data-attributes options are consistent with each other, and we’ll be setting conventions for enabling translation of strings in new component JS in future.

Who needs to work on this

Developers

Who needs to review this

Developers

Done when

  • We've made a decision on the above
  • We've written up our decision on this card (which we can then use when implementing the logic)
@querkmachine
Copy link
Member

Using camelCase seems like it might be the path of least resistance. It's the format we already use in Nunjucks, data-* attributes are automatically converted to it in the dataset property (data-show-all-sectionsdataset.showAllSections), and changing it in JavaScript shouldn't pose much of an issue either.

The main issue I see with this approach is around dataset, which isn't supported by IE10 and below. We would need to use getAttribute to support those browsers, in which case we would have to transform to and use the full, hyphenated data-show-all-sections or compromise on something like data-showAllSections. Unfortunately, this then makes using dataset as an enhancement infeasible, as data-showAllSectionsdataset.showallsections (not camelCased).

Adding an i18n namespace would be nice to have, but would further complicate the data-* approach and how we handle any transformations we do.

@domoscargin
Copy link
Contributor

Do we think it would be a significant impact to polyfill dataset temporarily? This would then end up disappearing when/if we drop polyfills entirely.

@querkmachine
Copy link
Member

The polyfill for dataset isn't particularly bulky, so that seems like a sensible route to go to avoid some headaches.

@domoscargin
Copy link
Contributor

domoscargin commented Aug 18, 2022

So my understanding is that we're currently at:

JavaScript config object

{
  accordion: {
    i18n: {
      locale: "cy",
      translations: {
        showAllSections: "Show all sections!",
        hideAllSections: "Hide all sections!",
        showThisSection: "Show section",
        hideThisSection: "Hide section"
      }
    }
  },
  characterCount: {
    ...
  }
}

Nunjucks

I think we're leaning towards something like:

translations: [
  showAllSections: "string",
  hideAllSections: "string",
  ...
]

Data attributes

I think we landed on something like:

<div class="govuk-accordion" data-translations='{"showAllSections": "string", "hideAllSections": "string", ...}'

@querkmachine
Copy link
Member

querkmachine commented Aug 18, 2022

I didn't think we're 100% set on handling translations through one data-* attribute vs. multiple, but from a Nunjucks users perspective there isn't any difference if we're doing the JSON-ifying and escaping for them.

For someone using HTML directly, I imagine the preference would be separate data-* attributes over inline JSON.

@36degrees
Copy link
Contributor

All of the strings used by a component will need to be translated, including strings that have no default value provided (like section headers for the accordion, or the label text for a character count). We also have some strings that are used as fallbacks when JavaScript is not available (like fallbackHintText in #2742).

I don't think the user should care about the difference between these things (thinking in particular about e.g. Prototype Kit users) so it might therefore make sense to try and abstract away the difference between 'Nunjucks strings' and 'JavaScript strings' when it comes to the Nunjucks API, which means avoiding things like nesting the JS translations within a separate object ('translations'), using camel case, and possibly matching the existing convention of using Text as a suffix for most strings (we're not 100% consistent on this though).

We might also want to group 'related' messages together under the part of the component that they appear in.

Putting this all together, an example the Nunjucks API for the Character Count might be something like:

{{ govukCharacterCount({
  name: "with-hint",
  id: "with-hint",
  maxlength: 200,
  label: {
    text: "Can you provide more detail?",
    classes: "govuk-label--l",
    isPageHeading: true
  },
  hint: {
    text: "Do not include personal or financial information like your National Insurance number or credit card details."
  },
  countMessage: {
    fallbackText: "You can enter up to %{count} characters",
    charactersUnderLimitOneText: 'You have %{count} character remaining',
    charactersUnderLimitOtherText: 'You have %{count} characters remaining',
    charactersOverLimitOneText: 'You have %{count} character too many',
    charactersOverLimitOtherText: 'You have %{count} characters too many',
    classes: 'my-custom-count-message-class'
  }
}) }}

If we wanted to go down this route, this might inform how we end up setting up data attributes, as I don't know if we can build a JSON object from within the Nunjucks template to pass it in a single data attribute. We might have to do something quite manual like:

<div class="govuk-character-count" data-module="govuk-character-count"
{%- if params.maxlength %} data-maxlength="{{ params.maxlength }}"{% endif %}
{%- if params.threshold %} data-threshold="{{ params.threshold }}"{% endif %}
{%- if params.maxwords %} data-maxwords="{{ params.maxwords }}"{% endif %}
{%- if params.countMessage.charactersUnderLimitOneText %} data-characters-under-limit-one="{{ params.countMessage.charactersUnderLimitOneText }}"{% endif %}
{# etc #}>

Which isn't pretty, but IMO is 'doing the hard work to make [the Nunjucks API] simple".

Thoughts?

@36degrees
Copy link
Contributor

36degrees commented Aug 18, 2022

Adding an i18n namespace would be nice to have, but would further complicate the data-* approach and how we handle any transformations we do.

I've read a few different sources now that state that data attribute names can include periods, which I found surprising.

The attribute name begins with data-. It can contain only letters, numbers, dashes (-), periods (.), colons (:), and underscores (_). Any ASCII capital letters (A to Z) are converted to lowercase.
https://developer.mozilla.org/en-US/docs/Web/API/HTMLElement/dataset

This does seem to work, but you can only access the properties in the dataset using the bracket notation.

For example with

<div id="test" data-i18n.my-translation="foo"></div>
  • $element.dataset['i18n.myTranslation'] works
  • $element.dataset.i18n.myTranslation does not work and throws 'Uncaught TypeError: Cannot read properties of undefined'
  • unsurprisingly, $element.dataset.i18n is undefined

(CodePen)

No idea if there any cross-browser gotchas we'd need to look out for with this sort of approach.

@querkmachine
Copy link
Member

I'm doing some poking right now with some of the suggestions here: camelCase keys, individual data-* attributes for each translation string, namespacing data-* attributes, polyfilling dataset.

An as-yet-unmentioned issue I've noticed is that dataset is provided as a flat key-value list, whereas JS configuration is in the form of nested objects.

Code examples Format of dataset (includes `module` as that is also part of the dataset returned):
{
  "i18n.showAllSections": "Show everything",
  "i18n.hideAllSections": "Hide everything",
  module: "accordion"
}

Format of default or passed settings object:

{
  i18n: {
    showAllSections: "Show everything",
    hideAllSections: "Hide everything"
  }
}

Does it seem like a good idea to bring back the 'flattening' functionality of the initial i18n spike work to normalise these? Flattening would save us from having to deeply/recursively merge objects, as a bonus.

@36degrees
Copy link
Contributor

I also spent a little bit of time exploring going the other way – pulling all of the keys with a certain prefix out into their own object.

https://codepen.io/36degrees/pen/jOzdWmo?editors=1111

(It could definitely be more readable – it currently uses reduce and is probably trying to be too clever)

@querkmachine querkmachine self-assigned this Aug 22, 2022
@querkmachine
Copy link
Member

I'm fairly happy with the camelCase and namespacing approach discussed above.

My working here is based on the assumption that we're going to combine all configuration parameters together, not just those relating to localisation. This is because we'd likely need to do this for future programatic API work.

If we go through with that, translations would be provided like this in HTML:

<div data-i18n.show-all-sections="Show everything" data-module="accordion">

Which, with a dataset polyfill for IE ≤10, will let us do this in JavaScript:

$element.dataset["i18n.showAllSections"]
// => "Show everything"

This would be abstracted into Nunjucks as a non-namespaced parameter, in order to be consistent with how we already accept localised strings that aren't being passed through to JavaScript (i.e. so the user doesn't need to know the distinction).

These will have Text appended to the end, consistent with parameters which only accept plain text.

{{ govukAccordion({
  showAllSectionsText: "Show everything"
}) }}

Translations provided through JavaScript initialisation would look like this.

Notably, JavaScript initialisation seems to be the only method wherein we will allow passing HTML strings as part of translations.

new Accordion($element, {
  "i18n.showAllSections": "Show everything"
}).init()

If we use the 'flattening' functionality previously described, this would also work:

new Accordion($element, {
  i18n: {
    showAllSections: "Show everything"
  }
}).init()

When combined, you'd end up with a single JS object that looks something like this:

{ 
  "i18n.showAllSections": "Show everything",
  "i18n.hideAllSections": "Hide everything",
  module: "accordion",
  rememberState: false
}

I'm hesitant to try and do too much processing of configuration objects in JavaScript (such as extracting only namespaced attributes, transforming data-* attribute formats, etc.) as this risks creating a disconnect between what is provided and what actually gets used.

I'm also worried about needing to polyfill a lot of things. Part of my reasoning for flattening the objects is to avoid having to use newer Array/Object methods to manipulate the config objects, instead being able to merge them manually without too much issue.

Does this seem like a sensible direction to go in?

@36degrees
Copy link
Contributor

Does this seem like a sensible direction to go in?

This all sounds good to me, thanks for writing it up! 👍🏻

These will have Text appended to the end, consistent with parameters which only accept plain text.

Notably, JavaScript initialisation seems to be the only method wherein we will allow passing HTML strings as part of translations.

Just a thought – do we think we might end up in the future needing to be able to pass HTML through the Nunjucks macro / data attributes? Using the Text suffix works in terms of leaving the Nunjucks API 'open' to introduce an HTML variant later, but do we think we need to work out how passing them through data attributes might work?

I think we've talked in the past about having a 'syntactic sugar' for visually hidden text, which is the most obvious use case I can think of for needing to pass HTML, but if we did that then there would once again be differences between Nunjucks and JavaScript-based strings, (unless we found a way to make the syntactic sugar work in Nunjucks too).

Maybe this is a problem we can worry about later, just wary of painting ourselves into a corner…

If we use the 'flattening' functionality previously described, this would also work:

If we flatten the config before merging with defaults, I think this will also give us most of the benefits of a 'deep merge' of the config? As there's no objects to replace each other, only keys with periods in? 🤔

I'm hesitant to try and do too much processing of configuration objects in JavaScript (such as extracting only namespaced attributes, transforming data-* attribute formats, etc.) as this risks creating a disconnect between what is provided and what actually gets used.

Would we still have the i18n object? If so, would we want a way of getting all the i18n config out in one go, in order to pass it to the i18n object when we create it?

@querkmachine
Copy link
Member

Just a thought – do we think we might end up in the future needing to be able to pass HTML through the Nunjucks macro / data attributes? Using the Text suffix works in terms of leaving the Nunjucks API 'open' to introduce an HTML variant later, but do we think we need to work out how passing them through data attributes might work?

I personally think that it's a problem that already exists; as our English language default for showing/hiding individual accordion sections is Show<span class="govuk-visually-hidden"> this section</span>. At the moment, we won't be giving a means of providing an equivalent string through data-* attributes, only through JavaScript.

I think we've talked in the past about having a 'syntactic sugar' for visually hidden text, which is the most obvious use case I can think of for needing to pass HTML, but if we did that then there would once again be differences between Nunjucks and JavaScript-based strings, (unless we found a way to make the syntactic sugar work in Nunjucks too).

Yeah, this is my main concern about introducing specific syntax features just for translation strings. Doing the syntactic sugar for data-* attributes and JS in the i18n layer, like we do with placeholders, would be simple enough.

If we later want to use the same translation strings in straight HTML, or a user expects the same sugar to work on Nunjucks parameters that go straight into HTML, or we get proposals for more syntactic additions later on, then we're likely creating inconsistency and pain for ourselves. Keep it to raw HTML as much as possible is my preference, personally.

If we flatten the config before merging with defaults, I think this will also give us most of the benefits of a 'deep merge' of the config? As there's no objects to replace each other, only keys with periods in? 🤔

That's the idea!

Would we still have the i18n object? If so, would we want a way of getting all the i18n config out in one go, in order to pass it to the i18n object when we create it?

Ah, yes. I meant that we should avoid picking out i18n namespaced data-* attributes when it came to creating the configuration object and instead be happy to munge everything together. When it comes to passing things through to i18n however, we probably do want to filter it to just the i18n namespaced keys.

@36degrees
Copy link
Contributor

I personally think that it's a problem that already exists; as our English language default for showing/hiding individual accordion sections is Show<span class="govuk-visually-hidden"> this section</span>. At the moment, we won't be giving a means of providing an equivalent string through data-* attributes, only through JavaScript.

Just to check I understand this correctly, this is because the strings will be escaped when we pass them as data attributes?

So e.g.

{{ govukAccordion({
  showThisSectionText: 'Show<span class="govuk-visually-hidden"> this section</span>'
}) }}

will result in:

<div class="govuk-accordion" data-module="accordion" data-i18n.show-this-section="Show&lt;span class=&quot;govuk-visually-hidden&quot;&gt; this section&lt;/span&gt;"></div>

And then the JS will use the escaped string when creating the i18n object?

Yeah, this is my main concern about introducing specific syntax features just for translation strings. Doing the syntactic sugar for data-* attributes and JS in the i18n layer, like we do with placeholders, would be simple enough.

If we later want to use the same translation strings in straight HTML, or a user expects the same sugar to work on Nunjucks parameters that go straight into HTML, or we get proposals for more syntactic additions later on, then we're likely creating inconsistency and pain for ourselves. Keep it to raw HTML as much as possible is my preference, personally.

Sounds sensible 👍🏻

@querkmachine
Copy link
Member

Fiddling with what I have at the moment, it seems that passing HTML into Nunjucks results in only the quotation marks being escaped. Not the chevrons, surprisingly.

<div data-i18n.show-section="Show<span class=&quot;govuk-visually-hidden&quot;> something</span>" data-i18n.hide-section="Hide<span class=&quot;govuk-visually-hidden&quot;> something</span>">

Back in JavaScript land, modern browsers (Chrome, Firefox, Safari, at least) seem quite happy to work with data in this format within JS objects. IE down to 8 doesn't seem to complain either, so it may be that (as ugly as it is) we can just dump HTML into data-* attributes if we want to. TIL!

@36degrees
Copy link
Contributor

36degrees commented Aug 23, 2022

Is that using the escape filter or just relying on autoescape?

Autoescape is enabled by default, but I don't think we tell anyone anywhere that they should leave it on… If they have disabled it, they may already be opening themselves up to security issues, but this might be the first time we'd be relying on escaping to not break the HTML of the page?

We might want to be explicit about it and use the escape filter? (Although I'm assuming escape doesn't do anything when autoescape is enabled.)

(I also am a little confused as to why < and > aren't being escaped, given that they're escaped in all of the examples in the Nunjucks doc)

We should also switch to using the Html suffix to indicate that the translated strings are treated as HTML and will not be escaped.

Do we think we need to provide a Text option that is escaped (somehow?) as well? We generally encourage people to use Text over Html wherever possible to reduce the likelihood of e.g. XSS attacks, but I feel like it'd introduce quite a lot of complexity in this instance and translation strings usually shouldn't contain user-generated content?


Back in JavaScript land, modern browsers (Chrome, Firefox, Safari, at least) seem quite happy to work with data in this format within JS objects. IE down to 8 doesn't seem to complain either, so it may be that (as ugly as it is) we can just dump HTML into data-* attributes if we want to. TIL!

Trying to understand why this works, I think this is because attributes can contain 'a mixture of text and character references' and when parsing an HTML attribute escaped characters (starting with ampersands) are converted to their corresponding code points. So by the point the dataset is created the quotes have already been converted back to their corresponding characters?

If I'm reading it right, this behaviour is defined in the current WHATWG HTML spec and the HTML5 spec (here). We should check what happens in older browsers like IE8 / IE9 though which do not use the HTML5 tokenizer. (I can't read – you've said you've tested IE8 and IE9)

@36degrees
Copy link
Contributor

I also ran the example from above through https://validator.w3.org/nu/#textarea and it raised no violations.

@querkmachine
Copy link
Member

Is that using the escape filter or just relying on autoescape?
[…]
We might want to be explicit about it and use the escape filter? (Although I'm assuming escape doesn't do anything when autoescape is enabled.)

I was testing in the review app, which has autoescape enabled. Turning it off makes a mess unless the escape filter is used. The filter and config option seem to work identically in terms of what characters they escape.

Do we think we need to provide a Text option that is escaped (somehow?) as well?

It does seem redundant to have both Html and Text versions—one that might be escaped by autoescape, one that will be escaped with the escape filter, both technically able to take HTML—we should probably stick with Html and explicitly escape it, though I don't feel too strongly on that point.

We should check what happens in older browsers like IE8 / IE9 though which do not use the HTML5 tokenizer.

For what it's worth, as none of the spikes carried out so far support IE, my means of testing in IE was relatively hacky (read: using console commands to try and extract the data-* attribute value and output it to an innerHTML somewhere). It worked in those instances, but I can't say I have 100% confidence that it's without caveats.

@querkmachine
Copy link
Member

querkmachine commented Aug 24, 2022

The polyfill for dataset isn't particularly bulky, so that seems like a sensible route to go to avoid some headaches.

After further experimentation, it seems like this polyfill does not support data-* attributes in the format we intend to use them (aka, with a dot in the middle), it only covers dashes as separators.

Given this might just be a regex thing, I'm gonna try and tweak it to work; otherwise we may have to find an alternative.

Edit: It was just a regex thing. All good now!

@querkmachine
Copy link
Member

The team seems to be happy with the decisions made here, so I'm going to consider this complete.

To summarise:

  • Configuration objects are not limited to internationalisation support. This is intended to work with future JS configuration work too.
  • Configuration objects will be 'flattened' to a single-depth set of key-value pairs to simplify merging.
  • Keys in JS and Nunjucks parameters will use camelCase. HTML data-* attributes will use kebab-case (these are automatically transformed into camelCase when accessed using dataset).
  • Keys in JS and data-* attributes will be namespaced by prefixing them with i18n.. Nunjucks parameters will not, to be consistent with how existing strings are already handled.
  • Nunjucks parameter names will be suffixed with Html. The values passed to these should be manually escaped with the escape/e filter, to account for users who have the global autoescape option turned off.

Examples

Passing a translation through a Nunjucks macro.

{{ govukAccordion({
  showAllSectionsHtml: 'Show everything',
  showSectionHtml: 'Show<span class="govuk-visually-hidden"> this section</span>'
}) }}

How this is interpolated into the template:

<div
  {%- if params.showAllSectionsHtml %} data-i18n.show-all-sections="{{ params.showAllSectionsHtml | escape }}"{% endif %}
  {%- if params.showSectionHtml %} data-i18n.show-section="{{ params.showSectionHtml | escape }}"{% endif %}
>

The equivalent HTML generated by the macro.

<div data-i18n.show-all-sections="Show everything" data-i18n.show-section="Show<span class=&quot;govuk-visually-hidden&quot;> this section</span>">

Passing a translation through the component's JavaScript initialisation. This supports both nested and flat objects.

new Accordion($element, {
  i18n: {
    showAllSections: 'Show everything',
    showSection: 'Show<span class="govuk-visually-hidden"> this section</span>'
  },
  // OR
  'i18n.showAllSections': 'Show everything',
  'i18n.showSection': 'Show<span class="govuk-visually-hidden"> this section</span>'
}).init()

Passing a translation through Frontend's initAll function. This will also support nested and flat objects, within each component's namespace.

window.GOVUKFrontend.initAll({
  accordion: {
    i18n: {
      showAllSections: 'Show everything',
      showSection: 'Show<span class="govuk-visually-hidden"> this section</span>'
    },
    // OR 
    'i18n.showAllSections': 'Show everything',
    'i18n.showSection': 'Show<span class="govuk-visually-hidden"> this section</span>'
  },
})

After the configurations have been merged and flattened, the resulting object should look something like this:

{ 
  'i18n.showAllSections': 'Show everything',
  'i18n.showSection': 'Show<span class="govuk-visually-hidden"> this section</span>'
}

@penx
Copy link

penx commented Aug 31, 2022

Thanks for your reply here @domoscargin - #2740 (reply in thread)

Ultimately, we want to encourage port maintainers to use our JavaScript, so we’re not planning to include this JSON file in the initial release. We’ll keep the conversation going with port maintainers to see what we can do to support them, and gather feedback from the release so we can make further calls.

With React (and perhaps other frameworks), the framework takes control of the HTML rendering for regions of the page that are rendered by the framework.

There are a few details here:

https://reactjs.org/docs/integrating-with-other-libraries.html#integrating-with-dom-manipulation-plugins

As govuk-frontend's JavaScript modifies the DOM, I would expect any React-based port would have to reimplement the govuk-frontend JavaScript, which is why access to the underlying strings would still be useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

No branches or pull requests

5 participants