Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Research a data-attribute approach to passing translations to component JavaScript #2736

Closed
6 tasks done
Tracked by #1708
vanitabarrett opened this issue Jul 28, 2022 · 11 comments
Closed
6 tasks done
Tracked by #1708

Comments

@vanitabarrett
Copy link
Contributor

vanitabarrett commented Jul 28, 2022

What

Research into how a data-attribute based approach to passing translations for strings in our component JavaScript would work. This might be in isolation and/or alongside another approach, like passing a translations object through initAll() at the point of initialising.

We want to know if this approach feels feasible (even if it may be untidy, in terms of a lot of data attributes), or if it's a non-starter.

Why

We're unsure if our current planned approach for passing translations to our component JavaScript will work well for all our users.
We want to know if there are any technical limitations or strong downsides to a data-attributes approach that would mean it's not appropriate for our use case (even if working alongside our current planned approach)

Assumptions

  • That service teams may prefer to pass in translations via Nunjucks macros / data attributes
  • That we want to abstract away what strings are in JS and which are in the HTML/template, so users don't need to know the difference

Timebox

1 day
We should review progress after this period of time has elapsed, even if the spike has not been 'completed'

Who is working on this?

Spike lead: @domoscargin

Spike buddy: @vanitabarrett

Questions to answer

  • Are there any technical limitations to having many data attributes on a HTML element, e.g: a cap on the number of attributes, the length of attributes, content within attributes?
  • Would it play nicely alongside our current approach (passing a translations object at point of initialising), if we wanted to give people two ways of passing translations? How would we set priorities if both were set?
  • Are there any downsides to this approach, other than it makes the code pretty 'ugly', that might make us want to rule this out?

Done when

You may find it helpful to refer to our expected outcomes of spikes.

  • Questions have been answered or we have a clearer idea of whether this is a potential approach - I don't think we need to know for sure whether we'll use this, as that will also come from community feedback on our current approach, but we should know whether it's at all feasible
  • Findings have been reviewed and agreed with at least one other person
  • Findings have been shared, e.g: via a write-up on the ticket, at a show & tell or team meeting
@querkmachine
Copy link
Member

Immediate issues that come to mind that warrant investigation:

  • Strings that contain quotation marks—these would likely need to be escaped or encoded in the data-* attribute.
  • Strings that contain embedded HTML—most frequently for visually-hidden text.

It might also be worth investigating the feasability of using encoded JSON, to allow for non-string data structures within the data-* attribute. This might let us cut down on the extreme cases of components having dozens of i18n data-* attributes.

@domoscargin
Copy link
Contributor

It might also be worth investigating the feasability of using encoded JSON, to allow for non-string data structures within the data-* attribute.

Bootstrap are experimenting with this: https://getbootstrap.com/docs/5.2/getting-started/javascript/#data-attributes

It looks like all their data attributes can now house JSON values.

@domoscargin domoscargin self-assigned this Aug 2, 2022
@domoscargin
Copy link
Contributor

domoscargin commented Aug 4, 2022

Technical limitations

Element attributes length and count:

HTML5

An attribute value is a string. Except where otherwise specified, attribute values on HTML elements may be any string value, including the empty string, and there is no restriction on what text can be specified in such attribute values

Every HTML element may have any number of custom data attributes specified, with any value.

https://jsbin.com/wikulamuni/3/edit?html,js,output This script works fine in Chrome with an attribute value length of 10 million+ characters.

HTML4

HTML4 allows 60 attributes with a total of 65536 characters.

Attribute values restricted characters

Attribute values are a mixture of text and character references, except with the additional restriction that the text cannot contain an ambiguous ampersand.

Other limitations

  • IE 10 and below do not support dataset (we currently use getAttribute() in various places).
  • Performance of reading data-* attributes compared to storing the data in regular JS objects is poor (though extremely unlikely to ever be an issue).
  • In Firefox 49.0.2 (and perhaps earlier/later versions) data attributes that exceed 1022 characters will not be ready by Javascript (EcmaScript 4).

Implementation

I have a spike branch here: https://github.com/alphagov/govuk-frontend/compare/bk-spike-i18n-data-attributes

I think this is non-breaking after the i18n stuff has been added, so wouldn't necessarily have to be targetted for the next major release change.

getOptions

I've added a getOptions function tocommon.mjs. This function gathers and prioritises options in the following order:

  1. params passed in programmatically during initialisation
  2. params derived from data attributes
  3. default params

It's in a very rough state (see comments in the file), but works for the minimal example...

example

I've added an example using the Accordion. I've pasted the HTML and added data attributes (Nunjucks macros would simply take params and use them as data attributes internally).

Further work

On this spike

  • I'd like to add a bunch of test cases, testing both the behaviour of attribute values mentioned on this issue, and the order of precedence for config options.
  • Would be nice to get this working with the Character count as well
  • Consider other approaches

If implemented

  • Investigate the namespacing of data attributes (eg: data-govuk-foo) to avoid clashes with anything else using data attributes.
  • Decide on naming conventions for data attributes - what do we do in cases where a simple attribute key->value to get transformed into a more complex object (the i18n attributes, for example, need to get put into { i18n.translations }). Do we investigate defining specific separators etc? Or handle all of that magically on our side? Or simplify our config?
  • Create a slick getOptions function that handles transformation and precedence
  • Implement on each component that needs it
  • [stretch] Investigate creating a component base class which other components extend from, which would contain the logic for creating component config options

@querkmachine
Copy link
Member

Thanks @domoscargin,

I think the first two in the order of precedence should be reversed. data-* attributes are scoped to a single instance of a component; whereas the others are wider in scope. The more 'targeted' the configuration method, the higher precedence it feels like it should be.

It doesn't seem like there are likely we'll hit any technical restrictions that affect the use of data-*, except for potentially IE10, unless we opt not to release this feature after the initial internationalisation release.

I like the idea of having an inherited base class that includes this and other programatic API work. It feels like a more scalable way to handle it going forwards.

I'm still curious as to how this may handle features such as embedded HTML or nested JSON.

Good work so far! Interested to see where it goes.

@domoscargin
Copy link
Contributor

data-* attributes are scoped to a single instance of a component; whereas the others are wider in scope.

I agree with this, but was wavering because the proposal includes programmatic initialisation of an individual component. I considered somehow making the priority individual programmatic component -> data attributes -> initAll -> default, but splitting out the programmatic options like that seems like a recipe for confusion.

@vanitabarrett
Copy link
Contributor Author

vanitabarrett commented Aug 5, 2022

Thanks for the really clear write-up @domoscargin , great work 👍 The general gist I'm getting is that there's no real technical limitations that would stop us from doing this, so it may come down to other things, primarily if it's what users need/find useful, how we'd document it (especially if it sits alongside the existing approach for passing translations).

I agree with @querkmachine on the specificity order - I think my assumption was the reversed order, with data attributes taking priority (thinking of them a bit like inline styles or our override classes), but that's an implementation detail we can get into if we're thinking of going with this approach.

I don't think it's worth getting too in depth until we know it's something we're going to do (and we'll need to wait for our user feedback before making any decisions, I think), but probably worth having a quick play around with the things Kim has mentioned to make sure none of them are going to cause us issues:

  • strings with quotation marks
  • strings with HTML
  • does using JSON objects make this easier?

@vanitabarrett
Copy link
Contributor Author

vanitabarrett commented Aug 8, 2022

I thought I'd try out the JSON approach in a separate branch here

➡️ View spike branch here
Note: The first 2 commits have slightly different approaches: one a bit more prescriptive on what params you can pass in, one just accepting a generic translations object. The actual logic doesn't differ much between the two.
Note part 2: I've used the dump filter to stringify the JSON in the data attribute. If we were going to implement this data-attribute approach, we'd probably want to do a bit of digging into whether that's the best approach and/or if anything else is needed. It looks like dump just runs JSON.stringify(), but somehow the string also has encoded quotations and I'm not sure what's doing that right now.

Seems technically feasible to implement, but I wonder if it fully resolves the issue of a user needing to understand where a string is used. It still partly separates out the JavaScript strings from the Nunjucks strings, so while it feels easier because it's all in one place, you'd still probably need to consult the macro documentation to understand what goes where.

It also might feel weird to anyone using the HTML versions, because adding JSON into a data attribute isn't something you do very often.

Having said that, I do quite like it as an alternative to our existing approach, because it feels nicer to have it all in one place.

@domoscargin give me a shout if you'd like to talk about this spike when you're back! Happy to chat and figure out where we go from here

@domoscargin
Copy link
Contributor

domoscargin commented Aug 10, 2022

Thanks for this, @vanitabarrett!

So it seems like, in terms of the questions we're asking on this spike:

  • there are no big technical limitations that can't be overcome. ✅
    • We'd probably need some guidance on getting JSON into the right shape to be used in an attribute (ie: wrap it in single quotes, or escape/encode it, or use Nunjucks templating).
  • It works in conjunction with our current approach. ✅
    • I still think we'd need to consider priority in the case of inidividually initialised components vs data attributes, but that's an implementation detail.
  • There are some downsides ✅
    • Passing in a JSON object separates Nunjucks and JavaScript strings so user has to figure out where things go
    • JSON in data attributes isn't very common
    • Need to make sure the JSON plays nicely as an attribute string

I think these comments are pretty easy to follow, so we could ask folks to glance at it and then call this done, or I could write up a summary?

@vanitabarrett
Copy link
Contributor Author

@domoscargin I think that's probably a good summary in itself!

Worth noting that "Separates Nunjucks and JavaScript strings so user has to figure out where things go" only really applies if going with the approach where a user passes in a JSON object for translations. It wouldn't be the case if we separated out each individual translation string into it's own macro param, although that has it's own downside where you may have a lot of data-attributes on the component.

@36degrees
Copy link
Contributor

Had a little play with an alternative approach to passing the JSON, by sticking it in an inline <script type="application/json"> within the module. The main advantage I can think of is that we can output it unescaped, which makes it slightly easier to debug? I don't think there's much in it TBH, but thought it was worth sharing:

https://github.com/alphagov/govuk-frontend/compare/vbs-spike-i18n-data-json...ob-spike-i18n-data-json-inline-script?expand=1

(Spikes on spikes on spikes at this point?)

@vanitabarrett
Copy link
Contributor Author

We have decided we're going to go with a data-attribute approach for passing translations, alongside the currently proposed JS approach. I'm going to close this spike and write up some new cards for doing the work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

No branches or pull requests

4 participants