Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Refactor internationalization system #1148

Closed
wants to merge 16 commits into from

Conversation

BertalanD
Copy link

@BertalanD BertalanD commented Aug 28, 2020

This pull request implements the redesigned localization system as described in my forum post.

In summary, the goal is to be transparent towards templates, but customizable and explicit towards users. A Project Fluent-based string translation system will eventually be introduced that's similar to #1040 by @XAMPPRocky.

See the commit messages for an in-depth changelog.

To-do

  • Refactor existing code
  • Merge extra from page/section header?? + language in front matter
  • Add proper linking in doc comments
  • Add more tests
  • Add fluent function
  • Load locales from both site and theme
  • Generate Fluent stuff on init
  • Reduce the amount of clone()s (kind of done)
  • User testing
  • Add user documentation
  • Add template API documentation
  • Create a demo theme or internationalize an existing theme

@Keats
Copy link
Collaborator

Keats commented Aug 31, 2020

@BertalanD I will have a look at that soon, I just want to get a 0.12 release first so this PR can be the main focus of 0.13

@Keats Keats closed this Sep 4, 2020
sburris0 and others added 5 commits September 7, 2020 17:26
* Add line highlighting to code blocks

* Fix highlighting of lines

Apparently every line to be highlighted is provided in one chunk.

* Add more documentation to codeblock.rs

* Turn FenceIter into an Iterator

* Move Range to fence.rs

* Add tests
* Allow site path to contain underscores

Fixes site.css is not being generated if any part of the path contains
underscores

* Add tests for path with underscores
@alerque
Copy link

alerque commented Sep 9, 2020

I'll definitely have my eyes pealed for 0.13 or when this PR hits master.

@BertalanD
Copy link
Author

I've just been put in quarantine, so I'll start working on it tomorrow.

@BertalanD
Copy link
Author

BertalanD commented Sep 13, 2020

I'm finally done with the core functionality and have written up some user docs.

Updating the docs for the template API is a big task, so before that, I would like to ask for help in a couple of places:

  • LocalizedConfig and the copied standalone Tera contexts for each language seem awkward and slow for me. Does anybody have an idea how they might be replaced?
  • Are there any config values you would like to be overridable for each language, but aren't yet?
  • Doing {{ config.extra.whatever }} in templates seems awkward to me (especially since extra values come from translations, themes, and hopefully in the future, from the front matter, too). Any better ideas?

Lastly, I'd like @Keats' help about what tests to implement and finding beta testers.

Requesting a review.

@BertalanD BertalanD marked this pull request as ready for review September 14, 2020 18:52
Copy link
Collaborator

@Keats Keats left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I started reviewing it but the PR is way too big to be properly reviewed on its own.
Can we split it in smaller PRs?

///
/// If unset in a translation, it will fall back to how it's set for the default language.
/// Its default value for the primary language is `false`.
pub generate_feed: Option<bool>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather deprecate the top generate_feed in favour of feed than having generate_feed everywhere.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would feel a bit ridiculous alongside build_search_index. How would you feel about also renaming build_search_index to searchable?

///
/// If unset in a translation, it will fall back to how it's set for the default language.
/// Its default value for the default language is `false`.
pub build_search_index: Option<bool>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: we should show a message if the language cannot be built into a search index, only a few languages are supported right now.

pub search: search::Search,

/// All user params set in [extra] in the config
pub extra: HashMap<String, Toml>,
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know about that..
Things like compile_sass, minify_html etc are not language specific. Maybe pass something like language_config to the template instead?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. I mainly want to have language-specific search because CJK search indices can get huge; and tight bandwidth might exclude things like feeds

components/config/src/config/mod.rs Outdated Show resolved Hide resolved
lang: lang.clone(),
default_language_options,
languages,
..self.clone()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this means we clone the Config every time we render a template?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, yes. I'm thinking about having a struct with just references like SerializingTaxonomy and co. I'm way too inexperienced to not resort to cloning something with that many borrows.

// TODO: add test
bail!("Page `{}` has an empty language. Did you mistakenly put 2 dots before the extension?", file_path.display());
} else if l == &config.default_language_options.language_alias {
bail!("Pages in the default language should not set it in their title (caused by `{}`).", file_path.display());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does that cause any issues?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't think so. The code currently does fail though if the default language is set in a file name.

if !config.languages_codes().contains(&parts[1].as_ref()) {
            bail!("File {:?} has a language code of {} which isn't present in the config.toml `languages`", self.path, parts[1]);
        }

if page.meta.date.is_none() {
page.meta.date = Some(caps.name("datetime").unwrap().as_str().to_string());
// TODO: do not fail silently on invalid formats
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have a test for that?

// Catch programming errors
// `lang` and `language_alias` must be non-empty
assert_ne!(section.lang, LanguageIdentifier::default());
assert!(!section.language_alias.is_empty());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like an almost copy/paste than the page.rs file

let mut fields = vec![];
if config.search.include_title {
let search = config.0.default_language_options.search.clone().expect("search set");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we ever have None?

};
pub fn build_index(library: &Library, config: &LocalizedConfig) -> Result<String> {
let language = Language::from_code(config.0.lang.language.as_str()).ok_or_else(|| {
format!("Tried to build search index for language {} which is not supported", config.0.lang)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can move that error at start time

@BertalanD
Copy link
Author

I'm not sure how to split this up. The new LocaleOptions + LanguageIdentifier stuff takes up most of the change anyway, and dividing that into parts if not really an option, because almost all lines referencing Config would still need to be modified.
I'm open to leaving the Fluent/Docs part for later.

Sorry for the code spam.

"#,
&context,
)
.unwrap();
Copy link
Author

@BertalanD BertalanD Sep 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rustfmt really messed up this part. Sorry about that. I probably should revert changes to this file.

@Keats
Copy link
Collaborator

Keats commented Sep 16, 2020

I put a comment on https://zola.discourse.group/t/rfc-internationalization-system-rework/546/7, let's think about the "what are we trying to achieve" first so you can do more incremental changes.

timvisee and others added 8 commits September 16, 2020 12:55
…1166)

* Update compile_sass docs, this is just for sass files in site root

* Improve file watch error, question whether the target exists
This commit starts the rework of the localization system, as described
in my [forum post] a while ago. Its intent is to make it easier to
eventually transition to a [Fluent]-based localization system and allow
the official [Rust website] to move to Zola.

Monolingual sites should continue working without change. If you use the
current localization system, take a look at the summary at the end of
the commit message.

User documentation has not yet been created, so the key changes are
described here as follows:

- Require `config.default_language` and any additional
  `config.languages` to be valid Unicode language identifiers. These
   are compatible with HTML's [BCP 47] language codes and allow for
   correct language-dependent behavior (e.g. date formatting...)

- Added `language_alias` field for `config` and all languages in
  `config.languages`, which sets how the language should be searched in
  file names and displayed in URLs. This lets us avoid breaking links to
  existing multilingual sites.

- Added struct `LocaleOptions`, which contains language-specific
  settings. The title, description, taxonomies, feed limit, search
  options and `extra` keys are now also set here.

- `LocaleOptions` belonging to the default language are in the field
  `default_language_options`. Its fields are transparently flattened,
  so `config.toml`'s syntax stays the same for monolingual sites.

- Changed type of `config.languages` from `Vec<Language>` to
  `HashMap<LanguageIdentifier, LocaleOptions>.

- Removed `config.translations`. Its functionality is taken over by
  `config.languages.{lang}.extra`.

- Missing fields in the `extra` field of translations fall back to the
  values set for the default language.

- Themes can also define translations similarly to `config.toml`. All
  fields except `extra` and `languages.{lang}.extra` are ignored.

- Removed `lang` fields of taxonomies, as they are now to be defined
  under a language.

- Added `LocalizedConfig` that wraps `Config` with
  `default_language_options` replaced with a translation's settings.
  TODO: investigate if a `SerializingConfig` should be used instead.
  `config.extra` in templates now contains the values for the specific
  language. Other langugages' `extra` is best accessed via the `trans`
  function.

- Language and date parsing from filenames is done in
  `library::content::file_info`. The new `maybe_lang` and `date` fields
  are not guaranteed to be semantically correct. There is no need any
  more to externally mutate FileInfo when setting these parameters.

- Reduced the distinction of "single-language mode". Page filenames
  are parsed as "{date}{_,-}{slug}.{lang}.md" regardless of the presence
  of translations. Section indices behave the same except dates aren't
  parsed.

- Each language has its own Tera context. This requires cloning the
  config way too many times, so a more suitable and idiomatic solution
  should be sought.

- Functions taking a `lang` argument use identifiers (not aliases!;
  like `lang` in context), and they fall back to the active language
  if not specified, not the site's default language.

- Additional dependency: [unic_langid]. This is also used by Project
  Fluent, and thus by Mozilla, so it won't likely be abandoned in the
  future.

For template authors: links constructed in templates and containing
`lang` WILL break. URLs will always use the value that's in
`language_alias`. You should replace `trans()` function calls with
directly getting the value from `config.extra`, as it allows for data
other than strings and will be deprecated in the future.

For users: unless you have a multilingual site, everything will keep
working as expected. Multilingual sites WILL stop working. The new
`config.toml` layout is demonstrated in `test_site_i18n/config.toml`

[forum post]: https://zola.discourse.group/t/rfc-internationalization-system-rework/546
[Rust website]: https://www.rust-lang.org/
[BCP 47]: https://tools.ietf.org/html/bcp47
[unic_langid]: https://crates.io/crates/unic-langid
- renamed `clear_filename` to `strip_filename_info`
- removed unnecessary String allocations from `get_language` and
  `get_date`
@Keats
Copy link
Collaborator

Keats commented Jan 9, 2021

sorry >_>

@Keats Keats reopened this Jan 9, 2021
@Keats
Copy link
Collaborator

Keats commented Mar 14, 2021

Closing in favour of #1391 as it addresses some existing issues and this PR would be too much work to rebase cleanly.

Thanks a lot @BertalanD !

@Keats Keats closed this Mar 14, 2021
@alerque
Copy link

alerque commented Apr 8, 2021

This PR (and #1040) should be reconsidered. I saw in notifications that #1391 got merged and came back to check out the project and see if it would work yet for any of my sites. The answer seems to be no, the i18n work actually done there is very rudimentary. I don't mean this to be rude @Keats, but please consider that while you may not have a use case for a true multilingual site this is a pretty big deal — for many a make or break issue. Please consider trusting the people that are trying to contribute something more here. Having a Fluent based static site generator with multi language templating capabilities would be a big deal. Seeing multiple people try to contribute towards this and having their issues shut down is immensely discouraging. This is still up for grabs because no other static site generator does this right yet either, but as long as you keep shutting down PRs that attempt to address this I don't see how we're going to get anywhere.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants