-
Notifications
You must be signed in to change notification settings - Fork 990
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Refactor internationalization system #1148
Conversation
@BertalanD I will have a look at that soon, I just want to get a 0.12 release first so this PR can be the main focus of 0.13 |
* Add line highlighting to code blocks * Fix highlighting of lines Apparently every line to be highlighted is provided in one chunk. * Add more documentation to codeblock.rs * Turn FenceIter into an Iterator * Move Range to fence.rs * Add tests
Better minify errors
* Allow site path to contain underscores Fixes site.css is not being generated if any part of the path contains underscores * Add tests for path with underscores
I'll definitely have my eyes pealed for 0.13 or when this PR hits master. |
I've just been put in quarantine, so I'll start working on it tomorrow. |
23a5cdf
to
ba5cecf
Compare
I'm finally done with the core functionality and have written up some user docs. Updating the docs for the template API is a big task, so before that, I would like to ask for help in a couple of places:
Lastly, I'd like @Keats' help about what tests to implement and finding beta testers. Requesting a review. |
39e5a0c
to
632f587
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I started reviewing it but the PR is way too big to be properly reviewed on its own.
Can we split it in smaller PRs?
/// | ||
/// If unset in a translation, it will fall back to how it's set for the default language. | ||
/// Its default value for the primary language is `false`. | ||
pub generate_feed: Option<bool>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rather deprecate the top generate_feed
in favour of feed
than having generate_feed
everywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would feel a bit ridiculous alongside build_search_index
. How would you feel about also renaming build_search_index
to searchable
?
/// | ||
/// If unset in a translation, it will fall back to how it's set for the default language. | ||
/// Its default value for the default language is `false`. | ||
pub build_search_index: Option<bool>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: we should show a message if the language cannot be built into a search index, only a few languages are supported right now.
pub search: search::Search, | ||
|
||
/// All user params set in [extra] in the config | ||
pub extra: HashMap<String, Toml>, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know about that..
Things like compile_sass
, minify_html
etc are not language specific. Maybe pass something like language_config
to the template instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. I mainly want to have language-specific search because CJK search indices can get huge; and tight bandwidth might exclude things like feeds
lang: lang.clone(), | ||
default_language_options, | ||
languages, | ||
..self.clone() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this means we clone the Config every time we render a template?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, yes. I'm thinking about having a struct with just references like SerializingTaxonomy
and co. I'm way too inexperienced to not resort to cloning something with that many borrows.
// TODO: add test | ||
bail!("Page `{}` has an empty language. Did you mistakenly put 2 dots before the extension?", file_path.display()); | ||
} else if l == &config.default_language_options.language_alias { | ||
bail!("Pages in the default language should not set it in their title (caused by `{}`).", file_path.display()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does that cause any issues?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't think so. The code currently does fail though if the default language is set in a file name.
if !config.languages_codes().contains(&parts[1].as_ref()) {
bail!("File {:?} has a language code of {} which isn't present in the config.toml `languages`", self.path, parts[1]);
}
if page.meta.date.is_none() { | ||
page.meta.date = Some(caps.name("datetime").unwrap().as_str().to_string()); | ||
// TODO: do not fail silently on invalid formats |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we have a test for that?
// Catch programming errors | ||
// `lang` and `language_alias` must be non-empty | ||
assert_ne!(section.lang, LanguageIdentifier::default()); | ||
assert!(!section.language_alias.is_empty()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like an almost copy/paste than the page.rs file
let mut fields = vec![]; | ||
if config.search.include_title { | ||
let search = config.0.default_language_options.search.clone().expect("search set"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we ever have None?
}; | ||
pub fn build_index(library: &Library, config: &LocalizedConfig) -> Result<String> { | ||
let language = Language::from_code(config.0.lang.language.as_str()).ok_or_else(|| { | ||
format!("Tried to build search index for language {} which is not supported", config.0.lang) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can move that error at start time
I'm not sure how to split this up. The new Sorry for the code spam. |
"#, | ||
&context, | ||
) | ||
.unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rustfmt really messed up this part. Sorry about that. I probably should revert changes to this file.
I put a comment on https://zola.discourse.group/t/rfc-internationalization-system-rework/546/7, let's think about the "what are we trying to achieve" first so you can do more incremental changes. |
…1166) * Update compile_sass docs, this is just for sass files in site root * Improve file watch error, question whether the target exists
This commit starts the rework of the localization system, as described in my [forum post] a while ago. Its intent is to make it easier to eventually transition to a [Fluent]-based localization system and allow the official [Rust website] to move to Zola. Monolingual sites should continue working without change. If you use the current localization system, take a look at the summary at the end of the commit message. User documentation has not yet been created, so the key changes are described here as follows: - Require `config.default_language` and any additional `config.languages` to be valid Unicode language identifiers. These are compatible with HTML's [BCP 47] language codes and allow for correct language-dependent behavior (e.g. date formatting...) - Added `language_alias` field for `config` and all languages in `config.languages`, which sets how the language should be searched in file names and displayed in URLs. This lets us avoid breaking links to existing multilingual sites. - Added struct `LocaleOptions`, which contains language-specific settings. The title, description, taxonomies, feed limit, search options and `extra` keys are now also set here. - `LocaleOptions` belonging to the default language are in the field `default_language_options`. Its fields are transparently flattened, so `config.toml`'s syntax stays the same for monolingual sites. - Changed type of `config.languages` from `Vec<Language>` to `HashMap<LanguageIdentifier, LocaleOptions>. - Removed `config.translations`. Its functionality is taken over by `config.languages.{lang}.extra`. - Missing fields in the `extra` field of translations fall back to the values set for the default language. - Themes can also define translations similarly to `config.toml`. All fields except `extra` and `languages.{lang}.extra` are ignored. - Removed `lang` fields of taxonomies, as they are now to be defined under a language. - Added `LocalizedConfig` that wraps `Config` with `default_language_options` replaced with a translation's settings. TODO: investigate if a `SerializingConfig` should be used instead. `config.extra` in templates now contains the values for the specific language. Other langugages' `extra` is best accessed via the `trans` function. - Language and date parsing from filenames is done in `library::content::file_info`. The new `maybe_lang` and `date` fields are not guaranteed to be semantically correct. There is no need any more to externally mutate FileInfo when setting these parameters. - Reduced the distinction of "single-language mode". Page filenames are parsed as "{date}{_,-}{slug}.{lang}.md" regardless of the presence of translations. Section indices behave the same except dates aren't parsed. - Each language has its own Tera context. This requires cloning the config way too many times, so a more suitable and idiomatic solution should be sought. - Functions taking a `lang` argument use identifiers (not aliases!; like `lang` in context), and they fall back to the active language if not specified, not the site's default language. - Additional dependency: [unic_langid]. This is also used by Project Fluent, and thus by Mozilla, so it won't likely be abandoned in the future. For template authors: links constructed in templates and containing `lang` WILL break. URLs will always use the value that's in `language_alias`. You should replace `trans()` function calls with directly getting the value from `config.extra`, as it allows for data other than strings and will be deprecated in the future. For users: unless you have a multilingual site, everything will keep working as expected. Multilingual sites WILL stop working. The new `config.toml` layout is demonstrated in `test_site_i18n/config.toml` [forum post]: https://zola.discourse.group/t/rfc-internationalization-system-rework/546 [Rust website]: https://www.rust-lang.org/ [BCP 47]: https://tools.ietf.org/html/bcp47 [unic_langid]: https://crates.io/crates/unic-langid
- renamed `clear_filename` to `strip_filename_info` - removed unnecessary String allocations from `get_language` and `get_date`
9a60078
to
7a90859
Compare
sorry >_> |
Closing in favour of #1391 as it addresses some existing issues and this PR would be too much work to rebase cleanly. Thanks a lot @BertalanD ! |
This PR (and #1040) should be reconsidered. I saw in notifications that #1391 got merged and came back to check out the project and see if it would work yet for any of my sites. The answer seems to be no, the i18n work actually done there is very rudimentary. I don't mean this to be rude @Keats, but please consider that while you may not have a use case for a true multilingual site this is a pretty big deal — for many a make or break issue. Please consider trusting the people that are trying to contribute something more here. Having a Fluent based static site generator with multi language templating capabilities would be a big deal. Seeing multiple people try to contribute towards this and having their issues shut down is immensely discouraging. This is still up for grabs because no other static site generator does this right yet either, but as long as you keep shutting down PRs that attempt to address this I don't see how we're going to get anywhere. |
This pull request implements the redesigned localization system as described in my forum post.
In summary, the goal is to be transparent towards templates, but customizable and explicit towards users. A Project Fluent-based string translation system will eventually be introduced that's similar to #1040 by @XAMPPRocky.
See the commit messages for an in-depth changelog.
To-do
extra
from page/section header?? + language in front matterclone()
s (kind of done)