Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Swedish language support #35

Closed
Timmyfox opened this issue Nov 22, 2024 · 27 comments
Closed

Swedish language support #35

Timmyfox opened this issue Nov 22, 2024 · 27 comments

Comments

@Timmyfox
Copy link

Hello! I discovered this package now after a recent Hyperref update has partially broken Cleveref. Unfortunately, I discovered that one of the biggest hurdles to being able to replace Cleveref with Zref-clever is that it lacks support for Swedish.

Therefore, I have gone ahead and created a language file for this. Most of it's based on Cleverefs translations, with some tweaks and modifications. I haven't tested it too thoroughly, so there might be some errors, but it should be correct enough for most part.

zref-clever-swedish.txt

@gusbrs
Copy link
Owner

gusbrs commented Nov 22, 2024

Hi @Timmyfox , thank you. This is most welcome, and I'll add it, of course. But, before I do so, some comments and questions.

I see you haven't included gender in the language declaration and for each of the types. Is it because in Swedish the articles do not inflect (as in English)? How do articles work in Swedish? (I have no idea...). Also, it seems there's some relation with German, so does Swedish also have (noun) declension?

Btw, did you have a chance to take a look at the "Localization guidelines" at texdoc zref-clever-code?

I haven't tested it too thoroughly, so there might be some errors, but it should be correct enough for most part.

Well... I do expect you to be more sure than that before we release it. ;-)

@Timmyfox
Copy link
Author

Timmyfox commented Nov 22, 2024

Hey!

  • Gendering: Swedish originally had a gendering system similar to German, however a lot of this has been largely simplified and it's mostly just some traces here and there. There are technically two genders (with around 3/4 of words being of the primary gender), but unlike German this is a lot more basic (probably best compared as a slightly more elaborate version of the English a vs an). Nevertheless, looking over the German language file, that seems easy enough to add so I suppose I might as well add it. I can't think of a use-case where it would make any actual difference however.
  • Declension: Similar story here, there are some traces from its Germanic roots with nominative and genitive cases. In practice, this does nothing more than to add the equivalent of a possessive 's at the end of the word. The only case I can imagine this mattering for would be something like "The table's 4th row" or "Chapter 4's fifth section", both of which can be easily rewritten to avoid it altogether (e.g. "The fifth section of chapter 4").
  • Localization guidelines: Yes. I did! I also checked the user manual for further clarification where uncertain.
  • Testing: Yes, of course! What I mean is I've done basic testing by just writing a simple document to go through each translation and generate a basic test case. What I haven't done is apply it to any elaborate real documents or go beyond the simple examples that initially came to mind. If you have any more elaborate per-existing "testing" documents available, this could be useful to perform more elaborate testing—even if those test documents are in another language (such as English or German), the surrounding text should be easy enough to translate.

Also, as a final note, the biggest thing I'm not 100% sure about is the plural usage. To my understanding as a native speaker and a graduate student, Swedish actually uses singular forms for most reference types even for plural references. For example, Swedish would write "Figure 5 and 6" instead of "Figures 5 and 6", although there were a few cases where I was uncertain and couldn't find any useful sources, so I will check with my linguistics professor for further clarification. Primarily the case where this would differ is with definite cases (e.g. "The figures 5 and 6"). Cleveref, for comparison, simply uses the singular word consistently for both singular and plural cases. I will provide an update once I have gotten clarification on this.

@gusbrs
Copy link
Owner

gusbrs commented Nov 22, 2024

  • Gendering: Swedish originally had a gendering system similar to German, however a lot of this has been largely simplified and it's mostly just some traces here and there. There are technically two genders (with around 3/4 of words being of the primary gender), but unlike German this is a lot more basic (probably best compared as a slightly more elaborate version of the English a vs an). Nevertheless, looking over the German language file, that seems easy enough to add so I suppose I might as well add it. I can't think of a use-case where it would make any actual difference however.

For zref-clever the gender does not affect output, but only the nudge feature, which can generate a warning in case of gender mismatch when enabled. Yes it is easy, but to include it depends on whether it makes sense for the language. If the articles preceding the type names may change with the gender, we should include them. So, if it's like French: "le chapitre", "la section", we should include them. If it's like English: "the chapter", "the section", there's no need. Either way, as you said, if needed, it is easy.

  • Declension: Similar story here, there are some traces from its Germanic roots with nominative and genitive cases. In practice, this does nothing more than to add the equivalent of a possessive 's at the end of the word. The only case I can imagine this mattering for would be something like "The table's 4th row" or "Chapter 4's fifth section", both of which can be easily rewritten to avoid it altogether (e.g. "The fifth section of chapter 4").

Ok, if it's just the possessive, the case is then similar to English, which the package doesn't actually include. (Nobody demanded it yet, and I agree with your judgement there).

  • Localization guidelines: Yes. I did! I also checked the user manual for further clarification where uncertain.

Great!

  • Testing: Yes, of course! What I mean is I've done basic testing by just writing a simple document to go through each translation and generate a basic test case. What I haven't done is apply it to any elaborate real documents or go beyond the simple examples that initially came to mind. If you have any more elaborate per-existing "testing" documents available, this could be useful to perform more elaborate testing—even if those test documents are in another language (such as English or German), the surrounding text should be easy enough to translate.

Ok, we are on the same page here. What I meant is that I cannot check this for you, since I haven't the faintest of Swedish. And since this is meant to be published, I expect it to be carefully done. But, as long as you are confident, that's enough for me, and I have to rely on you.

Regarding test files, the package has plenty of them (see the testfiles directory in the repo), but they are probably overkill for this purpose. The language files basically set type names and, occasionally some refbounds, abbreviated forms, or things of the sort. So it is fine to test on a simple file. You can actually artificially generate the "document element" simply using \newcounter (if needed), then \refstepcounter + \label a number of times. You can force a range with the range option, change cap/abbrev etc. That should be more than enough for this.

Also, as a final note, the biggest thing I'm not 100% sure about is the plural usage. To my understanding as a native speaker and a graduate student, Swedish actually uses singular forms for most reference types even for plural references. For example, Swedish would write "Figure 5 and 6" instead of "Figures 5 and 6", although there were a few cases where I was uncertain and couldn't find any useful sources, so I will check with my linguistics professor for further clarification. Primarily the case where this would differ is with definite cases (e.g. "The figures 5 and 6"). Cleveref, for comparison, simply uses the singular word consistently for both singular and plural cases.

Ok, please check this, if you can. Within reason, of course. It is fine if things are "to the best of your knowledge so far". And, if even after trying to check you are really divided about it, following cleveref as benchmark is not a bad idea. Either way, if need be, we can also leave a note saying we are in doubt about the best usage of this, so that future users feel more at ease to come forward and (re)discuss.

I will provide an update once I have gotten clarification on this.

I'm looking forward to it. Thank you.

@gusbrs
Copy link
Owner

gusbrs commented Nov 22, 2024

Ah, another practical question. Do you know if babel swedish has any variants or aliases?

@Timmyfox
Copy link
Author

For zref-clever the gender does not affect output, but only the nudge feature, which can generate a warning in case of gender mismatch when enabled. Yes it is easy, but to include it depends on whether it makes sense for the language. If the articles preceding the type names may change with the gender, we should include them. So, if it's like French: "le chapitre", "la section", we should include them. If it's like English: "the chapter", "the section", there's no need. Either way, as you said, if needed, it is easy.

Right, in this case the definite article in Swedish is a suffix as part of the word itself, thus then it shouldn't be needed as there is no preceding article. The one case it could make a difference is for the indefinite article, where it works almost identically to English (a figure, an appendix), but that goes beyond what is relevant here as if that's not needed for English it wouldn't be needed for Swedish.

Ah, another practical question. Do you know if babel swedish has any variants or aliases?

No, and I can't think of any such that would be relevant here either. The only plausible variant would be Finland Swedish, although this is (officially anyway) considered more of a dialect than a variant and I'm not familiar enough with the differences to make a conclusive judgement. Babel doesn't include it however, so therefore I don't think it needs to be here either.

@gusbrs
Copy link
Owner

gusbrs commented Nov 23, 2024

Right, in this case the definite article in Swedish is a suffix as part of the word itself, thus then it shouldn't be needed as there is no preceding article.

Oh, my... The more I learn about this kind of thing, the more I have to recognize how naïve some of my assumptions were for the package, and on how much flexibility would be needed to properly support language specific needs like these. If the article is a suffix as part of the word itself the package can't handle it...

I felt like I had a clear case in writing stuff like this in the manual:

Regarding the text surrounding the reference – the inflected article, the passing preposition, etc. –, the issue is more delicate. zref-clever cannot and intends not to typeset those for you. But, depending on the language, it is true that the kind of automation provided by zref-clever may betray your best efforts to get a proper surrounding text.

But if the article is a suffix to the type name itself... But, well, cleveref can't do this either, and I take it it is still useful to you, since you are here. :-) What does it actually do? I'd presume it provides the type names without the article, as that's probably the most common use case, and if someone needs to write the article, one has to do it manually. Is this it?

That's more or less what zref-clever can deliver too... Technically, we could abuse the declension infrastructure for this, but I think this is a bad idea because the semantics for the option would be completely wrong/deceiving.

In practical terms, I take it swedish should then be declared without gender, as you initially did, since it wouldn't be useful for the language anyway. Is that your understanding too?

The one case it could make a difference is for the indefinite article, where it works almost identically to English (a figure, an appendix), but that goes beyond what is relevant here as if that's not needed for English it wouldn't be needed for Swedish.

Agreed.

No, and I can't think of any such that would be relevant here either. The only plausible variant would be Finland Swedish, although this is (officially anyway) considered more of a dialect than a variant and I'm not familiar enough with the differences to make a conclusive judgement. Babel doesn't include it however, so therefore I don't think it needs to be here either.

My only concern is with what babel (and possibly polyglossia) offers, to see if we'd need to set some aliases ourselves. If there are none, we are good as is.

@Timmyfox
Copy link
Author

Timmyfox commented Nov 23, 2024

Oh, my... The more I learn about this kind of thing, the more I have to recognize how naïve some of my assumptions were for the package, and on how much flexibility would be needed to properly support language specific needs like these. If the article is a suffix as part of the word itself the package can't handle it...

I don't believe this is a problem here actually. To clarify: the word for figure in Swedish is "figur". The full indefinite form with article ("a figure") is "en figur", the "en" here indicating common gender (the other gender form, neuter gender, using "ett" instead—similar to how English uses a/an). In the definite form, this gets appended to the word as a suffix, so "the figure" is "figuren", similar to how Spanish uses -a or -o to indicate gender (la figura, el figuro). The word for "chapter" ("kapitel") is neuter gender, so the indefinite form with article would be "ett kapitel" (a chapter), and the definite form is "kapitlet" (the chapter). The only difference here is whether an -en or an -et suffix is used for word in definite form.

Why I don't think this is a problem is because this is completely self-contained and has no bearing on the surrounding words, nor does it change in any way. Thus, there is only one form per type name without any need for extra logic to handle the suffixes. It's merely a grammatical feature dictating how the word is formed.

@gusbrs
Copy link
Owner

gusbrs commented Nov 23, 2024

I don't believe this is a problem here actually. To clarify: the word for figure in Swedish is "figur". The full indefinite form with article ("a figure") is "en figur", the "en" here indicating common gender (the other gender form, neuter gender, using "ett" instead—similar to how English uses a/an). In the definite form, this gets appended to the word as a suffix, so "the figure" is "figuren", similar to how Spanish uses -a or -o to indicate gender (la figura, el figuro).

Why I don't think this is a problem is because this is completely self-contained and has no bearing on the surrounding words, nor does it change in any way. Thus, there is only one form per type name without any need for extra logic to handle the suffixes. It's merely a grammatical feature dictating how the word is formed.

Let me see if I understand this correctly. The package does not have a way to "choose" between the definite or indefinite forms (or "no article" for that matter). What you are saying is that, since this is a reference to a specific numbered object, we'll always be using the definite form. Is that your point? Why would anyone say "a chapter 5" instead of "(the) chapter 5". Indeed, one wouldn't.

Ok, this sounds a little less bad for my naïve assumptions. ;-)

And, reiterating the question: the language should be declared without gender then?

@Timmyfox
Copy link
Author

Timmyfox commented Nov 23, 2024

Let me see if I understand this correctly. The package does not have a way to "choose" between the definite or indefinite forms (or "no article" for that matter). What you are saying is that, since this is a reference to a specific numbered object, we'll always be using the definite form. Is that your point? Why would anyone say "a chapter 5" instead of "(the) chapter 5". Indeed, one wouldn't.
Ok, this sounds a little less bad for my naïve assumptions. ;-)
And, reiterating the question: the language should be declared without gender then?

Yes, pretty much! And because of this the gender declaration would be entirely unnecessary as the package wouldn't need to consider the article at all as it'd already be "baked in", fully self-contained in the type name wherever applicable.

Although what I was going to double check with my language teacher is that the convention appears to be to use the singular indefinite form both for singular and plural cases (so "figure 5 and 7" or "chapter 4 and 7"), meaning that the entire thing above regarding definite article suffixes would be moot anyway. However, in thinking about it further I also came up with a second potentially acceptable convention specifically for plural cases of using the plural definite form, the equivalent of writing "the figures 4 and 6", which would then be the only time the definite form would even be relevant. I'm not certain this is considered proper/correct though, but could in theory be a plausible style choice. If so, my initial thought was that it could be added as an optional declension case, unless the package might have support in any other way to choose between style variants.

For figures, this could look something like this:

type = figure ,
	case = O ,
		Name-sg = Figur ,
		name-sg = figur ,
		Name-pl = Figur ,
		name-pl = figur ,
	case = B ,
		Name-pl = Figurerna ,
		name-pl = figurerna.

Where case = O denotes the default indefinite case and case = B is the optional definite variant case.

Similarly, the word for "page" ("sida") has two possible ways it can be written, which I also believe to boil down to style choice, which could perhaps also be most easily implemented as an optional declension case.

@gusbrs
Copy link
Owner

gusbrs commented Nov 23, 2024

Yes, pretty much! And because of this the gender declaration would be entirely unnecessary as the package wouldn't need to consider the article at all as it'd already be "baked in", fully self-contained in the type name wherever applicable.

Ok, thanks for clarifying. Sounds good then.

Although what I was going to double check with my language teacher is that the convention appears to be to use the singular indefinite form both for singular and plural cases (so "figure 5 and 7" or "chapter 4 and 7"), meaning that the entire thing above regarding definite article suffixes would be moot anyway.

I leave this to your care and discretion.

However, in thinking about it further I also came up with a second potentially acceptable convention specifically for plural cases of using the plural definite form, the equivalent of writing "the figures 4 and 6", which would then be the only time the definite form would even be relevant. I'm not certain this is considered proper/correct though, but could in theory be a plausible style choice. If so, my initial thought was that it could be added as an optional declension case, unless the package might have support in any other way to choose between style variants.

For figures, this could look something like this:

type = figure ,
	case = O ,
		Name-sg = Figur ,
		name-sg = figur ,
		Name-pl = Figur ,
		name-pl = figur ,
	case = B ,
		Name-pl = Figurerna ,
		name-pl = figurerna.

Where case = O denotes the default indefinite case and case = B is the optional definite variant case.

Mhm... This is what I meant by:

Technically, we could abuse the declension infrastructure for this, but I think this is a bad idea because the semantics for the option would be completely wrong/deceiving.

In other words, this is not "declension" is it? To be clear, I'm not against it, if it's important. But, if it is, we could possibly consider the case of using a more general name for the option, something like "variant". This is actually not a bad idea...

@Timmyfox
Copy link
Author

In other words, this is not "declension" is it? To be clear, I'm not against it, if it's important. But, if it is, we could possibly consider the case of using a more general name for the option, something like "variant". This is actually not a bad idea...

Not strictly, no. I consider it more of a variant that, if applied, should be used consistently throughout the document much in the same manner as abbreviated forms work.

@gusbrs
Copy link
Owner

gusbrs commented Nov 23, 2024

In other words, this is not "declension" is it? To be clear, I'm not against it, if it's important. But, if it is, we could possibly consider the case of using a more general name for the option, something like "variant". This is actually not a bad idea...

Not strictly, no. I consider it more of a variant that, if applied, should be used consistently throughout the document much in the same manner as abbreviated forms work.

Sometimes, just changing perspective a bit can really topple things. I just opened a weird but very interesting "box" with that and I'm starting to think that just renaming "declension" with "variant" and allowing for a wider semantic meaning for the already existing infrastructure means the package can handle a lot more things... :-)

The case at hand is the distinction between the definite/indefinite alternatives. But it could be anything else. See, why can't we have a variant "with preposition X"? Or "Y"?

Say, in German, if we add a "variant" nach, that already presumes the Dativ, and one does not need to know the declension case, and things just become much more natural. True this might mean the language files may become much larger. But this is really interesting.

Let me think about this a bit, since the potential repercussions are relevant and I might be missing some blocking issue at first glance.

(This only affects whether we can use "declension" meaning "variant", anything else remains the same, and you can go ahead with it.)

gusbrs added a commit that referenced this issue Nov 24, 2024
@gusbrs
Copy link
Owner

gusbrs commented Nov 24, 2024

@Timmyfox I took the plunge and decided to interpret type name variants more broadly, thus I have renamed "declension" to "variants" (and the other associated options). See f01b261

Feel free to use it. Or not. That is, there's no need to feel obligated just because I made the change in the context of our discussion. I'm aware you were still just considering the possibility. And I did rename it because I thought it was a good idea in general. In other words, use it only if it makes sense for Swedish, otherwise, you shouldn't, so don't worry. ;-)

Now, given I've made these changes, depending on how you go about this, you may no longer be able to test things from the released version of the package. So I made a preliminary commit adding Swedish localization (c3c2bd9), so that you can build and test from there to polish things. I have no idea if you have some acquaintance with l3build or how to extract the package files from the repo, so let me know if you do need some guidance with this.

@Timmyfox
Copy link
Author

Alright! I received an email reply from my language professor and I received an answer to most of my questions. For most part it was just as I had suspected—singular indefinite case for everything. This means however also that most of the variants I had considered also shouldn't be used and this leaves only a single typename (page) with a variant (sida / sidan). I'm thinking of calling these std and alt for the default and the variant case. However, for consistency sake I'm now also wondering if it's be possible to set the package to use the variant by default globally (same as how abbreviations work)? Also as a way to avoid a bunch of duplication, perhaps just have a default variant that always is if there is none for the typename?

Additionally, on the subject of abbreviations perhaps there could also be a way to enable/disable specific abbreviations? For example, the current implementation of abbreviations (to my understanding) only allow all on or all off, but consider if someone might want the abbreviation "fig." for figures, but not "p." for pages. With variants, one could selectively use the latter (or globally set it to always be used), without affecting the usage of the former.

@gusbrs
Copy link
Owner

gusbrs commented Nov 24, 2024

Alright! I received an email reply from my language professor and I received an answer to most of my questions. For most part it was just as I had suspected—singular indefinite case for everything.

Great.

This means however also that most of the variants I had considered also shouldn't be used and this leaves only a single typename (page) with a variant (sida / sidan). I'm thinking of calling these std and alt for the default and the variant case. However, for consistency sake I'm now also wondering if it's be possible to set the package to use the variant by default globally (same as how abbreviations work)? Also as a way to avoid a bunch of duplication, perhaps just have a default variant that always is if there is none for the typename?

The first variant of the list in the language declaration is already (and always) the default. But, in the language file you should specify it. And what exactly do you mean by "avoid a bunch of duplication"? Indeed, if you define variants, I'd expect you to set all (four, or more) name forms for each variant and each type. Which makes me wonder if it's just for page, if it's worth it and/or how to deal with it. Please elaborate.

Additionally, on the subject of abbreviations perhaps there could also be a way to enable/disable specific abbreviations? For example, the current implementation of abbreviations (to my understanding) only allow all on or all off, but consider if someone might want the abbreviation "fig." for figures, but not "p." for pages. With variants, one could selectively use the latter (or globally set it to always be used), without affecting the usage of the former.

This is already possible, abbrev can be type-specific and language-specific, but it shouldn't be done in the language files. It is up to the user. And this is definitely not what variants are for. If you are considering including abbreviations for a type and then disabling them for the type, you shouldn't include them at all. Please re-read the corresponding part of the "Localization guidelines" for abbreviations, which boil down to: be very conservative in including abbreviations.

@Timmyfox
Copy link
Author

Timmyfox commented Nov 24, 2024

The first variant of the list in the language declaration is already (and always) the default. But, in the language file you should specify it. And what exactly do you mean by "avoid a bunch of duplication"? Indeed, if you define variants, I'd expect you to set all (four, or more) name forms for each variant and each type. Which makes me wonder if it's just for page, if it's worth it and/or how to deal with it. Please elaborate.

This is already possible, abbrev can be type-specific and language-specific, but it shouldn't be done in the language files. It is up to the user. And this is definitely not what variants are for. If you are considering including abbreviations for a type and then disabling them for the type, you shouldn't include them at all. Please re-read the corresponding part of the "Localization guidelines" for abbreviations, which boil down to: be very conservative in including abbreviations.

When referencing a page, there are two valid ways of doing this in Swedish: you can write either "sida" ("page") or "sidan" ("the page"), both are equally acceptable and which you choose is a matter of style and preference. Thus, here is makes sense to include both of them as variants, and this is the only type name that would have a variant as well. However, whichever is chosen it should be used consistently throughout the document, hence it would be useful for the user to be able to select on a global level which variant to use.

A third option as well is to abbreviate it: "Kapitel 5, s. 4" ("Chapter 5, p. 4"). However one may only wish to abbreviate it selectively in specific cases, so if referencing the page by itself you may want the full non-abbreviated form ("page 4") but when it's used like in the example there as part of a structural reference (together with the chapter type name) then the abbreviation could be preferred.

And to clarify, when I say globally I mean user-side in \zcsetup, rather than within the language file, for example something like:
\zcsetup{v=alt} to set the alt variant to be used throughout the entire document, overriding the default behavior of using the first variant in the list, eliminating the need to consistently insert v = alt in every \zcref call that uses a certain type name in situations where you may prefer to consistently use the variant over the default.

@Timmyfox
Copy link
Author

Timmyfox commented Nov 24, 2024

Hmm, alright I've performed some RTFM and I think I've realized what you're saying regarding abbreviations. If I just omit the abbrev line entirely in \zcsetup in my test file and instead add something like \zcRefTypeSetup{figure}{abbrev=true}, then I can selectively put abbrev=true only in the \zcref[page] commands where I want the page type to be abbreviated.

However, it doesn't seem possible to do this the other way around as zcsetup will take precedence over \zcRefTypeSetup. In other words, this works:

\zcsetup{...}
\zcRefTypeSetup{figure}{abbrev=true}

But not this:

\zcsetup{abbrev=true, ...}
\zcRefTypeSetup{page}{abbrev=false}

@gusbrs
Copy link
Owner

gusbrs commented Nov 24, 2024

When referencing a page, there are two valid ways of doing this in Swedish: you can write either "sida" ("page") or "sidan" ("the page"), both are equally acceptable and which you choose is a matter of style and preference. Thus, here is makes sense to include both of them as variants, and this is the only type name that would have a variant as well. However, whichever is chosen it should be used consistently throughout the document, hence it would be useful for the user to be able to select on a global level which variant to use.

For this I don't think we should use variants at all. Choose the most common/traditional one for the language file. And the user who prefers the alternative can just use \zcLanguageSetup to change settings for the type. As simple as that.

I did broaden the scope of the option to "variants", the user can in principle use it quite flexibly, but as far as the language files are concerned, they should use variants only for justifiable grammatical situations which cannot be handled otherwise. If you told me "the article is part of the word, and we must be able to choose between the situation with the article and without the article", then ok, that would have been a good reason. Since this choice does not seem to be needed, I think we should refrain from setting variants for Swedish. For "style and preference alternatives" the right tool is just \zcLanguageSetup.

A third option as well is to abbreviate it: "Kapitel 5, s. 4" ("Chapter 5, p. 4"). However one may only wish to abbreviate it selectively in specific cases, so if referencing the page by itself you may want the full non-abbreviated form ("page 4") but when it's used like in the example there as part of a structural reference (together with the chapter type name) then the abbreviation could be preferred.

Again, this is not really within the intended scope of the language files. Keep things general and clean. As far as zref-clever is concerned, there is not even a way to tell if two references follow each other this way.

zref-vario does that, and you may wish to contribute support for Swedish for it as well later on, but even then we should not try to micro-manage options this way. This sort of detail should be left to the users to decide and set.

And to clarify, when I say globally I mean user-side in \zcsetup, rather than within the language file, for example something like:
\zcsetup{v=alt} to set the alt variant to be used throughout the entire document, overriding the default behavior of using the first variant in the list, eliminating the need to consistently insert v = alt in every \zcref call that uses a certain type name in situations where you may prefer to consistently use the variant over the default.

Indeed, this is not possible. But I think this would be neither desirable, nor meaningful. This is not really what variants are for, and you don't need them for this kind of thing.

@gusbrs
Copy link
Owner

gusbrs commented Nov 24, 2024

Hmm, alright I've performed some RTFM and I think I've realized what you're saying regarding abbreviations. If I just omit the abbrev line entirely in \zcsetup in my test file and instead add something like \zcRefTypeSetup{figure}{abbrev=true}, then I can selectively put abbrev=true only in the \zcref[page] commands where I want the page type to be abbreviated.

However, it doesn't seem possible to do this the other way around as zcsetup will take precedence over \zcRefTypeSetup. In other words, this works:

\zcsetup{...}
\zcRefTypeSetup{figure}{abbrev=true}

Indeed, this is documented behavior. See #23 and #4.

In fact, this asymmetry a big reason why I recommend to be conservative with abbreviations in the language files in the localization guidelines.

@Timmyfox
Copy link
Author

Timmyfox commented Nov 24, 2024

Understood, thanks!

Then here should be the final/corrected version of the language file: zref-clever-swedish.txt

The only types I was unable to (naturally) test was book, endnote, and note. Though I have instead done my best to assume relevant context to manually test these.

  • book was expected (as the localization guidelines also remarks), as it doesn't exist in any standard classes by default. I've used the provided definition in the guidelines for this translation.
  • endnode I attempted to generate using the endnotes package, but was unable to get this working, I'm unaware if there are any other environments or packages which may be used here. Nevertheless, the translation is clear when contrasted to footnote.
  • note, similar to endnote, I was unable to find a suitable package or environment that would generate this. Though the guidelines suggest this is provided as a convenience, so I've assumed it's a similar situation to book and thus translated it as a generic term for a note that could be either a footnote, an endnote, or any other type of note.

Another point of note, for the page type, I arbitrarily chose to use sidan for the translation. This is different from sida which cleveref uses, but overall feels more natural (which is also what some online discussion forums I came across remarked when trying to research which of these variations is more common). In general, most style guides seem very evenly split which to use so there doesn't seem to be any conclusively "more correct" default.

I'll look into zref-vario as well! :)

gusbrs added a commit that referenced this issue Nov 24, 2024
@gusbrs
Copy link
Owner

gusbrs commented Nov 24, 2024

Then here should be the final/corrected version of the language file: zref-clever-swedish.txt

Great. Thank you!

I've updated things with your changes (see ecf448a).

It looks mostly good to me. There was one obvious repeated name for footnote, which I dropped. Besides that it also called my attention the setting rangesep = {\textendash} as the default for all types. Is it really common practice in Swedish to use the dash for all cross-reference ranges types like this? It may be that I just don't know the use in the language as an outsider, but it does feel a little weird/opinionated. It is your call, but I'm just double-checking if you're sure about this.

Other than that, the only other comment is about page, below.

The only types I was unable to (naturally) test was book, endnote, and note. Though I have instead done my best to assume relevant context to manually test these.

  • book was expected (as the localization guidelines also remarks), as it doesn't exist in any standard classes by default. I've used the provided definition in the guidelines for this translation.

  • endnode I attempted to generate using the endnotes package, but was unable to get this working, I'm unaware if there are any other environments or packages which may be used here. Nevertheless, the translation is clear when contrasted to footnote.

  • note, similar to endnote, I was unable to find a suitable package or environment that would generate this. Though the guidelines suggest this is provided as a convenience, so I've assumed it's a similar situation to book and thus translated it as a generic term for a note that could be either a footnote, an endnote, or any other type of note.

Oh, I guess I put you through more trouble than I had hoped for. I did tell you you could always generate the object with just newcounter/refstepcounter, but I guess I should have been more explicit. I think only memoir provides book, and only my own postnotes sets \@currentcounter for endnotes (I don't remember if enotez does). Anyway, it should be much easier than actually generating the elements just doing so "artificially". Something like:

\documentclass{article}

\usepackage[swedish]{babel}
\usepackage{zref-clever}
\usepackage{hyperref}

\newcounter{book}
\newcounter{endnote}
\newcounter{note}

\begin{document}

\refstepcounter{book}
\label{book1}
\refstepcounter{book}
\label{book2}
\refstepcounter{book}
\label{book3}

\refstepcounter{endnote}
\label{endnote1}
\refstepcounter{endnote}
\label{endnote2}
\refstepcounter{endnote}
\label{endnote3}

\refstepcounter{note}
\label{note1}
\refstepcounter{note}
\label{note2}
\refstepcounter{note}
\label{note3}

\zcref{book1, book2, book3, endnote1, endnote3, note1, note3}

\end{document}

Well, it may be a little late to spare you from the trouble... Sorry for not being more explicit.

Another point of note, for the page type, I arbitrarily chose to use sidan for the translation. This is different from sida which cleveref uses, but overall feels more natural (which is also what some online discussion forums I came across remarked when trying to research which of these variations is more common). In general, most style guides seem very evenly split which to use so there doesn't seem to be any conclusively "more correct" default.

Mhm, now. Indeed, while cleveref can be used as a reference, there's no need follow it. zref-clever does not propose or try to follow legacy in that regard. However, consistency with babel captions is a guideline for the package:

babel names: As is known, babel defines a set of captions for different document objects for each supported language. In some cases, they intersect with the objects referred to with cross-references, in which case consistency with babel should be maintained as much as possible. This is specially the case for prominent and traditional objects, such as \chaptername, \figurename, \tablename, \pagename, \partname, and \appendixname. This is not set in stone, but there should be good reason to diverge from it. In particular, if a certain term is contentious in a given language, babel’s default should be preferred.

And, I just checked, swedish.ldf sets \def\pagename{Sida}. As stated in the guidelines, this is not set in stone but it something which should require good reason to diverge from. I'm not saying you must, and this is your call. But, if you do not feel strongly about it, and if you were yourself a little in doubt about this, as it sounds you were, please take into consideration that consistency with babel is something zref-clever does value.

I'll look into zref-vario as well! :)

Nice! Good news is that it is much, much simpler than for zref-clever. ;-)

@Timmyfox
Copy link
Author

Timmyfox commented Nov 24, 2024

It looks mostly good to me. There was one obvious repeated name for footnote, which I dropped.
Ah! Seems I must've accidentally added that doing a last-minute fix for a typo I noticed when re-checking the type names one last time.

Besides that it also called my attention the setting rangesep = {\textendash} as the default for all types. Is it really common practice in Swedish to use the dash for all cross-reference ranges types like this? It may be that I just don't know the use in the language as an outsider, but it does feel a little weird/opinionated. It is your call, but I'm just double-checking if you're sure about this.

Apparently yes. With the questions I asked my language professor I included one question about the correct way to reference multiple consecutive objects, whether it matters if someone writes figur 4 till 7 ("till" meaning "to") or figur 4–7 with an en dash. I had assumed that it would be a stylistic choice but apparently according to her the en dash method is the only correct one in Swedish, so I'll trust her judgement and assume this is the way it should be (at least at my university).

Oh, I guess I put you through more trouble than I had hoped for. I did tell you you could always generate the object with just newcounter/refstepcounter, but I guess I should have been more explicit. I think only memoir provides book, and only my own postnotes sets @currentcounter for endnotes (I don't remember if enotez does). Anyway, it should be much easier than actually generating the elements just doing so "artificially".

No problem! I understood your instruction, this was more just me wanting to be extra thorough. What I meant when I said "generate naturally" was that I intentionally tried to generate the labels not just artificially but also in a more "real" scenario using appropriate packages and environments. If anything, that also helped me flag a possible bug/incompatibility (endnotes package not behaving correctly). I also discovered that whilst the algorithm package works correctly, the newer algorithm2e one (still not updated for several years) that Overleaf has a tutorial for did not.

And, I just checked, swedish.ldf sets \def\pagename{Sida}. As stated in the guidelines, this is not set in stone but it something which should require good reason to diverge from. I'm not saying you must, and this is your call. But, if you do not feel strongly about it, and if you were yourself a little in doubt about this, as it sounds you were, please take into consideration that consistency with babel is something zref-clever does value.

Right! To cite my sources here, since my language teacher gave me an inconclusive "normally, page references should be avoided in academic papers" instead of any real answer of which should be preferred. The main things I found which led me to choosing "sidan" were these:

  1. This post (in English, too!) on wordreference.com has users Tjahzi and AutumnOwl both agreeing that the definite case (sidan) "sounding more natural", a claim I personally agree with as well.
  2. This document on the Swedish government website, the "Government agency language handbook, 7th edition" from 2009, has two uses that support this:
    a) the first in section 8.3 on page 53 titled (in Swedish) "Abbreviations next to numbers", listing the various officially recognized abbreviations for things like figures, chapters and tables—and for this purpose, "s." for pages, which is then specified as being the abbreviation for "sidan", written in definite form.
    b) Section 11.4.1 on page 88 regarding pagination and page numbering also makes a reference that, with title pages, tables of contents, preface pages etc. the earliest (absolute) page where page numbering normally appears is page 5, again using the definite form "sidan 5" to denote this.

...However, in checking these again, I discovered that there's a newer version of the manual I referenced in the second point, the 8th edition from 2014, although provided not by the government website but by ISOF (The Swedish Institute for Language and Folklore). This newer version does change the usage in the same section about abbreviations (section 11.4 on page 92) to use "sida" instead of "sidan" for the non-abbreviated form, but does not make any further examples or references like the 7th edition did.

Judging by this new finding, I'm starting to feel like perhaps "sidan" is the more old-fashioned usage and "sida" the more modern such. The use of "sidan" in cross-referencing would also be the only valid one, because when including the page number in the footer you would indeed use the indefinite "sida" instead (presumably why babel chose that, it's more universal). In this case, I'm inclined to agree that it's probably better to follow babel and change it back to "sida":

type = page ,
	Name-sg = Sida ,
	name-sg = sida ,
	Name-pl = Sida ,
	name-pl = sida ,

@gusbrs
Copy link
Owner

gusbrs commented Nov 24, 2024

Apparently yes. With the questions I asked my language professor I included one question about the correct way to reference multiple consecutive objects, whether it matters if someone writes figur 4 till 7 ("till" meaning "to") or figur 4–7 with an en dash. I had assumed that it would be a stylistic choice but apparently according to her the en dash method is the only correct one in Swedish, so I'll trust her judgement and assume this is the way it should be (at least at my university).

I must admit this hurts a bit my typographical sensitivities, but if that's how things are, no issue there. It seems this is the best information you were able to get, and a categorical statement at that, so I see no reason not to follow the recommendation. All good here then.

No problem! I understood your instruction, this was more just me wanting to be extra thorough. What I meant when I said "generate naturally" was that I intentionally tried to generate the labels not just artificially but also in a more "real" scenario using appropriate packages and environments. If anything, that also helped me flag a possible bug/incompatibility (endnotes package not behaving correctly). I also discovered that whilst the algorithm package works correctly, the newer algorithm2e one (still not updated for several years) that Overleaf has a tutorial for did not.

Ah, if that's what you had in mind, no problem. It is not a bad idea to "stress test" a package you are considering adopting. ;-)

Btw, if you are curious about the technical side, check the "Limitations" section in the manual. The issue at hand is most likely about the value of \@currentcounter, and endnotes indeed neither uses \refstepcounter nor sets the \@currentcounter value. Even in the kernel, being more thorough in setting \@currentcounter is not that old, but since some time they stated this as "policy" and more people are taking care of that in the ecosystem.

Apropos, if you are interested in endnotes, I do recommend postnotes. I'm biased there, but check it out and judge for yourself, I'd bet you won't regret it.

Right! To cite my sources here, since my language teacher gave me an inconclusive "normally, page references should be avoided in academic papers" instead of any real answer of which should be preferred. The main things I found which led me to choosing "sidan" were these:

  1. This post (in English, too!) on wordreference.com has users Tjahzi and AutumnOwl both agreeing that the definite case (sidan) "sounding more natural", a claim I personally agree with as well.

  2. This document on the Swedish government website, the "Government agency language handbook, 7th edition" from 2009, has two uses that support this:
    a) the first in section 8.3 on page 53 titled (in Swedish) "Abbreviations next to numbers", listing the various officially recognized abbreviations for things like figures, chapters and tables—and for this purpose, "s." for pages, which is then specified as being the abbreviation for "sidan", written in definite form.
    b) Section 11.4.1 on page 88 regarding pagination and page numbering also makes a reference that, with title pages, tables of contents, preface pages etc. the earliest (absolute) page where page numbering normally appears is page 5, again using the definite form "sidan 5" to denote this.

...However, in checking these again, I discovered that there's a newer version of the manual I referenced in the second point, the 8th edition from 2014, although provided not by the government website but by ISOF (The Swedish Institute for Language and Folklore). This newer version does change the usage in the same section about abbreviations (section 11.4 on page 92) to use "sida" instead of "sidan" for the non-abbreviated form, but does not make any further examples or references like the 7th edition did.

Judging by this new finding, I'm starting to feel like perhaps "sidan" is the more old-fashioned usage and "sida" the more modern such. The use of "sidan" in cross-referencing would also be the only valid one, because when including the page number in the footer you would indeed use the indefinite "sida" instead (presumably why babel chose that, it's more universal). In this case, I'm inclined to agree that it's probably better to follow babel and change it back to "sida":

type = page ,
	Name-sg = Sida ,
	name-sg = sida ,
	Name-pl = Sida ,
	name-pl = sida ,

Ah, nice. Some sources for a contentious case are good to have here on record. Thanks. And, yes, considering those, and babel, "sida" seems indeed the right call. I've made the change at 03b8ade.

With that, everything looks good to me. So we are clear to go from my side. Please confirm if everything is in order from your side as well. Once you do, I'll prepare a release so that you can enjoy it. ;-)

@Timmyfox
Copy link
Author

Timmyfox commented Nov 25, 2024

Yep! All looks good. I realized that in doing all my testing I left out the rangetopair=false in the page typename, although I'm not sure it would make much (if any) difference anyway. The only situation this at all seems to affect the output is when I add the range option to some of the page references, although in hindsight I think it's probably best to add it back, also since all other language files seem to consistently use this.

@gusbrs
Copy link
Owner

gusbrs commented Nov 25, 2024

I realized that in doing all my testing I left out the rangetopair=false in the page typename, although I'm not sure it would make much (if any) difference anyway. The only situation this at all seems to affect the output is when I add the range option to some of the page references, although in hindsight I think it's probably best to add it back, also since all other language files seem to consistently use this.

Well, yes, this option is there to decide what to do when range is used but the elements are actually in immediate sequence. So, if you say \zcref[range]{chap2,chap3} you wouldn't actually want "chapters 2 to 3" but "chapters 2 and 3". Now rangetopair=false is typically set for page not because it is "page", but because when rangesep = {\textendash} it makes sense to simply say "pages 2--3". That said, in this case, since we are setting rangesep = {\textendash} as the default for all types, setting rangetopair=false just for page would be off the mark, I think. But it would make sense to set it as the default as well, alongside the rangesep setting. So, which shall it be: when explicitly asking for a range, "3--4" or "3 och 4"? I'd say the former, given the default rangesep = {\textendash}. WDYT?

@Timmyfox
Copy link
Author

Timmyfox commented Nov 25, 2024

Makes sense! Setting it as the default for for all types sounds like a good idea then, let's go with that, so "3–4".

@gusbrs
Copy link
Owner

gusbrs commented Nov 25, 2024

Makes sense! Setting it as the default for for all types sounds like a good idea then, let's go with that, so "3–4".

Done!

I went ahead and cut a release as well. I've sent it to CTAN so, if you use TeXLive, you should get it in a couple of days.

Once again, thank you very much for this. And I do hope you enjoy it. :-)

@gusbrs gusbrs closed this as completed Nov 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants