Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an option to choose the preferred apostrophe #202

Closed
wants to merge 2 commits into from

Conversation

DeinAlptraum
Copy link

@DeinAlptraum DeinAlptraum commented Jan 4, 2024

Add an option to choose between typographers apostrophe (default) and upright apostrophe, to be used when sanitizing file paths and song artist, title and lyrics, as I requested on #20

Changing the value has no immediate affect, only when a file is re-downloaded, it changes the folder/file names and the content of the txt. This also works when the file hasn't been changed on USDB side.

@@ -6,6 +6,13 @@
from usdb_syncer import settings


@dataclass(frozen=True)
class MiscOptions:
Copy link
Author

@DeinAlptraum DeinAlptraum Jan 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately this cannot be added to TxtOptions since they might not be present when "Download Song file" is unchecked. This option always has to be present since it's also used to sanitize the paths made up of artist & title.
I didn't have a good idea for a class name.

@@ -123,7 +123,7 @@ def new(
) -> Context:
txt_str = usdb_scraper.get_notes(details.song_id, logger)
txt = SongTxt.parse(txt_str, logger)
txt.sanitize()
txt.sanitize(options.misc_options.apostrophe.value)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This starts an ugly long chain of passing this value from sanitize to fix to fix_apostrophes_and_quotation_marks to replace_false_apostrophes_and_quotation_marks, but I didn't find a nicer place to put this without moving a lot of unrelated things around.

Comment on lines +27 to +28
for apostrophe in settings.Apostrophe:
self.comboBox_apostrophes.addItem(str(apostrophe), apostrophe)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has apparently been the final straw for pylint to rate the _populate_comboboxes function as "too complex".
I personally don't think this is a useful measure and should be deactivated. Alternatively, one could put the combobox names and setting enums into lists and have an outer for loop that iterates over these, but this isn't much prettier imo

Choose between typographers apostrophe (default) and upright apostrophe. Used when sanitizing file paths and song artist, title and lyrics
Also addd a test for fixing typographer's to upright apostrophe
@bohning
Copy link
Owner

bohning commented Jan 10, 2024

Sorry for not responding earlier. I truly appreciate your efforts, but I have to be honest here, as my heart not only beats for singing/karaoke, but also for typography.

Referring to the article about apostrophes on the German Wikipedia, the upright apostrophe is a typographically wrong character and stems from the old 7-bit ASCII character sets where symbol count was very limited (or, for that matter, from even older analog typing devices, see below). The upright apostrophe is solely a substitute character when the correct apostrophe is not available. Since we are moving away from ASCII/CP1252 or any other codepage encoding towards UTF8, we should embrace the versatility and the typographic possibilities this enables: supporting songs of any language/script (without needing to resort to romanization), we can support artists like "A★Teens" or titles like Prince’s "Sign “☮” The Times".

I suspect you dislike the typographer’s apostrophe because you have been used to the substitute character for so long. That reminds me of my father, who grew up using old typewriters that only had one key for O (vowel) and 0 (zero), so still today he sometimes types "hell0" instead of "hello", just because he has gotten so used to it.

I agree that in the past, the typographer’s apostrophe sometimes lead to headaches, because it is not part of ASCII (but of CP1252, Pos. 92) and if the wrong encoding was used, it screwed up the text display ("’" instead of the simple typographer’s apostrophe - similarly German Umlauts were displayed as "ä" (ä), "ö" (ö), "ü" (ü) and so forth). But with the support of UTF8, this is or should be a thing of the past. The same goes for using the typographer’s apostrophe for file and folder names - all somewhat modern operating systems support UTF8 filenames, so unless you’re using Windows 95, the typographer’s apostrophe in file- and folder names should not lead to trouble, neither from the OS side nor from the karaoke software side.

So to sum up, I truly believe we should aim for the correct way of doing it, and not continue using a crutch that dates back to 7-bit character encodings (1960s). Search queries, both in-game and on usdb, should handle the correct and the substitute apostrophe transparently, and at least for USDX, this was something you were already working on.

Hopefully, I was able to make it a little relatable why these decisions were made and why I am not in favor of accepting this PR, even though I truly appreciate the efforts.

Are there any pain points with the typographer’s apostrophe besides the way the apostrophe looks? Is anything not working that we should be aware of and that we haven’t taken into consideration?

P.S. I’ve seen pretty much all the wrong apostrophe symbols mentioned in the above Wikipedia article being used as apostrophe in songs uploaded to usdb. This is what initially bothered me ("don`t" or "don´t" – using accent characters – are truly ugly) and what made me find the (typographically) correct solution.

@irgendwr
Copy link

irgendwr commented Jan 10, 2024

@bohning one important difference to your analogy (O vs 0) is that there is no a simple/universal key for the typographer's apostrophe. It might sound like a negligible pain point, but it's the reason why I still continue to use the upright apostrophe (as you can see here). The method for typing varies depending on OS/software, and while one can look it up, it still takes more effort. When I write more formal/important texts, I do tend to put in the tiny extra effort and use the correct typographic symbols, but for "everyday activities" I choose convenience over being 100% correct.

Would you also want to force the use of typographically correct symbols in programming languages? (genuine question)

In my case you could probably just call it laziness or resistance to change, but in other cases it's a matter of accessibility.

Also, I just tried typing the typographer's apostrophe in UltraStar Deluxe's song search on Linux and it doesn't work for me. How am I supposed to search for songs with an apostrophe then? (edit: it was pointed out to me that this was addressed by your PR here, but we still need normalization) But the point still stands: even today some software doesn't properly support it or doesn't do normalization which allows you to search using both versions of the apostrophe. Yes, you can blame that software, but that doesn't help the user.

I definitely understand where you're coming from, but at the end of the day I think it should be up the the user to decide. So setting the typographically correct symbols as the default is reasonable, but I'd argue that it should still be a choice.

@DeinAlptraum
Copy link
Author

Thank you for your detailed answer!

@irgendwr has covered most of what I wanted to say already. While I understand your argument about correctness, and just being more used to the upright apostrophe etc., I believe this is not something that will ever change. The typographer's apostrophe is simply impractical to type, which is the main reason why comparatively few people are using it at the moment, and are still (and probably always will be) used to the upright apostrophe.

To me, this is mostly about customizability. Even though the upright apostrophe might be wrong, it is the most natural option for, I believe, the majority of people. As such I think it is reasonable to support this as an option in the syncer. This is a choice anyone can make for themselves, or is there any way this might affect other users that I am overlooking right now?

Regarding your question about pain points: I had issues with songs not being recognized by USDX in the past if they had typographer's apostrophes in the title/artist name, but the same was true for most non-ASCII characters... I solved this by fixing my broken locale settings (iirc) so might not have any broader relevance. Other than that and the WIP search normalization in USDX, no problems from my side.

@bohning
Copy link
Owner

bohning commented Jan 12, 2024

Thanks for your responses and the discussion, I really like it.

I understand the raised issue of accessibility (typing the typographer’s apostrophe is a pain in the butt for most people, no question about it), BUT: I am not asking anyone to type it.

Regarding file and folder names - unless I hear from someone otherwise, I expect all major operating systems to support UTF8 filenames. Thus this should not be an issue for anyone. If it is, then this only implies that no UTF8 characters should be used in file and folder names, but that they can and should be used inside the text files, just not for referencing the files in the header tags.

The only time a user really has to type apostrophes is when typing search queries:
The search in USDB Syncer already handles either apostrophe in a transparent way, you can search for both "Don’t Cry" (typographer’s apostrophe) or "Don't cry" (upright apostrophe) and it will find songs with either apostrophe.

The search in USDB should be adopted to behave the same way for those not using the Syncer. But tbh, there’s always a way to search for a song by omitting the word that contains an apostrophe, so at least, there is a workaround.

The search in USDX (and all the other karaoke software) needs be adapted to behave the same way. This will automatically be solved when the transliteration of the search is in place, and this is something that is needed anyhow when UTF8 is supported (the vast majority of UTF8 characters cannot be typed on a regular keyboard!). Until this is in place, the same goes here for the workaround mentioned above.

Ideally (IMHO), all song creation tools and lyrics websites would already correct upright to typographer’s apostrophes so that you never have to type it manually (spoiler: they don’t). But even if you upload a song to usdb with upright apostrophes, that’s fine, since the Syncer will fix the apostrophes (amongst other things) automatically when downloading.

To summarize, the whole idea is to have the correct typographer’s apostrophes without having to worry a single bit about them. They just work, both on file/folder level and in all softwares. And when you need to type apostrophes, use whichever you can type easier (which is of course the upright apostrophe, at least for everyone else but me - I changed my keyboard layout so that SHIFT+# produces the ’ and not ', that’s why I needed PR791).

@bohning bohning closed this Jan 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants