Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Port Qt API to use QAnyStringView #479

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

BernardoGomesNegri
Copy link

This was discussed a bit at #445 , so I thought to actually implement it to see how it actually would work.
I use QAnyStringView instead of the more specific QUtf8StringView because the former has the size_bytes() method that is the siplest way to copy it to a UTF-8 char*.
However, a QString can be implicitly converted to a QAnyStringView, so any software that used to work with the old version of the library will compile, but since the library expects all QAnyStringView s to be in UTF-8, it won't work.
We could try to check if the string is UTF-16 or UTF-8 and then do the conversion if needed.
We could also use QUtf8StringView, if we manipulate how the view is created so it includes the null character:

const char* utf8_bytes = "áé科";
auto view = QUtf8StringView(utf8_bytes, strlen(utf8_bytes) + 1);
const char *utf8_bytes2 = view.data()

As an upside, it would save a copy. But it would require anyone passing QUtf8StringView s to the library to do this as well.
We could also use QUtf8StringView's iterator to copy and reassemble the Unicode codepoints in UTF-8

QAnyStringView lets us save UTF-8 -> UTF-16 -> UTF-8 conversions.
All QAnyStringView s passed to the library must be encoded in UTF-8, without null terminator.
Tests don't pass
qPrintable returns a valid pointer that points to a null character.
Change stringViewToChar to match
May lead to lower performance since it might require resizing the array.
@BernardoGomesNegri BernardoGomesNegri marked this pull request as ready for review April 11, 2023 23:49
@BernardoGomesNegri BernardoGomesNegri changed the title Port API to use QAnyStringView Port Qt API to use QAnyStringView Apr 12, 2023
@ximion
Copy link
Owner

ximion commented Apr 12, 2023

Hmm, I'd need to learn more about this first to properly judge it, but first of all thank you for the patch!

Using QUtf8StringView sounds attractive because it makes it very clear what kind of string is expected here - all of AppStream uses UTF-8 everywhere after all, and AnyString would be hiding that. Requiring the Utf8StringViews to be constructed in a special way though sounds pretty bad, that would be setting a trap that someone would walk into for sure, sooner or later.

Not sure if someone would be constructing a lot of components from the Qt interface, but there is a chance I guess... And given how string-heavy AppStream is, saving a copy operation can actually be quite huge (internally, AppStream does a lot to avoid too much reallocation and string duplication).

@BernardoGomesNegri
Copy link
Author

We have multiple options, it seems:

  1. Use QAnyStringView (what this PR currently uses, but open to changes)
    • This means we have access to the size_bytes() method which allows for easy copying
    • It also means passing a QString to the functions will compile, but will not work since the QAnyStringView will be UTF-16
      • We could try checking if it is UTF-16, and then convert it to UTF-8 if it is, or throwing an error if it is UTF-16. The overhead for checking should be quite small
  2. Use QUtf8StringView, to turn it to a const char*, we have two options:
    1. Include a null character at the end of the string. This means 0-copy, but that passing a normal (not null-terminated) QUtf8StringView would not work unless we check for it and then copy it to a null-terminated const char*. It also means displaying the string might require some special manipulation on the consumer's part (but putting the string in a Qt GUI would require converting it to a QString anyways). It may also cause surprise when the non null-terminated QUtf8StringView you put in returns to you as a null-terminated QUtf8StringView.
    2. Use QUtf8StringView's iterator to copy the codepoints to a null-terminated UTF-8 string (the copy will be needed to add the null terminator)
  3. Create our own AppstreamString class, that just encapsulates a const char* with conversion functions from QString and Q*StringView.
    • Pros: Is 0-copy as long as the consumer of the library stores the AppstreamString itself and does not keep converting from and to other types, and even if the consumer does that, the only problem is worse performance.
    • Cons: may be confusing, would need to be documented, may need special accommodation to be compatible with e.g. hash map types.

Now that I look at these options, I prefer 2i with automatically converting string views that do not end in a null terminator or 3.
We could use QAnyStringView in a 0-copy fashion, but then we might as well use QUtf8StringView, since the only advantage of the former is the size_bytes() method

@aleixpol
Copy link
Collaborator

Maybe it's selfish of me but it's a matter of whether Discover can leverage this safely in its queries. If we need to keep converting to qstring (utf16) then it's probably not worth it. If we can do it in ways where we can save a good number of copies and share memory, I'd do it by all means.

Since we are still a few days away from being able to have Discover master using AppStream 1.0, I suggest postponing this decision until we can just test and see what the final code would look like.

I'm also worried the resulting code is considerably worse than the current one, we'll see.

@ximion
Copy link
Owner

ximion commented Apr 14, 2023

Using QUtf8StringView and copy in case there is no NULL-terminator indeed seems like the best idea, since QString is UTF-16, so when passing that we would need to copy anyway. With the Utf8 variant, we could at least in theory avoid that if needed.

I really need to read up more on this, but from a first glance I am puzzled by the design decisions here. First of all, getting QString off of UTF-16 would have been nice a decade ago, but even when wanting to avoid that breakage, why no spent just one more byte to add a null-terminator for interfacing with C code - that would make this whole thing infinitely more usable...
Forcing the API user to do some extra work to put in NULL terminators and then not having the resulting string operate nicely with QML and the rest of Qt feels a bit silly :-/

@BernardoGomesNegri
Copy link
Author

Maybe it's selfish of me but it's a matter of whether Discover can leverage this safely in its queries. If we need to keep converting to qstring (utf16) then it's probably not worth it. If we can do it in ways where we can save a good number of copies and share memory, I'd do it by all means.

Since we are still a few days away from being able to have Discover master using AppStream 1.0, I suggest postponing this decision until we can just test and see what the final code would look like.

I'm also worried the resulting code is considerably worse than the current one, we'll see.

It is likely implementing an appropriate Qt hash function for QUtf8StringView would be necessary, since I see QString s are used inside QHash s.

@BernardoGomesNegri
Copy link
Author

I just remembered: QUtf8StringView does not own the data it points to, and appstream expects whoever called the library to own the data.
That means calling any function that returns a QUtf8StringView (or a QAnyStringView) will work, but leak memory since no one is responsible for freeing it.
The alternative is possibly to use QByteArray, or creating our own UTF-8 string class that owns the data it points to.

@ximion
Copy link
Owner

ximion commented Aug 22, 2023

Do you think we can get this resolved for the AppStream 1.0 release? (Intended to be released in ~2 months)

@ximion
Copy link
Owner

ximion commented Sep 10, 2023

Also, how would this be implemented if we had a Qt5 version compiled as well? (Most likely that will happen, to make the transition to AppStream 1.0 easier).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants