Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AnnotatedString has no public way to extract the underlying string without annotations #55247

Open
aplavin opened this issue Jul 25, 2024 · 16 comments · May be fixed by #55458
Open

AnnotatedString has no public way to extract the underlying string without annotations #55247

aplavin opened this issue Jul 25, 2024 · 16 comments · May be fixed by #55458

Comments

@aplavin
Copy link
Contributor

aplavin commented Jul 25, 2024

The AnnotatedString docstring explicitly states

While the constructors are part of the Base public API, the fields of AnnotatedString are not

As far as I see, the only way to get the underlying (abstract)string is by accessing the string field which is not public. I guess this is an undesirable situation?..

@KristofferC
Copy link
Member

Isn't String the way to do that?

@aplavin
Copy link
Contributor Author

aplavin commented Jul 25, 2024

Not really, it always converts to String:

julia> ans = Base.AnnotatedString(InlineString("abc"))

julia> ans.string |> typeof
String3

julia> String(ans) |> typeof
String

I guess parent() could be a reasonable function to extract .string directly.

@adienes
Copy link
Contributor

adienes commented Jul 25, 2024

parent

Return the underlying parent object of the view

an AnnotatedString is not an array view

why not a simple accessor parentstring(a::AnnotatedString) = a.string

@aplavin
Copy link
Contributor Author

aplavin commented Jul 25, 2024

Any public function would work, parent was just a suggestion.

Although, parent is definitely used in a wider context in practice, see https://juliahub.com/ui/Search?type=symbols&q=parent&u=define.

@KristofferC
Copy link
Member

Why are you putting exotic strings in and out of AnnotatedString? What's the use case?

@aplavin
Copy link
Contributor Author

aplavin commented Jul 25, 2024

AnnotatedStrings appear to be a nice fit for all kinds of natural language processing applications. The task of annotating certain sections of a string is very common there, and it would be convenient to rely on the common implementation in Base!

When strings are large, or there are many of them, the builtin String type shows its limitations. Luckily, AnnotatedString can take arbitrary strings.
Not sure if InlineStrings specifically would be a common combination here – they are just the easiest to demonstrate. StringViews.jl forms a more natural combination: whenever you have a long string in a file/data array, create a StringView out of it, and annotate parts of interest with AnnotatedString. After doing that, one should really be able to access the underlying StringView object, right?
The issue is the same no matter what (non-String) string type you put inside.

@tecosaur
Copy link
Contributor

How about this?

(T)(s::Base.AnnotatedString{T}) where {T <: AbstractString} = s.string

@aplavin
Copy link
Contributor Author

aplavin commented Aug 7, 2024

Any public "get me the underlying string without annotations" function would be fine!

It should also support this case where .string doesn't cut it:

julia> s1 = Base.AnnotatedString(InlineString("abc"), [(2:3, :x => 123)])
"abc"

julia> s2 = view(s1, 1:2)
"ab"

# s2 is a string with anotations – here are annotations:
julia> Base.annotations(s2)
1-element Vector{Tuple{UnitRange{Int64}, Pair{Symbol, Any}}}:
 (2:2, :x => 123)

# but how to get the underlying string?..
# this is clearly wrong:
julia> s2.string
"abc"

@tecosaur
Copy link
Contributor

tecosaur commented Aug 7, 2024

You're calling .string on a SubString there 😛. You can produce the same style without using AnnotatedString.

@aplavin
Copy link
Contributor Author

aplavin commented Aug 7, 2024

That's one of the reasons a function to "strip annotations" would be useful. It should be applicable whenever annotations() work.

@tecosaur
Copy link
Contributor

tecosaur commented Aug 7, 2024

I don't think that could work with arbitrary wrapper types. The SubString support with annotations is explicit.

That said, we can add the same level of support via a (T)(s::SubString{Annotated string{T}}) method I'd think.

@aplavin
Copy link
Contributor Author

aplavin commented Aug 7, 2024

I don't think that could work with arbitrary wrapper types. The SubString support with annotations is explicit.

Surely a function remove_annotations or whatever it's called can work in all cases when annotations work?
As I suggested above,

It should be applicable whenever annotations() work.

@aplavin
Copy link
Contributor Author

aplavin commented Aug 7, 2024

Basically, there are some string types that are considered "annotated" – support annotations(). Currently, they are AnnotatedString and Substring{AnnotatedString}. Anything else or this is the complete set?
All these types should also support stripping/removing annotations, naturally.

@tecosaur
Copy link
Contributor

tecosaur commented Aug 7, 2024

I think my previous comment should mostly answer these questions. To clarify though:

  • annotations does not work generally for all wrapper types
  • A (T)(...) unwrapper can be implemented for substrings, then it will match the current support of annotations

@vtjnash
Copy link
Member

vtjnash commented Aug 7, 2024

I think such an operation would be spelled String(...) normally, and is already defined as such

@vtjnash
Copy link
Member

vtjnash commented Aug 7, 2024

How about this?

(::Type{T})(s::Base.AnnotatedString{T}) where {T <: AbstractString} = s.string

This generalization of the String constructor seems reasonable (aside from lowering issues, where it needed a slightly different syntax so I changed your comment accordingly). It could go a step further as well, assuming this doesn't lead to ambiguities:

(::Type{T})(s::Base.AnnotatedString) where {T <: AbstractString} = T(s.string)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants