Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce AnnotatedIOBuffer #51807

Merged
merged 8 commits into from
Feb 1, 2024
Merged

Introduce AnnotatedIOBuffer #51807

merged 8 commits into from
Feb 1, 2024

Conversation

tecosaur
Copy link
Contributor

@tecosaur tecosaur commented Oct 21, 2023

This allows for styled content to be constructed incrementally, without resorting to repeated concatenation. It operates very similarly to IOContext, just with a special write method and specifically wrapping an IOBuffer.


While I consider this functionality generally useful, I also have a specific use in Markdown in mind for this. For no it's not exported or public API, but I think we'd probably want to talk about making it public API in the future.

For Base itself though, I think it's important to note that this lets us pass an AnnotatedIOBuffer to show calls, and thus truncate printed AnnotatedStrings without the issues of accidentally removing terminating escape codes etc. that we currently have (see #45521 for an example).

With the way printstyled works, this benefit won't be seen immediately, but I see it as part of building the path to a nicer, less headache-inducing future.

@tecosaur tecosaur added io Involving the I/O subsystem: libuv, read, write, etc. strings "Strings!" feature Indicates new feature / enhancement requests labels Oct 21, 2023
@tecosaur tecosaur force-pushed the annotated-io branch 3 times, most recently from 43f88eb to b67d93b Compare October 21, 2023 18:48
@LilithHafner
Copy link
Member

This CI failure:

ambiguous                                        (3) \|         failed at 2023-10-21T12:07:05.619
  | Test Failed at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-honeycrisp-HL2F7YQ3XH.0/build/default-honeycrisp-HL2F7YQ3XH-0/julialang/julia-master/julia-b67d93b47d/share/julia/test/ambiguous.jl:174
  | Expression: sig1 === sig2
  | Evaluated: write(io::Base.AnnotatedIOBuffer, x::AbstractString) @ Base strings/annotated.jl:433 === write(io::IO, s::Union{SubString{String}, String}) @ Base strings/io.jl:248
  |  
  | Test Failed at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-honeycrisp-HL2F7YQ3XH.0/build/default-honeycrisp-HL2F7YQ3XH-0/julialang/julia-master/julia-b67d93b47d/share/julia/test/ambiguous.jl:177
  | Expression: good

Means that write(io::Base.AnnotatedIOBuffer, x::AbstractString) and write(io::IO, s::Union{SubString{String}, String}) are ambiguous.

Specifically, calls to write(io::Base.AnnotatedIOBuffer, s::Union{SubString{String}, String}) will now error.

@tecosaur tecosaur force-pushed the annotated-io branch 2 times, most recently from 3c2f0e2 to a3d0329 Compare October 26, 2023 01:33
@tecosaur
Copy link
Contributor Author

Thanks for having a look, in response I've just added these two methods:

write(io::AnnotatedIOBuffer, x::AbstractString) = write(io.io, x)
write(io::AnnotatedIOBuffer, s::Union{SubString{String}, String}) = write(io.io, x)

Copy link
Member

@LilithHafner LilithHafner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a good idea. However, I think it makes sense to include at least one motivating usage of this type in this PR (or a linked PR that depends on it) because I don't like introducing additional complexity that does not have any impact on publicly accessible functionality.

In addition to avoiding premature addition of complexity (which I doubt this is) the request to add a motivating use case is a nice principle to follow because having a use of AnnotatedIOBuffer gives context for the API design decisions of the type.

base/strings/annotated.jl Outdated Show resolved Hide resolved
base/strings/annotated.jl Outdated Show resolved Hide resolved
base/strings/annotated.jl Outdated Show resolved Hide resolved
base/strings/annotated.jl Outdated Show resolved Hide resolved
@LilithHafner LilithHafner added the needs tests Unit tests are required for this change label Oct 26, 2023
@tecosaur
Copy link
Contributor Author

Thanks for the comments Lilith. Regarding adding a usage example, there are three PRs I have in mind currently that need this: improving the join method for annotated strings, using AnnotatedIOBuffer with show/showerror, and a PR to the Markdown stdlib.

The latter two will be more work I suspect, so I'll see if I can whip up the join PR in the near future.

base/show.jl Outdated Show resolved Hide resolved
@tecosaur tecosaur force-pushed the annotated-io branch 2 times, most recently from 75f4654 to 1b43f6b Compare October 26, 2023 15:41
@tecosaur
Copy link
Contributor Author

For a basic example of this being useful, perhaps JuliaLang/StyledStrings.jl#12 is a decent starting point? The join PR might not come till the weekend.

@LilithHafner LilithHafner added the triage This should be discussed on a triage call label Oct 26, 2023
@LilithHafner
Copy link
Member

From triage: Jeff is afraid that everyone will have to use annotated things for all IO in Base (and the whole ecosystem?) if we go this route.

This is not something that we can talk about as a single issue. We need to discuss the overarching vision of all IO in Julia. This has far reaching consequences and shouldn't be merged as a small thing when it's really a huge thing.

This is not to say that we don't want to do this, but that if we do decide to do this, it should be a broad sweeping decision. Let's talk about this at next triage 1:30 UTC on November 9.

base/strings/annotated.jl Outdated Show resolved Hide resolved
@tecosaur
Copy link
Contributor Author

tecosaur commented Nov 9, 2023

Triage didn't get to this, try again in a month's time 😞

@tecosaur
Copy link
Contributor Author

tecosaur commented Dec 7, 2023

Triage had a chat about this, and while not unanimous was broadly in support of this. We concluded that it would be best to put some of the major point in the discussion here, and give an opportunity for objections/problems we've overlooked to be mentioned. Should nothing (new) come up, this can be merged in a week or so.

Triage conversation points:

  • AnnotatedIOBuffer acts as a annotated string builder
  • AnnotatedIOBuffer acts as a way to get structured styling information out of "legacy" code written for ::IO and using print / printstyled. This lets you e.g. safely take a substring of a show(...) invocation's result.
  • AnnotatedIOBuffer isn't really conceptually purely IO
  • Worst case seems to be get(::IO, :color, false)-guarded manually printed escape codes will be lost
  • You can't really get the "swap in and collect structured styling information" behaviour without subtyping IO
  • Since this is an opt-in where you want it type approach, this doesn't affect or require rewriting of existing code, other than funky shouldn't-really-be-done-in-the-first-place type stuff needing rewriting when combined with this.
  • As a compromise, it seems worth it

@tecosaur tecosaur force-pushed the annotated-io branch 2 times, most recently from 1bc3cb9 to 16a3361 Compare January 31, 2024 08:59
@tecosaur
Copy link
Contributor Author

Alrighty, time for round 2 of

I think that's everything addressed?

😁

@vtjnash
Copy link
Member

vtjnash commented Jan 31, 2024

It is pretty straightforward indeed! That's why I've had JuliaLang/StyledStrings.jl#12 sitting around since October, which lets you benefit from this new functionality with code that still uses printstyled. 😄

Ah gotcha. That seems somewhat like type piracy though, no, since you have to load StyledStrings for this AnnotatedIOBuffer to work correctly?

@vtjnash
Copy link
Member

vtjnash commented Jan 31, 2024

Seems to have an ambiguity error:

Error in testset ambiguous:
Test Failed at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-honeycrisp-HL2F7YQ3XH.0/build/default-honeycrisp-HL2F7YQ3XH-0/julialang/julia-master/julia-16a3361820/share/julia/test/ambiguous.jl:174
  Expression: sig1 === sig2
   Evaluated: read(s::IO, T::Type) @ Base io.jl:1179 === read(io::Base.AnnotatedIOBuffer, x) @ Base strings/annotated.jl:506

Also, I see you added the "don't squash" label, but it doesn't appear that each commit here is functionally independent. If not, they would normally be squashed to preserve the one commit == one feature / change expectation.

@tecosaur
Copy link
Contributor Author

It is pretty straightforward indeed! That's why I've had JuliaLang/StyledStrings.jl#12 sitting around since October, which lets you benefit from this new functionality with code that still uses printstyled. 😄

Ah gotcha. That seems somewhat like type piracy though, no, since you have to load StyledStrings for this AnnotatedIOBuffer to work correctly?

It is indeed, type pirate-y. However, I don't really see much of a way around it.

One of the reasons (by a few people, IIRC) given in support of StyledStrings being a stdlib and not part of Base is not adding display/printing-logic-type code to Base when there are active efforts to slim it down (I belive somebody even made the aspirational comment that Base itself should have no printing code at all eventually). However, to work without causing a pile of method invalidations, some bits need to be in Base (e.g. once this PR is merged, there's one that uses this to make join work better with a mix of Strings and AnnotatedStrings).

Now, printstyled is display/printing-logic, and so some of the functions required to convert it to a nice composable styled AnnotatedString are in the StyledStrings stdlib. So we have this split where some styling code is in Base, and some is in StyledStrings, and making the two work nicely together involves StyledStrings committing type piracy with Base … but I don't see any easy way around it.

Do please let me know if you have any better ideas around this.

@vtjnash
Copy link
Member

vtjnash commented Jan 31, 2024

I guess that makes sense, maybe StyledStrings is just one of those that needs to be loaded by default, like Logging, FileWatching, and Libdl. They are conceptually separated out of Base, and could be removed during a build if absolutely required, but are not actually separable or "replaceable batteries" in that sense.

@vtjnash
Copy link
Member

vtjnash commented Jan 31, 2024

Once the tests are passing, this looks good to merge

@tecosaur
Copy link
Contributor Author

I'm not actually quite sure what's going on with that ambiguity, might you be able to elaborate on it?

@tecosaur
Copy link
Contributor Author

Also, I see you added the "don't squash" label, but it doesn't appear that each commit here is functionally independent. If not, they would normally be squashed to preserve the one commit == one feature / change expectation.

I guess it depends on what you look for in a PR. I get the sense that you're seeing this as a PR that "provides AnnotatedIOBuffer" and so there should be one commit that "introduces AnnotatedIOBuffer".

I tend to think of PRs as a collection of atomic commits that collectively provide a feature/change, and a useful history of how that happened. These commits are that.

* Introduce AnnotatedIOBuffer
  (An outline of the feature, with a commit message explaining the intent.)
* Make AnnotatedIOBuffer behave more like IOBuffer
  (Changing the behaviour of the feature, with justification in the commit message)
* Make AnnotatedIOBuffer reading more generic
  (Another specific change with justification/reasoning in the commit message)
* Also read AnnotatedChars from an AnnotatedIOBuffer
  (A new (sub)feature)
* Writing from one AnnotatedIOBuffer to another
  (A new (sub)feature)
* Tests for AnnotatedIOBuffer
* Update the mention of annotated features in NEWS
* Use insert! not splice! for new annotations

I'd think the history of the development (so long as the commits are atomic, which they are) would be more valuable than dumping the collection of changes fully formed into the repo, no? I guess it depends on how much you see git as a tool to track the development history of a project vs. just a means of distributed work.

Depending on how you look at it, forget being independent of each other, some of the commits here aren't even independent of other PRs 😛 (e.g. the NEWS and insert! commits, affect content from previous PRs unrelated to this PR, but I wouldn't think are worth separating out into separate PRs).

@tecosaur
Copy link
Contributor Author

I guess that makes sense, maybe StyledStrings is just one of those that needs to be loaded by default, like Logging, FileWatching, and Libdl

On this note, once #51869 is sorted out (imminently hopefully 🤞), StyledStrings will be loaded with REPL, which I imagine is pretty similar in effect.

tecosaur and others added 6 commits February 1, 2024 01:53
While `String` is the only concrete type for which `read(::IOBuffer,
::Type{<:AbstractString})` is defined, is is entirely conceivable that
some other custom string type could define a similar `read` method.

Since making reading an `AnnotatedString` from an `AnnotatedIOBuffer`
more generic is as easy as replacing the hardcoded `String` with a type
parameter, we may as well do so.
The `read(::AnnotatedIOBuffer, AnnotatedString)` method is intended as
an approximate analogue to `read(::IOBuffer, String)`. In the same
sense, it makes sense to define `read(::AnnotatedIOBuffer,
AnnotatedChar)` as an analogue to `read(::IOBuffer, Char)`.
Co-authored-by: Jameson Nash <vtjnash@gmail.com>
@vtjnash
Copy link
Member

vtjnash commented Jan 31, 2024

I'd think the history of the development

Commits should not reflect the history of the development, but rather are curated set of changes that each can be viewed independently as being a complete change

@tecosaur
Copy link
Contributor Author

Commits should not reflect the history of the development, but rather are curated set of changes that each can be viewed independently as being a complete change

Right, but I don't see the two as being mutually exclusive — or do you not view commits like Use insert! not splice! for new annotations as being able to be viewed independently as a complete change?

@vtjnash
Copy link
Member

vtjnash commented Jan 31, 2024

Seen another way, each separate commit is a statement that individual commits could be reverted and the feature (up to that commit in the sequence) are still a complete implementation. Thus if someone encounters a problem later, we can revert and reland just the problematic commits. That breaks down if some arbitrary, unknown subset of commits are half-broken aspirational in progress work which is only valid at the merge commit point

@vtjnash
Copy link
Member

vtjnash commented Jan 31, 2024

If it was in a separate PR, it would be a standalone commit, but as a complete sequence, each change to the API or implementation in the development process should not result in a separate commit to master

@tecosaur
Copy link
Contributor Author

tecosaur commented Jan 31, 2024

You seem to be describing atomic commits, no?

The commits here are atomic, and can be handled individually. None of them are half-broken "aspirational" commits. 37 changes have been rebased into 8 logical commits, and the codebase is "valid" at each point, CI should pass at each step.

If the whole thing needs to be reverted for some reason, you can just revert the merge commit instead of needing to squash everything into one commit 🤷, and I'd think it would be much easier to pinpoint where/what's gone wrong when less changes are bundled together.

I'm not trying to be argumentative here, I'm just somewhat confused by this as it runs counter to the emphasis on the value of a detailed and useful history I've seen elsewhere.

@tecosaur
Copy link
Contributor Author

Feel free to just go ahead and squash merge this though, I care more about finally getting this PR over with than discussing whether it's worth preserving the commit history.

@vtjnash
Copy link
Member

vtjnash commented Jan 31, 2024

A semi-automated way to look at it is that each separate commit should likely have tests if it represents a different change. Not always the case, but not all PRs need to have tests either. And if the later commits are changing the proposed API, then it is even less obvious that you would want someone to roll back this to a version with the older API as that would be a breaking change that wouldn't be allowed.

@tecosaur
Copy link
Contributor Author

I can't say I entirely see it, but this is good to merge from my POV. The CI hasn't completed yet, but the buildkite/julia-master only has a UndefVarError: `A` not defined in `Main.Test100Main_REPL.REPLCompletionsTest` failure so far. Hopefully the ambiguity check will clear this time.

@tecosaur
Copy link
Contributor Author

tecosaur commented Feb 1, 2024

No ambiguity errors it looks like! 😀

@vtjnash vtjnash merged commit f117a50 into JuliaLang:master Feb 1, 2024
5 of 7 checks passed
@tecosaur tecosaur deleted the annotated-io branch February 1, 2024 15:51
@tecosaur
Copy link
Contributor Author

tecosaur commented Feb 1, 2024

Thanks for helping this over the line Jameson!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
don't squash Don't squash merge feature Indicates new feature / enhancement requests io Involving the I/O subsystem: libuv, read, write, etc. strings "Strings!"
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants