Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ownership and string representation #2065

Open
OlaFosheimGrostad opened this issue Aug 17, 2022 · 5 comments
Open

Ownership and string representation #2065

OlaFosheimGrostad opened this issue Aug 17, 2022 · 5 comments
Labels
leads question A question for the leads team long term issue Issues expected to take over 90 days to resolve. Does not apply to PRs.

Comments

@OlaFosheimGrostad
Copy link

OlaFosheimGrostad commented Aug 17, 2022

The design doc currently states "The right model of a string view versus an owning string is still very much unsettled."

Other issues, such as string-interpolation will need some clarity on what kind of string representation and ownership Carbon will support.

Maybe it would be a good idea to map out this landscape and see if there is some kind of unifying scheme or shared protocol that can be used to bring it all together in a flexible and efficient manner?

There seems to be many facets to this design issue:

  • Would it be desirable to have a common generic ownership type that can be used for strings, pointers and file-descriptors?
  • Should a string-owner also support a fixed size short-string optimization?
  • Should Carbon support a rope-representation for large mutable strings or provide a protocol that can work with ropes?
  • Should there be a difference between read only representations, mutable representations, appendable representations?
  • What should the relationship between Carbon and C++ string, string_view, u8string_view and span<char8_t> be?
  • Do we need to consider C++ third party library string types, some frameworks provide their own string type.
@L4stR1t3s
Copy link

Would it be desirable to have a common generic ownership type that can be used for strings, pointers and file-descriptors?

If possible, without negatively affecting speed and/or memory usage, yes.

Should a string-owner also support a fixed size short-string optimization?

Sounds like an implementation detail to me, not something that the spec should define.

Should Carbon support a rope-representation for large mutable strings https://en.wikipedia.org/wiki/Rope_(data_structure)

This is a tree of strings. IMO a standard library should offer implementations of core concepts, not abstractions of those concepts. So that means a tree and a string in this case. Developers can implement a rope from those, and design it to fit their specific needs.

Should there be a difference between read only representations, mutable representations, appendable representations?

Only if a seperate implementation offers a noticeable difference in speed and/or memory usage IMO. Otherwise they are just abstractions of a core concept.

What should the relationship between Carbon and C++ string, string_view and span be?

I would expect that a Carbon string/string_view/span/... can be created with a simple 1-on-1 copy of the data of an std::string/std::string_view/std::span and vice versa. If any data conversion is necessary, I would consider the overhead of that unacceptable.

Do we need to consider C++ third party library string types?

No, if they don't already offer conversion functionality to STL strings, it's usually not hard to add. If there is a need for it, I am sure people will write libraries for it. Carbon should focus on interoperability between C++ language and the STL. Anything else can and should be derived from that.

@OlaFosheimGrostad
Copy link
Author

OlaFosheimGrostad commented Aug 17, 2022

Thank you for the feedback. I think I should rewrite the issue so that the protocol question comes first on the top. I guess the question could be rephrased to something like "can we device a protocol/scheme that can provide a performant API to many different string representations?"

@L4stR1t3s
Copy link

I would also like to see the possiblity of a string (and other data structures) sharing the same memory in C++ and Carbon. I don't think that will be possible by using the regular Carbon and C++ STL data structures, because their internals might change, and can be platform/architecture/... dependent. But maybe a separate combined STL specifically designed for this could be possible? It would be useful to avoid duplication of large chunks of data being passed from Carbon to C++ and vice versa.

@OlaFosheimGrostad
Copy link
Author

My personal opinion is that Carbon could maintain a patch set for Clang and the Clang standard library. That could allow Carbon to do interesting optimizations with datastructures originating in C++ land (ARC of shared_ptr, string optimizations etc). I don't expect that to happen, but getting good performance with less work could be a good reason to switch from C++ to Carbon IMO.

@github-actions
Copy link

We triage inactive PRs and issues in order to make it easier to find active work. If this issue should remain active or becomes active again, please comment or remove the inactive label. The long term label can also be added for issues which are expected to take time.
This issue is labeled inactive because the last activity was over 90 days ago.

@github-actions github-actions bot added the inactive Issues and PRs which have been inactive for at least 90 days. label Nov 19, 2022
@jonmeow jonmeow added long term issue Issues expected to take over 90 days to resolve. Does not apply to PRs. leads question A question for the leads team and removed inactive Issues and PRs which have been inactive for at least 90 days. labels Jul 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
leads question A question for the leads team long term issue Issues expected to take over 90 days to resolve. Does not apply to PRs.
Projects
None yet
Development

No branches or pull requests

3 participants