Skip to content

Proposal: Add String to the type system #7734

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mlarouche opened this issue Jan 9, 2021 · 11 comments
Closed

Proposal: Add String to the type system #7734

mlarouche opened this issue Jan 9, 2021 · 11 comments
Labels
proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Milestone

Comments

@mlarouche
Copy link
Contributor

mlarouche commented Jan 9, 2021

Problem

Since the change in PR #6870, now everytime we need to format a string, we need to specify {s} as a format specifier.

In my project https://github.com/mlarouche/stringtime/, my print method looked like this:

 fn print(value: anytype, result_buffer: *StringBuffer) !void {
    switch (@typeInfo(@TypeOf(value))) {
        .Enum => {
            try result_buffer.appendSlice(@tagName(value));
        },
        else => {
            try std.fmt.formatType(value, "", .{}, result_buffer.writer(), std.fmt.default_max_depth);
        },
    }
}

Now I need to do this:

fn print(value: anytype, result_buffer: *StringBuffer) !void {
    switch (@typeInfo(@TypeOf(value))) {
        .Enum => {
            try result_buffer.appendSlice(@tagName(value));
        },
        .Array => |array_info| {
            if (array_info.child == u8) {
                try std.fmt.formatType(value, "s", .{}, result_buffer.writer(), std.fmt.default_max_depth);
            } else {
                try std.fmt.formatType(value, "", .{}, result_buffer.writer(), std.fmt.default_max_depth);
            }
        },
        .Pointer => |ptr_info| {
            switch (ptr_info.size) {
                .One => switch (@typeInfo(ptr_info.child)) {
                    .Array => |info| {
                        if (info.child == u8) {
                            try std.fmt.formatType(value, "s", .{}, result_buffer.writer(), std.fmt.default_max_depth);
                        } else {
                            try std.fmt.formatType(value, "", .{}, result_buffer.writer(), std.fmt.default_max_depth);
                        }
                    },
                    else => {
                        try std.fmt.formatType(value, "", .{}, result_buffer.writer(), std.fmt.default_max_depth);
                    },
                },
                .Many, .C, .Slice => {
                    if (ptr_info.child == u8) {
                        try std.fmt.formatType(value, "s", .{}, result_buffer.writer(), std.fmt.default_max_depth);
                    } else {
                        try std.fmt.formatType(value, "", .{}, result_buffer.writer(), std.fmt.default_max_depth);
                    }
                },
            }
        },
        else => {
            try std.fmt.formatType(value, "", .{}, result_buffer.writer(), std.fmt.default_max_depth);
        },
    }
}

This is far too many checks to shovel to the end user to know if the current type is a string.

Event std.fmt.formatType has to do this charade to know if the type is a string: Array, Pointer-To-One-Array, Pointer-To-Many, Pointer-To-C, Pointer-To-Slice and if the child type is u8.

The problem that #6870 fixed would have not occurred if the type system has a proper string type in the first place.

Advantages of having a proper string type in the type system

  • Clarity of intent in the function signature.
pub fn formatType(
    value: anytype,
    comptime fmt: const string,
    options: FormatOptions,
    writer: anytype,
    max_depth: usize,
) @TypeOf(writer).Error!void {
}
  • Simplify reflection code for handling strings, Array and pointers/slice are now properly only array and pointers. No more special case for the u8 child type that every user of reflection. Many serialization code needs to know if the type is a string for special handling.
switch (@typeInfo(T)) {
.ComptimeInt, .Int, .ComptimeFloat, .Float => {
    return formatValue(value, fmt, options, writer);
},
.Bool => {
    return formatBuf(if (value) "true" else "false", options, writer);
},
.String => |string_info| {
},
  • Expectation of new users

Consider a new user that try formatting for the first time.

const std = @import("std");

pub fn main() !void {
     const msg = "World!";
     std.log.info("Hello, {}\n", .{msg});
}

and see each value on an array printed instead of his string? It would be a really bad first impression of the language.

mlarouche added a commit to mlarouche/stringtime that referenced this issue Jan 9, 2021
…ting (see ziglang/zig#7734 for more info)

* Add parsing of fully qualified field, dunno if I will make it work
@rohlem
Copy link
Contributor

rohlem commented Jan 9, 2021

Just to clarify, are you proposing that "string literals", from "" and multiline \\ now return this string type, instead of being [N]u8? How do conversions to/from work?
Is string guaranteed to be valid UTF8? If so, I guess there can be one way to make the compiler insert a check (UB if invalid), and one way via the standard library that returns an error union?

Consider a new user ...

See #7675 / #7676 for an alternative solution.
((Personally, I assume they are following a tutorial (otherwise where did they get the {} from?), and this would be a good point to introduce formatting specifiers, like {s}. But since I consider this the least interesting point of discussion, no need to dwell on this I guess.))

@mikdusan
Copy link
Member

mikdusan commented Jan 9, 2021

Is string guaranteed to be valid UTF8?

if UTF-8 were used, maybe string operations are UTF-8 validated for debug/release-safe build modes

@ikskuh
Copy link
Contributor

ikskuh commented Jan 11, 2021

I don't think zig should have a builtin "string" type. It's way too complex to be included in the language spec (because supporting unicode isn't as trivial as it sounds and makes the language itself way more complex and requires constant updates).

Having some dedicated std.String type in the stdlib could be a solution, but even then i don't see a good general purpose string implementation:

  • Is std.String "owning"?
  • Does it store utf-8 data? Code points? Code units? Graphemes?
  • Does it allow operations/functions like concat, substr, …? If so: Do they always return a "new" String and require allocation?
  • How do we convert a string to byte data? Using a encodin std.Encoding like other stdlibs (C#, Java, JS, …)?

I think all these answers should be left to a 3rd party library and should not be bound to the language or stdlib spec

@tecanec
Copy link
Contributor

tecanec commented Jan 11, 2021

I also disagree with having a type dedicated to strings.

To keep the language simple and comprehensive, I think the privilege of being a primitive type should be reserved for things that either absolutely need it (like ints, vectors, async stack frames) or benefit from it extremely well and frequently during the vast majority of projects (like slices, optionals, errors). Strings can be easily made in userland, so they don't require a primitive type. Their benefits from becoming primitive types also aren't that vast, and they are only really useful for communication with humans, so the benefits of having primitive string types are limited.

You also make the point that having primitive string types would seem familiar to newcomers from other languages. I don't like the argument that something should be done just because it's familiar, because I don't think that that's enough to warrant sacrificing progress. When I started using zig, I was actually glad that they didn't have a type for strings exactly because it showed me that this new language had its own thoughts on this sort of thing, and I liked that. If you're worried that newcomers will get confused by the lack of built-in string support, what we need is better tutorials.

Your other two points can both be summarized as "being able to tell strings and byte arrays appart". For other uses, we already do have primitive types that exist simply to keep us from accidentally passing one type as another (such as aligned pointers). However, for strings, this issue can be solved in userland by making a struct. Even disregarding that, strings aren't both universally and frequently used, so solving this issue for strings alone wouldn't be worth the increase in complexity.

@ikskuh
Copy link
Contributor

ikskuh commented Jan 11, 2021

I, for once, don't want to have string literals enforce valid UTF-8 encoding like Rust does, for example. As soon you start interacting with non-utf8 encodings, everything will be way more complex than it needs to be, and people will just start using []const u8 again anyways.

A string literal should not be valid UTF-8 either. "\xFF\x00" should totally be a fine string and don't stop the compiler from compiling that. Rust for example does not allow that to compile.

As soon as we enforce UTF-8 on people's in-language string encodings, we force to make embedded people to do way more work to get a non-standard encoded text to be processed by string functions

@daurnimator daurnimator added the proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. label Jan 11, 2021
@ifreund
Copy link
Member

ifreund commented Jan 11, 2021

Something similar to this has been rejected before in #234.

@kyle-github
Copy link

Strings in libraries are always second-class citizens. After 30 years it is still painful to use them seamlessly in C++.

Strings, at least the way most people think of them, require hidden allocation for implementation. That seems to violate one of the core precepts of Zig.

Here are the options I see:

  1. strings similar to C strings: collections of bytes. This is what Zig has today. But it would be useful to be able to somehow annotate either the raw data elements as printable (what I suggested in another comment) or the array/slice as printable to aid in comptime code generation (i.e. for formatted printing) or in debugging.
  2. strings closer to Python or C++ in the core language. As noted above, this moves allocation right into the core.
  3. strings closer to Python or C++ in a library. This results in strings being a second class citizen. You would still not solve the problem of wanting to print formatted low level strings or debugging low level strings.

I prefer Zig's existing method of handling strings with one proviso: as noted in 1 above, it would be really nice to be able to use the type system in order to tell comptime code (and possibly debuggers) that something is printable. I was proposing a special utf8_byte type, but the arguments above are good ones. Sometimes you want data that is not utf8-compliant. I think a general printable_byte type or something similar would still work to replace the existing u8. Same size. Different type. That would allow for string arrays with and without sentinels and for string slices. Both would work fine because it is the underlying element type that is printable or at least tells comptime that it is printable.

@andrewrk andrewrk added this to the 0.8.0 milestone Jan 11, 2021
@andrewrk
Copy link
Member

We won't have a string type in the language.

OP problem can be solved with a better std.fmt.format API which better communicates intent. This is a deficiency of the std lib, not the language.

@RafaelLSa
Copy link

Learning the language now, and I spent 2 days and still don't know how to print a literal value from a structure on log info. Too many type errors, to something that should be simple. Even GitHub Copilot don't know how to show this. Really needs some changes...

@nektro
Copy link
Contributor

nektro commented Dec 18, 2023

please ask for help in a Community and we'll be happy to assist you :)

@ziglang ziglang deleted a comment from rinzwind5 Feb 6, 2024
@Abso1ut3Zer0
Copy link

Abso1ut3Zer0 commented Aug 29, 2024

Hi, I was curious about string type proposals in Zig and came across this thread. I'm quite interested in the language for use in trading systems, which has predominantly used C++ for latency critical code.

I do agree having a strict String type is not the way to go, but I think offering something to help with strings in the std could be useful. In my field, dev teams often are hybrid with systems developers and quants who still need to be productive but are not expected to have the same degree of developer skills. Given that, it could be tough for some who will be using Zig to be productive with strings.

Have there been any discussions to offer types within the std like a string_view that would just reference the slice? I think that aligns with Zig's stance on allocations, but provides some nice functionality without the need for a dependency.

Edit: I do see some string_view like stuff in the std, but wondering if this is intended to be expanded.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Projects
None yet
Development

No branches or pull requests