Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use C# 12 #996

Closed
wants to merge 5 commits into from
Closed

Use C# 12 #996

wants to merge 5 commits into from

Conversation

iamcarbon
Copy link
Contributor

This PR updates C# to version 12, and utilizes a few features:

  • Utilizes utf-8 literals
  • Utilizes collection expressions
  • Replaces a couple byte[] arrays with statically initialized data
  • Removes an unnecessary cast

@iamcarbon
Copy link
Contributor Author

@jstedfast Ready for review / feedback. Let me know what you think of these new syntax features.

Copy link
Owner

@jstedfast jstedfast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See individual code comments for details.

0x67, 0x68, 0x69, 0x6A, 0x6B, 0x6C, 0x6D, 0x6E, 0x6F, 0x70, 0x71, 0x72, 0x73, 0x74, 0x75, 0x76,
0x77, 0x78, 0x79, 0x7A, 0x30, 0x31, 0x32, 0x33, 0x34, 0x35, 0x36, 0x37, 0x38, 0x39, 0x2B, 0x2F
};
static ReadOnlySpan<byte> base64_alphabet => "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"u8;

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need the u8 here? All of the characters fit in ASCII, so I would imagine it wouldn't.

As much as I am an old dog that is stuck in his old ways (long-time C programmer who is generally a stickler for more C-like syntax), I definitely prefer "ABCDEF..." over 0x41, 0x42, 0x43, 0x44, 0x45, 0x46, ...

👍

Copy link
Contributor Author

@iamcarbon iamcarbon Jan 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The u8 postfix converts 16-bit wide chars to 8 bit wide ASCII bytes. If we omit the "u8", we'd be changing this to ReadOnlySpan<char>.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh, gotcha. The u8 at first glance looks like a typo at the end of the string which bugs me 😆

0x30, 0x31, 0x32, 0x33, 0x34, 0x35, 0x36, 0x37, // '0' -> '7'
0x38, 0x39, 0x41, 0x42, 0x43, 0x44, 0x45, 0x46, // '8' -> 'F'
};
static ReadOnlySpan<byte> hex_alphabet => "0123456789ABCDEF"u8;

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as base64_alphabet

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above. The u8 postfix turns the sequence into a ASCII bytes (instead of chars).

@@ -53,7 +53,7 @@ public class ExperimentalMimeParser : MimeReader, IMimeParser
int mboxMarkerLength;

// Current MimeMessage/MimeEntity Header state
readonly List<Header> headers = new List<Header> ();
readonly List<Header> headers = [];
byte[] preHeaderBuffer;
int preHeaderLength;

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This syntax change I'm not sure I like. It might grow on me like some of the past syntax changes you've made have grown on me, though. Same goes for all of the [] changes made.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try proposing again in year. :)

@@ -964,7 +964,7 @@ static byte[] EncodeReferencesHeader (ParserOptions options, FormatOptions forma

static bool IsWhiteSpace (char c)
{
return c == ' ' || c == '\t' || c == '\r' || c == '\n';
return c is ' ' or '\t' or '\r' or '\n';
}

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you suggested this change in the past and I think I rejected it, but the syntax has grown on me...

So 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yay. I'll get a clean PR with just the changes you like.

@@ -1744,7 +1744,7 @@ internal static unsafe bool TryParse (ParserOptions options, byte* input, int le
}
} else {
// Note: The only way to get here is if we have an invalid header, in which case the entire 'header' is stored as the 'field'.
value = Array.Empty<byte> ();
value = [];
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

of all of the [] syntax changes, I think I like this one the least because it's not obvious if this is allocating or not and the point of using Array.Empty<byte> () is to avoid allocations.

This one I am going to specifically reject.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted.

@@ -41,7 +41,7 @@ namespace MimeKit.IO.Filters {
/// </remarks>
public class ArmoredFromFilter : MimeFilterBase
{
static readonly byte[] From = { (byte) 'F', (byte) 'r', (byte) 'o', (byte) 'm', (byte) ' ' };
static ReadOnlySpan<byte> From => [(byte) 'F', (byte) 'r', (byte) 'o', (byte) 'm', (byte) ' '];

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's 2 changes here, one that updates the code to use ReadOnlySpan<byte> and one that updates the code to use []-style syntax.

What if we used:

static ReadOnlySpan<byte> From => "From ";

I think that'd be clearer and more concise.

And actually, would it make sense to do this?

static readonly ReadOnlySpan<byte> From => "From ";

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'd still need the u8 suffix to covert to ASCII bytes.

@@ -38,7 +38,7 @@ namespace MimeKit.IO.Filters {
/// </remarks>
public class MboxFromFilter : MimeFilterBase
{
static readonly byte[] From = { (byte) 'F', (byte) 'r', (byte) 'o', (byte) 'm', (byte) ' ' };
static ReadOnlySpan<byte> From => [(byte) 'F', (byte) 'r', (byte) 'o', (byte) 'm', (byte) ' '];
bool midline;

/// <summary>
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets do the same as ArmoredFromFilter

@@ -373,7 +373,7 @@ internal static bool TryParseLocalPart (byte[] text, ref int index, int endIndex
return true;
}

static ReadOnlySpan<byte> CommaGreaterThanOrSemiColon => new[] { (byte) ',', (byte) '>', (byte) ';' };
static ReadOnlySpan<byte> CommaGreaterThanOrSemiColon => [(byte)',', (byte) '>', (byte) ';'];

internal static bool TryParseAddrspec (byte[] text, ref int index, int endIndex, ReadOnlySpan<byte> sentinels, RfcComplianceMode compliance, bool throwOnError, out string addrspec, out int at)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure the [] gets us any more clarity over {}, but I think we have another opportunity here for the string syntax that would at least save us the ugliness of the (byte) casts (which are probably technically not needed?)

static readonly ReadOnlySpan<byte> CommaGreaterThanOrSemiColon => ",>;";

My only hesitation with my own proposed change is that it would look like its usage might be as a string rather than as individual bytes (which are meant to be treated as alternatives rather than a sequence). That said, it's a minor point of uncertainty that the CommaGreaterThanOrSemiColon constant name probably remedies.

}
} else {
// When no charset is specified, it should be safe to assume US-ASCII...
// but we all know what assume means, right??
encoding = Encoding.GetEncoding (28591); // iso-8859-1
decoder = (Decoder) encoding.GetDecoder ();
decoder = encoding.GetDecoder ();
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should definitely get rid of these casts. I think they are a relic of when I had to depend on my Portable.Text.Encoding library for charset support on Windows Phone back when I was trying to support that platform.

And therefore I had a using Decoder = Portable.Text.Encoding.Decoder;

@@ -390,7 +390,7 @@ public static bool TryParseDomain (byte[] text, ref int index, int endIndex, Rea
return TryParseDotAtom (text, ref index, endIndex, sentinels, throwOnError, "domain", out domain);
}

static ReadOnlySpan<byte> GreaterThanOrAt => new[] { (byte) '>', (byte) '@' };
static ReadOnlySpan<byte> GreaterThanOrAt => [(byte) '>', (byte) '@'];

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could also potentially use the static readonly ReadOnlySpan<byte> GreaterThanOrAt => ">@"; syntax.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was on the fence on whether converting to a string also lost some meaning, but agree this can be overlooked when the variable is well named. If we make this change, we still need the u8 suffix.

@iamcarbon
Copy link
Contributor Author

@jstedfast Thanks for the feedback. Let me know your thoughts on the u8 suffix, and I'll put out new PR with just the approved changes.

@iamcarbon iamcarbon mentioned this pull request Jan 30, 2024
@iamcarbon iamcarbon closed this Jan 30, 2024
@jstedfast
Copy link
Owner

A few things:

  1. Based on some quick playing around, it looks like we can't add the readonly modifier to these static ReadOnlySpan<byte> constants. No big deal there, I was just wondering but now I have an answer.
  2. Lets do the u8 changes. As much as I cringe at that u8 suffix to the strings, I think "01234567689ABCDEF"u8 (for example) is a lot more clear than the array of hexadecimal byte values.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants