Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regenerate tables for Unicode 13.0.0 #69929

Merged
merged 1 commit into from
Mar 19, 2020
Merged

Conversation

cuviper
Copy link
Member

@cuviper cuviper commented Mar 11, 2020

No description provided.

@rust-highfive
Copy link
Collaborator

r? @kennytm

(rust_highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Mar 11, 2020
@Mark-Simulacrum
Copy link
Member

Mark-Simulacrum commented Mar 11, 2020

Would you mind providing diffs of sizes against 12.1 (might require code modifications, I forget if the various scripts support it)?

Mostly just want to make sure none of the algorithms we have are breaking horribly by accident -- though it doesn't look like it based on the diffs.

@cuviper
Copy link
Member Author

cuviper commented Mar 11, 2020

OK, I hacked the URL_PREFIX to run with 12.1.0 for comparison.

12.1.0:

Alphabetic     : 2982 bytes, 127257 codepoints
Case_Ignorable : 2112 bytes, 2397 codepoints
Cased          : 934 bytes, 4280 codepoints
Cc             : 43 bytes, 66 codepoints
Grapheme_Extend: 1734 bytes, 1966 codepoints
Lowercase      : 985 bytes, 2341 codepoints
N              : 1239 bytes, 1755 codepoints
Uppercase      : 934 bytes, 1909 codepoints
White_Space    : 140 bytes, 25 codepoints
Total table sizes: 11103 bytes

13.0.0:

Alphabetic     : 3055 bytes, 132876 codepoints
Case_Ignorable : 2136 bytes, 2414 codepoints
Cased          : 934 bytes, 4287 codepoints
Cc             : 43 bytes, 66 codepoints
Grapheme_Extend: 1774 bytes, 1980 codepoints
Lowercase      : 985 bytes, 2345 codepoints
N              : 1266 bytes, 1782 codepoints
Uppercase      : 934 bytes, 1912 codepoints
White_Space    : 140 bytes, 25 codepoints
Total table sizes: 11267 bytes

All but Cc and White_Space increased in codepoints, but Cased, LowerCase, and UpperCase still kept the same number of bytes.

@Mark-Simulacrum
Copy link
Member

Looks great, thanks! Let's r? @SimonSapin for approval on the version bump (code changes look correct though I didn't run the script locally).

@SimonSapin
Copy link
Contributor

r? @Manishearth

@Manishearth
Copy link
Member

Version bump seems fine to me. I haven't run it locally either though

@Mark-Simulacrum
Copy link
Member

I think we're running into "too many approvals" at this rate so I'm just going to go ahead and approve. I don't think we need libs FCP.

@bors r+

@rustbot label add relnotes

(let's see if I, the author, managed to remember syntax...)

@bors
Copy link
Contributor

bors commented Mar 13, 2020

📌 Commit 543832b has been approved by Mark-Simulacrum

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Mar 13, 2020
@Mark-Simulacrum Mark-Simulacrum added the relnotes Marks issues that should be documented in the release notes of the next release. label Mar 13, 2020
Dylan-DPC-zz pushed a commit to Dylan-DPC-zz/rust that referenced this pull request Mar 14, 2020
…lacrum

Regenerate tables for Unicode 13.0.0
JohnTitor added a commit to JohnTitor/rust that referenced this pull request Mar 14, 2020
…lacrum

Regenerate tables for Unicode 13.0.0
@RalfJung
Copy link
Member

RalfJung commented Mar 14, 2020

#69996 (comment) failed in "xsv" cargotest complaining that strings involving unicode differ, I am going to guess this PR is at fault. @bors r-

2020-03-14T13:27:01.2477617Z thread 'test_cat::prop_cat_cols' panicked at 'assertion failed: `(left == right)`
2020-03-14T13:27:01.2490914Z   left: `[["\u{ffffd}2誢", "\u{e}", "-9", "\u{6dd}\u{90}隙", "O?", "\u{202b}5K⁑"], ["-", "", "", " -鈛", "", "\u{16}Cⶽ"], ["\u{fff6}", "\u{e0001}\u{a96d0}", "\u{14}叐;", "\u{14}¤3\u{10fffe}", "\n\u{206f}", "\u{83}"], ["ꑚo", "<䈰=\u{a33e7}", "\u{8dafc}\u{192d}+", "\u{fff8}\u{8a}/s", "", ""], ["@ \u{dcf}", "(\u{f55a7}", "Iv\"", "0 e", "㦬\u{e0001}@", "‟\u{f133}"], ["_", "", "M\u{86}", "", "n\u{18}4", "\u{15},"], ["", "L1\u{603}{", "ꓝ\u{8a}\r\u{49b91}", "\'[M<", "쳌;", "\u{a4cb}#\u{fff3}"], ["\u{85}  ", "\u{ecac}\u{9d}\u{6}웟", "", "\u{95}-:", "\u{88},", "\u{202f}"], ["$)\'", "B #3", "*\u{8}", "", "\'§", "{M"], ["\u{f}\u{e}\u{16}!", "\u{df0}", "`t}", "-]Z", "\u{1b}\u{1a}", "\u{df99d}4X"], ["", "\u{e4bf}", "9", "ㄸH:", "떟}\u{2028}!", "¨\u{6b5b6}\u{1b}"], ["\u{11} ", "®x", "B\u{e1b0}\u{6dc78}\u{88}", "", " ‑`", "`"], ["!㞾", "", "", "fI&\u{4}", "", "\u{e1c0f}}\u{9b}7"], ["", "\"", "\u{202a}—`^", "搄\u{2f08e}", "\u{15}Y~\u{13}", ""], ["\u{87}\u{85}!", "븞\u{90}\u{61c}", "y 6", "\u{fff0}UZU", "c", "\u{84}"], ["\u{16}\u{d3564}\u{1d}~", "\u{82}&\u{100a24}k", "[", "6", "", "C얣\u{fff2}"], ["\u{fff4}", "\u{3}", "1\u{ae8b9}c&", "u8", "jD", ">"], ["§\t", "\u{110bd}!s`", "^\u{8c}\u{14e2b}⁞", "\u{3}<P\u{f0000}", "", "¯¯쏍!"], ["釂¥0,", "-‧", "뎮", "‚湛w¤", "腽\u{f22d5}C\u{b}", "\u{9c}\u{87}"], ["3\u{18}𐨓9", "~-n!", "@¥_", "6", "«", "鐛<"], ["\u{faf5}\r)\u{17}", "¡,I\u{5}", "?:\u{10}b", "pS\u{c7fe4}‹", "!-?y", "\u{e0001}){{"], ["_\u{b9191}\u{2063}ꌨ", ";붌:-", "g", "", "휀_\u{ffffd}", "ꙕ"], ["\\\u{6b40b}", "\u{8}|_", "J", "\u{0}2矖", "\u{9f}\"", "$ꈂ"], ["£+", "t‾,\u{2063}", "`\u{ad}`", "\u{970aa}", "", ""], ["[", "", "@<\u{eb35}", "勶d\u{92}\u{76f51}", "\u{fff4}N깭\\", "\'"], ["\u{8}&2鄧", "", "\u{99}Y⁁", "", "", ""], ["\u{12}_ᢪ ", ";\n\u{96}꺅", "<", "\u{42609}\u{202d}", "\u{92}\u{12}", "\"h\u{d7701}"], ["", "H\u{1680}\u{5}(", "\u{2002}k$", "\u{d3c63}:b", "", "\u{ffffd}\t"], ["鵀53\u{14}", "<", "", "#¦£\\", "", ""], ["", "\u{fff8}", "&1�", "6\"섲\u{e0020}", "", ""], ["| ", "%", "7\u{7}", "", "\u{10fffd}", "f8;"], ["\u{e6406}<⁈", " \u{5dca1}}", "|", "", ")", ""], ["礔豿7", "*#", "&⒕[1", "4E!\u{85}", "22", ",\u{fff6}"], ["", "\u{87}[\'@", "&|―\u{ed51}", "o", "-", "\\"], ["C.", "", "£‥ª0", "];\u{16}", "\u{f0000}", "\\\u{12}U嗨"], ["}1!", "¯\u{10}\"\u{16}", "⁍", "+i", "\u{9f}", "⁈¦\n"], ["C\u{d55d1}W", "\u{97}$\u{18}", "", "\u{97}㜪", "䍧‖4", ""], ["\u{e0020}0[{", "H¬", "E/", "\u{14}x¨|", "W", "\u{7}\u{f}"], ["", "", "", "숆&㵲\u{ffffe}", "", "X=§⁂"], ["", "孄\u{5}\u{87}", "", "瘒\u{95}&\u{dade3}", "T", "\u{fb10}›9"], [",\u{1d}", "m", "", "\u{953}\u{0}$", "0", ""], ["⇨,", "\u{9c}\u{2028}\u{d5a1e} ", "‷", "", "6*l嗃", "\u{9b}\u{17}\u{4}ꋯ"], ["]-", "5\u{603}", "\u{602}\u{3}", "", "u䯝4`", ""], ["", "\u{5586a}(\u{c}\u{12}", "\u{cf3ca}", "\u{1f}{", ")_9C", "4\u{16}\u{9b}\u{9e}"], ["", "`.", "", "~", "*\u{14}\u{83}]", ""], ["\u{b3262}\u{88}⬳", "/鿵\\⁍", "", "4‴䦋", "\u{87}&�\u{81}", ""], ["[㐉⁅;", "皟\u{fffa}", "", "懍}h", "膳\u{100000}0", ""], ["\u{e000}£~", "‐`% ", "캯\u{6f85d}s", "ꍜ⁅^", "@\u{e846}\u{5}", "\u{1d173}𢕄‿\""], ["`", "\u{f14d1}`", "\u{9c}0\u{206f}\u{81}", ":\u{16}\u{fffb}", "", "朅\u{0}"], ["u", "3p\u{12}", "I(>P", "\u{e056}", "3ᚣ/", ",ªj"], ["\"", "", "a\u{80}", "諐`6¡", "⁘~", "J‼!\u{9c}"], ["\u{9d}1", "5«", "", "+\u{6}J", "9\u{2}", "=㴼3"], ["4\u{67e86}3", "@\u{f088}- ", "", "\'`-", "\u{1a}", "V"], ["䧂⁔", "]9\'㝴", "\'£\u{84b19}", "", "d\u{f}?", "\u{8b}\u{3ab24}"]]`,
2020-03-14T13:27:01.2512712Z  right: `[["\u{ffffd}2誢", "\u{e}", "\u{feff}-9", "\u{6dd}\u{90}隙", "O?", "\u{202b}5K⁑"], ["-", "", "", " -鈛", "", "\u{16}Cⶽ"], ["\u{fff6}", "\u{e0001}\u{a96d0}", "\u{14}叐;", "\u{14}¤3\u{10fffe}", "\n\u{206f}", "\u{83}"], ["ꑚo", "<䈰=\u{a33e7}", "\u{8dafc}\u{192d}+", "\u{fff8}\u{8a}/s", "", ""], ["@ \u{dcf}", "(\u{f55a7}", "Iv\"", "0 e", "㦬\u{e0001}@", "‟\u{f133}"], ["_", "", "M\u{86}", "", "n\u{18}4", "\u{15},"], ["", "L1\u{603}{", "ꓝ\u{8a}\r\u{49b91}", "\'[M<", "쳌;", "\u{a4cb}#\u{fff3}"], ["\u{85}  ", "\u{ecac}\u{9d}\u{6}웟", "", "\u{95}-:", "\u{88},", "\u{202f}"], ["$)\'", "B #3", "*\u{8}", "", "\'§", "{M"], ["\u{f}\u{e}\u{16}!", "\u{df0}", "`t}", "-]Z", "\u{1b}\u{1a}", "\u{df99d}4X"], ["", "\u{e4bf}", "9", "ㄸH:", "떟}\u{2028}!", "¨\u{6b5b6}\u{1b}"], ["\u{11} ", "®x", "B\u{e1b0}\u{6dc78}\u{88}", "", " ‑`", "`"], ["!㞾", "", "", "fI&\u{4}", "", "\u{e1c0f}}\u{9b}7"], ["", "\"", "\u{202a}—`^", "搄\u{2f08e}", "\u{15}Y~\u{13}", ""], ["\u{87}\u{85}!", "븞\u{90}\u{61c}", "y 6", "\u{fff0}UZU", "c", "\u{84}"], ["\u{16}\u{d3564}\u{1d}~", "\u{82}&\u{100a24}k", "[", "6", "", "C얣\u{fff2}"], ["\u{fff4}", "\u{3}", "1\u{ae8b9}c&", "u8", "jD", ">"], ["§\t", "\u{110bd}!s`", "^\u{8c}\u{14e2b}⁞", "\u{3}<P\u{f0000}", "", "¯¯쏍!"], ["釂¥0,", "-‧", "뎮", "‚湛w¤", "腽\u{f22d5}C\u{b}", "\u{9c}\u{87}"], ["3\u{18}𐨓9", "~-n!", "@¥_", "6", "«", "鐛<"], ["\u{faf5}\r)\u{17}", "¡,I\u{5}", "?:\u{10}b", "pS\u{c7fe4}‹", "!-?y", "\u{e0001}){{"], ["_\u{b9191}\u{2063}ꌨ", ";붌:-", "g", "", "휀_\u{ffffd}", "ꙕ"], ["\\\u{6b40b}", "\u{8}|_", "J", "\u{0}2矖", "\u{9f}\"", "$ꈂ"], ["£+", "t‾,\u{2063}", "`\u{ad}`", "\u{970aa}", "", ""], ["[", "", "@<\u{eb35}", "勶d\u{92}\u{76f51}", "\u{fff4}N깭\\", "\'"], ["\u{8}&2鄧", "", "\u{99}Y⁁", "", "", ""], ["\u{12}_ᢪ ", ";\n\u{96}꺅", "<", "\u{42609}\u{202d}", "\u{92}\u{12}", "\"h\u{d7701}"], ["", "H\u{1680}\u{5}(", "\u{2002}k$", "\u{d3c63}:b", "", "\u{ffffd}\t"], ["鵀53\u{14}", "<", "", "#¦£\\", "", ""], ["", "\u{fff8}", "&1�", "6\"섲\u{e0020}", "", ""], ["| ", "%", "7\u{7}", "", "\u{10fffd}", "f8;"], ["\u{e6406}<⁈", " \u{5dca1}}", "|", "", ")", ""], ["礔豿7", "*#", "&⒕[1", "4E!\u{85}", "22", ",\u{fff6}"], ["", "\u{87}[\'@", "&|―\u{ed51}", "o", "-", "\\"], ["C.", "", "£‥ª0", "];\u{16}", "\u{f0000}", "\\\u{12}U嗨"], ["}1!", "¯\u{10}\"\u{16}", "⁍", "+i", "\u{9f}", "⁈¦\n"], ["C\u{d55d1}W", "\u{97}$\u{18}", "", "\u{97}㜪", "䍧‖4", ""], ["\u{e0020}0[{", "H¬", "E/", "\u{14}x¨|", "W", "\u{7}\u{f}"], ["", "", "", "숆&㵲\u{ffffe}", "", "X=§⁂"], ["", "孄\u{5}\u{87}", "", "瘒\u{95}&\u{dade3}", "T", "\u{fb10}›9"], [",\u{1d}", "m", "", "\u{953}\u{0}$", "0", ""], ["⇨,", "\u{9c}\u{2028}\u{d5a1e} ", "‷", "", "6*l嗃", "\u{9b}\u{17}\u{4}ꋯ"], ["]-", "5\u{603}", "\u{602}\u{3}", "", "u䯝4`", ""], ["", "\u{5586a}(\u{c}\u{12}", "\u{cf3ca}", "\u{1f}{", ")_9C", "4\u{16}\u{9b}\u{9e}"], ["", "`.", "", "~", "*\u{14}\u{83}]", ""], ["\u{b3262}\u{88}⬳", "/鿵\\⁍", "", "4‴䦋", "\u{87}&�\u{81}", ""], ["[㐉⁅;", "皟\u{fffa}", "", "懍}h", "膳\u{100000}0", ""], ["\u{e000}£~", "‐`% ", "캯\u{6f85d}s", "ꍜ⁅^", "@\u{e846}\u{5}", "\u{1d173}𢕄‿\""], ["`", "\u{f14d1}`", "\u{9c}0\u{206f}\u{81}", ":\u{16}\u{fffb}", "", "朅\u{0}"], ["u", "3p\u{12}", "I(>P", "\u{e056}", "3ᚣ/", ",ªj"], ["\"", "", "a\u{80}", "諐`6¡", "⁘~", "J‼!\u{9c}"], ["\u{9d}1", "5«", "", "+\u{6}J", "9\u{2}", "=㴼3"], ["4\u{67e86}3", "@\u{f088}- ", "", "\'`-", "\u{1a}", "V"], ["䧂⁔", "]9\'㝴", "\'£\u{84b19}", "", "d\u{f}?", "\u{8b}\u{3ab24}"]]`', tests/test_cat.rs:86:9

(and more)

@bors bors added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Mar 14, 2020
@Mark-Simulacrum
Copy link
Member

cc @BurntSushi -- I suspect that's just the test needing to be updated to a new unicode version? Not sure though.

@BurntSushi
Copy link
Member

xsv shouldn't depend on any specific Unicode version for its tests to pass. I believe that test failure is spurious. Either way, I won't be able to really dig into it for quite some time.

@jonas-schievink jonas-schievink added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label Mar 14, 2020
@jonas-schievink jonas-schievink added this to the 1.44 milestone Mar 14, 2020
@Mark-Simulacrum
Copy link
Member

No worries. @cuviper -- could you try to run the xsv tests locally perhaps? If it is spurious we can likely just retry. I personally suspect it might be a bug in the code generation scripts (e.g., some limit is hit and we overwrite existing data with new data).

@ollie27
Copy link
Member

ollie27 commented Mar 15, 2020

xsv is using random numbers to generate test cases: https://github.com/BurntSushi/xsv/blob/66956b6bfd62d6ac767a6b6499c982eae20a2c9f/tests/tests.rs#L56. This means that xsv needs to be removed from cargotest because otherwise we're just asking for spurious test failures.

My best guess as to what's happened is through random chance a test case with a BOM ('\u{feff}') was generated and that breaks the tests. It looks like this has happened many times before: #45348 (comment) #45380 (comment) #47195 (comment) #49394 (comment) #58224 (comment) #65342 (comment) #67015 (comment) etc...

In other words I don't think this PR had anything to do with the test failure.

@BurntSushi
Copy link
Member

Right, as I said, I'm pretty sure it's spurious. And "random numbers" aren't the problem. The test itself is likely broken.

@cuviper
Copy link
Member Author

cuviper commented Mar 17, 2020

I ran xsv's testsuite in a loop with stable-x86_64-unknown-linux-gnu, rustc 1.42.0 (b8cedc004 2020-03-09), and on the 13th try got this failure:

thread 'test_frequency::prop_frequency_indexed' panicked at '[quickcheck] TEST FAILED (runtime error). Arguments: (CsvData { data: [[[]], [[]], [[239, 187, 191]], [[]], [[]]] })
Error: "assertion failed: `(left == right)`\n  left: `[(\"(NULL)\", 4)]`,\n right: `[(\"(NULL)\", 3), (\"\\u{feff}\", 1)]`"', <::std::macros::panic macros>:2:4

That's not the same test that #69996 hit, but it does appear there are lingering BOM issues.

@Mark-Simulacrum
Copy link
Member

Sounds good, thanks for investigating. I think that means that hopefully a @bors retry will pass.

It sounds like we'll plausibly want to either ignore that test inside the cargotest suite or perhaps drop xsv for now (but I'm not filing an issue since we have some further discussion scheduled on that topic in the infra team, IIRC).

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Mar 17, 2020
@cuviper
Copy link
Member Author

cuviper commented Mar 18, 2020

The queue doesn't show approval, despite the label change, so let's try again...

@bors r=Mark-Simulacrum

@bors
Copy link
Contributor

bors commented Mar 18, 2020

📌 Commit 543832b has been approved by Mark-Simulacrum

Centril added a commit to Centril/rust that referenced this pull request Mar 19, 2020
…lacrum

Regenerate tables for Unicode 13.0.0
bors added a commit that referenced this pull request Mar 19, 2020
Rollup of 9 pull requests

Successful merges:

 - #69036 (rustc: don't resolve Instances which would produce malformed shims.)
 - #69443 (tidy: Better license checks.)
 - #69814 (Smaller and more correct generator codegen)
 - #69929 (Regenerate tables for Unicode 13.0.0)
 - #69959 (std: Don't abort process when printing panics in tests)
 - #69969 (unix: Set a guard page at the end of signal stacks)
 - #70005 ([rustdoc] Improve visibility for code blocks warnings)
 - #70088 (Use copy bound in atomic operations to generate simpler MIR)
 - #70095 (Implement -Zlink-native-libraries)

Failed merges:

r? @ghost
bors added a commit that referenced this pull request Mar 19, 2020
Rollup of 9 pull requests

Successful merges:

 - #68941 (Properly handle Spans that reference imported SourceFiles)
 - #69036 (rustc: don't resolve Instances which would produce malformed shims.)
 - #69443 (tidy: Better license checks.)
 - #69814 (Smaller and more correct generator codegen)
 - #69929 (Regenerate tables for Unicode 13.0.0)
 - #69959 (std: Don't abort process when printing panics in tests)
 - #69969 (unix: Set a guard page at the end of signal stacks)
 - #70005 ([rustdoc] Improve visibility for code blocks warnings)
 - #70088 (Use copy bound in atomic operations to generate simpler MIR)

Failed merges:

r? @ghost
@bors bors merged commit 904909f into rust-lang:master Mar 19, 2020
Dylan-DPC-zz pushed a commit to Dylan-DPC-zz/rust that referenced this pull request Mar 28, 2020
…tolnay

Shrink Unicode tables (even more)

This shrinks the Unicode tables further, building upon the wins in rust-lang#68232 (the previous counts differ due to an interim Unicode version update, see rust-lang#69929.

The new data structure is slower by around 3x, on the benchmark of looking up every Unicode scalar value in each data set sequentially in every data set included. Note that for ASCII, the exposed functions on `char` optimize with direct branches, so ASCII will retain the same performance regardless of internal optimizations (or the reverse). Also, note that the size reduction due to the skip list (from where the performance losses come) is around 40%, and, as a result, I believe the performance loss is acceptable, as the routines are still quite fast. Anywhere where this is hot, should probably be using a custom data structure anyway (e.g., a raw bitset) or something optimized for frequently seen values, etc.

This PR updates both the bitset data structure, and introduces a new data structure similar to a skip list. For more details, see the [main.rs] of the table generator, which describes both. The commits mostly work individually and document size wins.

As before, this is tested on all valid chars to have the same results as nightly (and the canonical Unicode data sets), happily, no bugs were found.

[main.rs]: https://github.com/rust-lang/rust/blob/fb4a715e18b/src/tools/unicode-table-generator/src/main.rs

Set             | Previous |  New  |  % of old  | Codepoints | Ranges |
----------------|---------:|------:|-----------:|-----------:|-------:|
Alphabetic      |     3055 |  1599 |        52% |     132875 |    695 |
Case Ignorable  |     2136 |   949 |        44% |       2413 |    410 |
Cased           |      934 |   359 |        38% |       4286 |    141 |
Cc              |       43 |     9 |        20% |         65 |      2 |
Grapheme Extend |     1774 |   813 |        46% |       1979 |    344 |
Lowercase       |      985 |   867 |        88% |       2344 |    652 |
N               |     1266 |   419 |        33% |       1781 |    133 |
Uppercase       |      934 |   777 |        83% |       1911 |    643 |
White_Space     |      140 |    37 |        26% |         25 |     10 |
----------------|----------|-------|------------|------------|--------|
Total           |    11267 |  5829 |        51% |     -      |   -    |
@cuviper cuviper deleted the unicode-13.0.0 branch April 3, 2020 18:37
netbsd-srcmastr pushed a commit to NetBSD/pkgsrc that referenced this pull request Jun 8, 2020
Pkgsrc changes:
 * Remove a couple diffs which are now integrated upstream.
 * Adjust cargo checksums after upstream upgrades.
 * Belatedly bump the curl dependency
 * Unset DESTDIR during the build phase, to work around a mysterious
   build bug deep in the bowels of llvm.
 * Bump nearly all bootstraps to 1.43.1.

Upstream changes:

Version 1.44.0 (2020-06-04)
==========================

Language
--------
- [You can now use `async/.await` with `#[no_std]` enabled.][69033]
- [Added the `unused_braces` lint.][70081]

**Syntax-only changes**

- [Expansion-driven outline module parsing][69838]
```rust
#[cfg(FALSE)]
mod foo {
    mod bar {
        mod baz; // `foo/bar/baz.rs` doesn't exist, but no error!
    }
}
```

These are still rejected semantically, so you will likely receive an error but
these changes can be seen and parsed by macros and conditional compilation.

Compiler
--------
- [Rustc now respects the `-C codegen-units` flag in incremental mode.][70156]
  Additionally when in incremental mode rustc defaults to 256 codegen units.
- [Refactored `catch_unwind`, to have zero-cost unless unwinding is enabled and
  a panic is thrown.][67502]
- [Added tier 3\* support for the `aarch64-unknown-none` and
  `aarch64-unknown-none-softfloat` targets.][68334]
- [Added tier 3 support for `arm64-apple-tvos` and
  `x86_64-apple-tvos` targets.][68191]

Libraries
---------
- [Special cased `vec![]` to map directly to `Vec::new()`.][70632] This allows
  `vec![]` to be able to be used in `const` contexts.
- [`convert::Infallible` now implements `Hash`.][70281]
- [`OsString` now implements `DerefMut` and `IndexMut` returning
  a `&mut OsStr`.][70048]
- [Unicode 13 is now supported.][69929]
- [`String` now implements `From<&mut str>`.][69661]
- [`IoSlice` now implements `Copy`.][69403]
- [`Vec<T>` now implements `From<[T; N]>`.][68692] Where `N` is less than 32.
- [`proc_macro::LexError` now implements `fmt::Display` and `Error`.][68899]
- [`from_le_bytes`, `to_le_bytes`, `from_be_bytes`, `to_be_bytes`,
  `from_ne_bytes`, and `to_ne_bytes` methods are now `const` for all
  integer types.][69373]

Stabilized APIs
---------------
- [`PathBuf::with_capacity`]
- [`PathBuf::capacity`]
- [`PathBuf::clear`]
- [`PathBuf::reserve`]
- [`PathBuf::reserve_exact`]
- [`PathBuf::shrink_to_fit`]
- [`f32::to_int_unchecked`]
- [`f64::to_int_unchecked`]
- [`Layout::align_to`]
- [`Layout::pad_to_align`]
- [`Layout::array`]
- [`Layout::extend`]

Cargo
-----
- [Added the `cargo tree` command which will print a tree graph of
  your dependencies.][cargo/8062] E.g.
  ```
    mdbook v0.3.2 (/Users/src/rust/mdbook)
  +-- ammonia v3.0.0
  |   +-- html5ever v0.24.0
  |   |   +-- log v0.4.8
  |   |   |   +-- cfg-if v0.1.9
  |   |   +-- mac v0.1.1
  |   |   +-- markup5ever v0.9.0
  |   |       +-- log v0.4.8 (*)
  |   |       +-- phf v0.7.24
  |   |       |   +-- phf_shared v0.7.24
  |   |       |       +-- siphasher v0.2.3
  |   |       |       +-- unicase v1.4.2
  |   |       |           [build-dependencies]
  |   |       |           +-- version_check v0.1.5
  ...
  ```
  You can also display dependencies on multiple versions of the same crate with
  `cargo tree -d` (short for `cargo tree --duplicates`).

Misc
----
- [Rustdoc now allows you to specify `--crate-version` to have rustdoc include
  the version in the sidebar.][69494]

Compatibility Notes
-------------------
- [Rustc now correctly generates static libraries on Windows GNU targets with
  the `.a` extension, rather than the previous `.lib`.][70937]
- [Removed the `-C no_integrated_as` flag from rustc.][70345]
- [The `file_name` property in JSON output of macro errors now points the actual
  source file rather than the previous format of `<NAME macros>`.][70969]
  **Note:** this may not point a file that actually exists on the user's system.
- [The minimum required external LLVM version has been bumped to LLVM 8.][71147]
- [`mem::{zeroed, uninitialised}` will now panic when used with types that do
  not allow zero initialization such as `NonZeroU8`.][66059] This was
  previously a warning.
- [In 1.45.0 (the next release) converting a `f64` to `u32` using the `as`
  operator has been defined as a saturating operation.][71269] This was
  previously undefined behaviour, you can use the `{f64, f32}::to_int_unchecked`
  methods to continue using the current behaviour which may desirable in rare
  performance sensitive situations.

Internal Only
-------------
These changes provide no direct user facing benefits, but represent significant
improvements to the internals and overall performance of rustc and
related tools.

- [dep_graph Avoid allocating a set on when the number reads are small.][69778]
- [Replace big JS dict with JSON parsing.][71250]

[69373]: rust-lang/rust#69373
[66059]: rust-lang/rust#66059
[68191]: rust-lang/rust#68191
[68899]: rust-lang/rust#68899
[71147]: rust-lang/rust#71147
[71250]: rust-lang/rust#71250
[70937]: rust-lang/rust#70937
[70969]: rust-lang/rust#70969
[70632]: rust-lang/rust#70632
[70281]: rust-lang/rust#70281
[70345]: rust-lang/rust#70345
[70048]: rust-lang/rust#70048
[70081]: rust-lang/rust#70081
[70156]: rust-lang/rust#70156
[71269]: rust-lang/rust#71269
[69838]: rust-lang/rust#69838
[69929]: rust-lang/rust#69929
[69661]: rust-lang/rust#69661
[69778]: rust-lang/rust#69778
[69494]: rust-lang/rust#69494
[69403]: rust-lang/rust#69403
[69033]: rust-lang/rust#69033
[68692]: rust-lang/rust#68692
[68334]: rust-lang/rust#68334
[67502]: rust-lang/rust#67502
[cargo/8062]: rust-lang/cargo#8062
[`PathBuf::with_capacity`]: https://doc.rust-lang.org/std/path/struct.PathBuf.html#method.with_capacity
[`PathBuf::capacity`]: https://doc.rust-lang.org/std/path/struct.PathBuf.html#method.capacity
[`PathBuf::clear`]: https://doc.rust-lang.org/std/path/struct.PathBuf.html#method.clear
[`PathBuf::reserve`]: https://doc.rust-lang.org/std/path/struct.PathBuf.html#method.reserve
[`PathBuf::reserve_exact`]: https://doc.rust-lang.org/std/path/struct.PathBuf.html#method.reserve_exact
[`PathBuf::shrink_to_fit`]: https://doc.rust-lang.org/std/path/struct.PathBuf.html#method.shrink_to_fit
[`f32::to_int_unchecked`]: https://doc.rust-lang.org/std/primitive.f32.html#method.to_int_unchecked
[`f64::to_int_unchecked`]: https://doc.rust-lang.org/std/primitive.f64.html#method.to_int_unchecked
[`Layout::align_to`]: https://doc.rust-lang.org/std/alloc/struct.Layout.html#method.align_to
[`Layout::pad_to_align`]: https://doc.rust-lang.org/std/alloc/struct.Layout.html#method.pad_to_align
[`Layout::array`]: https://doc.rust-lang.org/std/alloc/struct.Layout.html#method.array
[`Layout::extend`]: https://doc.rust-lang.org/std/alloc/struct.Layout.html#method.extend
netbsd-srcmastr pushed a commit to NetBSD/pkgsrc that referenced this pull request Jun 9, 2020
Version 1.44.0 (2020-06-04)
==========================

Language
--------
- [You can now use `async/.await` with `#[no_std]` enabled.][69033]
- [Added the `unused_braces` lint.][70081]

**Syntax-only changes**

- [Expansion-driven outline module parsing][69838]
```rust
#[cfg(FALSE)]
mod foo {
    mod bar {
        mod baz; // `foo/bar/baz.rs` doesn't exist, but no error!
    }
}
```

These are still rejected semantically, so you will likely receive an error but
these changes can be seen and parsed by macros and conditional compilation.

Compiler
--------
- [Rustc now respects the `-C codegen-units` flag in incremental mode.][70156]
  Additionally when in incremental mode rustc defaults to 256 codegen units.
- [Refactored `catch_unwind` to have zero-cost, unless unwinding is enabled and
  a panic is thrown.][67502]
- [Added tier 3\* support for the `aarch64-unknown-none` and
  `aarch64-unknown-none-softfloat` targets.][68334]
- [Added tier 3 support for `arm64-apple-tvos` and
  `x86_64-apple-tvos` targets.][68191]


Libraries
---------
- [Special cased `vec![]` to map directly to `Vec::new()`.][70632] This allows
  `vec![]` to be able to be used in `const` contexts.
- [`convert::Infallible` now implements `Hash`.][70281]
- [`OsString` now implements `DerefMut` and `IndexMut` returning
  a `&mut OsStr`.][70048]
- [Unicode 13 is now supported.][69929]
- [`String` now implements `From<&mut str>`.][69661]
- [`IoSlice` now implements `Copy`.][69403]
- [`Vec<T>` now implements `From<[T; N]>`.][68692] Where `N` is at most 32.
- [`proc_macro::LexError` now implements `fmt::Display` and `Error`.][68899]
- [`from_le_bytes`, `to_le_bytes`, `from_be_bytes`, `to_be_bytes`,
  `from_ne_bytes`, and `to_ne_bytes` methods are now `const` for all
  integer types.][69373]

Stabilized APIs
---------------
- [`PathBuf::with_capacity`]
- [`PathBuf::capacity`]
- [`PathBuf::clear`]
- [`PathBuf::reserve`]
- [`PathBuf::reserve_exact`]
- [`PathBuf::shrink_to_fit`]
- [`f32::to_int_unchecked`]
- [`f64::to_int_unchecked`]
- [`Layout::align_to`]
- [`Layout::pad_to_align`]
- [`Layout::array`]
- [`Layout::extend`]

Cargo
-----
- [Added the `cargo tree` command which will print a tree graph of
  your dependencies.][cargo/8062] E.g.
  ```
    mdbook v0.3.2 (/Users/src/rust/mdbook)
  ├── ammonia v3.0.0
  │   ├── html5ever v0.24.0
  │   │   ├── log v0.4.8
  │   │   │   └── cfg-if v0.1.9
  │   │   ├── mac v0.1.1
  │   │   └── markup5ever v0.9.0
  │   │       ├── log v0.4.8 (*)
  │   │       ├── phf v0.7.24
  │   │       │   └── phf_shared v0.7.24
  │   │       │       ├── siphasher v0.2.3
  │   │       │       └── unicase v1.4.2
  │   │       │           [build-dependencies]
  │   │       │           └── version_check v0.1.5
  ...
  ```
  You can also display dependencies on multiple versions of the same crate with
  `cargo tree -d` (short for `cargo tree --duplicates`).

Misc
----
- [Rustdoc now allows you to specify `--crate-version` to have rustdoc include
  the version in the sidebar.][69494]

Compatibility Notes
-------------------
- [Rustc now correctly generates static libraries on Windows GNU targets with
  the `.a` extension, rather than the previous `.lib`.][70937]
- [Removed the `-C no_integrated_as` flag from rustc.][70345]
- [The `file_name` property in JSON output of macro errors now points the actual
  source file rather than the previous format of `<NAME macros>`.][70969]
  **Note:** this may not point to a file that actually exists on the user's system.
- [The minimum required external LLVM version has been bumped to LLVM 8.][71147]
- [`mem::{zeroed, uninitialised}` will now panic when used with types that do
  not allow zero initialization such as `NonZeroU8`.][66059] This was
  previously a warning.
- [In 1.45.0 (the next release) converting a `f64` to `u32` using the `as`
  operator has been defined as a saturating operation.][71269] This was previously
  undefined behaviour, but you can use the `{f64, f32}::to_int_unchecked` methods to
  continue using the current behaviour, which may be desirable in rare performance
  sensitive situations.

Internal Only
-------------
These changes provide no direct user facing benefits, but represent significant
improvements to the internals and overall performance of rustc and
related tools.

- [dep_graph Avoid allocating a set on when the number reads are small.][69778]
- [Replace big JS dict with JSON parsing.][71250]

[69373]: rust-lang/rust#69373
[66059]: rust-lang/rust#66059
[68191]: rust-lang/rust#68191
[68899]: rust-lang/rust#68899
[71147]: rust-lang/rust#71147
[71250]: rust-lang/rust#71250
[70937]: rust-lang/rust#70937
[70969]: rust-lang/rust#70969
[70632]: rust-lang/rust#70632
[70281]: rust-lang/rust#70281
[70345]: rust-lang/rust#70345
[70048]: rust-lang/rust#70048
[70081]: rust-lang/rust#70081
[70156]: rust-lang/rust#70156
[71269]: rust-lang/rust#71269
[69838]: rust-lang/rust#69838
[69929]: rust-lang/rust#69929
[69661]: rust-lang/rust#69661
[69778]: rust-lang/rust#69778
[69494]: rust-lang/rust#69494
[69403]: rust-lang/rust#69403
[69033]: rust-lang/rust#69033
[68692]: rust-lang/rust#68692
[68334]: rust-lang/rust#68334
[67502]: rust-lang/rust#67502
[cargo/8062]: rust-lang/cargo#8062
[`PathBuf::with_capacity`]: https://doc.rust-lang.org/std/path/struct.PathBuf.html#method.with_capacity
[`PathBuf::capacity`]: https://doc.rust-lang.org/std/path/struct.PathBuf.html#method.capacity
[`PathBuf::clear`]: https://doc.rust-lang.org/std/path/struct.PathBuf.html#method.clear
[`PathBuf::reserve`]: https://doc.rust-lang.org/std/path/struct.PathBuf.html#method.reserve
[`PathBuf::reserve_exact`]: https://doc.rust-lang.org/std/path/struct.PathBuf.html#method.reserve_exact
[`PathBuf::shrink_to_fit`]: https://doc.rust-lang.org/std/path/struct.PathBuf.html#method.shrink_to_fit
[`f32::to_int_unchecked`]: https://doc.rust-lang.org/std/primitive.f32.html#method.to_int_unchecked
[`f64::to_int_unchecked`]: https://doc.rust-lang.org/std/primitive.f64.html#method.to_int_unchecked
[`Layout::align_to`]: https://doc.rust-lang.org/std/alloc/struct.Layout.html#method.align_to
[`Layout::pad_to_align`]: https://doc.rust-lang.org/std/alloc/struct.Layout.html#method.pad_to_align
[`Layout::array`]: https://doc.rust-lang.org/std/alloc/struct.Layout.html#method.array
[`Layout::extend`]: https://doc.rust-lang.org/std/alloc/struct.Layout.html#method.extend
netbsd-srcmastr pushed a commit to NetBSD/pkgsrc that referenced this pull request Jul 6, 2020
Pkgsrc changes:
 * Remove the clutter caused by the cross-compile setup from Makefile
   (Now consigned to my own private cross.mk file.)
 * Remove a couple of patches which are now integrated upstream.
 * Minor adjustments to a couple of other patches.
 * Adjust cargo checksums after upstream upgrades.
 * Belatedly bump the curl dependency
 * If doing a "dist" build, unset DESTDIR during the build phase,
   to work around a mysterious build bug deep in the bowels of llvm,
   causing llvm tools to be installed to a directory unexpecetd by
   the rest of the rust build, ref.
   rust-lang/rust#73132
   A "dist" build is not expected to be followed by an "install".
 * Bump nearly all bootstraps to 1.43.1; NetBSD earmv7hf bootstrap
   bumped to 1.44.0, as that one now finally builds and works.

Upstream changes:

Version 1.44.0 (2020-06-04)
==========================

Language
--------
- [You can now use `async/.await` with `#[no_std]` enabled.][69033]
- [Added the `unused_braces` lint.][70081]

**Syntax-only changes**

- [Expansion-driven outline module parsing][69838]
```rust
#[cfg(FALSE)]
mod foo {
    mod bar {
        mod baz; // `foo/bar/baz.rs` doesn't exist, but no error!
    }
}
```

These are still rejected semantically, so you will likely receive an error but
these changes can be seen and parsed by macros and conditional compilation.

Compiler
--------
- [Rustc now respects the `-C codegen-units` flag in incremental mode.][70156]
  Additionally when in incremental mode rustc defaults to 256 codegen units.
- [Refactored `catch_unwind`, to have zero-cost unless unwinding is enabled and
  a panic is thrown.][67502]
- [Added tier 3\* support for the `aarch64-unknown-none` and
  `aarch64-unknown-none-softfloat` targets.][68334]
- [Added tier 3 support for `arm64-apple-tvos` and
  `x86_64-apple-tvos` targets.][68191]

Libraries
---------
- [Special cased `vec![]` to map directly to `Vec::new()`.][70632] This allows
  `vec![]` to be able to be used in `const` contexts.
- [`convert::Infallible` now implements `Hash`.][70281]
- [`OsString` now implements `DerefMut` and `IndexMut` returning
  a `&mut OsStr`.][70048]
- [Unicode 13 is now supported.][69929]
- [`String` now implements `From<&mut str>`.][69661]
- [`IoSlice` now implements `Copy`.][69403]
- [`Vec<T>` now implements `From<[T; N]>`.][68692] Where `N` is less than 32.
- [`proc_macro::LexError` now implements `fmt::Display` and `Error`.][68899]
- [`from_le_bytes`, `to_le_bytes`, `from_be_bytes`, `to_be_bytes`,
  `from_ne_bytes`, and `to_ne_bytes` methods are now `const` for all
  integer types.][69373]

Stabilized APIs
---------------
- [`PathBuf::with_capacity`]
- [`PathBuf::capacity`]
- [`PathBuf::clear`]
- [`PathBuf::reserve`]
- [`PathBuf::reserve_exact`]
- [`PathBuf::shrink_to_fit`]
- [`f32::to_int_unchecked`]
- [`f64::to_int_unchecked`]
- [`Layout::align_to`]
- [`Layout::pad_to_align`]
- [`Layout::array`]
- [`Layout::extend`]

Cargo
-----
- [Added the `cargo tree` command which will print a tree graph of
  your dependencies.][cargo/8062] E.g.
  ```
    mdbook v0.3.2 (/Users/src/rust/mdbook)
  +-- ammonia v3.0.0
  |   +-- html5ever v0.24.0
  |   |   +-- log v0.4.8
  |   |   |   +-- cfg-if v0.1.9
  |   |   +-- mac v0.1.1
  |   |   +-- markup5ever v0.9.0
  |   |       +-- log v0.4.8 (*)
  |   |       +-- phf v0.7.24
  |   |       |   +-- phf_shared v0.7.24
  |   |       |       +-- siphasher v0.2.3
  |   |       |       +-- unicase v1.4.2
  |   |       |           [build-dependencies]
  |   |       |           +-- version_check v0.1.5
  ...
  ```
  You can also display dependencies on multiple versions of the same crate with
  `cargo tree -d` (short for `cargo tree --duplicates`).

Misc
----
- [Rustdoc now allows you to specify `--crate-version` to have rustdoc include
  the version in the sidebar.][69494]

Compatibility Notes
-------------------
- [Rustc now correctly generates static libraries on Windows GNU targets with
  the `.a` extension, rather than the previous `.lib`.][70937]
- [Removed the `-C no_integrated_as` flag from rustc.][70345]
- [The `file_name` property in JSON output of macro errors now points the actual
  source file rather than the previous format of `<NAME macros>`.][70969]
  **Note:** this may not point a file that actually exists on the user's system.
- [The minimum required external LLVM version has been bumped to LLVM 8.][71147]
- [`mem::{zeroed, uninitialised}` will now panic when used with types that do
  not allow zero initialization such as `NonZeroU8`.][66059] This was
  previously a warning.
- [In 1.45.0 (the next release) converting a `f64` to `u32` using the `as`
  operator has been defined as a saturating operation.][71269] This was
  previously undefined behaviour, you can use the `{f64, f32}::to_int_unchecked`
  methods to continue using the current behaviour which may desirable in rare
  performance sensitive situations.

Internal Only
-------------
These changes provide no direct user facing benefits, but represent significant
improvements to the internals and overall performance of rustc and
related tools.

- [dep_graph Avoid allocating a set on when the number reads are small.][69778]
- [Replace big JS dict with JSON parsing.][71250]

[69373]: rust-lang/rust#69373
[66059]: rust-lang/rust#66059
[68191]: rust-lang/rust#68191
[68899]: rust-lang/rust#68899
[71147]: rust-lang/rust#71147
[71250]: rust-lang/rust#71250
[70937]: rust-lang/rust#70937
[70969]: rust-lang/rust#70969
[70632]: rust-lang/rust#70632
[70281]: rust-lang/rust#70281
[70345]: rust-lang/rust#70345
[70048]: rust-lang/rust#70048
[70081]: rust-lang/rust#70081
[70156]: rust-lang/rust#70156
[71269]: rust-lang/rust#71269
[69838]: rust-lang/rust#69838
[69929]: rust-lang/rust#69929
[69661]: rust-lang/rust#69661
[69778]: rust-lang/rust#69778
[69494]: rust-lang/rust#69494
[69403]: rust-lang/rust#69403
[69033]: rust-lang/rust#69033
[68692]: rust-lang/rust#68692
[68334]: rust-lang/rust#68334
[67502]: rust-lang/rust#67502
[cargo/8062]: rust-lang/cargo#8062
[`PathBuf::with_capacity`]: https://doc.rust-lang.org/std/path/struct.PathBuf.html#method.with_capacity
[`PathBuf::capacity`]: https://doc.rust-lang.org/std/path/struct.PathBuf.html#method.capacity
[`PathBuf::clear`]: https://doc.rust-lang.org/std/path/struct.PathBuf.html#method.clear
[`PathBuf::reserve`]: https://doc.rust-lang.org/std/path/struct.PathBuf.html#method.reserve
[`PathBuf::reserve_exact`]: https://doc.rust-lang.org/std/path/struct.PathBuf.html#method.reserve_exact
[`PathBuf::shrink_to_fit`]: https://doc.rust-lang.org/std/path/struct.PathBuf.html#method.shrink_to_fit
[`f32::to_int_unchecked`]: https://doc.rust-lang.org/std/primitive.f32.html#method.to_int_unchecked
[`f64::to_int_unchecked`]: https://doc.rust-lang.org/std/primitive.f64.html#method.to_int_unchecked
[`Layout::align_to`]: https://doc.rust-lang.org/std/alloc/struct.Layout.html#method.align_to
[`Layout::pad_to_align`]: https://doc.rust-lang.org/std/alloc/struct.Layout.html#method.pad_to_align
[`Layout::array`]: https://doc.rust-lang.org/std/alloc/struct.Layout.html#method.array
[`Layout::extend`]: https://doc.rust-lang.org/std/alloc/struct.Layout.html#method.extend
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
relnotes Marks issues that should be documented in the release notes of the next release. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.