-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
improve folder name for persistent doc tests #69458
Conversation
One solution is to pass around a If there is a duplicate folder it would be overwritten (if the folder does not already exist in the This would prevent cluttering of the folder (old tests would be overwritten) and it does not involve difficult parsing to know wether or not the test is from a proc-macro. It would also prevent future name clashes. |
Just a thought about this: instead of basing the id on the file line, wouldn't it be better to base on it the test number? Like it's the third test of this file so Also, cc @rust-lang/rustdoc |
@GuillaumeGomez this would work, but from where would you get the test number? or should this iterate through all folders? (would require one more syscall for each test, which could decrease runtime performance) |
The job Click to expand the log.
I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact |
Simply reading the folder names and putting the test in the next free folder would not be a viable solution, because this would mean that old test will never be overwritten, therefore one has to keep somewhere a list of all folders that were already created. I decided to use a |
☔ The latest upstream changes (presumably #69592) made this pull request unmergeable. Please resolve the merge conflicts. |
src/librustdoc/test.rs
Outdated
@@ -205,6 +207,7 @@ fn run_test( | |||
mut error_codes: Vec<String>, | |||
opts: &TestOptions, | |||
edition: Edition, | |||
visited_tests: &Mutex<HashMap<String, usize>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be changed to visited_tests: &Mutex<HashMap<u64, usize>>
?
It is not required to keep the entire folder_name
in memory a hash would suffice.
src/librustdoc/test.rs
Outdated
visited_tests | ||
.lock() | ||
.unwrap() | ||
.entry(folder_name.clone()) | ||
.and_modify(|v| *v += 1) | ||
.or_insert(0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part would then be changed to something like this:
let mut hasher = std::collections::hash_map::DefaultHasher::new();
folder_name.hash(&mut hasher);
visited_tests
.lock()
.unwrap()
.entry(hasher.finish())
.and_modify(|v| *v += 1)
.or_insert(0)
which would most likely increase performance, because only 8 bytes have to be saved for each test in the HashMap
, instead of an arbitrary number of bytes, which would be most likely larger than 8 bytes. I think cloning an entire String
should be slower than hashing it.
I am not sure if this change is desirable, because it might make the code harder to read?
The naming after a location in the source has always bugged me -- I wonder if there's any hope we could actually name the test after the path to the item being tested, and then perhaps a monotonic test number for that item. |
Sorry for not responding earlier, completely missed the notification... That's why I suggested to use the position (not in term of line) of the test in the file. But it still seems to be not enough, but I'm not sure what I'm missing here... An idea maybe @ollie27 ? |
What is wrong with the current implementation? |
I can't put the finger on it. But maybe your implementation is perfectly correct and I'm just imagining things. That's why I asked for other opinions. If they don't find anything, then we can just merge it. Don't worry, we'll move forward and sorry it takes so much time! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall I think what this PR is proposing is a nice improvement but there will still be chances for filename collisions.
I don't like removing the line number from the path as that will make it difficult to figure out which test corresponds to which path. Maybe the line number can be included as well as the incremented number like "{name}_{line}_{number}"
. Possibly only appending the number if it's greater than 0 so in most cases it won't be noticed.
I did not implement this, because I think it is better to have a unified naming scheme. It would also increase the code complexity unnecessarily. |
Ping from triage: |
I already addressed the changes requested by @ollie27. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry about the delay.
With the line number included in the hash this should be good to merge. We could do with some regression tests for --persist-doctests
though but I can't see an easy way to write them so we can leave that as a follow up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me now. Just waiting for @ollie27's confirmation and then it's good to go! Thanks a lot!
Yeah, looks good to me too. @bors r=GuillaumeGomez,ollie27 |
📌 Commit bc00b16 has been approved by |
…ie27 improve folder name for persistent doc tests This partially fixes rust-lang#69411 by using the entire path as folder name, but I do not know how to deal with the proc-macro problem, where a doc test is forwarded to multiple generated functions, which have the same line for the doc test (origin). For example ```rust #[derive(ShortHand)] pub struct ExtXMedia { /// The [`MediaType`] associated with this tag. /// /// # Example /// -> /// ``` <- this line is given to `run_test` /// # use hls_m3u8::tags::ExtXMedia; /// use hls_m3u8::types::MediaType; /// /// let mut media = ExtXMedia::new(MediaType::Audio, "ag1", "english audio channel"); /// /// media.set_media_type(MediaType::Video); /// /// assert_eq!(media.media_type(), MediaType::Video); /// ``` /// /// # Note /// /// This attribute is required. #[shorthand(enable(copy))] media_type: MediaType, // the rest of the fields are omitted } ``` and my proc macro generates ```rust #[allow(dead_code)] impl ExtXMedia { /// The [`MediaType`] associated with this tag. /// /// # Example /// /// ``` /// # use hls_m3u8::tags::ExtXMedia; /// use hls_m3u8::types::MediaType; /// /// let mut media = ExtXMedia::new(MediaType::Audio, "ag1", "english audio channel"); /// /// media.set_media_type(MediaType::Video); /// /// assert_eq!(media.media_type(), MediaType::Video); /// ``` /// /// # Note /// /// This attribute is required. #[inline(always)] #[must_use] pub fn media_type(&self) -> MediaType { struct _AssertCopy where MediaType: ::std::marker::Copy; self.media_type } /// The [`MediaType`] associated with this tag. /// /// # Example /// /// ``` /// # use hls_m3u8::tags::ExtXMedia; /// use hls_m3u8::types::MediaType; /// /// let mut media = ExtXMedia::new(MediaType::Audio, "ag1", "english audio channel"); /// /// media.set_media_type(MediaType::Video); /// /// assert_eq!(media.media_type(), MediaType::Video); /// ``` /// /// # Note /// /// This attribute is required. #[inline(always)] pub fn set_media_type<VALUE: ::std::convert::Into<MediaType>>( &mut self, value: VALUE, ) -> &mut Self { self.media_type = value.into(); self } } ``` rustdoc then executes both tests with the same line (the line from the example above the field -> 2 different tests have the same name). We need a way to differentiate between the two tests generated by the proc-macro, so that they do not cause threading issues.
When this PR lands it will close that issue (because "fixes" is a magic keyword for GitHub). Is that deliberate, given that it is just a partial fix? If no, please edit the PR message to no longer say "fixes" (or "closes"). |
@RalfJung I updated the post. So it should be fine if the issue is closed with this PR. |
⌛ Testing commit bc00b16 with merge a46c7116049fcbc2e8f7c72db429d36e14310020... |
💔 Test failed - checks-azure |
I updated the fork ( |
Once the CI is ok, let's approve again then. |
@bors r=GuillaumeGomez,ollie27 |
📌 Commit 2e40ac7 has been approved by |
⌛ Testing commit 2e40ac7 with merge 02bf2b4659cb49caa3f0281f354ab048d3652e88... |
@bors retry yielding |
Rollup of 7 pull requests Successful merges: - rust-lang#69425 (add fn make_contiguous to VecDeque) - rust-lang#69458 (improve folder name for persistent doc tests) - rust-lang#70268 (Document ThreadSanitizer in unstable-book) - rust-lang#70600 (Ensure there are versions of test code for aarch64 windows) - rust-lang#70606 (Clean up E0466 explanation) - rust-lang#70614 (remove unnecessary relocation check in const_prop) - rust-lang#70623 (Fix broken link in README) Failed merges: r? @ghost
This fixes #69411, by using the entire path as folder name and storing already visited paths in a HashMap + appending a number to the file name for duplicates.