-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Use a sharded dep node to dep node index map #61845
Conversation
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
c8b7c83
to
945366c
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
[WIP] Make dep node indices persistent between sessions This makes marking dep nodes green faster (and lock free in the case with no diagnostics). This change is split out from #60035. Unlike #60035 this makes loading the dep graph slower because it loads 2 copies of the dep graph, one immutable and one mutable. Based on #61845, #61779 and #61923.
@bors try |
⌛ Trying commit ca487efd28eca8b8f196d6ed4ac9622a47d75703 with merge 0375265c8567d1c4486ee7705d209471ab694578... |
☀️ Try build successful - checks-travis |
@rust-timer build 0375265c8567d1c4486ee7705d209471ab694578 |
Success: Queued 0375265c8567d1c4486ee7705d209471ab694578 with parent 4a365a2, comparison URL. |
Finished benchmarking try commit 0375265c8567d1c4486ee7705d209471ab694578, comparison URL. |
☔ The latest upstream changes (presumably #62580) made this pull request unmergeable. Please resolve the merge conflicts. |
Oh, @aturon might want to take this (#61873 (comment)). |
Also not interested in reviewing any more of these. |
r? @aturon then |
The job Click to expand the log.
I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact |
@Zoxc why is this marked as |
Also, I feel like I already reviewed this =) I guess I'm thinking of #65472 -- does that PR subsume this one? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. It took me a while to figure out what was going on in this PR but I get it now. Maybe we can edit the title to something more suggestive? It seems like the big thing that's happening here is that:
- we are sharing the data vector (not the "dep-node to dep-node-index map", as the PR title suggests)
- we are tweaking how we do locks so that we release the lock on the dep-node-to-dep-node-index map faster, and the locks don't overlap
I left some suggestions, mostly to add comments, but they include one API change. What do you think, @Zoxc?
@@ -936,7 +954,14 @@ struct DepNodeData { | |||
/// we first acquire the `node_to_node_index` lock and then, once a new node is to be inserted, | |||
/// acquire the lock on `data.` | |||
pub(super) struct CurrentDepGraph { | |||
data: Lock<IndexVec<DepNodeIndex, DepNodeData>>, | |||
/// The current node count. Used to allocate an index before storing it in the | |||
/// `data` and `node_to_node_index` field below. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice comments :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it might be worth adding something like: "This is semantically the length of the data vector at any given point" if that's accurate? That's how I understood it.
(The reason we can't query the length directly is we have N shards of the vector in reality)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems good, yes (including the parenthetical note at the end)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// `data` and `node_to_node_index` field below. | |
/// `data` and `node_to_node_index` field below. | |
/// | |
/// NB. This is the "logical" length of the complete data vector at any point. | |
/// In practice, though, the "logical" data vector is sharded across many vectors, | |
/// which will of course be of smaller lengths. |
} | ||
}; | ||
|
||
if inserted { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This deserves a comment. It took me a bit to read and understand what was happening. Let me take a stab at it.
In the code above, we secured an index (index
) for this node. Now we have to adjust the data
array to include this index. However, note that we've realized the lock on the node_to_node_index
map -- so there could be multiple threads contending at this point, each with their own index.
Most of the time, the same thread will acquire both locks, one after the other. In that case, index
will equal self.data.len()
and we can just push onto the end of self.data
.
Other times, however, we may find there is a gap -- e.g., we have been assigned index N+1
, but the length of self.data
is only N
. In that case, we will push a dummy node to represent the index N
and then push our own node for index N+1
. At this point, we return; when the thread with index N
acquires the lock, it will overwrite the dummy node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Update: I wrote this comment before I understood that the vector itself was sharded, so the text is a bit off...)
if likely!(len == inner_index) { | ||
data.push(dep_node_data) | ||
} else { | ||
let dummy_data = DepNodeData { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment here:
We can't just push, there is some kind of gap. Add dummy nodes as needed (they may already have been added), and then overwrite our index with the correct value.
let inner_index = index.as_usize() / sharded::SHARDS; | ||
let mut data = self.data.get_shard_by_index(index.as_usize()).lock(); | ||
let len = data.len(); | ||
if likely!(len == inner_index) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment:
The happy path, where we can just push onto the end of the array.
@@ -60,6 +60,11 @@ impl<T> Sharded<T> { | |||
} | |||
} | |||
|
|||
#[inline] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a doc-comment here? Further, I feel like I would like this function to encapsulate both the modulo and the division operations (I think I left this feedback on some other PR before).
I'd like to see the API be:
/// Index into a sharded vector. The lower-order bits of `index`
/// are used to determine the shared, and the higher-order bits
/// are used to select the index within the shard.
///
/// Returns:
/// * a reference to the shared, `&Lock<T>`
/// * the index within the shared, a usize
pub fn get_shard_by_index(&self, index: usize) -> (&Lock<T>, usize) ```
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Obviously this would affect the callers, which would no longer hard-code the /
operation
fingerprint | ||
}; | ||
let inner_index = index.as_usize() / sharded::SHARDS; | ||
let mut data = self.data.get_shard_by_index(index.as_usize()).lock(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with my suggested API below, this would be
let (mut data, inner_index) = self.data.get_shard_by_index(index.as_usize()).lock();
index | ||
} | ||
|
||
fn data( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a doc-comment would be good here -- something like
Returns a guard that gives access to the data with index index
. This requires first finding the correct shard, acquiring the lock, and then returning the correct index.
&self, | ||
index: DepNodeIndex, | ||
) -> MappedLockGuard<'_, DepNodeData> { | ||
LockGuard::map(self.data.get_shard_by_index(index.as_usize()).lock(), |vec| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With my suggested API, this would become
|(vec, index)| { ... }
and there would be no need to hard-code the division.
@Zoxc OK -- well, I'm r+ on the general concept, though I'd like to take one more look after the nits / API suggestions above are made. I'm definitely 👍 on breaking out these changes into smaller PRs than #62038! =) Much more approachable. One thing I did wonder: The scheme where we release the index-map lock before acquiring the lock on the particular shard seems potentially more complex than necessary. Is the main reason to do that performance? Obviously, it would allow multiple threads to insert nodes into their respective shards in parallel, which seems good, but it's also kind of complex. I'm wondering what the overall evidence is for the impact of this patch in terms of performance. |
Ping from triage: this has sat idle for the last few days |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(github requires me to post a comment for some reason before I can respond to people's threads)
Ping from triage, any updates? @Zoxc |
1 similar comment
Ping from triage, any updates? @Zoxc |
Ping from Triage: @Zoxc closing due to inactivity. Please re-open with any updates. |
The job Click to expand the log.
I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact |
Ping from triage - |
☔ The latest upstream changes (presumably #67362) made this pull request unmergeable. Please resolve the merge conflicts. |
☔ The latest upstream changes (presumably #67540) made this pull request unmergeable. Please resolve the merge conflicts. |
This reduces lock contention a bit for incremental compilation. #60035 gets rid of the locks, but is much larger and probably won't land until @michaelwoerister returns. This reduces clean incremental times for
winapi
from 18.37s to 16.43s on 16 threads / 8 cores.Based on #61779.
r? @eddyb