-
Notifications
You must be signed in to change notification settings - Fork 102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add dirty bitmap tracking abstractions #125
Conversation
@alexandruag We have live migration working in Cloud Hypervisor now taking advantage of the KVM region tracking. I did wonder if we could be in a situation where updates from the VMM side i.e. via GuestMemory might be an issue. The fact you've created this PR makes me think you're seeing this in practice. |
@alexandruag It looks good. But don't need a way to atomically clear the bitmap? Like how the |
Hi Rob, thanks for the quick reply! Yeah, during previous experiments we noticed the KVM dirty log did not catch some of the write accesses performed by the VMM (which makes sense if stuff like PML is used). Regarding your second question, the |
Seems it's a chance to get rid of some unused interfaces, we could add them back later when they are really used. |
Thanks for the comments Gerry, I'll start addressing them soon. |
src/volatile_memory.rs
Outdated
self.bitmap.mark_dirty(addr, count); | ||
src.read_exact(dst).map_err(Error::IOError) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case the read_exact
function fails here, we still mark the pages as dirty. Wouldn't it be best to update the bitmap only in case the write succeeds?
self.bitmap.mark_dirty(addr, count); | |
src.read_exact(dst).map_err(Error::IOError) | |
src.read_exact(dst).map_err(Error::IOError)?; | |
self.bitmap.mark_dirty(addr, count) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added the mark_dirty
first since even if the memory operation fails, I think we can still have partial writes to guest memory, and they are implicitly covered if the entire range was marked already. It is possible to generate false positives this way, but I think they are not significant errors in the context of tracking dirty memory areas.
src/bitmap.rs
Outdated
} | ||
|
||
/// Represents a slice into a `Bitmap` object, starting at `base_offset`. | ||
pub struct BitmapSlice<'a, B> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why doesn't a BitmapSlice
have a size?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I tried to keep the interface to a minimum, and I'd leave it as is for now until we identify a use case that explicitly needs this (sorry if I missed any).
src/bitmap.rs
Outdated
fn new(size: usize) -> Self; | ||
|
||
/// Mark the memory range specified by the given `offset` and `len` as dirtied. | ||
fn mark_dirty(&self, offset: usize, len: usize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would a impl Bitmap for Foo
look like - more precisely, what would Foo
contain to represent the bitmap? Mult be something with interior mutability so you can get away with &self
here - are you thinking direct pointer manipulation? Splitting the bitmap representation into 64-bit chunks and using Atomic
s? Something else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's up to the implementer; anything goes as long as it can implement the interface. Using a sequence of AtomicU64
s does seem like one of the options that first come to mind, and IIRC the implementation used for tracking by Firecracker does this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi everyone, thanks for the comments! I've updated the PR, and the biggest change is that I used HRTBs (drawing inspiration from @Daniel-Aaron-Bloom's PR that was merged a while ago) to turn the tracking into a zero-cost abstraction (no additional operations are performed or space required when a trivial Bitmap
implementation is used). This does make some signatures a bit more complex, but I've also added some type aliases to help with that.
Instead of the Bitmap
trait having constructors itself, I've defined a NewBitmap
trait meant to be used by GuestRegionMmap
to instantiate bitmap objects without having to change its current interface (i.e. such that bitmaps are moved over at construction time). Another thing I'm considering is to rename GuestMemoryMmap
and GuestRegionMmap
to something else, and use the original names for type aliases that imply using a trivial bitmap implementation by default. This should minimize the amount of changes required by vm-memory
consumers that do not need the new functionality.
The next steps are to fix the unit tests, add some generic tests that can be used with any Bitmap
implementation, and provide a backend implementation as well. I'll update this PR with the tests as soon as they're done, and I'm looking forward to any feedback regarding the changes so far.
src/bitmap.rs
Outdated
fn new(size: usize) -> Self; | ||
|
||
/// Mark the memory range specified by the given `offset` and `len` as dirtied. | ||
fn mark_dirty(&self, offset: usize, len: usize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's up to the implementer; anything goes as long as it can implement the interface. Using a sequence of AtomicU64
s does seem like one of the options that first come to mind, and IIRC the implementation used for tracking by Firecracker does this.
src/bitmap.rs
Outdated
} | ||
|
||
/// Represents a slice into a `Bitmap` object, starting at `base_offset`. | ||
pub struct BitmapSlice<'a, B> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I tried to keep the interface to a minimum, and I'd leave it as is for now until we identify a use case that explicitly needs this (sorry if I missed any).
src/volatile_memory.rs
Outdated
self.bitmap.mark_dirty(addr, count); | ||
src.read_exact(dst).map_err(Error::IOError) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added the mark_dirty
first since even if the memory operation fails, I think we can still have partial writes to guest memory, and they are implicitly covered if the entire range was marked already. It is possible to generate false positives this way, but I think they are not significant errors in the context of tracking dirty memory areas.
@alexandruag Would you also be willing to include a simple "off the shelf" Bitmap tracking implementation so every user does not need to construct their own? |
Sorry for not noticing this PR until now. There is one important issue with attaching the bitmap to the GuestMemory, which is that the memory addresses can change but the dirty bitmap should not. Think for example of a large RAM region for an emulated graphics card; because it is a PCI BAR, the base address can move freely. So I'm thinking of a different design where the bitmap is attached to the GuestRegion instead, for example as a decorator where you'd create a Regarding removing the |
@rbradford Yes, the plan is to include a backend implementation as well (guarded by a feature, like @bonzini Thanks for having a look! The bitmap(s) are already attached to individual regions, and the tracking is done relative to the start of each particular region. Is this different from what you were describing? |
You're right, I misread the code. I have now looked at Firecracker too, and the main difference with my proposal in #140 is that Firecracker handles external bitmaps (such as KVM's) by hand in |
Hi! I’ve updated the PR with backend implementations ( I’ve reverted the code removals with the exception of the original |
@bonzini @jiangliu @aghecenco @rbradford @luminitavoicu can you please take another look? I'll go over the orderings in the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did a first pass and overall LGTM! I find the naming scheme a bit confusing with all the new traits and implementors, as it's hard to figure out which is which, but I have no better suggestions (others than those in comments). When this gets "promoted" to non-RFC, I think dedicated benchmarks could be useful in demonstrating the zero part of the zero cost abstractions.
src/bitmap/mod.rs
Outdated
{ | ||
} | ||
|
||
/// Common bitmap operations. Using HRTBs to effectively define an associated type that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I had to search what HRTB stands for; maybe add the full term in parentheses when you first introduce the abbreviation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
||
/// Helper type alias for referring to the `BitmapSlice` concrete type associated with | ||
/// an object `B: WithBitmapSlice<'a>`. | ||
pub type BS<'a, B> = <B as WithBitmapSlice<'a>>::S; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I'd be more verbose with these exported type names, rather than the very abbreviated form (particularly BS
). Maybe BitmapSliceT
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I don't know what the best solution is here. Having the very abbreviated forms helps with keeping type declarations or function return types shorter whenever we need to explicitly mention them. I prefer the shorthand notation, but we can def think about different possibilities here some more until the PR is ready to be merged otherwise.
src/bitmap/backend/atomic_bitmap.rs
Outdated
#[cfg(feature = "backend-mmap")] | ||
use crate::mmap::NewBitmap; | ||
|
||
/// `Bitmap` implements a simple bit map on the page level with test and set operations. It is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/Bitmap/AtomicBitmap
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
I'm not sure why I'll take a look at making a decorator-like implementation where you have |
The |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi again! I've addressed the recently added comments, and I also pushed a new version where the previous removals around MmapRegion
are reverted, and the struct now has an inner object used for tracking (i.e. it's now MmapRegion<B>
). I have to fix some of the doc tests and Windows-specific logic as well, and then I'll bump the status to an actual PR.
I think that adding benchmarks to verify that the newly added abstractions really are zero-cost is a bit heavy-weight; so far I've validated the expectations in this regard mainly by looking at snippets of compiler output (with the regular release optimizations enabled). We can def revisit the idea if folks think it's valuable going forward.
Looking forward to any additional feedback!
src/bitmap/mod.rs
Outdated
{ | ||
} | ||
|
||
/// Common bitmap operations. Using HRTBs to effectively define an associated type that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
||
/// Helper type alias for referring to the `BitmapSlice` concrete type associated with | ||
/// an object `B: WithBitmapSlice<'a>`. | ||
pub type BS<'a, B> = <B as WithBitmapSlice<'a>>::S; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I don't know what the best solution is here. Having the very abbreviated forms helps with keeping type declarations or function return types shorter whenever we need to explicitly mention them. I prefer the shorthand notation, but we can def think about different possibilities here some more until the PR is ready to be merged otherwise.
src/bitmap/backend/atomic_bitmap.rs
Outdated
#[cfg(feature = "backend-mmap")] | ||
use crate::mmap::NewBitmap; | ||
|
||
/// `Bitmap` implements a simple bit map on the page level with test and set operations. It is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
As the first step to complete live-migration with tracking dirty-pages written by the VMM, this patch patches the dependent vm-memory crate to the development branch with dirty-page-tracking capability [1]. For now, we are using our own fork of the vm-memory crate [2] before the feature is landed in the upstream. Most changes are due to the updated `GuestMemoryMmap`, `GuestRegionMmap`, and `MmapRegion` structs which are taking an additional generic type parameter (to specify what 'bitmap backend' is used). For the same reason, similar changes are required in the dependent 'vfio-ioctls' crate (that uses the vm-memory crate). We are also using our own fork of the vfio-ioctls crate before the upstream is updated [3]. The above changes should be transparent to the rest of the code base, e.g. all unit/integration tests should pass without additional changes. [1] rust-vmm/vm-memory#125 [2] https://github.com/cloud-hypervisor/vm-memory/tree/ch [3] https://github.com/cloud-hypervisor/vfio-ioctls/tree/ch Signed-off-by: Bo Chen <chen.bo@intel.com>
As the first step to complete live-migration with tracking dirty-pages written by the VMM, this patch patches the dependent vm-memory crate to the development branch with dirty-page-tracking capability [1]. For now, we are using our own fork of the vm-memory crate [2] before the feature is landed in the upstream. Most changes are due to the updated `GuestMemoryMmap`, `GuestRegionMmap`, and `MmapRegion` structs which are taking an additional generic type parameter (to specify what 'bitmap backend' is used). For the same reason, similar changes are required in the dependent 'vfio-ioctls' crate (that uses the vm-memory crate). We are also using our own fork of the vfio-ioctls crate before the upstream is updated [3]. The above changes should be transparent to the rest of the code base, e.g. all unit/integration tests should pass without additional changes. [1] rust-vmm/vm-memory#125 [2] https://github.com/cloud-hypervisor/vm-memory/tree/ch [3] https://github.com/cloud-hypervisor/vfio-ioctls/tree/ch Signed-off-by: Bo Chen <chen.bo@intel.com>
As the first step to complete live-migration with tracking dirty-pages written by the VMM, this patch patches the dependent vm-memory crate to the development branch with dirty-page-tracking capability [1]. For now, we are using our own fork of the vm-memory crate [2] before the feature is landed in the upstream. Most changes are due to the updated `GuestMemoryMmap`, `GuestRegionMmap`, and `MmapRegion` structs which are taking an additional generic type parameter (to specify what 'bitmap backend' is used). For the same reason, similar changes are required in the dependent 'vfio-ioctls' crate (that uses the vm-memory crate). We are also using our own fork of the vfio-ioctls crate before the upstream is updated [3]. The above changes should be transparent to the rest of the code base, e.g. all unit/integration tests should pass without additional changes. [1] rust-vmm/vm-memory#125 [2] https://github.com/cloud-hypervisor/vm-memory/tree/ch [3] https://github.com/cloud-hypervisor/vfio-ioctls/tree/ch Signed-off-by: Bo Chen <chen.bo@intel.com>
As the first step to complete live-migration with tracking dirty-pages written by the VMM, this patch patches the dependent vm-memory crate to the development branch with dirty-page-tracking capability [1]. For now, we are using our own fork of the vm-memory crate [2] before the feature is landed in the upstream. Most changes are due to the updated `GuestMemoryMmap`, `GuestRegionMmap`, and `MmapRegion` structs which are taking an additional generic type parameter (to specify what 'bitmap backend' is used). For the same reason, similar changes are required in the dependent 'vfio-ioctls' crate (that uses the vm-memory crate). We are also using our own fork of the vfio-ioctls crate before the upstream is updated [3]. The above changes should be transparent to the rest of the code base, e.g. all unit/integration tests should pass without additional changes. [1] rust-vmm/vm-memory#125 [2] https://github.com/cloud-hypervisor/vm-memory/tree/ch [3] https://github.com/cloud-hypervisor/vfio-ioctls/tree/ch Signed-off-by: Bo Chen <chen.bo@intel.com>
Things like "GuestMemoryMmapBase" may be a burden for maintenance. |
As the first step to complete live-migration with tracking dirty-pages written by the VMM, this patch patches the dependent vm-memory crate to the development branch with dirty-page-tracking capability [1]. For now, we are using our own fork of the vm-memory crate [2] before the feature is landed in the upstream. Most changes are due to the updated `GuestMemoryMmap`, `GuestRegionMmap`, and `MmapRegion` structs which are taking an additional generic type parameter (to specify what 'bitmap backend' is used). For the same reason, similar changes are required in the dependent 'vfio-ioctls' crate (that uses the vm-memory crate). We are also using our own fork of the vfio-ioctls crate before the upstream is updated [3]. The above changes should be transparent to the rest of the code base, e.g. all unit/integration tests should pass without additional changes. [1] rust-vmm/vm-memory#125 [2] https://github.com/cloud-hypervisor/vm-memory/tree/ch [3] https://github.com/cloud-hypervisor/vfio-ioctls/tree/ch Signed-off-by: Bo Chen <chen.bo@intel.com>
As the first step to complete live-migration with tracking dirty-pages written by the VMM, this patch patches the dependent vm-memory crate to the development branch with dirty-page-tracking capability [1]. For now, we are using our own fork of the vm-memory crate [2] before the feature is landed in the upstream. Most changes are due to the updated `GuestMemoryMmap`, `GuestRegionMmap`, and `MmapRegion` structs which are taking an additional generic type parameter (to specify what 'bitmap backend' is used). For the same reason, similar changes are required in the dependent 'vfio-ioctls' crate (that uses the vm-memory crate). We are also using our own fork of the vfio-ioctls crate before the upstream is updated [3]. The above changes should be transparent to the rest of the code base, e.g. all unit/integration tests should pass without additional changes. [1] rust-vmm/vm-memory#125 [2] https://github.com/cloud-hypervisor/vm-memory/tree/ch [3] https://github.com/cloud-hypervisor/vfio-ioctls/tree/ch Signed-off-by: Bo Chen <chen.bo@intel.com>
Signed-off-by: Alexandru Agache <aagch@amazon.com>
Signed-off-by: Alexandru Agache <aagch@amazon.com>
Signed-off-by: Alexandru Agache <aagch@amazon.com>
Signed-off-by: Alexandru Agache <aagch@amazon.com>
Signed-off-by: Alexandru Agache <aagch@amazon.com>
Signed-off-by: Alexandru Agache <aagch@amazon.com>
Signed-off-by: Alexandru Agache <aagch@amazon.com>
Signed-off-by: Alexandru Agache <aagch@amazon.com>
You're right; I experimented a bit more and looks like the default generic parameter works automatically in most cases. The only places where something needs to be explicitly specified are free standing statements (i.e. like the examples in the doc comments). Whenever there's context that already specifies (for example) the Right now, the two suggestions about improving the |
I'm OK with that because it's just an optimization:) |
As the first step to complete live-migration with tracking dirty-pages written by the VMM, this patch patches the dependent vm-memory crate to the development branch with dirty-page-tracking capability [1]. For now, we are using our own fork of the vm-memory crate [2] before the feature is landed in the upstream. Most changes are due to the updated `GuestMemoryMmap`, `GuestRegionMmap`, and `MmapRegion` structs which are taking an additional generic type parameter (to specify what 'bitmap backend' is used). For the same reason, similar changes are required in the dependent 'vfio-ioctls' crate (that uses the vm-memory crate). We are also using our own fork of the vfio-ioctls crate before the upstream is updated [3]. The above changes should be transparent to the rest of the code base, e.g. all unit/integration tests should pass without additional changes. [1] rust-vmm/vm-memory#125 [2] https://github.com/cloud-hypervisor/vm-memory/tree/ch [3] https://github.com/cloud-hypervisor/vfio-ioctls/tree/ch Signed-off-by: Bo Chen <chen.bo@intel.com>
As the first step to complete live-migration with tracking dirty-pages written by the VMM, this patch patches the dependent vm-memory crate to the development branch with dirty-page-tracking capability [1]. For now, we are using our own fork of the vm-memory crate [2] before the feature is landed in the upstream. Most changes are due to the updated `GuestMemoryMmap`, `GuestRegionMmap`, and `MmapRegion` structs which are taking an additional generic type parameter (to specify what 'bitmap backend' is used). For the same reason, similar changes are required in the dependent 'vfio-ioctls' crate (that uses the vm-memory crate). We are also using our own fork of the vfio-ioctls crate before the upstream is updated [3]. The above changes should be transparent to the rest of the code base, e.g. all unit/integration tests should pass without additional changes. [1] rust-vmm/vm-memory#125 [2] https://github.com/cloud-hypervisor/vm-memory/tree/ch [3] https://github.com/cloud-hypervisor/vfio-ioctls/tree/ch Signed-off-by: Bo Chen <chen.bo@intel.com>
} | ||
}; | ||
|
||
region.bitmap().mark_dirty(start, len); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use result
instead of len
?
This PR contains an initial proposal to tackle the problem of dirty page/region tracking within a
GuestMemory
object, which happens at the level of each individual region here. We add theBitmap
interface, theBitmapSlice
abstraction (which is passed to objects such asVolatileSlice
s to enable tracking local accesses in the context of the outerGuestMemoryRegion
), and begin to add the necessary invocations of tracking functionality around write accesses. Here are some details and open questions:The PR also removes several things (for example, the
VolatileMemory
impls forMmapRegion
and&mut [u8]
). I did a quick investigation and it looked like these implementations where not specifically used outside the crate, and I think this also simplifies things a bit. They can be added back and made compatible with the current changes if necessary.An open question is what to do (w.r.t. tracking) with methods that return mutable slices or references (the bigger question is whether we should actually have them at all, but that's outside of the current scope :D). My preference is to add the lack of tracking to the safety contract of methods like
as_mut_slice
(i.e. the caller must manually mark the areas that get written to), and to remove (either completely or just from the public interface) methods such asget_atomic_ref
andaligned_as_mut
fromVolatileMemory
. One alternative is to treat them likeas_mut_slice
, but this kinda means we're increasing the number of moving parts that have to be taken into account, and most of their use cases are already covered by theBytes
interface (there's justload
andstore
for atomic ops right now, but we can add the rest as well if required). WDYT?A potentially significant (and breaking) change is the additional generic type parameter for
Volatile{Slice,Ref,ArrayRef}
. Its purpose is turning this into a zero-cost abstraction (as far as accesses are concerned), and to allow different tracking implementations to be used. The change can become unwieldy if there are a lot of fields/variable definitions that explicitly mention the type, and one way around this is to create a type alias instead of replacing everything.GuestMemoryMmap
andGuestRegionMmap
also get an additional type parameter which represents the bitmap implementation in use.Looking forward to any comments! If this makes sense, I'll start adding the scaffolding required to test things for generic
Bytes
implementations as well as specific mutable accesses that are enabled by the variousvm-memory
primitives.