-
Notifications
You must be signed in to change notification settings - Fork 236
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: optimize reading transactions in commit loop #3117
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3117 +/- ##
==========================================
- Coverage 77.92% 77.87% -0.06%
==========================================
Files 242 242
Lines 81738 82015 +277
Branches 81738 82015 +277
==========================================
+ Hits 63695 63869 +174
- Misses 14861 14940 +79
- Partials 3182 3206 +24
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚨 Try these New Features:
|
9b88d10
to
063cb88
Compare
063cb88
to
5937da9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just minor thoughts (though it looks like CI is failing)
@@ -114,6 +114,7 @@ impl FileMetadataCache { | |||
|
|||
pub fn size(&self) -> usize { | |||
if let Some(cache) = self.cache.as_ref() { | |||
cache.sync(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this mainly for testing purposes? Or are there non-testing reasons we need this to be accurate? I don't think it's a problem as we shouldn't be calling size
in a loop so seems fine, just curious.
// If read_version is zero, then it might not have originally been | ||
// passed. We can assume the latest version. | ||
if transaction.read_version > 0 { | ||
builder = builder.with_version(transaction.read_version) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we not need to pass in the read_version anymore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think, we only need to load the read_version
if it's a detached commit. I've fixed this in the latest commits.
rust/lance/src/io/commit.rs
Outdated
.and_then(|(version, other_transaction)| { | ||
let res = check_transaction( | ||
transaction, | ||
version, | ||
Some(other_transaction.as_ref()), | ||
); | ||
futures::future::ready(res) | ||
}) | ||
.try_all(|_| futures::future::ready(true)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you make the and_then
a try_for_each
? Then you don't need the try_all
?
#[derive(Debug)] | ||
struct IoTrackingMultipartUpload { | ||
target: Box<dyn MultipartUpload>, | ||
stats: Arc<Mutex<IoStats>>, | ||
} | ||
|
||
#[async_trait::async_trait] | ||
impl MultipartUpload for IoTrackingMultipartUpload { | ||
async fn abort(&mut self) -> OSResult<()> { | ||
self.target.abort().await | ||
} | ||
|
||
async fn complete(&mut self) -> OSResult<PutResult> { | ||
self.target.complete().await | ||
} | ||
|
||
fn put_part(&mut self, payload: PutPayload) -> UploadPart { | ||
{ | ||
let mut stats = self.stats.lock().unwrap(); | ||
stats.write_iops += 1; | ||
stats.write_bytes += payload.content_length() as u64; | ||
} | ||
self.target.put_part(payload) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yay
Closes #3057