Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: optimize reading transactions in commit loop #3117

Merged
merged 10 commits into from
Nov 21, 2024

Conversation

wjones127
Copy link
Contributor

  • Cache transaction files
  • Read transaction files with concurrency within commit loop

Closes #3057

@codecov-commenter
Copy link

codecov-commenter commented Nov 11, 2024

Codecov Report

Attention: Patch coverage is 78.35294% with 92 lines in your changes missing coverage. Please review.

Project coverage is 77.87%. Comparing base (566082f) to head (4203b71).
Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
rust/lance/src/io/commit.rs 80.46% 12 Missing and 13 partials ⚠️
rust/lance/src/utils/test.rs 25.00% 21 Missing ⚠️
rust/lance/src/dataset.rs 63.26% 14 Missing and 4 partials ⚠️
rust/lance/src/dataset/transaction.rs 20.00% 6 Missing and 2 partials ⚠️
rust/lance-table/src/format/fragment.rs 0.00% 2 Missing and 4 partials ⚠️
rust/lance/src/dataset/write/commit.rs 96.55% 0 Missing and 5 partials ⚠️
rust/lance/src/index.rs 57.14% 2 Missing and 1 partial ⚠️
rust/lance-table/src/io/commit.rs 71.42% 1 Missing and 1 partial ⚠️
rust/lance-io/src/object_store.rs 94.11% 0 Missing and 1 partial ⚠️
...ust/lance-table/src/io/commit/external_manifest.rs 91.66% 0 Missing and 1 partial ⚠️
... and 2 more
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3117      +/-   ##
==========================================
- Coverage   77.92%   77.87%   -0.06%     
==========================================
  Files         242      242              
  Lines       81738    82015     +277     
  Branches    81738    82015     +277     
==========================================
+ Hits        63695    63869     +174     
- Misses      14861    14940      +79     
- Partials     3182     3206      +24     
Flag Coverage Δ
unittests 77.87% <78.35%> (-0.06%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.


🚨 Try these New Features:

@wjones127 wjones127 force-pushed the perf/reading-transactions branch from 063cb88 to 5937da9 Compare November 19, 2024 19:48
@wjones127 wjones127 marked this pull request as ready for review November 20, 2024 21:37
Copy link
Contributor

@westonpace westonpace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just minor thoughts (though it looks like CI is failing)

@@ -114,6 +114,7 @@ impl FileMetadataCache {

pub fn size(&self) -> usize {
if let Some(cache) = self.cache.as_ref() {
cache.sync();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this mainly for testing purposes? Or are there non-testing reasons we need this to be accurate? I don't think it's a problem as we shouldn't be calling size in a loop so seems fine, just curious.

// If read_version is zero, then it might not have originally been
// passed. We can assume the latest version.
if transaction.read_version > 0 {
builder = builder.with_version(transaction.read_version)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we not need to pass in the read_version anymore?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, we only need to load the read_version if it's a detached commit. I've fixed this in the latest commits.

rust/lance/src/io/commit.rs Outdated Show resolved Hide resolved
rust/lance/src/io/commit.rs Outdated Show resolved Hide resolved
Comment on lines 810 to 818
.and_then(|(version, other_transaction)| {
let res = check_transaction(
transaction,
version,
Some(other_transaction.as_ref()),
);
futures::future::ready(res)
})
.try_all(|_| futures::future::ready(true))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make the and_then a try_for_each? Then you don't need the try_all?

Comment on lines +417 to +441
#[derive(Debug)]
struct IoTrackingMultipartUpload {
target: Box<dyn MultipartUpload>,
stats: Arc<Mutex<IoStats>>,
}

#[async_trait::async_trait]
impl MultipartUpload for IoTrackingMultipartUpload {
async fn abort(&mut self) -> OSResult<()> {
self.target.abort().await
}

async fn complete(&mut self) -> OSResult<PutResult> {
self.target.complete().await
}

fn put_part(&mut self, payload: PutPayload) -> UploadPart {
{
let mut stats = self.stats.lock().unwrap();
stats.write_iops += 1;
stats.write_bytes += payload.content_length() as u64;
}
self.target.put_part(payload)
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yay

@wjones127 wjones127 merged commit 8069936 into lancedb:main Nov 21, 2024
26 checks passed
@wjones127 wjones127 deleted the perf/reading-transactions branch November 21, 2024 16:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Optimize IO scheduling for conflict resolution
3 participants