Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make InMemory object store track last modified time for each entry #3796

Merged
merged 2 commits into from
Mar 4, 2023

Conversation

Weijun-H
Copy link
Member

@Weijun-H Weijun-H commented Mar 2, 2023

Which issue does this PR close?

Closes #3782

Rationale for this change

Explain in #3782

What changes are included in this PR?

In struct InMemoryUpload, it can store btye with timestamp

Are there any user-facing changes?

@github-actions github-actions bot added the object-store Object Store Interface label Mar 2, 2023
@Weijun-H Weijun-H changed the title refactor: allow InMemoryUpload to store timestamp Make InMemory object store track last modified time for each entry Mar 2, 2023
@Weijun-H Weijun-H force-pushed the refactor-InMemory branch from 4ebc9bb to 7fd6fb1 Compare March 2, 2023 21:36
Copy link
Contributor

@gruuya gruuya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd argue that the copy/copy_if_not_exists methods should also alter the last modified timestamps if we're going to emulate local FS behavior:

$ touch test
$ date -r test
Fri Mar  3 09:32:25 CET 2023
$ cp test test_cp
$ date -r test_cp
Fri Mar  3 09:33:50 CET 2023

That said, rename also uses copy (and likewise for rename_if_not_exists), but it probably shouldn't affect the timestamp:

$ mv test test_mv
$ date -r test_mv
Fri Mar  3 09:32:25 CET 2023

Copy link
Contributor

@tustvold tustvold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should be returning this timestamp as the last_modified when we return ObjectMeta, and possibly verify that we are setting it consistently in all the mutating methods (such as copy)

@@ -273,22 +278,23 @@ impl InMemory {
}
}

async fn get_bytes(&self, location: &Path) -> Result<Bytes> {
async fn get_bytes(&self, location: &Path) -> Result<(Bytes, DateTime<Utc>)> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
async fn get_bytes(&self, location: &Path) -> Result<(Bytes, DateTime<Utc>)> {
async fn entry(&self, location: &Path) -> Result<(Bytes, DateTime<Utc>)> {

@@ -33,6 +33,8 @@ use std::sync::Arc;
use std::task::Poll;
use tokio::io::AsyncWrite;

type StorageType = Arc<RwLock<BTreeMap<Path, (Bytes, DateTime<Utc>)>>>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
type StorageType = Arc<RwLock<BTreeMap<Path, (Bytes, DateTime<Utc>)>>>;
type Entry = (Bytes, DateTime<Utc>);
type StorageType = Arc<RwLock<BTreeMap<Path, Entry>>>;

@@ -150,7 +155,7 @@ impl ObjectStore for InMemory {
Ok(ObjectMeta {
location: location.clone(),
last_modified,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should return the timestamp from the entry?

@@ -181,7 +186,7 @@ impl ObjectStore for InMemory {
Ok(ObjectMeta {
location: key.clone(),
last_modified,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be the value from the entry?

@@ -225,7 +230,7 @@ impl ObjectStore for InMemory {
let object = ObjectMeta {
location: k.clone(),
last_modified,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto here

@Weijun-H Weijun-H force-pushed the refactor-InMemory branch from 7fd6fb1 to c11ec79 Compare March 3, 2023 11:06
@@ -273,22 +275,23 @@ impl InMemory {
}
}

async fn get_bytes(&self, location: &Path) -> Result<Bytes> {
async fn enty(&self, location: &Path) -> Result<(Bytes, DateTime<Utc>)> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
async fn enty(&self, location: &Path) -> Result<(Bytes, DateTime<Utc>)> {
async fn entry(&self, location: &Path) -> Result<(Bytes, DateTime<Utc>)> {

@Weijun-H Weijun-H force-pushed the refactor-InMemory branch from c11ec79 to 697a504 Compare March 3, 2023 12:45
@@ -238,13 +240,13 @@ impl ObjectStore for InMemory {
}

async fn copy(&self, from: &Path, to: &Path) -> Result<()> {
let data = self.get_bytes(from).await?;
let data = self.entry(from).await?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should probably generate a new last modified timestamp

Copy link
Member Author

@Weijun-H Weijun-H Mar 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

like this self.storage.write().insert(to.clone(), (data.0, Utc::now()));?

self.storage.write().insert(to.clone(), data);
Ok(())
}

async fn copy_if_not_exists(&self, from: &Path, to: &Path) -> Result<()> {
let data = self.get_bytes(from).await?;
let data = self.entry(from).await?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto here

@Weijun-H Weijun-H force-pushed the refactor-InMemory branch from aaaf49a to 3f6a920 Compare March 4, 2023 16:09
@tustvold tustvold merged commit 79518cf into apache:master Mar 4, 2023
@tustvold
Copy link
Contributor

tustvold commented Mar 4, 2023

Thanks again

@ursabot
Copy link

ursabot commented Mar 4, 2023

Benchmark runs are scheduled for baseline = 6cd0917 and contender = 79518cf. 79518cf is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
object-store Object Store Interface
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make InMemory object store track last modified time for each entry
4 participants