Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LocalFileSystem list operation returns objects in wrong order #6375

Closed
paraseba opened this issue Sep 9, 2024 · 5 comments
Closed

LocalFileSystem list operation returns objects in wrong order #6375

paraseba opened this issue Sep 9, 2024 · 5 comments
Labels
question Further information is requested

Comments

@paraseba
Copy link
Contributor

paraseba commented Sep 9, 2024

Describe the bug
The LocalFileSystem list implementation returns objects in what looks like arbitrary order.

S3's ListObjectsV2 returns objects in lexicographical order of their keys:

General purpose bucket - For general purpose buckets, ListObjectsV2 returns objects in lexicographical order based on their key names.

To Reproduce
The following code fails the assertion:

use futures::TryStreamExt;
use object_store::{local::LocalFileSystem, ObjectStore, PutPayload};
use std::assert_eq;
use std::fs::create_dir_all;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let dir = "/tmp/reproducer";
    create_dir_all(dir)?;

    let store = LocalFileSystem::new_with_prefix(dir)?;
    for i in 0..=9 {
        store.put(&(i.to_string()).into(), PutPayload::from_static(b"42")).await?;
    }

    let actual = store
        .list(None)
        .map_ok(|om| om.location.to_string())
        .try_collect::<Vec<_>>()
        .await?;

    let expected = (0..=9).map(|s| s.to_string()).collect::<Vec<_>>();
    assert_eq!(actual, expected);
    Ok(())
}

Expected behavior
LocalFileSystem should mimic what ListObjectsV2 does, returning objects in lexicographical order of their keys.

Additional context
InMemory doesn't fail this test.

@paraseba paraseba added the bug label Sep 9, 2024
@Xuanwo
Copy link
Member

Xuanwo commented Sep 9, 2024

LocalFileSystem should mimic what ListObjectsV2 does, returning objects in lexicographical order of their keys.

Hi, this behavior is expected. We can't perform sorting over the LocalFileSystem since we don't have such an API, and we can't collect keys and sort them in-memory. The best we can do is return objects in the order that the filesystem provides.

@paraseba
Copy link
Contributor Author

paraseba commented Sep 9, 2024

@Xuanwo I see. Would it make sense to have sorting behavior as optional? Intended only for small stores that fin in memory? walkdir library has the option to do that. Otherwise, is the listing behavior documented somewhere? I'm not sure what level of "compatibility" the different stores implementations try to achieve.

@alamb
Copy link
Contributor

alamb commented Sep 9, 2024

Another potential option is to implement your own wrapper over LocalFileSystem that sorted the results by file name

like

struct SortedLocalFileSystem {
  inner: LocalFileSystem 
}

impl ObjectStore for SortedLocalFileSystem {
  // in the implementation you could intercept the various calls and implement 
  // whatever behavior you wanted
}

@Xuanwo
Copy link
Member

Xuanwo commented Sep 9, 2024

Would it make sense to have sorting behavior as optional?

Sorry, but it doesn't make sense to me now. If you care about the order, you can implement something like what @alamb mentioned, which should be simple.

Intended only for small stores that fin in memory?

The most interesting part is this: we don't know how many items there are until we list them all. This inconsistent behavior based on item size might introduce more issues. For example, users might find the object_store behaves one way when there are only 10 items, but differently in production services.

I'm not sure what level of "compatibility" the different stores implementations try to achieve.

/// List all the objects with the given prefix.
///
/// Prefixes are evaluated on a path segment basis, i.e. `foo/bar/` is a prefix of `foo/bar/x` but not of
/// `foo/bar_baz/x`. List is recursive, i.e. `foo/bar/more/x` will be included.
///
/// Note: the order of returned [`ObjectMeta`] is not guaranteed

No guarantee has been provided so far.

@paraseba
Copy link
Contributor Author

paraseba commented Sep 9, 2024

Thanks for the quick responses. Closing this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants