Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make FileContainer iteration more dict-like #108

Closed
mathause opened this issue Oct 15, 2024 · 6 comments · Fixed by #128
Closed

make FileContainer iteration more dict-like #108

mathause opened this issue Oct 15, 2024 · 6 comments · Fixed by #128

Comments

@mathause
Copy link
Collaborator

Currently we iterate FileContainer as

for filename, meta in fc:
    pass

https://github.com/mathause/filefinder/blob/aef8d48356b814e70bfbcd6ffec8025949d82d3b/filefinder/_filefinder.py#L552

Often the filename is not needed but .create_filename(**meta) is used, so we should make this more dict-like and add .keys(), .values(), and .items() to FileContainer to allow iteration over whatever is needed. Now the difficult question: which should be which? My initial feeling: meta -> keys, filename -> values. But I am not 100% sure.

NOTE: there is no good way to deprecate __iter__.

@veni-vidi-vici-dormivi

@veni-vidi-vici-dormivi
Copy link
Collaborator

Really? Intuitively I would have said the other way around? meta -> values and filename -> keys just because filename is one string and meta is several... values? also the order would fit right now for .items(), no?

for filename, meta in fc.items():
   pass

@mathause
Copy link
Collaborator Author

Yes you are probably right. I think that comes from datalist where it's the other way round 😕 (where data is certainly the values)

@veni-vidi-vici-dormivi
Copy link
Collaborator

Which datalist?

@mathause
Copy link
Collaborator Author

DataList - my DataTree alternative

@mathause
Copy link
Collaborator Author

I started implementing this and I am no longer sure we should use keys and values. We should probably add @property for paths and meta (or a better name?) and have the users loop over that. On the other hand I think items is good. See examples below.

We should probably deprecate __iter__ and remove it sometimes later (there is no good deprecation path from it's current form to only return path).


  1. keys
    for path in fc.keys():
        print(path)
    
    for path in fc.iter_paths():
        print(path)
    
    # or we could make it a @property
    for path in fc.paths:
        print(path)
  2. values
    for meta in fc.values():
        print(meta)
    
    for meta in fc.iter_meta():
        print(meta)
    
    # or we could make it a @property
    for meta in fc.meta:
        print(meta)
  3. items
    for path, meta in fc.items():
        print(path, meta)

    @property
    def meta(self) -> list[dict]:
        return self.df.to_dict("records")

    @property
    def paths(self) -> list[str]:
        return fc.df.index.to_list()

    def items(self) -> tuple[str, dict]:
        for path, element in self.df.iterrows():
            yield path, element.to_dict()

    # ===============================

    def __iter__(self):
        warnings.warn(
            "iterating over a FileContainer now only yields the path. To restore"
            " the old behaviour and/ or supress this warning loop over `.items()`"
            " `.keys()` or `.values()`."
            )

        yield from self.index

    def keys(self):
        """iterate over the paths"""
        yield from self.index

    def values(self):
        for __, element in self.df.iterrows():
            yield element.to_dict()

@veni-vidi-vici-dormivi
Copy link
Collaborator

Hm I like the properties and the .items() yielding path and meta.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants