Skip to content
This repository has been archived by the owner on Jul 31, 2023. It is now read-only.

Add guard for non-image files in image directory input #32

Open
cfezequiel opened this issue Oct 1, 2020 · 4 comments
Open

Add guard for non-image files in image directory input #32

cfezequiel opened this issue Oct 1, 2020 · 4 comments
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@cfezequiel
Copy link
Contributor

It would be good to add some check in case there are non-image files in an image directory.

Describe the solution you'd like
A simple filter would suffice, e.g.

If not image file: 
    skip

Describe alternatives you've considered
A: Do nothing - potential for tool to fail while processing data, which could waste user's time
B: Filter at the DataFrame level - best not to propagate errors downstream

Additional context
See client._read_image_directory.

@cfezequiel cfezequiel added enhancement New feature or request good first issue Good for newcomers labels Oct 1, 2020
@lc0
Copy link
Contributor

lc0 commented Oct 22, 2020

@cfezequiel I wonder if it makes sense to switch to tf.io.gfile.glob this way we could provide pattern for images like

'dataset-folder/*/**/*.jpg'

This change would also simplify code quite a bit

@lc0
Copy link
Contributor

lc0 commented Oct 22, 2020

Additionally, would be beneficial to do better processing of filenames. I had a dataset collected from the internet and it had , or could potentially have ".

Currently I fixed it locally, but could be addressed in the library itself

@lc0
Copy link
Contributor

lc0 commented Oct 29, 2020

another ping @cfezequiel @mbernico

@cfezequiel cfezequiel pinned this issue Nov 10, 2020
@cfezequiel
Copy link
Contributor Author

Hi @lc0 , thanks for the feedback, and apologies for the delay in response. It seems I wasn't getting any notifications for non-PR comments in this repo. That's an interesting idea. I can see how it could simplify parsing since the image files will be in one list. I think it would add a bit more burden to the user to specify a glob instead of just the directory path, but shouldn't be a big deal. We were also thinking of possible supporting other directory structures (e.g. label/image only) for flexibility.

Regarding filename processing, could you elaborate on the problem a bit more and the solution that you came up with? Feel free to send a PR btw and I'll be happy to review it.

@cfezequiel cfezequiel unpinned this issue Nov 12, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants