Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit image referencing possibilities in imported TSVs #806

Open
grololo06 opened this issue May 6, 2022 · 2 comments
Open

Limit image referencing possibilities in imported TSVs #806

grololo06 opened this issue May 6, 2022 · 2 comments
Assignees
Labels
feature New functionality page-import Everything related to import functionality to clarify

Comments

@grololo06
Copy link
Member

As of today and since a few years, probably due to the addition of UVP6 file format, column img_file_name can contain nearly any valid path and allow to escape 'current ' directory:

https://github.com/ecotaxa/ecotaxa/blob/c84dfdffaa84aa759b0f6de9c14047aa5fca0452/appli/tasks/taskimport.py#L224

Demo of the behavior of "/" operator used above:

>>> from pathlib import Path
>>> p = Path("/home/laurent")
>>> p/"Devs"
PosixPath('/home/laurent/Devs')
>>> p/"/tmp"
PosixPath('/tmp')
>>> p/".."
PosixPath('/home/laurent/..')

The behavior has been ported identically to the new back-end development, but with a bit more comments:

https://github.com/ecotaxa/ecotaxa_back/blob/073997062f01cb5650558e9d4f6f1e7a32e313c1/py/BO/TSVFile.py#L232

It is (now) obvious that the original code was a mistake in that:

  • it did not check that img_file_name was relative.
  • it did not restrict the possibility of accessing sub-directories to the UVP6 format.
@grololo06 grololo06 added feature New functionality page-import Everything related to import functionality labels May 6, 2022
@grololo06
Copy link
Member Author

I'm putting the issue in 'to clarify' as I'm unsure on how much strictness we should move to.

@jiho
Copy link
Contributor

jiho commented May 6, 2022

It should be possible to place images in subdirectories for any kind of instrument but those subdirectories should always be relative to the .tsv file and below it.

So this is valid

my_data/
  toto.tsv
  img1.jpg
  img2.jpg
my_data/
  toto.tsv
  imgs/img1.jpg
  imgs/img2.jpg
my_data/
  toto.tsv
  imgs/taxonA/img1.jpg
  imgs/taxonB/img2.jpg

But this not valid

my_data/
  toto.tsv
img1.jpg
img2.jpg
/home/me/my_data/
              toto.tsv
/archive/img1.jpg
/archive/img2.jpg

The rationale for authorising images in sub-directories is that, for large datasets, one usually likes to have all tsv files at the root and images in subdirectories to be able to quickly access all .tsv without fishing for them inside folders of thousands of images. Also, having images sorted in per taxon folders is quite common for archived datasets and allowing to create a .tsv file to reference them and import them is convenient.

We don't want to authorise images outside of the root folder because some checks become impossible (#286) and this opens the way to importing unwanted things. It also makes things messy when unzipping zip archives.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New functionality page-import Everything related to import functionality to clarify
Projects
None yet
Development

No branches or pull requests

5 participants