Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dvc add a file from data directory #3218

Closed
dmpetrov opened this issue Jan 22, 2020 · 4 comments · Fixed by #3296
Closed

dvc add a file from data directory #3218

dmpetrov opened this issue Jan 22, 2020 · 4 comments · Fixed by #3296
Assignees
Labels
feature request Requesting a new feature research

Comments

@dmpetrov
Copy link
Member

From a discussion with users: https://opendatascience.slack.com/archives/CGGLZJ119/p1579704007005600 (you need access to this community)

$ mkdir datadir
$ cp ~/whatever/* datadir/
$ dvc add datadir/
WARNING: Output 'datadir' of 'datadir.dvc' changed because it is 'modified'
To track the changes with git, run:

          git add datadir.dvc
$ cp ~/Downloads/newfile.csv datadir/jan2020.csv
$ dvc add datadir/jan2020.csv
ERROR: Paths for outs:                  # <-- error is terrible btw (not related to this issue).
'datadir'('datadir.dvc')
'datadir/file4'('datadir/file4.dvc')
overlap. To avoid unpredictable behaviour, rerun command with non overlapping outs paths.

The last command fails because the file is inside a data dir and you suppose to update (dvc add) the entire dir. However, a user intuition says (for some users) to add a single file.

Ideally, this should work:

$ dvc add datadir/jan2020.csv
'jan2020' was added to dir 'datadir' and 'datadir.dvc' changed because it is 'modified'
100% Add|██████████████████████████████████|1.00/1.00 [00:00<00:00,  2.76file/s]

To track the changes with git, run:

	git add datadir.dvc
@triage-new-issues triage-new-issues bot added the triage Needs to be triaged label Jan 22, 2020
@dmpetrov dmpetrov added the feature request Requesting a new feature label Jan 22, 2020
@triage-new-issues triage-new-issues bot removed the triage Needs to be triaged label Jan 22, 2020
@efiop
Copy link
Contributor

efiop commented Jan 23, 2020

I'm not sure about this implicit behavior. A user might not even notice that we did that trick and be then surprised that his file doesn't have their own dvc file. This is the only case where we've seen a report like that, so I would rather wait for someone else to report it, to understand if this is a one-off cofusion of particular user or something reoccurring.

@efiop
Copy link
Contributor

efiop commented Jan 23, 2020

From conversation with @dmpetrov it is clear that he has seen other occurences of this issue and it would be great to at least improve the error

@pared
Copy link
Contributor

pared commented Feb 10, 2020

Reproduction script:

#!/bin/bash

rm -rf repo

mkdir repo

set -ex

pushd repo

git init --quiet && dvc init -q

mkdir data
echo 1 >> data/1

dvc add data

echo 2 >> data/2

dvc add data/2

@pared
Copy link
Contributor

pared commented Feb 10, 2020

@efiop @dmpetrov I created pull request that suggests user how this situation should be handled,
do you want to continue work, and make dvc autocommit new file?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Requesting a new feature research
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants