Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ci skip] flag shows up in commit message when it wasn't called #30

Closed
elleobrien opened this issue Mar 25, 2020 · 14 comments
Closed

[ci skip] flag shows up in commit message when it wasn't called #30

elleobrien opened this issue Mar 25, 2020 · 14 comments

Comments

@elleobrien
Copy link
Contributor

I replicated the experiment in the Wiki (GitHub version). When I made my first commit, the commit messages are showing up as dvc repro [ci skip]. I wouldn't expect to see [ci skip] if I didn't include a flag for that, and in fact, I'm sure the CI ran! It might help me avoid confusion if we avoid printing that flag in commit messages except when it is explicitly called by the user.

image

@DavidGOrtega
Copy link
Contributor

Hi Elle, this is because DVC-cml pushes back the update of running your dvc pipeline generating another push event in the CI, hence the use of [ci skip] which is a well known keyword to avoid your CI from running again. Dvc actually won't need it since the second time will not run the pipeline as everything will be up to date, however it still runs the previous steps.

This is how the workflow is:

  • You change your experiment and push your changes.
  • dvc determines if the pipeline has changed executing repro if that happens and leaving if nothing changes.
  • dvc-cml pushes back your new dvc pipeline changes after executing the repro including the commit message [ci-skip] to avoid another run if possible.
  • generates a report as a check or even a release if you allow tag_prefix parameter.

@shcheklein
Copy link
Member

Can it detect and skip it by the author name + commit message?

@DavidGOrtega
Copy link
Contributor

@shcheklein not sure if I totally get it, but that might be not CI/CD standard. Could you please put an example?

Actually Github is not fully supporting [ci skip]
https://github.saobby.my.eu.orgmunity/t5/GitHub-Actions/GitHub-Actions-does-not-respect-skip-ci/td-p/42834
but Gitlab does it as expected and it's very convenient to not start the ci again.

@shcheklein
Copy link
Member

The way I understand the problem (please @andronovhopf correct me if I'm wrong) is that the message does not look nice on those automated commits, it's confusing since not that many people are aware about this convention (I believe it's close to 0), it looks like some noise in the history (similar to the __test__ issue), etc.

The point is that if we can in our action somehow distinguish those w/o relaying on the message we can avoid this special mark and just use regular message.

@DavidGOrtega
Copy link
Contributor

I believe it's close to 0

[ci skip] is very well known in ci/cd 🙂 Its the regular way to do a commit and not run the pipeline.

If not used ci like gitlab will run twice until dvc repro does not have anything to do.

P.D. Interesting paper on ML applied to ci skip that I found in the search

@shcheklein
Copy link
Member

is very well known in ci/cd

I still think you are way over-estimating how well it is known. In 99% of cases you just don't need it. I don't remember when I saw it last time in one of the projects I've dealt with.

If not used ci like gitlab will run twice until dvc repro does not have anything to do.

that's the whole point - we can detect our own commits by other means. And [ci skip] - it'll be up to user to decide for his/her commits to use it or not.

@DavidGOrtega
Copy link
Contributor

DavidGOrtega commented Mar 25, 2020

Indeed we can, but that won't stop the whole pipeline while gitlab or any other vendor wont run at all if the tag if set. Just for your consideration dvc-cml could be just only a step inside a big workflow that could be triggered because of the push.

@shcheklein
Copy link
Member

That's true. And usually I would prefer to CI checks for any commit in my master, for example - automated or not. But you are right, I can imagine cases when it's not necessary. Probably there should an ability to specify the prefix/suffix (and we have it now, right)? But in case I specify an empty one our system should rely on other signal rather than the message alone to avoid the loop.

(not directly related, but might be relevant) Here we come to the bigger point as well - the whole idea of committing something into my branch is far from ideal - like we discussed - a lot of users might be potentially confused by some automated commits (e.g. I try to push but instead have to pull, merge, rebase or whatever - this is advanced stuff). I would brainstorm other possible workflows - similar to restyled? dependa-bot and other tools?

@DavidGOrtega
Copy link
Contributor

DavidGOrtega commented Mar 26, 2020

Probably there should an ability to specify the prefix/suffix (and we have it now, right)?

Maybe the solution would be reuse the last commit message adding skip ci? That way skip ci would be residual and more "digestible"?

I would brainstorm other possible workflows - similar to restyled? dependa-bot and other tools?

I have been experimenting with that, the idea would be a way to automatically amend the commit pulling and rebasing... like if you would have been training on your side before committing.

@shcheklein
Copy link
Member

Maybe the solution would be reuse the last commit message adding skip ci? That way skip ci would be residual and more "digestible"?

not sure I understand this. elaborate please?

I have been experimenting with that, the idea would be a way to automatically amend the commit pulling and rebasing... like if you would have been training on your side before committing.

rebase/amend - all destructive operations that break workflow. Take a look at how restyled operates, it can be just one of the possible ideas.

@DavidGOrtega
Copy link
Contributor

DavidGOrtega commented Mar 26, 2020

Maybe the solution would be reuse the last commit message adding skip ci? That way skip ci would be residual and more "digestible"?
not sure I understand this. elaborate please?

Right now it comments dvc repro [ci skip] maybe a better solution would be your_last_comment \n[ci skip]

That way ci skip should not appear in the UI and still shutdowns the CI run.

Take a look at how restyled operates

I did, but all those tools works in pull requests and not push? Also, the result of dvc repro (changed dvc files) has to be stored somewhere, right? Anyway Im trying them

@shcheklein
Copy link
Member

Right now it comments dvc repro [ci skip] maybe a better solution would be your_last_comment \n[ci skip]

it can create even more confusion? not sure to be honest

has to be stored somewhere, right? Anyway Im trying them

yep, they create a separate PR with their logo, nice message, explanation etc to your branch and it's up to you to merge it or not

@dmpetrov
Copy link
Member

dmpetrov commented Mar 31, 2020

has to be stored somewhere, right? Anyway Im trying them

yep, they create a separate PR with their logo, nice message, explanation etc to your branch and it's up to you to merge it or not

restyled-like tool can definitely improve the workflow. However, there are still a couple of issues:

  1. We need temporary storage (as David said) to keep results when the action is already completed but a user has not pushed the approval/merge button.
  2. It actually does not solve the issue it just mitigates it - you still have to pull (after approving/merging) and the user will see the additional commit. Yes, only once instead of dozens of times but the user still has to be familiar with the complicated workflow.
  3. It will be more complicated to implement a full table of experiments (Metrics of experiments with different tech implementation #34) since Git won't have all the required information. The temporary storage (1) has to be used to build the experiments table (for a given branch for example). The temporary storage adds complexity to the system.

A more holistic approach might be implemented through dvc build-cache/run-cache (iterative/dvc#1234) and pushing all the results to DVC cache without committing it to Git. So, dvc-cache will play the role of temporary storage from the above. Yes, it adds complexity as well, but it solves the problem in a better way without any workflow complications and it also solves an additional optimization problem (the initial motivation of the build cache).

Prioritization

It seems like, run-cache approach solves the problem in the best possible way. The only concern - it requires a significant change in DVC that will take time. Also, it has a single dvc-file (iterative/dvc#1871) as a prerequisite that is under development right now. It raises the question - should we wait for this feature before CI/CD release? I'll need to discuss it with core-dvc team.

@DavidGOrtega
Copy link
Contributor

Closed since right now the workflow does not push back to remote anymore.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants