Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

periodic jobs disappear after awhile #1013

Closed
carlsverre opened this issue Mar 31, 2016 · 13 comments
Closed

periodic jobs disappear after awhile #1013

carlsverre opened this issue Mar 31, 2016 · 13 comments

Comments

@carlsverre
Copy link

Nomad version

Nomad v0.3.1

Operating system and Environment details

debian 8.3
Linux ip-10-0-3-181 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt20-1+deb8u4 (2016-02-29) x86_64 GNU/Linux

Issue

After a number of runs periodic jobs disappear from nomad status and stop running. For example this is my nomad status:

ip-10-0-3-181 /home/admin $ nomad status
ID                                Type     Priority  Status
expire-check/periodic-1459373400  batch    50        dead
expire-check/periodic-1459374000  batch    50        dead
expire-check/periodic-1459374600  batch    50        dead

Reproduction steps

Run a periodic job with the following config:

    periodic {
        prohibit_overlap = true
        cron = "*/10 *  *  *  *"
    }

Job file (if appropriate)

job "expire-check" {
    type = "batch"
    datacenters = ["us-east-1"]

    group "expire-check" {
        task "expire-check" {
            driver = "docker"
            config {
                image = "...snip..."
            }
            resources {
                cpu = 1000
                memory = 128
                network { mbits = 100 }
            }
            env {
                ...snip...
            }
        }
    }

    periodic {
        prohibit_overlap = true

        // https://github.com/gorhill/cronexpr#implementation
        // Field name     Mandatory?   Allowed values    Allowed special characters
        // ----------     ----------   --------------    --------------------------
        // Seconds        No           0-59              * / , -
        // Minutes        Yes          0-59              * / , -
        // Hours          Yes          0-23              * / , -
        // Day of month   Yes          1-31              * / , - L W
        // Month          Yes          1-12 or JAN-DEC   * / , -
        // Day of week    Yes          0-6 or SUN-SAT    * / , - L #
        // Year           No           1970–2099         * / , -

        //      m    h  D  M  W
        cron = "*/10 *  *  *  *"
    }
}
@diptanu
Copy link
Contributor

diptanu commented Mar 31, 2016

@carlsverre Can you please explain what you mean by jobs disappearing? Once a job is run, Nomad will garbage collect it after some time.

Also, what does nomad status expire-check say?

@carlsverre
Copy link
Author

I thought the idea of a periodic job is to continue to run periodically? Maybe I misunderstand the feature? When I register the job in nomad it runs every 10 minutes for some amount of time (seemingly random) and then the main job I guess is garbage collected and the periodic job never is scheduled to run again. Here is my status output after the job has "disappeared":

ip-10-0-3-181 /home/admin $ nomad status
ID                                Type     Priority  Status
expire-check/periodic-1459389000  batch    50        dead
expire-check/periodic-1459389600  batch    50        dead
expire-check/periodic-1459390200  batch    50        dead
expire-check/periodic-1459390800  batch    50        dead
expire-check/periodic-1459391400  batch    50        dead
expire-check/periodic-1459392000  batch    50        dead
expire-check/periodic-1459392600  batch    50        dead
expire-check/periodic-1459393200  batch    50        dead
expire-check/periodic-1459393800  batch    50        dead
expire-check/periodic-1459394400  batch    50        dead
expire-check/periodic-1459395000  batch    50        dead
expire-check/periodic-1459395600  batch    50        dead
expire-check/periodic-1459396200  batch    50        dead
expire-check/periodic-1459396800  batch    50        dead
expire-check/periodic-1459397400  batch    50        dead
expire-check/periodic-1459398000  batch    50        dead
expire-check/periodic-1459398600  batch    50        dead
expire-check/periodic-1459399200  batch    50        dead
expire-check/periodic-1459399800  batch    50        dead
expire-check/periodic-1459400400  batch    50        dead
expire-check/periodic-1459401000  batch    50        dead
expire-check/periodic-1459401600  batch    50        dead
expire-check/periodic-1459402200  batch    50        dead
registry                          service  50        running
web                               service  50        running
worker                            service  50        running

As you can see, the original "expire-check" job has disappeared from nomad as if it was garbage collected. Maybe the garbage collector doesn't properly handle periodic batch jobs?

@carlsverre
Copy link
Author

If it isn't clear - by goal is to run my expire-check job every 10 minutes forever.

@diptanu
Copy link
Contributor

diptanu commented Mar 31, 2016

@carlsverre Oh, I see what you are saying are now. It's probably a bug. We will fix this for 0.3.2, thanks for reporting. We hope to get the release out very soon.

@preichenberger
Copy link

+1 I'm seeing this too

@dadgar
Copy link
Contributor

dadgar commented Apr 1, 2016

This may already be fixed in master. It would be great if one of you could build and verify. Thanks

@preichenberger
Copy link

I'll try it out the latest master

@preichenberger
Copy link

Seems it's fixed using version: Nomad v0.3.2-dev ('52edfe3a4e0ae76447c6c69198d7fa955beb748c+CHANGES')

@carlsverre
Copy link
Author

Thanks for testing that @preichenberger! I am hoping to wait for the 0.3.2 release if possible but will upgrade if thats going to be pretty far out. @diptanu any idea when that is landing?

@dadgar
Copy link
Contributor

dadgar commented Apr 7, 2016

@carlsverre Should be in less than a week.

@dadgar dadgar closed this as completed Apr 7, 2016
@carlsverre
Copy link
Author

thanks dadgar!

@albertogg
Copy link

I'm seeing this behavior too, glad it's fixed!

@github-actions
Copy link

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Dec 24, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants