Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

job usage: archive long term job records in the accounting db #353

Closed
Tracked by #356
garlick opened this issue May 12, 2023 · 4 comments
Closed
Tracked by #356

job usage: archive long term job records in the accounting db #353

garlick opened this issue May 12, 2023 · 4 comments
Assignees
Labels
improvement Upgrades to an already existing feature

Comments

@garlick
Copy link
Member

garlick commented May 12, 2023

As discussed in flux-framework/flux-core#3136, it may be advantageous for flux accounting to do its own job record archival. A suggested approach would be to

  1. define table(s) to be added to the accounting db schema
  2. develop a python script that uses the job-list interface to request records that have been added since the date of the newest job in the database.
  3. the script could be run periodically by flux cron like other accounting scripts

With archival moved to flux accounting, the job-archive module in flux-core could be retired and the flux accounting project can evolve the schema as needed to fit its requirements.

@cmoussa1 cmoussa1 self-assigned this May 14, 2023
@cmoussa1 cmoussa1 added the improvement Upgrades to an already existing feature label May 14, 2023
@cmoussa1 cmoussa1 changed the title archive long term job records in the accounting db job usage: archive long term job records in the accounting db May 15, 2023
@cmoussa1
Copy link
Member

cmoussa1 commented May 15, 2023

Some other benefits with having flux-accounting be responsible for its own job-archive:

  • flux-accounting no longer needs to depend on flux-core's job-archive DB when running the update-usage command when updating the job usage for all users/banks (i.e specifying a path to the job-archive DB when running the command)
  • flux-accounting can tailor its use of flux-core's job-list tool to only grab what it needs in regards to job records and store it in a table in its own DB; one less database to worry about!

@cmoussa1
Copy link
Member

Thought I would take some time to provide an update as I've experimented with this this week.

I was able to write a Python script that utilizes Flux's job-list and job-info interfaces to fetch the necessary information needed for flux-accounting's job usage calculation (thanks @chu11 and @grondo for pointing me at this comment!). The Python script essentially issues an RPC to get the required attributes from job-list and the required information from job-info and creates a job_record dictionary that holds all of the attributes. Then, after all jobs are fetched and placed into a list of job_record dictionaries, they're inserted into a jobs table that holds each job record, which can be queried by job-archive-interface.py and update-usage while only needing to make minimal changes to both (essentially just changing them to point to the flux-accounting DB instead of flux-core's job-archive DB).

I'm still pretty early on this but wanted to jot a couple of thoughts down at this point:

  • the data fetched by job-archive-interface.py could probably use some cleanup. There are some attributes it's currently fetching from the job-archive DB that it probably doesn't need anymore (e.g job-list returns nnodes now), but I think it would make a PR too heavy if I included both the addition of this Python script to fetch new jobs and changing the data fetched in job-archive-interface.py. So, maybe multiple PR's would be good here to 1) transition flux-accounting to using its own job-archive, and 2) clean up data it might not necessarily need.
  • this one might be obvious (it wasn't right away to me) that fetching new jobs periodically and adding it to the flux-accounting DB will grow the DB incredibly large. And since job-usage and fair share (by default) only consider jobs up to a certain point, perhaps a purge_old_jobs () function should be considered to be added (side note: I think flux-core is considering different methods of storing old jobs "forever" in some sort of historical storage, so it wouldn't be flux-accounting's responsibility to keep jobs forever at this moment).

@vsoch
Copy link
Member

vsoch commented May 18, 2023

Shameless plug for my PR here, that now three people have needed (and have needed to dig up old comments or otherwise ask for help) flux-framework/flux-docs#229

@cmoussa1
Copy link
Member

cmoussa1 commented May 9, 2024

This should be fixed by #357, so I'll go ahead and close this. I can open more specific issues if any problems arise.

@cmoussa1 cmoussa1 closed this as completed May 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Upgrades to an already existing feature
Projects
None yet
Development

No branches or pull requests

3 participants