Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

job-archive: gap analysis for fields needed to create job usage reports #3136

Closed
cmoussa1 opened this issue Aug 13, 2020 · 20 comments
Closed

Comments

@cmoussa1
Copy link
Member

I am in the process of developing a front-end for the job-archive (as required by stakeholders like Ryan Day to create job usage reports). This involves building a JobRecord class which holds job data:

class JobRecord(object):
        '''
        A record of an individual job.
        '''

        def __init__(self, jobid, user, group, project, nnodes, hostlist, sub, start, end) :
                self.jobid = jobid
                self.user = user
                self.group = group
                self.project = project
                self.hostlist = hostlist
                self.nnodes = nnodes
                self.sub = sub
                self.start = start
                self.end = end
                return None

        @property
        def elapsed(self) :
                return time.mktime(self.end) - time.mktime(self.start)
        
        @property
        def queued(self) :
                return time.mktime(self.start) - time.mktime(self.sub)

I have been able to grab most of these fields from the job-archive, but a couple either are not present, or I have a question on how to fetch that data -

user - Currently I can grab the user ID from the job-archive, but not a username. Is it planned to include a username field along with a userid? Is there a way to do a lookup to find a username given a user ID?

group (project probably fits in here too) - these are actually the account (and in the future, wckey) fields from flux-accounting. From speaking with Ryan, I believe this is specified when users are submitting jobs; if not specified, it defaults to some bank. Realistically, probably not a high priority item at this moment, but something we will need in the near future.

nnodes - Is there a way to get nnodes from the R or jobspec columns from job-archive? Is it the rank field in R?

bash-4.2$ flux mini submit -N 2 -n 2 hostname
ƒRneAoWNB
bash-4.2$ flux job info ƒRneAoWNB R
{"version":1,"execution":{"R_lite":[{"rank":"0-1","children":{"core":"0"}}]}}
                                        ^
                                        |
@grondo
Copy link
Contributor

grondo commented Aug 13, 2020

user - Currently I can grab the user ID from the job-archive, but not a username. Is it planned to include a username field along with a userid? Is there a way to do a lookup to find a username given a user ID?

I think an assumption of Flux is that there is a global userid space across the center. Hopefully the system on which the flux-accounting scripts are run has a fully populated passwd file, in which case you can use the same code as flux-jobs:

def get_username(userid):
    try:
        return pwd.getpwuid(userid).pw_name
    except KeyError:
        return str(userid)

group (project probably fits in here too) - these are actually the account (and in the future, wckey) fields from flux-accounting. From speaking with Ryan, I believe this is specified when users are submitting jobs; if not specified, it defaults to some bank. Realistically, probably not a high priority item at this moment, but something we will need in the near future.

We'll need to determine a spot where the group/account gets set in jobspec. Then you can pull it from there, or it may be important enough to add to job-info and perhaps broken out as a separate key in job-archive?

We may want to open an issue on this in the rfc repo.

nnodes - Is there a way to get nnodes from the R or jobspec columns from job-archive? Is it the rank field in R?

Yes, and it looks like job-archive also stores the "ranks" which job-info inferred from the R object.
However, this might get tricky for systems where nodes are not exclusively assigned to jobs. A job could run across 2 ranks with 1 core on each rank, and have utilized a smaller resource set than a job on one node/rank which uses 4 cores... Something to think about. Perhaps we want to account by total number of cores and not nodes by default? Something to discuss with Ryan.

Also, I wonder if instead a JobRecord class, if we could somehow reuse the JobInfo class from flux-jobs.py, and reduce the duplication of effort?

@cmoussa1
Copy link
Member Author

cmoussa1 commented Aug 13, 2020

nnodes - Is there a way to get nnodes from the R or jobspec columns from job-archive? Is it the rank field in R?

Yes, and it looks like job-archive also stores the "ranks" which job-info inferred from the R object.
However, this might get tricky for systems where nodes are not exclusively assigned to jobs. A job could run across 2 ranks with 1 core on each rank, and have utilized a smaller resource set than a job on one node/rank which uses 4 cores... Something to think about. Perhaps we want to account by total number of cores and not nodes by default? Something to discuss with Ryan.

Let me go ahead and cc @ryanday36 on this thread so we can get his input on this as well!

I think an assumption of Flux is that there is a global userid space across the center. Hopefully the system on which the flux-accounting scripts are run has a fully populated passwd file, in which case you can use the same code as flux-jobs.

Thank you for pointing this out. I can fetch a username by passing in the userid instead of username:

try:
    userid = pwd.getpwnam(username).pw_uid
except KeyError:
    return str(username)

@ryanday36
Copy link

Accounting by total number of cores works fine as long as users get charged for all of the cores when we give them a whole node. I.e., in slurm, a user can run something like 'srun -n1 ...' and only ask for 1 task on one core, but, on most of our clusters, we're giving them the whole node anyway, so we need to charge them for all 36 (or whatever) cores.

Something else that you probably don't need to worry about right now, but might want to think about in terms of designing in some flexibility, is accounting for GPUs or other resources. Some folks from Sandia were recently asking about whether we did anything to account for users requesting GPUs in the context of Slurm. Since we're scheduling our GPU clusters by node currently and don't have any plans to change that, it's not something that we're worried about immediately, but it's not hard to imagine some sort of Sierra-like cluster that we allocate at a sub-node level and want to charge jobs more if they request a GPU.

@grondo
Copy link
Contributor

grondo commented Aug 13, 2020

Flux records all assigned resources in the resource set R, so accounting for all resources should be possible.

@chu11
Copy link
Member

chu11 commented Aug 27, 2020

nnodes - Is there a way to get nnodes from the R or jobspec columns from job-archive?

We could also add any number of job-info list outputs to the job-archive DB. The initial set of fields stored to the job-archive db need not be the final set. I didn't know exactly which ones were important / not important.

@cmoussa1
Copy link
Member Author

Thanks @chu11! Yeah, having an nnodes field would be useful. We also talked about including a hostname field to specify which host the job was run on - I asked about this yesterday and @grondo gave a couple of suggestions on how to get this:

The rank will be included in R, and that can be resolved to a hostname. I think when the Fluxion scheduler is used, the nodename will appear in the R_lite

@dongahn
Copy link
Member

dongahn commented Sep 1, 2020

Yes, Fluxion will have hostnames in its R. The code should be flexible to deal with R without hostname (as emitted by sched-simple) and with hostname though.

@garlick
Copy link
Member

garlick commented May 11, 2023

Now that flux-core's interfaces to job-list are more evolved and documented, would it make sense for flux-accounting to take ownership for the job archive db, either by folding its tables in with the accounting db or by starting it as a separate, standalone systemd service?

It feels to me like that puts the database schema in a place where the stakeholders are going to feel more empowered to adapt it to their needs.

If, as discussed in #4336, we end up with another database implementation in flux-core for historical job information, then we can think about whether it makes sense for flux-accounting to switch over to that, but it wouldn't be required.

Edit: I guess I didn't mention that the archive db gets its job information from job-list. I was just thinking that the fact that the job-list now presents as an external interface with a relatively stable API makes it seem like the better interface between core and accounting, rather than using the archive database schema as the interface.

@chu11
Copy link
Member

chu11 commented May 11, 2023

Just a random side comment about something I did in #4336 relevant to this thread. I dump all data from job-list in a json blob (the "all" attribute sent to job-list) into the database. So there is a lot more flexibility going forward in terms of database design, we don't have to fixate on some database table forever. And if/when we can start using sqlite w/ json support (or some other DB w/ json support), querying that json data can be even easier.

@chu11
Copy link
Member

chu11 commented May 11, 2023

Now that flux-core's interfaces to job-list are more evolved and documented, would it make sense for flux-accounting to take ownership for the job archive db, either by folding its tables in with the accounting db or by starting it as a separate, standalone systemd service?

I like this idea. This would allow both services to "fork" from each other for their specific needs and there's less need to worry about breaking one another with changes.

then we can think about whether it makes sense for flux-accounting to switch over to that, but it wouldn't be required.

I would actually be inclined to say that flux-accounting should just maintain their own, to avoid dependency issues if one ever needs to change.

@cmoussa1
Copy link
Member Author

cmoussa1 commented May 11, 2023

I guess I didn't mention that the archive db gets its job information from job-list. I was just thinking that the fact that the job-list now presents as an external interface with a relatively stable API makes it seem like the better interface between core and accounting, rather than using the archive database schema as the interface.

This is an interesting thought! Does job-list have a Python API interface? Maybe I could use that instead of using the DB schema (which is what I currently use to calculate usage).

EDIT: i.e here's a query I currently use for the jobs table:

select_stmt = (
        "SELECT userid,id,t_submit,t_run,t_inactive,ranks,R,jobspec FROM jobs "
)

@garlick
Copy link
Member

garlick commented May 11, 2023

I think you'd still want the archive tables since the data available from job-list gets pruned from time to time.

@garlick
Copy link
Member

garlick commented May 11, 2023

But it could be a python script in flux-accounting that pulls data periodically from job-list and stores it.

So the above query returns all jobs since the beginning of time. Is that going to become a problem? Should there be a window outside of which jobs should be ignored? If so then if you own the archive tables, you could prune the database...

@cmoussa1
Copy link
Member Author

Oh, sorry, I should have been clearer. The query gets refined to only look for jobs after a certain timestamp because there is a window where jobs are no longer considered a part of a user/bank's usage; I believe it can be adjusted by an administrator, but the default cutoff is one month. After a job becomes gets older than the cutoff it is no longer affects a user/bank's usage and fair share, so I don't think the flux-accounting code that calculates a user/bank's total job usage looks for those old jobs any more.

@garlick
Copy link
Member

garlick commented May 11, 2023

Nice, then if you own the db, you can nix those old records (after backing them up to csv for @ryanday36 of course). Right now we do not ever prune the archive db 😱

Edit: sorry, I didn't mean to over simplify. There are reporting considerations too that would influence that decision.

@chu11
Copy link
Member

chu11 commented May 11, 2023

An aside on this discussion, if flux accounting were to eventually do its own thing for job archive should we drop the job-archive module? Some people out there might be using it ... although I find that unlikely. I'd say remove, and if anyone complains they could build it separately for themselves?

@garlick
Copy link
Member

garlick commented May 11, 2023

They could install flux accounting.

@cmoussa1
Copy link
Member Author

Sorry, haha, I think I'm a little confused on the end-goal responsibility of flux-accounting and I want to make sure I understand you correctly. Is the consideration here whether flux-accounting should become responsible of implementing its own job-archive DB instead of fetching job records from flux-core's job-archive module where the current job-archive DB is located?

@garlick
Copy link
Member

garlick commented May 11, 2023

Yep that's what I was throwing out there to see if it stuck to the wall :-)

@garlick
Copy link
Member

garlick commented Jul 8, 2024

Now that flux-accounting has its own job archive database, seems like we can close this.

@garlick garlick closed this as completed Jul 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants