-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Leaderboard Creation and Submission Flow #48
Comments
Some quick thoughts
|
Runners: I feel like runners would have a lot of impact on the performance, as well as there might be different GPUs per runner, I would be open to have leaderboard per (GPU x Runner) Leaderboard definition: I mostly agree with what you wrote, however as Mark said, I'd scratch dtype, should have it defined by the example input. Auth: Maybe I'm just naive, but I'd try going with like a Discord role that we just give to people for now when they ask for it, granted we need some Admin utils (i.e. Score: I think score should be run time for now, we can also add a column to the DB, something like "user_defined_score" and that would be extracted in a way similar to current, but would be optional. |
One thing I have wondered (but it is somewhat off-topic at this stage) is whether we should allow multiple runs per (user, problem) combination, and if so how we determine the score. I could imagine that someone puts a lot of work into making a fast kernel (and has something riding on the result, like a prize or whatever), but gets unlucky because the machine they get in Modal is running hot and is underclocked. I don't know if that exact scenario can occur, but surely some operational condition could cause the performance to fall below expected performance. And conversely, someone else could get lucky and have better than expected performance on a kernel that is not optimal. To ameliorate such situations we might consider (and perhaps you've already considered) allowing multiple runs per combination of user and problem. If we do that, we'd have to think about how to determine the score. If you buy the above reasoning, min and max of the list of scores are bad choices; but median or perhaps average could be good choices. |
In the case of
So I think when we run a competition this isn't a concern at all and we should be the ones proofreading and verifying each leaderboard. However, in the general case (e.g. we want this to permanently be on GPU mode as a channel or thing for people to do) then I think the best thing to do is to have maintainers that can accept new competitions, but Mark should probably be the final decider on this because it's his Discord.
This is a very fair point. In general, I think a user should be allowed to have unlimited submissions for a specific kernel, and the hope is that the reference code should reduce variance by capturing runtime averaged over many runs. The alternative @b9r5 which I think might be cool is to have a public leaderboard (this is just for the competition) that basically has immediate scores, but the final standings are actually determined after we re-run all of the submissions at the very end. The idea being it should in theory get rid of variance in machine stability, but idk this was just a thought. |
There are a few components of the leaderboard that we need to converge on as a team. Specifically, what exactly entails a "leaderboard problem", how this is stored in the DB, and what this looks like to the problem creator and problem submitter. This is an open doc, so feel free to modify or provide feedback on any desired changes.
We also need to figure out how runners (e.g. modal vs. GitHub actions) factor into the leaderboard. Currently, you can use either runner to submit to the leaderboard, but it's unclear whether they will lead to performance differences. I'm not sure how we should factor this in, because in theory the choice of runners should not be affecting kernel performance. I personally think we should fix a runner for the leaderboard / not allow submitting to multiple different runners for the same leaderboard.
CC: @S1ro1 @b9r5 @msaroufim
Leaderboard Creation
The general leaderboard creation scheme should include a unique leaderboard ID / name, a leaderboard deadline, and reference code that contains a 1) input data generators with fixed shapes, 2) reference code for the kernel, 3) a verifier function that checks the user submission against the reference code, and 4) a function
metric()
that gets called to verify (using 3) and evaluate the runtime of the user submitted kernel. It is currently the responsibility of the problem creator to provide this reference code. The problem writer also needs to specify a metric(s) (e.g. runtime, peak activation memory) that they care about -- this unfortunately has not been implemented, and is just abstracted as a "score" currently. [@S1ro1 @b9r5 We should sync on this.]@msaroufim Not sure what you had in mind, but we also shouldn't allow arbitrary users to just create leaderboards, at least for the main channel because of spam. There should be some kind of simple / quick verification process on our end.
Current proposed command:
/leaderboard create [leaderboard_name] [deadline] [reference code]
Alternative command 1:
/leaderboard create [leaderboard_name] [list of gpu_types] [list of dtypes] [deadline] [reference code]
Alternative command 2:
/leaderboard create [leaderboard_name] [gpu_type] [dtype] [deadline] [reference code]
Leaderboard Submission
Current proposed command:
/leaderboard submit [runner] [leaderboard_name] [gpu_type] [dtype] [script]
Note on
score
. Currently, the meaning and computation of a score is all based on the reference code that a leaderboard creator provides. We don't have a good way of scraping this information for the runners right now, so this is also an important TODO for someone to figure out. Right now, we print out ascore: {score:.f}
and look for this pattern in the logs to extract the final score. This is obviously hackable and should not be the final solution.Leaderboard Display
Currently, the leaderboard display just spits out the top "scores". We can either have the leaderboard spit out the scores (e.g. Wall-clock speed) for a specific GPU type and dtype, or for all available GPU types and dtypes of a particular leaderboard name.
Current proposed command:
/leaderboard show [leaderboard_name] [gpu_type] [dtype]
Leaderboard List
We also want to list / provide users with a list of all available leaderboards. TBD
Current proposed command:
/leaderboard list
The text was updated successfully, but these errors were encountered: