Remarks + possible improvements/extentions from v1 PRs #19

DBCerigo · 2021-09-09T08:25:54Z

Compiled feedback, remarks, and the suggested possible improvements and extensions, gained from the v1 PR reviews.

Misc

Remove `pandas` dep to make pure python package

"worth considering using the stdlib CSV writer here to avoid the v heavyweight pandas dependency"
#13 (comment)

Interface

Always pass extendable `run_context` object to `Validations`/`Models` during `run`

This will replace the specific artefact_directory functionality, and instead the artefact_directory will be made available (if the user sets it) through the context object.
Agreed to implement this once we come across the next piece of context beyond just artefact_directory. I think I would call it run_context to make it clear it is specific to the running and not specific to configuring Validations or Models.
See #17 (comment) and #17 (review)

Make store functionality be fully implementable by user

Probably by passing a callback function into run, or by returning the results instead of storing them within run
#16 (comment)

Move `validation` instantiation outside of the loop so each is only instantiated once per run

"Is there an argument for putting the validation_spec.make() outside of the loop over model specs? I would have thought that a single validation instance could be shared by all models. Also building the validation environment could be an expensive step so we'd want to avoid repeating unnecessarily"
#13 (comment)

Stronger typing and validation

#13 (comment)

Docs

In example usage emove validations as factories in favour of just using bare functions

#18 (comment)

Example project structure

I think this is a worthy addition. We could have a few different "example projects" within the docs. Let's do this once we've got some real world example usage.
#18 (review)

More detailed docs on model/validation registration

#18 (comment)

The text was updated successfully, but these errors were encountered:

alex-hh · 2022-07-07T15:12:14Z

Another thought - it might be useful to think about how to 'package' a benchmark built on kotsu.
i.e. if you wanted to develop some ML/data science benchmark and make the code public, so that others
can run / extend it, and want to use kotsu to build it, is there some standardised project structure that
would work for this use case?

DBCerigo · 2022-07-10T19:46:09Z

@alex-hh yea that would be super useful. Will mull it over. (Was thinking about it already, thought about packaging up a benchmark as a pypi package, but without having all the deps pinned exactly, some (more) anxiety creeps in that the benchmarks wouldn't be reproducible. Will mull some more. Super open to ideas on it so do share if stuff appears to ya)

This was referenced Sep 9, 2021

Implement run and store #13

Merged

Implement functionality for passing an artefacts_directory to Validations #17

Merged

Write README #18

Merged

Should we use tentaclio to enable remote URIs #16

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remarks + possible improvements/extentions from v1 PRs #19

Remarks + possible improvements/extentions from v1 PRs #19

DBCerigo commented Sep 9, 2021 •

edited

Loading

alex-hh commented Jul 7, 2022

DBCerigo commented Jul 10, 2022

Remarks + possible improvements/extentions from v1 PRs #19

Remarks + possible improvements/extentions from v1 PRs #19

Comments

DBCerigo commented Sep 9, 2021 • edited Loading

Misc

Remove pandas dep to make pure python package

Interface

Always pass extendable run_context object to Validations/Models during run

Make store functionality be fully implementable by user

Move validation instantiation outside of the loop so each is only instantiated once per run

Stronger typing and validation

Docs

In example usage emove validations as factories in favour of just using bare functions

Example project structure

More detailed docs on model/validation registration

alex-hh commented Jul 7, 2022

DBCerigo commented Jul 10, 2022

DBCerigo commented Sep 9, 2021 •

edited

Loading

Remove `pandas` dep to make pure python package

Always pass extendable `run_context` object to `Validations`/`Models` during `run`

Move `validation` instantiation outside of the loop so each is only instantiated once per run