Skip to content

v2.0.0 - Major scorer rework

Latest
Compare
Choose a tag to compare
@AKuederle AKuederle released this 24 Oct 10:12

[2.0.0] - 2024-10-24

Added

  • The global cache helper now support algorithms with multiple action methods by specifying the name of the action
    method you want to cache.
    (#118)
  • Global disk cache helper should now be able to cache the action methods of algorithm classes defined in the main
    script.
    (#118)
  • There are new builtin FloatAggregator and MacroFloatAggregator that should cover many of the use cases that
    previously required custom aggregators.
    (#118)
  • Scorers now support passing a final_aggregator. This is called after all scoring and aggregation happens and allows
    to implement complicated "meta" aggregation that depends on the results of all scores of all datapoints.
    Note, that we are not sure yet, if this should be used more as an escape hedge and overusing it should be considered
    an anti-pattern, or if it is exactly the other way around.
    We need to experiment in a couple of real-life applications to figure this out.
    (#120)
  • Dataset classes now have a proper __equals__ implementation.
    (#120)

Changed

  • Relative major overhall of how aggregator in scoring functions work. Before, aggregators were classes that were
    initialized with the value of a score. Now they are instances of a class that is called with the value of a score.
    This change allows it to create "configurable" aggregators that get the configuration at initialization time.
    (#118)
    This comes with a couple of breaking changes:
    • The most "user-facing" one is that the NoAgg aggregator is now called no_agg indicating that it is an instance
      of a class and not a class itself.
    • All custom aggregators need to be rewritten, but you will likely find, that they are much simpler now.
      (see the reworked examples for custom aggregators)

Fixed

  • Fixed massive performance regression in version 0.34.1 affecting people that had tensorflow or torch installed, but
    did not use it in their code.
    The reason for that was, that we imported the two modules in the global scope, which caused importing tpcp to be very
    slow.
    This was particularly noticeable in case of multiprocessing, as the module was imported in every worker process.
    We now only import the module, within the clone function and only, if you had imported it before.
    (#118)
  • The custom hash function now has a different way of hashing functions and classes defined in local scopes.
    This should prevent strange pickling errors from just using "tpcp" normally.
    (#118)

Removed

  • score functions implemented directly as method on the pipeline class are no longer supported.
    Score functions now need to be independent functions that take a pipeline instance as their first argument.
    For this reason, it is also no longer supported to pass None as argument to scoring in any validate or optimize
    method.
    (#120)