Benchmarking toolkit wrap up #2462

abidwael · 2022-09-07T08:59:04Z

This PR includes

Enabling profiling with LudwigProfiler.
README for the benchmarking toolkit.

for more information, see https://pre-commit.ci

github-actions · 2022-09-07T10:02:45Z

Unit Test Results

        6 files ±0       6 suites ±0 3h 4m 57s ⏱️ - 4m 20s
  3 409 tests ±0 3 331 ✔️ ±0   78 💤 ±0 0 ❌ ±0
10 227 runs ±0 9 970 ✔️ ±0 257 💤 ±0 0 ❌ ±0

Results for commit 165c4da. ± Comparison against base commit c1a16dd.

♻️ This comment has been updated with latest results.

for more information, see https://pre-commit.ci

ludwig/benchmarking/profiler_callbacks.py

justinxzhao

It looks like you're running into an obsolete broken test -- could you rebase?

justinxzhao

Looks mostly good to me. Mainly just a few nits.

ludwig/benchmarking/utils.py

requirements.txt

ludwig/benchmarking/summarize.py

ludwig/benchmarking/examples/process_config.py

abidwael · 2022-09-28T03:26:01Z

Using the LudwigProfiler as part of the callbacks is limited to local backend for now due to the following error.

Traceback (most recent call last):
  File "python/ray/_raylet.pyx", line 419, in ray._raylet.prepare_args_internal
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/_private/serialization.py", line 433, in serialize
    return self._serialize_to_msgpack(value)
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/_private/serialization.py", line 411, in _serialize_to_msgpack
    pickle5_serialized_object = self._serialize_to_pickle5(
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/_private/serialization.py", line 373, in _serialize_to_pickle5
    raise e
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/_private/serialization.py", line 368, in _serialize_to_pickle5
    inband = pickle.dumps(
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/cloudpickle/cloudpickle_fast.py", line 73, in dumps
    cp.dump(obj)
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/cloudpickle/cloudpickle_fast.py", line 620, in dump
    return Pickler.dump(self, obj)
TypeError: cannot pickle '_thread.lock' object

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ludwig/benchmarking/benchmark.py", line 122, in benchmark
    benchmark_one(experiment)
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ludwig/benchmarking/benchmark.py", line 94, in benchmark_one
    _, _, _, output_directory = model.experiment(
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ludwig/api.py", line 1093, in experiment
    (train_stats, preprocessed_data, output_directory) = self.train(
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ludwig/api.py", line 557, in train
    train_stats = trainer.train(
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ludwig/backend/ray.py", line 386, in train
    results, self._validation_field, self._validation_metric = runner.run(
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/train/trainer.py", line 350, in run
    iterator = TrainingIterator(
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/train/trainer.py", line 683, in __init__
    self._start_training(
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/train/trainer.py", line 712, in _start_training
    self._run_with_error_handling(
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/train/trainer.py", line 722, in _run_with_error_handling
    return func()
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/train/trainer.py", line 713, in <lambda>
    lambda: self._backend_executor.start_training(
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/train/_internal/utils.py", line 185, in <lambda>
    return lambda *args, **kwargs: ray.get(actor_method.remote(*args, **kwargs))
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/actor.py", line 138, in remote
    return self._remote(args, kwargs)
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/util/tracing/tracing_helper.py", line 425, in _start_span
    return method(self, args, kwargs, *_args, **_kwargs)
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/actor.py", line 184, in _remote
    return invocation(args, kwargs)
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/actor.py", line 171, in invocation
    return actor._actor_method_call(
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/actor.py", line 1164, in _actor_method_call
    object_refs = worker.core_worker.submit_actor_task(
  File "python/ray/_raylet.pyx", line 1730, in ray._raylet.CoreWorker.submit_actor_task
  File "python/ray/_raylet.pyx", line 1735, in ray._raylet.CoreWorker.submit_actor_task
  File "python/ray/_raylet.pyx", line 385, in ray._raylet.prepare_args_and_increment_put_refs
  File "python/ray/_raylet.pyx", line 376, in ray._raylet.prepare_args_and_increment_put_refs
  File "python/ray/_raylet.pyx", line 427, in ray._raylet.prepare_args_internal
TypeError: Could not serialize the argument <function construct_train_func.<locals>.<lambda> at 0x7f45c3d8cd30> for a task or actor ray.train._internal.backend_executor.BackendExecutor.start_training. Check https://docs.ray.io/en/master/ray-core/objects/serialization.html#troubleshooting for more information.

jppgks · 2022-09-28T09:47:26Z

EDIT: Looks like the LudwigProfiler is not serializable because of the thread that is started when entering its context.

Debugging serializability:

>>> from ray.util import inspect_serializability
>>> from ludwig.benchmarking.profiler_callbacks import LudwigProfilerCallback
>>> cb = LudwigProfilerCallback({'experiment_name': '/tmp', 'profiler': {'use_torch_profiler': False, 'logging_interval': 0}})

>>> inspect_serializability(cb, name="LudwigProfilerCallback")
================================================================================
Checking Serializability of <ludwig.benchmarking.profiler_callbacks.LudwigProfilerCallback object at 0xffff9d867b20>
================================================================================
(True, set())

>>> cb.on_preprocess_start()

>>> inspect_serializability(cb, name="LudwigProfilerCallback")
================================================================================
Checking Serializability of <ludwig.benchmarking.profiler_callbacks.LudwigProfilerCallback object at 0xffff9d6c4d90>
================================================================================
!!! FAIL serialization: cannot pickle '_thread.lock' object
    Serializing 'preload' <function Callback.preload at 0xffff7cdbf670>...
    Serializing '_abc_impl' <_abc_data object at 0xffff9d5b0300>...
    !!! FAIL serialization: cannot pickle '_abc_data' object
    WARNING: Did not find non-serializable object in <_abc_data object at 0xffff9d5b0300>. This may be an oversight.
================================================================================
Variable: 

        FailTuple(_abc_impl [obj=<_abc_data object at 0xffff9d5b0300>, parent=<ludwig.benchmarking.profiler_callbacks.LudwigProfilerCallback object at 0xffff9d6c4d90>])

was found to be non-serializable. There may be multiple other undetected variables that were non-serializable. 
Consider either removing the instantiation/imports of these variables or moving the instantiation into the scope of the function/class. 
If you have any suggestions on how to improve this error message, please reach out to the Ray developers on github.com/ray-project/ray/issues/
================================================================================
(False, {FailTuple(_abc_impl [obj=<_abc_data object at 0xffff9d5b0300>, parent=<ludwig.benchmarking.profiler_callbacks.LudwigProfilerCallback object at 0xffff9d6c4d90>])})

abidwael · 2022-09-28T21:03:09Z

@jppgks Thanks for investigating! That was the issue I encountered with a ray backend. I looked at other options of how we can still achieve this (using mutliprocessing instead of threading, etc.) but didn't get to a working solution. For now, let's just default to local backend but will be investigating this after this merge.

Wael Abid and others added 6 commits August 25, 2022 16:30

first fixes

f6fa87c

changing to ray backend

59135ac

Merge branch 'master' into benchmarking-toolkit-wrap-up

576a88b

support hyperopt

7fa0ec6

Make sure the stratify_colname doesnt have any NaNs

3707609

[pre-commit.ci] auto fixes from pre-commit.com hooks

c607e57

for more information, see https://pre-commit.ci

Wael Abid and others added 19 commits September 7, 2022 11:24

Merge branch 'master' into benchmarking-toolkit-wrap-up

095cf6e

resolve merge conflicts

4913469

[pre-commit.ci] auto fixes from pre-commit.com hooks

16967f0

for more information, see https://pre-commit.ci

debugging hyperopt

ebfe13e

Merge branch 'master' into benchmarking-toolkit-wrap-up

a28e0df

trying gbm fix

5fe6880

saving updated config after process

4e49e40

adding utils for saving updated config and cleaning up after hyperopt

c0f8856

pass in LudwigProfiler in callbacks

c8b6b7c

run pre commit formatting

2ffd276

make preprocess config optional

35a5869

add example benchmarking files

ac3e848

fixed summary printing

e64d169

export summaries to stdout and csv

200692e

add README

9b5e9b9

formatting

d379891

remove config.yaml

633683c

fix merge conflict

68786b2

use new datasets api to load module

0915c11

abidwael requested review from justinxzhao and jppgks September 19, 2022 17:56

abidwael marked this pull request as ready for review September 19, 2022 17:57

jppgks reviewed Sep 19, 2022

View reviewed changes

ludwig/benchmarking/profiler_callbacks.py Show resolved Hide resolved

jppgks reviewed Sep 19, 2022

View reviewed changes

ludwig/benchmarking/profiler_callbacks.py Show resolved Hide resolved

justinxzhao reviewed Sep 19, 2022

View reviewed changes

Wael Abid added 3 commits September 19, 2022 23:52

logging info

38197cd

Merge branch 'master' into benchmarking-toolkit-wrap-up

ce9201e

preventing collisions in naming

aa31dc6

justinxzhao reviewed Sep 27, 2022

View reviewed changes

Wael Abid added 5 commits September 27, 2022 15:03

moved s3fs to requirements_benchmrking.txt

4d286c3

add docstring to export_and_print function

ba54f04

using instantiated logger

f0deb6c

fix styling and add docstring

b309348

making config_path optional

4c64fd6

abidwael force-pushed the benchmarking-toolkit-wrap-up branch from 2a8f7d1 to 4c64fd6 Compare September 27, 2022 23:06

Wael Abid and others added 7 commits September 27, 2022 16:06

formatting

12ab1bb

Merge branch 'master' into benchmarking-toolkit-wrap-up

ac21535

updating logger param

b9b272e

override experiment with the same name

9d9e8fa

LudwigProfiler currently only supports local backend

5616586

update benchmarking config example

e73da4d

formatting

165c4da

jppgks approved these changes Sep 28, 2022

View reviewed changes

abidwael merged commit 43dc37a into master Sep 28, 2022

abidwael deleted the benchmarking-toolkit-wrap-up branch September 28, 2022 21:03

abidwael mentioned this pull request Oct 10, 2022

Model performace in GitHub actions #2568

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmarking toolkit wrap up #2462

Benchmarking toolkit wrap up #2462

abidwael commented Sep 7, 2022

github-actions bot commented Sep 7, 2022 •

edited

Loading

justinxzhao left a comment

justinxzhao left a comment

abidwael commented Sep 28, 2022

jppgks commented Sep 28, 2022 •

edited

Loading

abidwael commented Sep 28, 2022

Benchmarking toolkit wrap up #2462

Benchmarking toolkit wrap up #2462

Conversation

abidwael commented Sep 7, 2022

github-actions bot commented Sep 7, 2022 • edited Loading

Unit Test Results

justinxzhao left a comment

Choose a reason for hiding this comment

justinxzhao left a comment

Choose a reason for hiding this comment

abidwael commented Sep 28, 2022

jppgks commented Sep 28, 2022 • edited Loading

abidwael commented Sep 28, 2022

github-actions bot commented Sep 7, 2022 •

edited

Loading

jppgks commented Sep 28, 2022 •

edited

Loading