Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmarking toolkit wrap up #2462

Merged
merged 40 commits into from
Sep 28, 2022
Merged

Benchmarking toolkit wrap up #2462

merged 40 commits into from
Sep 28, 2022

Conversation

abidwael
Copy link
Contributor

@abidwael abidwael commented Sep 7, 2022

This PR includes

  • Enabling profiling with LudwigProfiler.
  • README for the benchmarking toolkit.

@github-actions
Copy link

github-actions bot commented Sep 7, 2022

Unit Test Results

         6 files  ±0         6 suites  ±0   3h 4m 57s ⏱️ - 4m 20s
  3 409 tests ±0  3 331 ✔️ ±0    78 💤 ±0  0 ±0 
10 227 runs  ±0  9 970 ✔️ ±0  257 💤 ±0  0 ±0 

Results for commit 165c4da. ± Comparison against base commit c1a16dd.

♻️ This comment has been updated with latest results.

@abidwael abidwael marked this pull request as ready for review September 19, 2022 17:57
Copy link
Contributor

@justinxzhao justinxzhao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like you're running into an obsolete broken test -- could you rebase?

Copy link
Contributor

@justinxzhao justinxzhao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks mostly good to me. Mainly just a few nits.

ludwig/benchmarking/utils.py Outdated Show resolved Hide resolved
requirements.txt Outdated Show resolved Hide resolved
ludwig/benchmarking/summarize.py Show resolved Hide resolved
ludwig/benchmarking/summarize.py Outdated Show resolved Hide resolved
ludwig/benchmarking/examples/process_config.py Outdated Show resolved Hide resolved
ludwig/benchmarking/examples/process_config.py Outdated Show resolved Hide resolved
@abidwael abidwael force-pushed the benchmarking-toolkit-wrap-up branch from 2a8f7d1 to 4c64fd6 Compare September 27, 2022 23:06
@abidwael
Copy link
Contributor Author

Using the LudwigProfiler as part of the callbacks is limited to local backend for now due to the following error.

Traceback (most recent call last):
  File "python/ray/_raylet.pyx", line 419, in ray._raylet.prepare_args_internal
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/_private/serialization.py", line 433, in serialize
    return self._serialize_to_msgpack(value)
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/_private/serialization.py", line 411, in _serialize_to_msgpack
    pickle5_serialized_object = self._serialize_to_pickle5(
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/_private/serialization.py", line 373, in _serialize_to_pickle5
    raise e
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/_private/serialization.py", line 368, in _serialize_to_pickle5
    inband = pickle.dumps(
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/cloudpickle/cloudpickle_fast.py", line 73, in dumps
    cp.dump(obj)
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/cloudpickle/cloudpickle_fast.py", line 620, in dump
    return Pickler.dump(self, obj)
TypeError: cannot pickle '_thread.lock' object

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ludwig/benchmarking/benchmark.py", line 122, in benchmark
    benchmark_one(experiment)
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ludwig/benchmarking/benchmark.py", line 94, in benchmark_one
    _, _, _, output_directory = model.experiment(
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ludwig/api.py", line 1093, in experiment
    (train_stats, preprocessed_data, output_directory) = self.train(
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ludwig/api.py", line 557, in train
    train_stats = trainer.train(
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ludwig/backend/ray.py", line 386, in train
    results, self._validation_field, self._validation_metric = runner.run(
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/train/trainer.py", line 350, in run
    iterator = TrainingIterator(
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/train/trainer.py", line 683, in __init__
    self._start_training(
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/train/trainer.py", line 712, in _start_training
    self._run_with_error_handling(
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/train/trainer.py", line 722, in _run_with_error_handling
    return func()
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/train/trainer.py", line 713, in <lambda>
    lambda: self._backend_executor.start_training(
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/train/_internal/utils.py", line 185, in <lambda>
    return lambda *args, **kwargs: ray.get(actor_method.remote(*args, **kwargs))
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/actor.py", line 138, in remote
    return self._remote(args, kwargs)
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/util/tracing/tracing_helper.py", line 425, in _start_span
    return method(self, args, kwargs, *_args, **_kwargs)
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/actor.py", line 184, in _remote
    return invocation(args, kwargs)
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/actor.py", line 171, in invocation
    return actor._actor_method_call(
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/actor.py", line 1164, in _actor_method_call
    object_refs = worker.core_worker.submit_actor_task(
  File "python/ray/_raylet.pyx", line 1730, in ray._raylet.CoreWorker.submit_actor_task
  File "python/ray/_raylet.pyx", line 1735, in ray._raylet.CoreWorker.submit_actor_task
  File "python/ray/_raylet.pyx", line 385, in ray._raylet.prepare_args_and_increment_put_refs
  File "python/ray/_raylet.pyx", line 376, in ray._raylet.prepare_args_and_increment_put_refs
  File "python/ray/_raylet.pyx", line 427, in ray._raylet.prepare_args_internal
TypeError: Could not serialize the argument <function construct_train_func.<locals>.<lambda> at 0x7f45c3d8cd30> for a task or actor ray.train._internal.backend_executor.BackendExecutor.start_training. Check https://docs.ray.io/en/master/ray-core/objects/serialization.html#troubleshooting for more information.

@jppgks
Copy link
Contributor

jppgks commented Sep 28, 2022

EDIT: Looks like the LudwigProfiler is not serializable because of the thread that is started when entering its context.

Debugging serializability:

>>> from ray.util import inspect_serializability
>>> from ludwig.benchmarking.profiler_callbacks import LudwigProfilerCallback
>>> cb = LudwigProfilerCallback({'experiment_name': '/tmp', 'profiler': {'use_torch_profiler': False, 'logging_interval': 0}})

>>> inspect_serializability(cb, name="LudwigProfilerCallback")
================================================================================
Checking Serializability of <ludwig.benchmarking.profiler_callbacks.LudwigProfilerCallback object at 0xffff9d867b20>
================================================================================
(True, set())

>>> cb.on_preprocess_start()

>>> inspect_serializability(cb, name="LudwigProfilerCallback")
================================================================================
Checking Serializability of <ludwig.benchmarking.profiler_callbacks.LudwigProfilerCallback object at 0xffff9d6c4d90>
================================================================================
!!! FAIL serialization: cannot pickle '_thread.lock' object
    Serializing 'preload' <function Callback.preload at 0xffff7cdbf670>...
    Serializing '_abc_impl' <_abc_data object at 0xffff9d5b0300>...
    !!! FAIL serialization: cannot pickle '_abc_data' object
    WARNING: Did not find non-serializable object in <_abc_data object at 0xffff9d5b0300>. This may be an oversight.
================================================================================
Variable: 

        FailTuple(_abc_impl [obj=<_abc_data object at 0xffff9d5b0300>, parent=<ludwig.benchmarking.profiler_callbacks.LudwigProfilerCallback object at 0xffff9d6c4d90>])

was found to be non-serializable. There may be multiple other undetected variables that were non-serializable. 
Consider either removing the instantiation/imports of these variables or moving the instantiation into the scope of the function/class. 
If you have any suggestions on how to improve this error message, please reach out to the Ray developers on github.com/ray-project/ray/issues/
================================================================================
(False, {FailTuple(_abc_impl [obj=<_abc_data object at 0xffff9d5b0300>, parent=<ludwig.benchmarking.profiler_callbacks.LudwigProfilerCallback object at 0xffff9d6c4d90>])})

@abidwael
Copy link
Contributor Author

@jppgks Thanks for investigating! That was the issue I encountered with a ray backend. I looked at other options of how we can still achieve this (using mutliprocessing instead of threading, etc.) but didn't get to a working solution. For now, let's just default to local backend but will be investigating this after this merge.

@abidwael abidwael merged commit 43dc37a into master Sep 28, 2022
@abidwael abidwael deleted the benchmarking-toolkit-wrap-up branch September 28, 2022 21:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants