Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Data] Enabling verbose_stats_logs shadows user function error #46736

Closed
Oblynx opened this issue Jul 22, 2024 · 2 comments
Closed

[Data] Enabling verbose_stats_logs shadows user function error #46736

Oblynx opened this issue Jul 22, 2024 · 2 comments
Labels
bug Something that is supposed to be working; but isn't data Ray Data-related issues triage Needs triage (eg: priority, bug/not-bug, and owning component)

Comments

@Oblynx
Copy link

Oblynx commented Jul 22, 2024

What happened + What you expected to happen

I enable verbose stats logs,
then I create a Ray Dataset with the following execution plan:

Execution plan of Dataset: InputDataBuffer[Input] -> TaskPoolMapOperator[Map(read_dataset)->Map(map_fn)]

Then, I produce data by iterating over the rows:

for row in dataset.iter_rows():

I get this error:

    for row in dataset.iter_rows():
  File "/usr/local/lib/python3.10/dist-packages/ray/data/iterator.py", line 244, in _wrapped_iterator
    for batch in batch_iterable:
  File "/usr/local/lib/python3.10/dist-packages/ray/data/iterator.py", line 161, in _create_iterator
    block_iterator, stats, blocks_owned_by_consumer = self._to_block_iterator()
  File "/usr/local/lib/python3.10/dist-packages/ray/data/_internal/iterator/iterator_impl.py", line 33, in _to_block_iterator
    block_iterator, stats, executor = ds._plan.execute_to_iterator()
  File "/usr/local/lib/python3.10/dist-packages/ray/data/exceptions.py", line 57, in handle_trace
    raise e
  File "/usr/local/lib/python3.10/dist-packages/ray/data/exceptions.py", line 49, in handle_trace
    return fn(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/ray/data/_internal/plan.py", line 425, in execute_to_iterator
    block_iter = itertools.chain([next(gen)], gen)
  File "/usr/local/lib/python3.10/dist-packages/ray/data/_internal/execution/legacy_compat.py", line 31, in execute_to_legacy_block_iterator
    for bundle in bundle_iter:
  File "/usr/local/lib/python3.10/dist-packages/ray/data/_internal/execution/interfaces/executor.py", line 37, in __next__
    return self.get_next()
  File "/usr/local/lib/python3.10/dist-packages/ray/data/_internal/execution/streaming_executor.py", line 152, in get_next
    self._outer.shutdown(isinstance(e, StopIteration))
  File "/usr/local/lib/python3.10/dist-packages/ray/data/_internal/execution/streaming_executor.py", line 186, in shutdown
    stats_summary_string = self._final_stats.to_summary().to_string(
  File "/usr/local/lib/python3.10/dist-packages/ray/data/_internal/stats.py", line 944, in to_string
    out += "\n" + self.runtime_metrics()
  File "/usr/local/lib/python3.10/dist-packages/ray/data/_internal/stats.py", line 979, in runtime_metrics
    out += fmt_line(summ.base_name, op_total_time)
  File "/usr/local/lib/python3.10/dist-packages/ray/data/_internal/stats.py", line 969, in fmt_line
    return f"* {name}: {fmt(time)} ({time / total_wall_time * 100:.3f}%)\n"
ZeroDivisionError: division by zero

It turns out that there was an error in the user defined function read_dataset, which is not mentioned in the logs.

What I expected

I enabled verbose stats logs because I suspected I might be doing something wrong. Instead of helping me, the verbose logs masked my error and misled me to look for problems in Ray code instead.

I only identified my error once I randomly disabled the verbose logs and saw the real error in the logs.

Versions / Dependencies

  • ray: 2.31.0
  • Python: 3.10.12
  • OS: 5.15.0-1064-azure #73-Ubuntu SMP Tue Apr 30 14:24:24 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

Reproduction script

from ray.data import DataContext, from_items

def read_dataset(id):
  raise ValueError("")

DataContext.get_current().verbose_stats_logs = True
dataset = from_items(range(10)).map(read_dataset)
for row in dataset.iter_rows():
  print(row)

This prints:

2024-07-22 11:21:51,627 INFO streaming_executor.py:109 -- Execution plan of Dataset: InputDataBuffer[Input] -> TaskPoolMapOperator[Map(read_dataset)]
Running 0:   0%|                                                              | 0/10 [00:00<?, ?it/s]2024-07-22 11:21:52,331 ERROR streaming_executor_state.py:455 -- An exception was raised from a task of operator "Map(read_dataset)". Dataset execution will now abort. To ignore this exception and continue, set DataContext.max_errored_blocks.
2024-07-22 11:21:52,332 ERROR exceptions.py:73 -- Exception occurred in Ray Data or Ray Core internal code. If you continue to see this error, please open an issue on the Ray project GitHub page with the full stack trace below: https://github.com/ray-project/ray/issues/new/choose
ray.data.exceptions.SystemException

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.10/dist-packages/ray/data/iterator.py", line 244, in _wrapped_iterator
    for batch in batch_iterable:
  File "/usr/local/lib/python3.10/dist-packages/ray/data/iterator.py", line 161, in _create_iterator
    block_iterator, stats, blocks_owned_by_consumer = self._to_block_iterator()
  File "/usr/local/lib/python3.10/dist-packages/ray/data/_internal/iterator/iterator_impl.py", line 33, in _to_block_iterator
    block_iterator, stats, executor = ds._plan.execute_to_iterator()
  File "/usr/local/lib/python3.10/dist-packages/ray/data/exceptions.py", line 86, in handle_trace
    raise e.with_traceback(None) from SystemException()
ZeroDivisionError: division by zero

Instead, repeating the example but setting: DataContext.get_current().verbose_stats_logs = False prints:

2024-07-22 11:22:22,950 INFO streaming_executor.py:109 -- Execution plan of Dataset: InputDataBuffer[Input] -> TaskPoolMapOperator[Map(read_dataset)]
Running 0:   0%|                                                              | 0/10 [00:00<?, ?it/s]2024-07-22 11:22:22,959 ERROR streaming_executor_state.py:455 -- An exception was raised from a task of operator "Map(read_dataset)". Dataset execution will now abort. To ignore this exception and continue, set DataContext.max_errored_blocks.
                                                                                                    2024-07-22 11:22:22,964  ERROR exceptions.py:63 -- Exception occurred in user code, with the abbreviated stack trace below. By default, the Ray Data internal stack trace is omitted from stdout, and only written to the Ray Data log files at /tmp/ray/session_2024-07-22_11-21-46_834791_3983/logs/ray-data. To output the full stack trace to stdout, set `DataContext.log_internal_stack_trace_to_stdout` to True.
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.10/dist-packages/ray/data/iterator.py", line 244, in _wrapped_iterator
    for batch in batch_iterable:
  File "/usr/local/lib/python3.10/dist-packages/ray/data/iterator.py", line 161, in _create_iterator
    block_iterator, stats, blocks_owned_by_consumer = self._to_block_iterator()
  File "/usr/local/lib/python3.10/dist-packages/ray/data/_internal/iterator/iterator_impl.py", line 33, in _to_block_iterator
    block_iterator, stats, executor = ds._plan.execute_to_iterator()
  File "/usr/local/lib/python3.10/dist-packages/ray/data/exceptions.py", line 84, in handle_trace
    raise e.with_traceback(None)
ray.exceptions.RayTaskError(UserCodeException): ray::Map(read_dataset)() (pid=4241, ip=10.244.37.29)
  File "<stdin>", line 2, in read_dataset
ValueError

The above exception was the direct cause of the following exception:

ray::Map(read_dataset)() (pid=4241, ip=10.244.37.29)
  File "/usr/local/lib/python3.10/dist-packages/ray/data/_internal/execution/operators/map_operator.py", line 438, in _map_task
    for b_out in map_transformer.apply_transform(iter(blocks), ctx):
  File "/usr/local/lib/python3.10/dist-packages/ray/data/_internal/execution/operators/map_transformer.py", line 392, in __call__
    for data in iter:
  File "/usr/local/lib/python3.10/dist-packages/ray/data/_internal/execution/operators/map_transformer.py", line 134, in _udf_timed_iter
    output = next(input)
  File "/usr/local/lib/python3.10/dist-packages/ray/data/_internal/execution/operators/map_transformer.py", line 216, in __call__
    yield from self._row_fn(input, ctx)
  File "/usr/local/lib/python3.10/dist-packages/ray/data/_internal/planner/plan_udf_map_op.py", line 255, in transform_fn
    out_row = fn(row)
  File "/usr/local/lib/python3.10/dist-packages/ray/data/_internal/planner/plan_udf_map_op.py", line 130, in fn
    _handle_debugger_exception(e)
  File "/usr/local/lib/python3.10/dist-packages/ray/data/_internal/planner/plan_udf_map_op.py", line 146, in _handle_debugger_exception
    raise UserCodeException() from e
ray.exceptions.UserCodeException

Issue Severity

Medium: It is a significant difficulty but I can work around it.

@Oblynx Oblynx added bug Something that is supposed to be working; but isn't triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Jul 22, 2024
@anyscalesam anyscalesam added the data Ray Data-related issues label Jul 24, 2024
@scottjlee
Copy link
Contributor

@Oblynx For this, you'll want to set the DataContext.log_internal_stack_trace_to_stdout attribute. Two ways to do it:

  • Set the env var RAY_DATA_LOG_INTERNAL_STACK_TRACE_TO_STDOUT=1
  • ray.data.DataContext.get_current().log_internal_stack_trace_to_stdout = True

In later versions of Ray (2.33+), we changed this behavior so that we always show the full internal stack trace: #46647

Feel free to followup with any other questions, or re-open this issue if you are still running into the problem.

@Oblynx
Copy link
Author

Oblynx commented Aug 25, 2024

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something that is supposed to be working; but isn't data Ray Data-related issues triage Needs triage (eg: priority, bug/not-bug, and owning component)
Projects
None yet
Development

No branches or pull requests

3 participants