[data] Basic structured logging #47210

omatthew98 · 2024-08-19T21:22:02Z

Why are these changes needed?

Adds structured logging to Ray Data. This will allow users to configure logging to use any of the following:

A user's custom logging file (existing functionality)
A default TEXT logger (existing functionality)
A default JSON logger (new functionality)

Examples:

Code snippet:

import ray
import time

def f(x):
    time.sleep(0.1)
    return x

def g(x):
    time.sleep(1)
    return x

ray.data.range(100).map(f).map(g, num_cpus=0.1).materialize()

JSON logging (new)

Console output:

❯ RAY_DATA_LOG_ENCODING="JSON" python log.py
2024-10-09 11:15:55,473	INFO worker.py:1777 -- Started a local Ray instance. View the dashboard at 127.0.0.1:8265
2024-10-09 11:15:56,490	INFO streaming_executor.py:108 -- Starting execution of Dataset. Full logs are in /tmp/ray/session_2024-10-09_11-15-54_688421_42223/logs/ray-data
2024-10-09 11:15:56,490	INFO streaming_executor.py:109 -- Execution plan of Dataset: InputDataBuffer[Input] -> TaskPoolMapOperator[ReadRange->Map(f)] -> TaskPoolMapOperator[Map(g)]
✔️  Dataset execution finished in 10.01 seconds: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:10<00:00, 10.0 row/s]
- ReadRange->Map(f): 0 active, 0 queued, [cpu: 0.0, objects: 0.0B]: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:09<00:00, 10.0 row/s]
- Map(g): 0 active, 0 queued, [cpu: 0.0, objects: 32.0B]: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:09<00:00, 10.0 row/s]

ray-data.log:

❯ cat /tmp/ray/session_2024-10-09_11-15-54_688421_42223/logs/ray-data/ray-data.log
{"asctime": "2024-10-09 11:15:55,963", "levelname": "DEBUG", "message": "Autodetected parallelism=24 based on estimated_available_cpus=12 and estimated_data_size=800.", "filename": "util.py", "lineno": 203, "job_id": "01000000", "worker_id": "01000000ffffffffffffffffffffffffffffffffffffffffffffffff", "node_id": "61a34d5f67987e4e6bcb41ea37b914d90d83a6aa3057a25e731d4840"}
{"asctime": "2024-10-09 11:15:56,489", "levelname": "DEBUG", "message": "Autodetected parallelism=24 based on estimated_available_cpus=12 and estimated_data_size=800.", "filename": "util.py", "lineno": 203, "job_id": "01000000", "worker_id": "01000000ffffffffffffffffffffffffffffffffffffffffffffffff", "node_id": "61a34d5f67987e4e6bcb41ea37b914d90d83a6aa3057a25e731d4840"}
{"asctime": "2024-10-09 11:15:56,489", "levelname": "DEBUG", "message": "Expected in-memory size 800, block size 32.0", "filename": "set_read_parallelism.py", "lineno": 34, "job_id": "01000000", "worker_id": "01000000ffffffffffffffffffffffffffffffffffffffffffffffff", "node_id": "61a34d5f67987e4e6bcb41ea37b914d90d83a6aa3057a25e731d4840"}
{"asctime": "2024-10-09 11:15:56,489", "levelname": "DEBUG", "message": "Size based split factor 1", "filename": "set_read_parallelism.py", "lineno": 42, "job_id": "01000000", "worker_id": "01000000ffffffffffffffffffffffffffffffffffffffffffffffff", "node_id": "61a34d5f67987e4e6bcb41ea37b914d90d83a6aa3057a25e731d4840"}
{"asctime": "2024-10-09 11:15:56,489", "levelname": "DEBUG", "message": "Blocks after size splits 25", "filename": "set_read_parallelism.py", "lineno": 44, "job_id": "01000000", "worker_id": "01000000ffffffffffffffffffffffffffffffffffffffffffffffff", "node_id": "61a34d5f67987e4e6bcb41ea37b914d90d83a6aa3057a25e731d4840"}
{"asctime": "2024-10-09 11:15:56,489", "levelname": "DEBUG", "message": "Using autodetected parallelism=24 for operator ReadRange to satisfy parallelism at least twice the available number of CPUs (12).", "filename": "set_read_parallelism.py", "lineno": 117, "job_id": "01000000", "worker_id": "01000000ffffffffffffffffffffffffffffffffffffffffffffffff", "node_id": "61a34d5f67987e4e6bcb41ea37b914d90d83a6aa3057a25e731d4840"}
{"asctime": "2024-10-09 11:15:56,490", "levelname": "DEBUG", "message": "Estimated num output blocks 25", "filename": "set_read_parallelism.py", "lineno": 132, "job_id": "01000000", "worker_id": "01000000ffffffffffffffffffffffffffffffffffffffffffffffff", "node_id": "61a34d5f67987e4e6bcb41ea37b914d90d83a6aa3057a25e731d4840"}
{"asctime": "2024-10-09 11:15:56,490", "levelname": "INFO", "message": "Starting execution of Dataset. Full logs are in /tmp/ray/session_2024-10-09_11-15-54_688421_42223/logs/ray-data", "filename": "streaming_executor.py", "lineno": 108, "job_id": "01000000", "worker_id": "01000000ffffffffffffffffffffffffffffffffffffffffffffffff", "node_id": "61a34d5f67987e4e6bcb41ea37b914d90d83a6aa3057a25e731d4840"}
{"asctime": "2024-10-09 11:15:56,490", "levelname": "INFO", "message": "Execution plan of Dataset: InputDataBuffer[Input] -> TaskPoolMapOperator[ReadRange->Map(f)] -> TaskPoolMapOperator[Map(g)]", "filename": "streaming_executor.py", "lineno": 109, "job_id": "01000000", "worker_id": "01000000ffffffffffffffffffffffffffffffffffffffffffffffff", "node_id": "61a34d5f67987e4e6bcb41ea37b914d90d83a6aa3057a25e731d4840"}
{"asctime": "2024-10-09 11:15:56,490", "levelname": "DEBUG", "message": "Execution config: ExecutionOptions(resource_limits=ExecutionResources(cpu=inf, gpu=inf, object_store_memory=inf), exclude_resources=ExecutionResources(cpu=0.0, gpu=0.0, object_store_memory=0.0B), locality_with_output=False, preserve_order=False, actor_locality_enabled=False, verbose_progress=True)", "filename": "streaming_executor.py", "lineno": 111, "job_id": "01000000", "worker_id": "01000000ffffffffffffffffffffffffffffffffffffffffffffffff", "node_id": "61a34d5f67987e4e6bcb41ea37b914d90d83a6aa3057a25e731d4840"}
{"asctime": "2024-10-09 11:15:56,511", "levelname": "DEBUG", "message": "ConcurrencyCapBackpressurePolicy initialized with: {InputDataBuffer[Input]: inf, InputDataBuffer[Input] -> TaskPoolMapOperator[ReadRange->Map(f)]: inf, InputDataBuffer[Input] -> TaskPoolMapOperator[ReadRange->Map(f)] -> TaskPoolMapOperator[Map(g)]: inf}", "filename": "concurrency_cap_backpressure_policy.py", "lineno": 37, "job_id": "01000000", "worker_id": "01000000ffffffffffffffffffffffffffffffffffffffffffffffff", "node_id": "61a34d5f67987e4e6bcb41ea37b914d90d83a6aa3057a25e731d4840"}
{"asctime": "2024-10-09 11:15:56,521", "levelname": "DEBUG", "message": "Operator Metrics:\nInput: {'num_inputs_received': 25, 'bytes_inputs_received': 67350, 'num_outputs_taken': 25, 'bytes_outputs_taken': 67350, 'task_submission_backpressure_time': 0, 'obj_store_mem_internal_inqueue_blocks': 0, 'obj_store_mem_internal_inqueue': 0, 'obj_store_mem_internal_outqueue_blocks': 0, 'obj_store_mem_internal_outqueue': 0, 'obj_store_mem_used': 0, 'cpu_usage': 0, 'gpu_usage': 0}\nReadRange->Map(f): {'num_inputs_received': 1, 'bytes_inputs_received': 2694, 'num_task_inputs_processed': 0, 'bytes_task_inputs_processed': 0, 'bytes_inputs_of_submitted_tasks': 2694, 'num_task_outputs_generated': 0, 'bytes_task_outputs_generated': 0, 'rows_task_outputs_generated': 0, 'num_outputs_taken': 0, 'bytes_outputs_taken': 0, 'num_outputs_of_finished_tasks': 0, 'bytes_outputs_of_finished_tasks': 0, 'num_tasks_submitted': 1, 'num_tasks_running': 1, 'num_tasks_have_outputs': 0, 'num_tasks_finished': 0, 'num_tasks_failed': 0, 'block_generation_time': 0, 'task_submission_backpressure_time': 0, 'obj_store_mem_internal_inqueue_blocks': 0, 'obj_store_mem_internal_inqueue': 0, 'obj_store_mem_internal_outqueue_blocks': 0, 'obj_store_mem_internal_outqueue': 0, 'obj_store_mem_pending_task_inputs': 2694, 'obj_store_mem_freed': 0, 'obj_store_mem_spilled': 0, 'obj_store_mem_used': 268435456, 'cpu_usage': 1, 'gpu_usage': 0, 'ray_remote_args': {'num_cpus': 1, 'scheduling_strategy': 'SPREAD'}}\nMap(g): {'num_inputs_received': 0, 'bytes_inputs_received': 0, 'num_task_inputs_processed': 0, 'bytes_task_inputs_processed': 0, 'bytes_inputs_of_submitted_tasks': 0, 'num_task_outputs_generated': 0, 'bytes_task_outputs_generated': 0, 'rows_task_outputs_generated': 0, 'num_outputs_taken': 0, 'bytes_outputs_taken': 0, 'num_outputs_of_finished_tasks': 0, 'bytes_outputs_of_finished_tasks': 0, 'num_tasks_submitted': 0, 'num_tasks_running': 0, 'num_tasks_have_outputs': 0, 'num_tasks_finished': 0, 'num_tasks_failed': 0, 'block_generation_time': 0, 'task_submission_backpressure_time': 0, 'obj_store_mem_internal_inqueue_blocks': 0, 'obj_store_mem_internal_inqueue': 0, 'obj_store_mem_internal_outqueue_blocks': 0, 'obj_store_mem_internal_outqueue': 0, 'obj_store_mem_pending_task_inputs': 0, 'obj_store_mem_freed': 0, 'obj_store_mem_spilled': 0, 'obj_store_mem_used': 0, 'cpu_usage': 0, 'gpu_usage': 0, 'ray_remote_args': {'num_cpus': 0.1}}\n", "filename": "streaming_executor.py", "lineno": 483, "job_id": "01000000", "worker_id": "01000000ffffffffffffffffffffffffffffffffffffffffffffffff", "node_id": "61a34d5f67987e4e6bcb41ea37b914d90d83a6aa3057a25e731d4840"}
{"asctime": "2024-10-09 11:15:56,521", "levelname": "DEBUG", "message": "Execution Progress:", "filename": "streaming_executor.py", "lineno": 466, "job_id": "01000000", "worker_id": "01000000ffffffffffffffffffffffffffffffffffffffffffffffff", "node_id": "61a34d5f67987e4e6bcb41ea37b914d90d83a6aa3057a25e731d4840"}
{"asctime": "2024-10-09 11:15:56,521", "levelname": "DEBUG", "message": "0: - Input: 0 active, 0 queued, [cpu: 0.0, objects: 0.0B], Blocks Outputted: 25/25", "filename": "streaming_executor.py", "lineno": 468, "job_id": "01000000", "worker_id": "01000000ffffffffffffffffffffffffffffffffffffffffffffffff", "node_id": "61a34d5f67987e4e6bcb41ea37b914d90d83a6aa3057a25e731d4840"}
{"asctime": "2024-10-09 11:15:56,521", "levelname": "DEBUG", "message": "1: - ReadRange->Map(f): 1 active, 24 queued \ud83d\udea7, [cpu: 1.0, objects: 256.0MB], Blocks Outputted: 0/None", "filename": "streaming_executor.py", "lineno": 468, "job_id": "01000000", "worker_id": "01000000ffffffffffffffffffffffffffffffffffffffffffffffff", "node_id": "61a34d5f67987e4e6bcb41ea37b914d90d83a6aa3057a25e731d4840"}
{"asctime": "2024-10-09 11:15:56,521", "levelname": "DEBUG", "message": "2: - Map(g): 0 active, 0 queued, [cpu: 0.0, objects: 0.0B], Blocks Outputted: 0/None", "filename": "streaming_executor.py", "lineno": 468, "job_id": "01000000", "worker_id": "01000000ffffffffffffffffffffffffffffffffffffffffffffffff", "node_id": "61a34d5f67987e4e6bcb41ea37b914d90d83a6aa3057a25e731d4840"}
{"asctime": "2024-10-09 11:15:56,521", "levelname": "DEBUG", "message": "Operator InputDataBuffer[Input] completed. Operator Metrics:\n{'num_inputs_received': 25, 'bytes_inputs_received': 67350, 'num_outputs_taken': 25, 'bytes_outputs_taken': 67350, 'task_submission_backpressure_time': 0, 'obj_store_mem_internal_inqueue_blocks': 0, 'obj_store_mem_internal_inqueue': 0, 'obj_store_mem_internal_outqueue_blocks': 0, 'obj_store_mem_internal_outqueue': 0, 'obj_store_mem_used': 0, 'cpu_usage': 0, 'gpu_usage': 0}", "filename": "streaming_executor.py", "lineno": 338, "job_id": "01000000", "worker_id": "01000000ffffffffffffffffffffffffffffffffffffffffffffffff", "node_id": "61a34d5f67987e4e6bcb41ea37b914d90d83a6aa3057a25e731d4840"}
{"asctime": "2024-10-09 11:15:58,879", "levelname": "DEBUG", "message": "Operator InputDataBuffer[Input] -> TaskPoolMapOperator[ReadRange->Map(f)] completed. Operator Metrics:\n{'num_inputs_received': 25, 'bytes_inputs_received': 67350, 'num_task_inputs_processed': 25, 'bytes_task_inputs_processed': 67350, 'bytes_inputs_of_submitted_tasks': 67350, 'num_task_outputs_generated': 25, 'bytes_task_outputs_generated': 800, 'rows_task_outputs_generated': 100, 'num_outputs_taken': 25, 'bytes_outputs_taken': 800, 'num_outputs_of_finished_tasks': 25, 'bytes_outputs_of_finished_tasks': 800, 'num_tasks_submitted': 25, 'num_tasks_running': 0, 'num_tasks_have_outputs': 25, 'num_tasks_finished': 25, 'num_tasks_failed': 0, 'block_generation_time': 10.345995793, 'task_submission_backpressure_time': 2.3538285830000003, 'obj_store_mem_internal_inqueue_blocks': 0, 'obj_store_mem_internal_inqueue': 0, 'obj_store_mem_internal_outqueue_blocks': 0, 'obj_store_mem_internal_outqueue': 0, 'obj_store_mem_pending_task_inputs': 0, 'obj_store_mem_freed': 67350, 'obj_store_mem_spilled': 0, 'obj_store_mem_used': 800, 'cpu_usage': 0, 'gpu_usage': 0, 'ray_remote_args': {'num_cpus': 1, 'scheduling_strategy': 'SPREAD'}}", "filename": "streaming_executor.py", "lineno": 338, "job_id": "01000000", "worker_id": "01000000ffffffffffffffffffffffffffffffffffffffffffffffff", "node_id": "61a34d5f67987e4e6bcb41ea37b914d90d83a6aa3057a25e731d4840"}
{"asctime": "2024-10-09 11:16:01,533", "levelname": "DEBUG", "message": "Operator Metrics:\nInput: {'num_inputs_received': 25, 'bytes_inputs_received': 67350, 'num_outputs_taken': 25, 'bytes_outputs_taken': 67350, 'task_submission_backpressure_time': 0, 'obj_store_mem_internal_inqueue_blocks': 0, 'obj_store_mem_internal_inqueue': 0, 'obj_store_mem_internal_outqueue_blocks': 0, 'obj_store_mem_internal_outqueue': 0, 'obj_store_mem_used': 0, 'cpu_usage': 0, 'gpu_usage': 0}\nReadRange->Map(f): {'num_inputs_received': 25, 'bytes_inputs_received': 67350, 'num_task_inputs_processed': 25, 'bytes_task_inputs_processed': 67350, 'bytes_inputs_of_submitted_tasks': 67350, 'num_task_outputs_generated': 25, 'bytes_task_outputs_generated': 800, 'rows_task_outputs_generated': 100, 'num_outputs_taken': 25, 'bytes_outputs_taken': 800, 'num_outputs_of_finished_tasks': 25, 'bytes_outputs_of_finished_tasks': 800, 'num_tasks_submitted': 25, 'num_tasks_running': 0, 'num_tasks_have_outputs': 25, 'num_tasks_finished': 25, 'num_tasks_failed': 0, 'block_generation_time': 10.345995793, 'task_submission_backpressure_time': 2.3538285830000003, 'obj_store_mem_internal_inqueue_blocks': 0, 'obj_store_mem_internal_inqueue': 0, 'obj_store_mem_internal_outqueue_blocks': 0, 'obj_store_mem_internal_outqueue': 0, 'obj_store_mem_pending_task_inputs': 0, 'obj_store_mem_freed': 67350, 'obj_store_mem_spilled': 0, 'obj_store_mem_used': 768, 'cpu_usage': 0, 'gpu_usage': 0, 'ray_remote_args': {'num_cpus': 1, 'scheduling_strategy': 'SPREAD'}}\nMap(g): {'num_inputs_received': 25, 'bytes_inputs_received': 800, 'num_task_inputs_processed': 1, 'bytes_task_inputs_processed': 32, 'bytes_inputs_of_submitted_tasks': 800, 'num_task_outputs_generated': 1, 'bytes_task_outputs_generated': 32, 'rows_task_outputs_generated': 4, 'num_outputs_taken': 1, 'bytes_outputs_taken': 32, 'num_outputs_of_finished_tasks': 1, 'bytes_outputs_of_finished_tasks': 32, 'num_tasks_submitted': 25, 'num_tasks_running': 24, 'num_tasks_have_outputs': 1, 'num_tasks_finished': 1, 'num_tasks_failed': 0, 'block_generation_time': 4.0186684999999995, 'task_submission_backpressure_time': 3.3788410420000003, 'obj_store_mem_internal_inqueue_blocks': 0, 'obj_store_mem_internal_inqueue': 0, 'obj_store_mem_internal_outqueue_blocks': 0, 'obj_store_mem_internal_outqueue': 0, 'obj_store_mem_pending_task_inputs': 768, 'obj_store_mem_freed': 32, 'obj_store_mem_spilled': 0, 'obj_store_mem_used': 768.0, 'cpu_usage': 2.4000000000000004, 'gpu_usage': 0, 'ray_remote_args': {'num_cpus': 0.1, 'scheduling_strategy': 'SPREAD'}}\n", "filename": "streaming_executor.py", "lineno": 483, "job_id": "01000000", "worker_id": "01000000ffffffffffffffffffffffffffffffffffffffffffffffff", "node_id": "61a34d5f67987e4e6bcb41ea37b914d90d83a6aa3057a25e731d4840"}
{"asctime": "2024-10-09 11:16:01,534", "levelname": "DEBUG", "message": "Execution Progress:", "filename": "streaming_executor.py", "lineno": 466, "job_id": "01000000", "worker_id": "01000000ffffffffffffffffffffffffffffffffffffffffffffffff", "node_id": "61a34d5f67987e4e6bcb41ea37b914d90d83a6aa3057a25e731d4840"}
{"asctime": "2024-10-09 11:16:01,534", "levelname": "DEBUG", "message": "0: - Input: 0 active, 0 queued, [cpu: 0.0, objects: 0.0B], Blocks Outputted: 25/25", "filename": "streaming_executor.py", "lineno": 468, "job_id": "01000000", "worker_id": "01000000ffffffffffffffffffffffffffffffffffffffffffffffff", "node_id": "61a34d5f67987e4e6bcb41ea37b914d90d83a6aa3057a25e731d4840"}
{"asctime": "2024-10-09 11:16:01,534", "levelname": "DEBUG", "message": "1: - ReadRange->Map(f): 0 active, 0 queued, [cpu: 0.0, objects: 768.0B], Blocks Outputted: 25/25", "filename": "streaming_executor.py", "lineno": 468, "job_id": "01000000", "worker_id": "01000000ffffffffffffffffffffffffffffffffffffffffffffffff", "node_id": "61a34d5f67987e4e6bcb41ea37b914d90d83a6aa3057a25e731d4840"}
{"asctime": "2024-10-09 11:16:01,534", "levelname": "DEBUG", "message": "2: - Map(g): 24 active, 0 queued, [cpu: 2.4, objects: 768.0B], Blocks Outputted: 1/25", "filename": "streaming_executor.py", "lineno": 468, "job_id": "01000000", "worker_id": "01000000ffffffffffffffffffffffffffffffffffffffffffffffff", "node_id": "61a34d5f67987e4e6bcb41ea37b914d90d83a6aa3057a25e731d4840"}
{"asctime": "2024-10-09 11:16:06,493", "levelname": "DEBUG", "message": "Operator InputDataBuffer[Input] -> TaskPoolMapOperator[ReadRange->Map(f)] -> TaskPoolMapOperator[Map(g)] completed. Operator Metrics:\n{'num_inputs_received': 25, 'bytes_inputs_received': 800, 'num_task_inputs_processed': 25, 'bytes_task_inputs_processed': 800, 'bytes_inputs_of_submitted_tasks': 800, 'num_task_outputs_generated': 25, 'bytes_task_outputs_generated': 800, 'rows_task_outputs_generated': 100, 'num_outputs_taken': 25, 'bytes_outputs_taken': 800, 'num_outputs_of_finished_tasks': 25, 'bytes_outputs_of_finished_tasks': 800, 'num_tasks_submitted': 25, 'num_tasks_running': 0, 'num_tasks_have_outputs': 25, 'num_tasks_finished': 25, 'num_tasks_failed': 0, 'block_generation_time': 100.44397699799998, 'task_submission_backpressure_time': 3.3788410420000003, 'obj_store_mem_internal_inqueue_blocks': 0, 'obj_store_mem_internal_inqueue': 0, 'obj_store_mem_internal_outqueue_blocks': 0, 'obj_store_mem_internal_outqueue': 0, 'obj_store_mem_pending_task_inputs': 0, 'obj_store_mem_freed': 800, 'obj_store_mem_spilled': 0, 'obj_store_mem_used': 32, 'cpu_usage': 0, 'gpu_usage': 0, 'ray_remote_args': {'num_cpus': 0.1, 'scheduling_strategy': 'SPREAD'}}", "filename": "streaming_executor.py", "lineno": 338, "job_id": "01000000", "worker_id": "01000000ffffffffffffffffffffffffffffffffffffffffffffffff", "node_id": "61a34d5f67987e4e6bcb41ea37b914d90d83a6aa3057a25e731d4840"}
{"asctime": "2024-10-09 11:16:06,498", "levelname": "DEBUG", "message": "Shutting down <StreamingExecutor(StreamingExecutor-26a7573d42af437b94e5fac95c711658, stopped daemon 13186969600)>.", "filename": "streaming_executor.py", "lineno": 182, "job_id": "01000000", "worker_id": "01000000ffffffffffffffffffffffffffffffffffffffffffffffff", "node_id": "61a34d5f67987e4e6bcb41ea37b914d90d83a6aa3057a25e731d4840"}

TEXT logging (unchanged)

Console output:

❯ RAY_DATA_LOG_ENCODING="TEXT" python log.py
2024-09-10 14:02:09,407	INFO worker.py:1777 -- Started a local Ray instance. View the dashboard at 127.0.0.1:8265
2024-09-10 14:02:10,353	INFO streaming_executor.py:108 -- Starting execution of Dataset. Full logs are in /tmp/ray/session_2024-09-10_14-02-08_613994_98769/logs/ray-data
2024-09-10 14:02:10,353	INFO streaming_executor.py:109 -- Execution plan of Dataset: InputDataBuffer[Input] -> TaskPoolMapOperator[ReadRange->Map(f)] -> TaskPoolMapOperator[Map(g)]
✔️  Dataset execution finished in 9.86 seconds: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:09<00:00, 10.1 row/s]
- ReadRange->Map(f): 0 active, 0 queued, [cpu: 0.0, objects: 0.0B]: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:09<00:00, 10.2 row/s]
- Map(g): 0 active, 0 queued, [cpu: 0.0, objects: 96.0B]: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:09<00:00, 10.2 row/s]

ray-data.log:

2024-09-10 14:02:09,843	DEBUG util.py:203 -- Autodetected parallelism=24 based on estimated_available_cpus=12 and estimated_data_size=800.
2024-09-10 14:02:10,353	DEBUG util.py:203 -- Autodetected parallelism=24 based on estimated_available_cpus=12 and estimated_data_size=800.
2024-09-10 14:02:10,353	DEBUG set_read_parallelism.py:34 -- Expected in-memory size 800, block size 32.0
2024-09-10 14:02:10,353	DEBUG set_read_parallelism.py:42 -- Size based split factor 1
2024-09-10 14:02:10,353	DEBUG set_read_parallelism.py:44 -- Blocks after size splits 25
2024-09-10 14:02:10,353	DEBUG set_read_parallelism.py:117 -- Using autodetected parallelism=24 for operator ReadRange to satisfy parallelism at least twice the available number of CPUs (12).
2024-09-10 14:02:10,353	DEBUG set_read_parallelism.py:132 -- Estimated num output blocks 25
2024-09-10 14:02:10,353	INFO streaming_executor.py:108 -- Starting execution of Dataset. Full logs are in /tmp/ray/session_2024-09-10_14-02-08_613994_98769/logs/ray-data
2024-09-10 14:02:10,353	INFO streaming_executor.py:109 -- Execution plan of Dataset: InputDataBuffer[Input] -> TaskPoolMapOperator[ReadRange->Map(f)] -> TaskPoolMapOperator[Map(g)]
2024-09-10 14:02:10,353	DEBUG streaming_executor.py:111 -- Execution config: ExecutionOptions(resource_limits=ExecutionResources(cpu=inf, gpu=inf, object_store_memory=inf), exclude_resources=ExecutionResources(cpu=0.0, gpu=0.0, object_store_memory=0.0B), locality_with_output=False, preserve_order=False, actor_locality_enabled=False, verbose_progress=True)
2024-09-10 14:02:10,383	DEBUG concurrency_cap_backpressure_policy.py:37 -- ConcurrencyCapBackpressurePolicy initialized with: {InputDataBuffer[Input]: inf, InputDataBuffer[Input] -> TaskPoolMapOperator[ReadRange->Map(f)]: inf, InputDataBuffer[Input] -> TaskPoolMapOperator[ReadRange->Map(f)] -> TaskPoolMapOperator[Map(g)]: inf}
2024-09-10 14:02:10,393	DEBUG streaming_executor.py:478 -- Operator Metrics:
Input: {'num_inputs_received': 25, 'bytes_inputs_received': 66350, 'num_outputs_taken': 25, 'bytes_outputs_taken': 66350, 'task_submission_backpressure_time': 0, 'obj_store_mem_internal_inqueue_blocks': 0, 'obj_store_mem_internal_inqueue': 0, 'obj_store_mem_internal_outqueue_blocks': 0, 'obj_store_mem_internal_outqueue': 0, 'obj_store_mem_used': 0, 'cpu_usage': 0, 'gpu_usage': 0}
ReadRange->Map(f): {'num_inputs_received': 1, 'bytes_inputs_received': 2654, 'num_task_inputs_processed': 0, 'bytes_task_inputs_processed': 0, 'bytes_inputs_of_submitted_tasks': 2654, 'num_task_outputs_generated': 0, 'bytes_task_outputs_generated': 0, 'rows_task_outputs_generated': 0, 'num_outputs_taken': 0, 'bytes_outputs_taken': 0, 'num_outputs_of_finished_tasks': 0, 'bytes_outputs_of_finished_tasks': 0, 'num_tasks_submitted': 1, 'num_tasks_running': 1, 'num_tasks_have_outputs': 0, 'num_tasks_finished': 0, 'num_tasks_failed': 0, 'block_generation_time': 0, 'task_submission_backpressure_time': 0, 'obj_store_mem_internal_inqueue_blocks': 0, 'obj_store_mem_internal_inqueue': 0, 'obj_store_mem_internal_outqueue_blocks': 0, 'obj_store_mem_internal_outqueue': 0, 'obj_store_mem_pending_task_inputs': 2654, 'obj_store_mem_freed': 0, 'obj_store_mem_spilled': 0, 'obj_store_mem_used': 268435456, 'cpu_usage': 1, 'gpu_usage': 0, 'ray_remote_args': {'num_cpus': 1, 'scheduling_strategy': 'SPREAD'}}
Map(g): {'num_inputs_received': 0, 'bytes_inputs_received': 0, 'num_task_inputs_processed': 0, 'bytes_task_inputs_processed': 0, 'bytes_inputs_of_submitted_tasks': 0, 'num_task_outputs_generated': 0, 'bytes_task_outputs_generated': 0, 'rows_task_outputs_generated': 0, 'num_outputs_taken': 0, 'bytes_outputs_taken': 0, 'num_outputs_of_finished_tasks': 0, 'bytes_outputs_of_finished_tasks': 0, 'num_tasks_submitted': 0, 'num_tasks_running': 0, 'num_tasks_have_outputs': 0, 'num_tasks_finished': 0, 'num_tasks_failed': 0, 'block_generation_time': 0, 'task_submission_backpressure_time': 0, 'obj_store_mem_internal_inqueue_blocks': 0, 'obj_store_mem_internal_inqueue': 0, 'obj_store_mem_internal_outqueue_blocks': 0, 'obj_store_mem_internal_outqueue': 0, 'obj_store_mem_pending_task_inputs': 0, 'obj_store_mem_freed': 0, 'obj_store_mem_spilled': 0, 'obj_store_mem_used': 0, 'cpu_usage': 0, 'gpu_usage': 0, 'ray_remote_args': {'num_cpus': 0.1}}

2024-09-10 14:02:10,393	DEBUG streaming_executor.py:461 -- Execution Progress:
2024-09-10 14:02:10,393	DEBUG streaming_executor.py:463 -- 0: - Input: 0 active, 0 queued, [cpu: 0.0, objects: 0.0B], Blocks Outputted: 25/25
2024-09-10 14:02:10,393	DEBUG streaming_executor.py:463 -- 1: - ReadRange->Map(f): 1 active, 24 queued 🚧, [cpu: 1.0, objects: 256.0MB], Blocks Outputted: 0/None
2024-09-10 14:02:10,393	DEBUG streaming_executor.py:463 -- 2: - Map(g): 0 active, 0 queued, [cpu: 0.0, objects: 0.0B], Blocks Outputted: 0/None
2024-09-10 14:02:10,393	DEBUG streaming_executor.py:338 -- Operator InputDataBuffer[Input] completed. Operator Metrics:
{'num_inputs_received': 25, 'bytes_inputs_received': 66350, 'num_outputs_taken': 25, 'bytes_outputs_taken': 66350, 'task_submission_backpressure_time': 0, 'obj_store_mem_internal_inqueue_blocks': 0, 'obj_store_mem_internal_inqueue': 0, 'obj_store_mem_internal_outqueue_blocks': 0, 'obj_store_mem_internal_outqueue': 0, 'obj_store_mem_used': 0, 'cpu_usage': 0, 'gpu_usage': 0}
2024-09-10 14:02:13,165	DEBUG streaming_executor.py:338 -- Operator InputDataBuffer[Input] -> TaskPoolMapOperator[ReadRange->Map(f)] completed. Operator Metrics:
{'num_inputs_received': 25, 'bytes_inputs_received': 66350, 'num_task_inputs_processed': 25, 'bytes_task_inputs_processed': 66350, 'bytes_inputs_of_submitted_tasks': 66350, 'num_task_outputs_generated': 25, 'bytes_task_outputs_generated': 800, 'rows_task_outputs_generated': 100, 'num_outputs_taken': 25, 'bytes_outputs_taken': 800, 'num_outputs_of_finished_tasks': 25, 'bytes_outputs_of_finished_tasks': 800, 'num_tasks_submitted': 25, 'num_tasks_running': 0, 'num_tasks_have_outputs': 25, 'num_tasks_finished': 25, 'num_tasks_failed': 0, 'block_generation_time': 10.35352417, 'task_submission_backpressure_time': 2.240482626999999, 'obj_store_mem_internal_inqueue_blocks': 0, 'obj_store_mem_internal_inqueue': 0, 'obj_store_mem_internal_outqueue_blocks': 0, 'obj_store_mem_internal_outqueue': 0, 'obj_store_mem_pending_task_inputs': 0, 'obj_store_mem_freed': 66350, 'obj_store_mem_spilled': 0, 'obj_store_mem_used': 800, 'cpu_usage': 0, 'gpu_usage': 0, 'ray_remote_args': {'num_cpus': 1, 'scheduling_strategy': 'SPREAD'}}
2024-09-10 14:02:15,483	DEBUG streaming_executor.py:478 -- Operator Metrics:
Input: {'num_inputs_received': 25, 'bytes_inputs_received': 66350, 'num_outputs_taken': 25, 'bytes_outputs_taken': 66350, 'task_submission_backpressure_time': 0, 'obj_store_mem_internal_inqueue_blocks': 0, 'obj_store_mem_internal_inqueue': 0, 'obj_store_mem_internal_outqueue_blocks': 0, 'obj_store_mem_internal_outqueue': 0, 'obj_store_mem_used': 0, 'cpu_usage': 0, 'gpu_usage': 0}
ReadRange->Map(f): {'num_inputs_received': 25, 'bytes_inputs_received': 66350, 'num_task_inputs_processed': 25, 'bytes_task_inputs_processed': 66350, 'bytes_inputs_of_submitted_tasks': 66350, 'num_task_outputs_generated': 25, 'bytes_task_outputs_generated': 800, 'rows_task_outputs_generated': 100, 'num_outputs_taken': 25, 'bytes_outputs_taken': 800, 'num_outputs_of_finished_tasks': 25, 'bytes_outputs_of_finished_tasks': 800, 'num_tasks_submitted': 25, 'num_tasks_running': 0, 'num_tasks_have_outputs': 25, 'num_tasks_finished': 25, 'num_tasks_failed': 0, 'block_generation_time': 10.35352417, 'task_submission_backpressure_time': 2.240482626999999, 'obj_store_mem_internal_inqueue_blocks': 0, 'obj_store_mem_internal_inqueue': 0, 'obj_store_mem_internal_outqueue_blocks': 0, 'obj_store_mem_internal_outqueue': 0, 'obj_store_mem_pending_task_inputs': 0, 'obj_store_mem_freed': 66350, 'obj_store_mem_spilled': 0, 'obj_store_mem_used': 768, 'cpu_usage': 0, 'gpu_usage': 0, 'ray_remote_args': {'num_cpus': 1, 'scheduling_strategy': 'SPREAD'}}
Map(g): {'num_inputs_received': 25, 'bytes_inputs_received': 800, 'num_task_inputs_processed': 1, 'bytes_task_inputs_processed': 32, 'bytes_inputs_of_submitted_tasks': 800, 'num_task_outputs_generated': 1, 'bytes_task_outputs_generated': 32, 'rows_task_outputs_generated': 4, 'num_outputs_taken': 1, 'bytes_outputs_taken': 32, 'num_outputs_of_finished_tasks': 1, 'bytes_outputs_of_finished_tasks': 32, 'num_tasks_submitted': 25, 'num_tasks_running': 24, 'num_tasks_have_outputs': 1, 'num_tasks_finished': 1, 'num_tasks_failed': 0, 'block_generation_time': 4.0134345, 'task_submission_backpressure_time': 3.350887582999999, 'obj_store_mem_internal_inqueue_blocks': 0, 'obj_store_mem_internal_inqueue': 0, 'obj_store_mem_internal_outqueue_blocks': 0, 'obj_store_mem_internal_outqueue': 0, 'obj_store_mem_pending_task_inputs': 768, 'obj_store_mem_freed': 32, 'obj_store_mem_spilled': 0, 'obj_store_mem_used': 768.0, 'cpu_usage': 2.4000000000000004, 'gpu_usage': 0, 'ray_remote_args': {'num_cpus': 0.1, 'scheduling_strategy': 'SPREAD'}}

2024-09-10 14:02:15,483	DEBUG streaming_executor.py:461 -- Execution Progress:
2024-09-10 14:02:15,483	DEBUG streaming_executor.py:463 -- 0: - Input: 0 active, 0 queued, [cpu: 0.0, objects: 0.0B], Blocks Outputted: 25/25
2024-09-10 14:02:15,483	DEBUG streaming_executor.py:463 -- 1: - ReadRange->Map(f): 0 active, 0 queued, [cpu: 0.0, objects: 768.0B], Blocks Outputted: 25/25
2024-09-10 14:02:15,483	DEBUG streaming_executor.py:463 -- 2: - Map(g): 24 active, 0 queued, [cpu: 2.4, objects: 768.0B], Blocks Outputted: 1/25
2024-09-10 14:02:20,214	DEBUG streaming_executor.py:338 -- Operator InputDataBuffer[Input] -> TaskPoolMapOperator[ReadRange->Map(f)] -> TaskPoolMapOperator[Map(g)] completed. Operator Metrics:
{'num_inputs_received': 25, 'bytes_inputs_received': 800, 'num_task_inputs_processed': 25, 'bytes_task_inputs_processed': 800, 'bytes_inputs_of_submitted_tasks': 800, 'num_task_outputs_generated': 25, 'bytes_task_outputs_generated': 800, 'rows_task_outputs_generated': 100, 'num_outputs_taken': 25, 'bytes_outputs_taken': 800, 'num_outputs_of_finished_tasks': 25, 'bytes_outputs_of_finished_tasks': 800, 'num_tasks_submitted': 25, 'num_tasks_running': 0, 'num_tasks_have_outputs': 25, 'num_tasks_finished': 25, 'num_tasks_failed': 0, 'block_generation_time': 100.446355003, 'task_submission_backpressure_time': 3.350887582999999, 'obj_store_mem_internal_inqueue_blocks': 0, 'obj_store_mem_internal_inqueue': 0, 'obj_store_mem_internal_outqueue_blocks': 0, 'obj_store_mem_internal_outqueue': 0, 'obj_store_mem_pending_task_inputs': 0, 'obj_store_mem_freed': 800, 'obj_store_mem_spilled': 0, 'obj_store_mem_used': 96, 'cpu_usage': 0, 'gpu_usage': 0, 'ray_remote_args': {'num_cpus': 0.1, 'scheduling_strategy': 'SPREAD'}}
2024-09-10 14:02:20,217	DEBUG streaming_executor.py:182 -- Shutting down <StreamingExecutor(StreamingExecutor-9091586451d54141b2ef511d6f302bde, stopped daemon 13086404608)>.

Related issue number

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

python/ray/data/_internal/logging.py

python/ray/data/_internal/logging_json.yaml

python/ray/data/_internal/logging.py

python/ray/data/_internal/logging_json.yaml

python/ray/data/_internal/logging.py

alanwguo · 2024-09-09T23:28:56Z

python/ray/data/_internal/logging_json.yaml

@@ -0,0 +1,34 @@
+version: 1


what is this yaml file format? Is this somethign specific to ray data? is there documentation about this?

The YAML file format is specific to python (more docs here). We just directly load the yaml with something like

with open(config_path) as file: config = yaml.safe_load(file) logging.config.dictConfig(config)

Probably could do a better job explaining that this could be used in the brief logging docs here.

maybe add a comment in this file explaining the origin/format of this YAML file?

GeneDer

No blocker for me, but we should probably add some tests to ensure the behavior, and follow up with product and doc changes.

Also, maybe out of scope, but right now everything is configured through env var. Does it make sense to create a logging config for users to do other configurations such as log level or log file locations?

python/ray/data/_internal/logging_json.yaml

omatthew98 · 2024-09-09T23:46:49Z

No blocker for me, but we should probably add some tests to ensure the behavior, and follow up with product and doc changes.

Also, maybe out of scope, but right now everything is configured through env var. Does it make sense to create a logging config for users to do other configurations such as log level or log file locations?

Users could create their own logging.yaml configuration (which could be forked off either of the default examples we have for TEXT or JSON), and specify the path to be used instead of the default. Does that make sense as a workflow or is there something that would be preferred?

GeneDer · 2024-09-09T23:52:29Z

No blocker for me, but we should probably add some tests to ensure the behavior, and follow up with product and doc changes.
Also, maybe out of scope, but right now everything is configured through env var. Does it make sense to create a logging config for users to do other configurations such as log level or log file locations?

Users could create their own logging.yaml configuration (which could be forked off either of the default examples we have for TEXT or JSON), and specify the path to be used instead of the default. Does that make sense as a workflow or is there something that would be preferred?

Yep, understand there are essentially unlimited degrees of freedom for users to define their own yaml, but then they would have to specify everything on their own right. Just wondering if here are something in between if user just need a small tweak. But totally understand if there are no such usecase for ray data, no need to over engineer this.

GeneDer

LGTM!

omatthew98 · 2024-09-10T21:07:49Z

Yep, understand there are essentially unlimited degrees of freedom for users to define their own yaml, but then they would have to specify everything on their own right. Just wondering if here are something in between if user just need a small tweak. But totally understand if there are no such usecase for ray data, no need to over engineer this.

I think to keep things simple will leave as is. My understanding is that the vast majority of users will use the default settings (maybe not even JSON file logging) and those who do need something custom will likely have the expertise to set things up in whatever way they desire. Can revisit in the future if this becomes a pain point.

raulchen · 2024-09-10T23:11:06Z

python/ray/data/_internal/logging.py

+    os.path.join(os.path.dirname(__file__), "logging_json.yaml")
+)
+
+# Environment variable to specify the encoding of the log messages


also comment what options are available?

bveeramani · 2024-09-10T23:38:05Z

python/ray/data/_internal/logging.py

    environment variable. If the variable isn't set, this function loads the
    "logging.yaml" file that is adjacent to this module.


Docstring is slightly out-of-date with the code

scottjlee · 2024-09-11T17:06:04Z

python/ray/data/_internal/logging_json.yaml

@@ -0,0 +1,34 @@
+version: 1


maybe add a comment in this file explaining the origin/format of this YAML file?

kevin85421

Hi @omatthew98, based on the logs in the PR description, it seems that the core context is not being added to the logs in ray-data.log. The main use case for structured logging is to enable users to associate Job ID, Actor ID, and Task ID with the logs.

hongpeng-guo · 2024-10-07T22:32:44Z

Hi @omatthew98, based on the logs in the PR description, it seems that the core context is not being added to the logs in ray-data.log. The main use case for structured logging is to enable users to associate Job ID, Actor ID, and Task ID with the logs.

cc @omatthew98 I also met this problem when I was doing the train structured logging. I think the fix is to use (): key instead of the class: key when register the core context filter in logging.yaml. A reference online discussion here: https://stackoverflow.com/questions/61456543/logging-in-python-with-yaml-and-filter

hongpeng-guo · 2024-10-07T22:34:29Z

python/ray/data/_internal/logging.yaml


 filters:
    console_filter:
        (): ray.data._internal.logging.HiddenRecordFilter
+    core_context_filter:
+        class: ray._private.ray_logging.filters.CoreContextFilter


(): ray._private.ray_logging.filters.CoreContextFilter will enable this core context filter, I think.

omatthew98 · 2024-10-09T18:22:40Z

Hi @omatthew98, based on the logs in the PR description, it seems that the core context is not being added to the logs in ray-data.log. The main use case for structured logging is to enable users to associate Job ID, Actor ID, and Task ID with the logs.

@kevin85421 I updated this PR with the suggestion from @hongpeng-guo and it now seems to be including those tags. The example above should be updated if you want to confirm.

bveeramani

LGTM

bveeramani · 2024-10-10T03:16:18Z

python/ray/data/_internal/logging.yaml

+        class: ray.data._internal.logging.SessionFileHandler
+        formatter: ray_json
+        filename: ray-data.log
+        filters: [core_context_filter]


Do we need to the core context filter for the default (text) log?

I don't think so. The core context filter doesn't actually filter anything out, it just adds metadata to the logs but for text logs this is not shown.

bveeramani · 2024-10-10T03:17:53Z

python/ray/data/_internal/logging.py

+    If "RAY_DATA_LOG_ENCODING" is specified as "JSON" we will enable JSON reading mode
+    if using the default logging config.


I'm confused about what JSON reading mode refers to -- does this mean JSON log format?

Good catch, it should be JSON logging mode not JSON reading.

bveeramani · 2024-10-10T03:19:07Z

python/ray/data/_internal/logging.py

+# Env. variable to specify the encoding of the file logs when using the default config.
+RAY_DATA_LOG_ENCODING = "RAY_DATA_LOG_ENCODING"
+
+# Env. variable to specify the logging config path use defaults if not set
+RAY_DATA_LOGGING_CONFIG_PATH = "RAY_DATA_LOGGING_CONFIG_PATH"


Nit: Maybe rename to make it clear these refer to environment variable names and not the values? I'd think RAY_DATA_LOGGING_CONFIG_PATH refers to the actual path and not the name of an environment variable.

Going to suffix both of these with _ENV_VAR_NAME to be super explicit.

bveeramani · 2024-10-10T03:20:32Z

python/ray/data/_internal/logging.py

+    # After configuring logger, warn if RAY_DATA_LOGGING_CONFIG_PATH is used with
+    # RAY_DATA_LOG_ENCODING, because they are not both supported together.
+    if config_path is not None and log_encoding is not None:
+        logger = logging.getLogger("ray.data")


Nit: What's the motivation for using the parent logger rather than this module's logger? Feel like it's more conventional to do logging.getLogger(__name__) unless there's a strong reason to do otherwise

Don't think there is a solid reason to use the parent logger, will switch to the module logger.

bveeramani · 2024-10-10T03:21:57Z

python/ray/data/_internal/logging.py

+        logger.warning(
+            "Using `RAY_DATA_LOG_ENCODING` is not supported with "
+            + "`RAY_DATA_LOGGING_CONFIG_PATH`"
+        )


Nit: If these aren't supported, should we just error? Feel like it might be confusing if allow unsupported combinations (e.g., what's the behavior if you do this?)

If they specify RAY_DATA_LOGGING_CONFIG, then this will be used exactly as the logging configuration. If they try to use RAY_DATA_LOG_ENCODING alongside this, this env var will essentially be ignored. My thought was if the user is using a custom configuration that should be the expected behavior. This warning is meant to tell them that this env var doesn't magically make JSON logging work, but they should still get logs exactly how they have them configured in the custom config

bveeramani · 2024-10-10T03:22:52Z

python/ray/data/_internal/logging.py

+    # Dynamically load env vars
+    config_path = os.environ.get("RAY_DATA_LOGGING_CONFIG_PATH")
+    log_encoding = os.environ.get("RAY_DATA_LOG_ENCODING")


For my own understanding, what happens if we don't load these dynamically?

The main difference would be if you were editing the environment variables and then recalled configure_logging, this would pick up the changes vs. loading them once on module initialization would not. In our tests we are doing something similar where we modify the variables and then call configure_logging so this works better with that, but in general this will just ensure the most recent environment variable values are used.

Signed-off-by: Matthew Owen <mowen@anyscale.com>

Co-authored-by: Scott Lee <scottjlee@users.noreply.github.com> Signed-off-by: Matthew Owen <omatthew98@berkeley.edu>

Co-authored-by: Hao Chen <chenh1024@gmail.com> Signed-off-by: Matthew Owen <omatthew98@berkeley.edu>

Signed-off-by: Matthew Owen <mowen@anyscale.com>

Adds structured logging to Ray Data. This will allow users to configure logging to use any of the following: * A user's custom logging file (existing functionality) * A default TEXT logger (existing functionality) * A default JSON logger (new functionality) --------- Signed-off-by: Matthew Owen <mowen@anyscale.com> Signed-off-by: Matthew Owen <omatthew98@berkeley.edu> Co-authored-by: Scott Lee <scottjlee@users.noreply.github.com> Co-authored-by: Hao Chen <chenh1024@gmail.com> Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>

omatthew98 requested review from ericl, scv119, c21, amogkam, scottjlee, bveeramani, raulchen and stephanie-wang as code owners August 19, 2024 21:22

omatthew98 commented Aug 19, 2024

View reviewed changes

python/ray/data/_internal/logging.py Outdated Show resolved Hide resolved

omatthew98 commented Aug 19, 2024

View reviewed changes

python/ray/data/_internal/logging_json.yaml Outdated Show resolved Hide resolved

omatthew98 assigned alexeykudinkin and scottjlee Aug 19, 2024

scottjlee reviewed Aug 19, 2024

View reviewed changes

omatthew98 commented Aug 19, 2024

View reviewed changes

python/ray/data/_internal/logging_json.yaml Outdated Show resolved Hide resolved

raulchen reviewed Aug 21, 2024

View reviewed changes

python/ray/data/_internal/logging.py Outdated Show resolved Hide resolved

omatthew98 force-pushed the mowen/ray-data-structured-logging branch from 18a921c to 7c6313d Compare September 9, 2024 22:40

alanwguo reviewed Sep 9, 2024

View reviewed changes

GeneDer reviewed Sep 9, 2024

View reviewed changes

python/ray/data/_internal/logging_json.yaml Outdated Show resolved Hide resolved

GeneDer approved these changes Sep 10, 2024

View reviewed changes

omatthew98 added the go add ONLY when ready to merge, run all tests label Sep 10, 2024

raulchen approved these changes Sep 10, 2024

View reviewed changes

bveeramani approved these changes Sep 10, 2024

View reviewed changes

scottjlee approved these changes Sep 11, 2024

View reviewed changes

hongpeng-guo mentioned this pull request Sep 24, 2024

[Train Log]Ray Train Structured Logging #47806

Open

8 tasks

kevin85421 reviewed Oct 3, 2024

View reviewed changes

kevin85421 self-assigned this Oct 4, 2024

hongpeng-guo reviewed Oct 7, 2024

View reviewed changes

kevin85421 approved these changes Oct 10, 2024

View reviewed changes

bveeramani approved these changes Oct 10, 2024

View reviewed changes

omatthew98 and others added 10 commits October 11, 2024 13:21

basic structured logging

b5271b7

Signed-off-by: Matthew Owen <mowen@anyscale.com>

respond to feedback

dd44baa

Signed-off-by: Matthew Owen <mowen@anyscale.com>

Update python/ray/data/_internal/logging.py

517eda8

Co-authored-by: Scott Lee <scottjlee@users.noreply.github.com> Signed-off-by: Matthew Owen <omatthew98@berkeley.edu>

Update python/ray/data/_internal/logging.py

9450689

Co-authored-by: Hao Chen <chenh1024@gmail.com> Signed-off-by: Matthew Owen <omatthew98@berkeley.edu>

JSON mode log text to console

8511d37

Signed-off-by: Matthew Owen <mowen@anyscale.com>

update code to use single config, tweak logic to accommodate this

9608e3b

Signed-off-by: Matthew Owen <mowen@anyscale.com>

adding in test and dynamic env var loading

e986017

Signed-off-by: Matthew Owen <mowen@anyscale.com>

swap class for () to make corecontextfilter work

1216470

Signed-off-by: Matthew Owen <mowen@anyscale.com>

pr feedback

a2e4377

Signed-off-by: Matthew Owen <mowen@anyscale.com>

update env var name

b4e0e6a

Signed-off-by: Matthew Owen <mowen@anyscale.com>

omatthew98 force-pushed the mowen/ray-data-structured-logging branch from 4f35375 to b4e0e6a Compare October 11, 2024 20:22

bveeramani merged commit 04098a6 into ray-project:master Oct 11, 2024
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[data] Basic structured logging #47210

[data] Basic structured logging #47210

omatthew98 commented Aug 19, 2024 •

edited

Loading

alanwguo Sep 9, 2024

omatthew98 Sep 9, 2024

scottjlee Sep 11, 2024

GeneDer left a comment

omatthew98 commented Sep 9, 2024

GeneDer commented Sep 9, 2024

GeneDer left a comment

omatthew98 commented Sep 10, 2024

raulchen Sep 10, 2024

bveeramani Sep 10, 2024

scottjlee Sep 11, 2024

kevin85421 left a comment

hongpeng-guo commented Oct 7, 2024 •

edited

Loading

hongpeng-guo Oct 7, 2024

omatthew98 commented Oct 9, 2024

bveeramani left a comment

bveeramani Oct 10, 2024

omatthew98 Oct 10, 2024

bveeramani Oct 10, 2024

omatthew98 Oct 10, 2024

bveeramani Oct 10, 2024

omatthew98 Oct 10, 2024

bveeramani Oct 10, 2024

omatthew98 Oct 10, 2024

bveeramani Oct 10, 2024

omatthew98 Oct 10, 2024

bveeramani Oct 10, 2024

omatthew98 Oct 10, 2024

		environment variable. If the variable isn't set, this function loads the
		"logging.yaml" file that is adjacent to this module.

		If "RAY_DATA_LOG_ENCODING" is specified as "JSON" we will enable JSON reading mode
		if using the default logging config.

[data] Basic structured logging #47210

[data] Basic structured logging #47210

Conversation

omatthew98 commented Aug 19, 2024 • edited Loading

Why are these changes needed?

Examples:

JSON logging (new)

TEXT logging (unchanged)

Related issue number

Checks

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

GeneDer left a comment

Choose a reason for hiding this comment

omatthew98 commented Sep 9, 2024

GeneDer commented Sep 9, 2024

GeneDer left a comment

Choose a reason for hiding this comment

omatthew98 commented Sep 10, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kevin85421 left a comment

Choose a reason for hiding this comment

hongpeng-guo commented Oct 7, 2024 • edited Loading

Choose a reason for hiding this comment

omatthew98 commented Oct 9, 2024

bveeramani left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

omatthew98 commented Aug 19, 2024 •

edited

Loading

hongpeng-guo commented Oct 7, 2024 •

edited

Loading