Skip to content

Releases: ray-project/ray

Ray-2.53.0

20 Dec 15:16
0de2118

Choose a tag to compare

Highlights

  • Ray plans to drop support for Pydantic V1 starting version 2.56.0. Please see this RFC for details.
  • Ray Data now has support for bounded reading from Kafka and improved Iceberg support.

Ray Data

🎉 New Features

  • Autoscaling: New utilization-based cluster autoscaler for Ray Data workloads (#59353, #59362, #59366). To use this new autoscaler set RAY_DATA_CLUSTER_AUTOSCALER=V2.
  • Kafka Datasource: Add Kafka as a native datasource for data ingestion (#58592)
  • Dataset summary API: Add Dataset.summary() API for quick dataset inspection (#58862)
  • Iceberg support: Add Iceberg schema evolution, upsert, and overwrite support (#59210, #59335)
  • Graceful error handling: Add should_continue_on_error for graceful error handling in batch inference (#59212)
  • Datetime compute expressions: Add datetime compute expressions support (#58740)
  • Grouped with_column expressions: Enable expressions for grouped with_column in Ray Data (#58231)
  • Parallelized collation: Parallelize DefaultCollateFn, arrow_batch_to_tensors (#58821)

💫 Enhancements

  • Optimized Autoscaler Step Size: Optimize autoscaler to support configurable step size for actor pool scaling (#58726)
  • Improved Streaming Repartition: Improve streaming repartition performance (#58728)
  • Actor init retry: Add actor retry if there's a failure in __init__ (#59105)
  • Fused Repartition + MapBatches: Fuse StreamingRepartition with MapBatches operators to scale collate (#59108)
  • Combined repartitions: Combine consecutive repartitions for efficiency (#59145)
  • Prefetch buffering: Handle prefetch buffering in iter_batches (#58657)
  • HashShuffle block breakdown: HashShuffleAggregator breaks down blocks on finalize (#58603)
  • Backpressure tuning: Tune concurrency cap backpressure object store budget ratio (#58813)
  • Non-string ApproximateTopK: Support non-string items for ApproximateTopK aggregator (#58659)
  • Lance version support: Add version support to read_lance() (#58895)
  • Dashboard metrics: Add time_to_first_batch and get_ref_bundles metrics to data dashboard (#58912)
  • Iter prefetched bytes stats: Add iter_prefetched_bytes statistics tracking (#58900)
  • Configurable batching for iter_batches: Add configurable batching for resolve_block_refs to speed up iter_batches (#58467)
  • Improved dashboard metrics: Improve Ray Data dashboard metrics display (#58667)
  • Histogram percentiles: Update Ray Data histograms to show percentiles in data dashboard (#58650)
  • Deprecated API removal: Remove deprecated read_parquet_bulk API (#58970)
  • Block shaping option: Add disable block shaping option to BlockOutputBuffer (#58757)
  • Removed concurrency lock: Remove concurrency lock for better performance (#56798)

🔨 Fixes

  • Fixes to Unique: Fix support of list types for Unique aggregator (#58916)
  • Parquet NaN fix: Fix reading from written parquet for numpy with NaNs (#59172)
  • Hash Shuffle empty block: Fix empty block sort in hash shuffle operator (#58836)
  • Hive partitioning pushdown: Fix pushdown optimizations with Hive partitioning (#58723)
  • Object Store usage reporting: Fix obj_store_mem_max_pending_output_per_task reporting (#58864)
  • Pyarrow FileSystem serialization fix: Handle filesystem serialization issue in get_parquet_dataset (#57047)
  • Azure UC SAS: Handle Azure UC user delegation SAS (#59393)
  • Async UDF Thread Cleanup: Close threads from async UDF after actor died (#59261)
  • Object Locality Default: Default return 0s for object locality instead of -1s (#58754)

📖 Documentation

  • Added contributing guide to Ray Data documentation (#58589)
  • Added download expression to key user journeys in documentation (#59417)
  • Added Kafka user guide (#58881)
  • Added unstructured data templates from Ray Summit 2025 (#57063)
  • Improved instructions for reading Hugging Face datasets (#58492, #58832)
  • Refined batch-format guidance in docs (#58971)
  • Exposed vision_preprocess and vision_postprocess in VLM docs (#59012)
  • Added upgrading huggingface_hub instruction (#59109)
  • Added scaling out expensive collation functions doc (#58993)

Ray Serve

🎉 New Features

  • Deployment topology visibility. Exposes deployment dependency graphs in Serve REST API, allowing users to visualize and understand the DAG structure of their applications. (#58355)
  • External autoscaler integration. Adds external_scaler_enabled flag to application config, enabling third-party autoscalers to control replica counts. (#57727, #57698)
  • Node rank and local rank support. Extends replica rank system to track node-level and per-node local ranks, enabling better distributed serving coordination for multi-node deployments. (#58477, #58479)
  • Custom batch size function. Allows users to define custom functions for computing logical batch sizes in @serve.batch, useful when batch items have varying weights (e.g., token counts in LLM inference). (#59059)
  • Stateful application-level autoscaling. Adds policy state persistence for custom autoscaling policies, allowing policies to maintain state across control-loop iterations. (#59118)
  • New autoscaling, batching, and routing metrics. Adds Prometheus metrics for autoscaling decisions (ray_serve_deployment_target_replicas, ray_serve_autoscaling_decision_replicas), batching statistics, and router queue latency for improved observability. (#59220, #59232, #59233)

💫 Enhancements

  • Smarter downscaling behavior. Prioritizes stopping most recently scaled-up replicas during downscale, preserving long-lived replicas that are optimally placed and fully warmed up. (#52929)
  • Autoscaling performance optimizations. Short-circuits metric aggregation for single time series cases (O(n log n) → O(1)) and lazily evaluates expensive autoscaling context fields to reduce controller CPU usage. (#58962, #58963)
  • Route matching cleanup. Removes redundant route matching logic from replicas since correct route values are now included in RequestMetadata. Also allows multiple methods (GET, PUT) corresponding to a route. (#58927)
  • Deployment wrapper metadata preservation. Wrapper classes from decorators like @ingress now preserve original class metadata (__qualname__, __module__, __doc__, __annotations__). (#58478)
  • Improved type annotations. Enhances generic type annotations on DeploymentHandle, DeploymentResponse, and DeploymentResponseGenerator for better IDE support and type inference. Adds .result() stub to DeploymentResponseGenerator to fix static typing errors. (#59363, #58522)

🔨 Fixes

  • YAML serialization for autoscaling enums. Fixes RepresenterError when using serve build with AggregationFunction enum values in autoscaling config. (#58509)
  • Autoscaling context timestamp fix. Correctly sets last_scale_up_time and last_scale_down_time on autoscaling context. (#59057)
  • Deadlock in chained deployment responses. Fixes hang when awaiting intermediate DeploymentResponse objects in a chain of deployment calls from different event loops. (#59385)
  • FastAPI class-based view inheritance. Fixes make_fastapi_class_based_view to properly handle inherited methods. (#59410)

📖 Documentation

  • Async I/O best practices guide. New documentation covering async programming patterns and best practices for Ray Serve deployments. (#58909)
  • Replica scheduling guide. New documentation covering compact scheduling, placement groups, custom resources, and guidance on when to use each feature. (#59114)

Ray Train

🎉 New Features

  • Worker Placement with Label Selectors: Added label_selector to ScalingConfig. This allows users to control worker placement by targeting specific labeled nodes in the cluster. (#58845, #59414)
  • Multihost JaxTrainer on GPU: Introduced support for JaxTrainer running on GPU machines. (#58322)
  • Checkpoint Consistency Modes: Added CheckpointConsistencyMode to get_all_reported_checkpoints, providing options for handling checkpoint retrieval consistency. (#58271)
  • Per-Dataset Execution Options: DataConfig now supports setting execution_options on a per-dataset basis for finer-grained control over data loading. (#58717)

💫 Enhancements

  • Nested Metrics Support: Result.get_best_checkpoint now supports nested metrics, allowing for more flexible metric tracking and checkpoint selection. (#58537)
  • Non-Blocking Checkpoint Retrieval: get_all_reported_checkpoints no longer blocks when only metrics are reported. (#58870)
  • Improved Resource Cleanup: Implemented eager cleanup of data resources and placement groups upon training run failures or aborts, preventing resource leaks. (#58325, #58515)

🔨 Fixes

  • MLflow Compatibility: Updated setup_mlflow API to ensure full compatibility with Ray Train V2. (#58705)
  • Validation for Checkpoint Uploads: A ValueError is now raised if checkpoint_upload_fn fails to return a valid checkpoint. (#58863)

📖 Documentation

  • New API Documentation: Added comprehensive documentation for the ray.train.get_all_reported_checkpoints method. (#58946)

Ray Tune

💫 Enhancements:

  • Nested Metrics Support: Result.get_best_checkpoint now supports nested metrics, allowing for more flexible metric tracking and checkpoint selection. (#58537)

Ray LLM

💫 Enhancements

  • Cloud filesystem restructuring with provider-specific implementations (#58469)
  • Bump transformers to 4.57.3 (#58980)
  • Ray Data LLM config refactor (#58298)
  • Update vllm_engine.py to check for VLLM_USE_V1 attribute (#58820)
  • Infer VLLM_RAY_PER_WORKER_GPUS from fractional placement-group bundles automatically (#5...
Read more

Ray-2.51.2

29 Nov 00:40
9ac1e61

Choose a tag to compare

  • Fix for CVE-2025-62593: reject Sec-Fetch-* other browser-specific headers in dashboard browser rejection logic

Ray-2.52.1

28 Nov 02:23
4ebdc0a

Choose a tag to compare

  • More robust handling for CVE-2025-62593: test for more browser-specific headers in dashboard browser rejection logic

Ray-2.52.0

21 Nov 19:10
9527a55

Choose a tag to compare

Release Highlights

Ray Core:

  • End of Life for Python 3.9 Support: Ray will no longer be releasing Python 3.9 wheels from now on.
  • Token authentication: Ray now supports built-in token authentication across all components including the dashboard, CLI, API clients, and internal services. This provides an additional layer of security for production deployments to reduce the risk of unauthorized code execution. Token authentication is initially off by default. For more information, see: https://docs.ray.io/en/latest/ray-security/token-auth.html

Ray Data:

  • We’ve added a number of improvements for Iceberg, including upserts, predicate and projection pushdown, and overwrite.
  • We’ve added significant improvements to our expressions framework, including temporal, list, tensor, and struct datatype expressions.

Ray Libraries

Ray Data

🎉 New Features:

  • Added predicate pushdown rule that pushes filter predicates past eligible operators (#58150, #58555)
  • Iceberg support for upsert tables, schema updates, and overwrite operations (#58270)
  • Iceberg support for predicate and projection pushdown (#58286)
  • Iceberg write datafiles in write() then commit (#58601)
  • Enhanced Unity Catalog integration (#57954)
  • Namespaced expressions that expose PyArrow functions (#58465)
  • Added version argument to read_delta_lake (#54976)
  • Generator UDF support for map_groups (#58039)
  • ApproximateTopK aggregator (#57950)
  • Serialization framework for preprocessors (#58321)
  • Support for temporal, list, tensor, and struct datatypes (#58225)

💫 Enhancements:

  • Use approximate quantile for RobustScaler preprocessor (#58371)
  • Map batches support for limit pushdown (#57880)
  • Make all map operations zero-copy by default (#58285)
  • Use tqdm_ray for progress reporting from workers (#58277)
  • Improved concurrency cap backpressure tuning (#58163, #58023, #57996)
  • Sample finalized partitions randomly to avoid lens effect (#58456)
  • Allow file extensions starting with '.' (#58339)
  • Set default file_extensions for read_parquet (#56481)
  • URL decode values in parse_hive_path (#57625)
  • Streaming partition enforces row_num per block (#57984)
  • Streaming repartition combines small blocks (#58020)
  • Lower DEFAULT_ACTOR_MAX_TASKS_IN_FLIGHT_TO_MAX_CONCURRENCY_FACTOR to 2 (#58262)
  • Set udf-modifying-row-count default to false (#58264)
  • Cache PyArrow schema operations (#58583)
  • Explain optimized plans (#58074)
  • Ranker interface (#58513)

🔨 Fixes:

  • Fixed renamed columns to be appropriately dropped from output (#58040, #58071)
  • Fixed handling of renames in projection pushdown (#58033, #58037)
  • Fixed broken LogicalOperator abstraction barrier in predicate pushdown rule (#58683)
  • Fixed file size ordering in download partitioning with multiple URI columns (#58517)
  • Fixed HTTP streaming file download by using open_input_stream (#58542)
  • Fixed expression mapping for Pandas (#57868)
  • Fixed reading from zipped JSON (#58214)
  • Fixed MCAP datasource import for better compatibility (#57964)
  • Avoid slicing block when total_pending_rows < target (#58699)
  • Clear queue for manually marked execution_finished operators (#58441)
  • Add exception handling for invalid URIs in download operation (#58464)
  • Fixed progress bar name display (#58451)

📖 Documentation:

  • Documentation for Ray Data metrics (#58610)
  • Simplify and add Ray Data LLM quickstart example (#58330)
  • Convert rST-style to Google-style docstrings (#58523)

🏗 Architecture:

  • Removed stats update thread (#57971)
  • Refactor histogram metrics (#57851)
  • Revisit OpResourceAllocator to make data flow explicit (#57788)
  • Create unit test directory for fast, isolated tests (#58445)
  • Dump verbose ResourceManager telemetry into ray-data.log (#58261)

Ray Train

🎉 New Features:

  • Result::from_path implementation in v2 (#58216)

💫 Enhancements:

  • Exit actor and log appropriately when poll_workers is in terminal state (#58287)
  • Set JAX_PLATFORMS environment variable based on ScalingConfig (#57783)
  • Default to disabling Ray Train collective util timeouts (#58229)
  • Add SHUTTING_DOWN TrainControllerState and improve logging (#57882)
  • Improved error message when calling training function utils outside Ray Train worker (#57863)
  • FSDP2 template: Resume from previous epoch when checkpointing (#57938)
  • Clean up checkpoint config and trainer param deprecations (#58022)
  • Update failure policy log message (#58274)

📖 Documentation:

  • Ray Train Metrics documentation page (#58235)
  • Local mode user guide (#57751)
  • Recommend tree_learner="data_parallel" in examples for distributed LightGBM training (#58709)

Ray Serve

🎉 New Features:

  • Custom request routing with runtime environment support. Users can now define custom request router classes that are safely imported and serialized using the application's runtime environment, enabling advanced routing logic with custom dependencies. (#56855)
  • Custom autoscaling policies with enhanced logging. Deployment-level and application-level autoscaling policies now display their custom policy names in logs, making it easier to debug and monitor autoscaling behavior. (#57878)
  • Audio transcription support in vLLM backend. Ray Serve now supports transcription tasks through the vLLM engine, expanding multimodal capabilities. (#57194)
  • Data parallel attention public API. Introduced a public API for data parallel attention, enabling efficient distributed attention mechanisms for large-scale inference workloads. (#58301)
  • Route pattern tracking in proxy metrics. Proxy metrics now expose actual route patterns (e.g., /api/users/{user_id}) instead of just route prefixes, enabling granular endpoint monitoring without high cardinality issues. Performance impact is minimal (~1% RPS decrease). (#58180)
  • Replica dependency graph construction. Added list_outbound_deployments() method to discover downstream deployment dependencies, enabling programmatic analysis of service topology for both stored and dynamically-obtained handles. (#58345, #58350)
  • Multi-dimensional replica ranking. Introduced ReplicaRank schema with global, node-level, and local ranks to support advanced coordination scenarios like tensor parallelism and model sharding across nodes. (#58471, #58473)
  • Proxy readiness verification. Added a check to ensure proxies are ready to serve traffic before serve.run() completes, improving deployment reliability. (#57723)
  • IPv6 socket support. Ray Serve now supports IPv6 networking for socket communication. (#56147)

💫 Enhancements:

  • Selective throughput optimization flag overrides. Users can now override individual flags set by RAY_SERVE_THROUGHPUT_OPTIMIZED without manually configuring all f...
Read more

Ray-2.51.1

01 Nov 03:27
eeb38c7

Choose a tag to compare

  • Reuse previous metadata if transferring the same tensor list with nixl (#58309)

Ray-2.51.0

29 Oct 05:33
b6b1fac

Choose a tag to compare

Release Highlights

Ray Train:

  • Ray Train v2 is now enabled by default! Ray Train v2 provides usability and stability improvements, as well as new features. For more details, see the REP and Migration Guide. To disable Ray Train v2, set the environment variable RAY_TRAIN_V2_ENABLED=0.

Ray Serve:

  • Application-level autoscaling: Introduces custom autoscaling policies that operate across all deployments in an application, enabling coordinated scaling decisions based on aggregate metrics. This is a significant advancement over per-deployment autoscaling, allowing for more intelligent resource management at the application level.
  • Enhanced autoscaling capabilities with replica-level metrics: Wires up AutoscalingContext with total_running_requests, total_queued_requests, and total_num_requests, plus adds support for min, max, and time-weighted average aggregation functions. These improvements give users fine-grained control to implement sophisticated custom autoscaling policies based on real-time workload metrics.

Ray Libraries

Ray Data

🎉 New Features:

  • Added enhanced support for Unity Catalog integration (#57954, #58049)
  • New expression evaluator infrastructure for improved query optimization (#57778, #57855)
  • Support for SaveMode in write operations (#57946)
  • Added approximate quantile aggregator (#57598)
  • MCAP datasource support for robotics data (#55716)
  • Callback-based stat computation for preprocessors and ValueCounter (#56848)
  • Support for multiple download URIs with improved error handling (#57775)

💫 Enhancements:

  • Improved projection pushdown handling with renamed columns (#58033, #58037, #58040, #58071)
  • Enhanced hash-shuffle performance with better retry policies (#57572)
  • Streamlined concurrency parameter semantics (#57035)
  • Improved execution progress rendering (#56992)
  • Better handling of empty columns in pandas blocks (#57740)
  • Enhanced support for complex data types and column operations (#57271)
  • Reduced memory usage with improved streaming generator backpressure (#57688)
  • Enhanced preemption testing and utilities (#57883)
  • Improved Download operator display names (#57773)
  • Better handling of variable-shaped tensors and tensor columns (#57240)
  • Optimized aggregator execution with out-of-order processing by default (#57753)

🔨 Fixes:

  • Fixed renamed columns to be appropriately dropped from output (#58040, #58071)
  • Fixed handling of renames in projection pushdown (#58033, #58037)
  • Fixed vLLMEngineStage field name inconsistency for images (#57980)
  • Fixed driver hang during streaming generator block metadata retrieval (#56451)
  • Fixed retry policy for hash-shuffle tasks (#57572)
  • Fixed prefetch loop to avoid blocking on fetches (#57613)
  • Fixed empty projection handling (#57740)
  • Fixed errors with concatenation of mixed pyarrow native and extension types (#56811)

📖 Documentation:

  • Updated document embedding benchmark to use canonical Ray Data API (#57977)
  • Improved concurrency-related documentation (#57658)
  • Updated preprocessing and data handling examples

Ray Train

🎉 New features

  • Turn on Train v2 by default (#57857)
  • Top-level ray.train aliases for public APIs (#57758)

💫 Enhancements

  • Raise clear errors when mixing v1/v2 APIs (#57570)
  • JAX backend: add jax.distributed.shutdown() for JaxBackend (#57802)
  • Update TrainingFailedError module (#57865)
  • Improve deprecation handling when ray.train methods are called from ray.tune (#57810)
  • Enable deprecation warnings for legacy XGBoost/LightGBM trainers (#57280)

🔨 Fixes

  • Fix ControllerError triggered by after_worker_group_poll_status errors (#57869)
  • Fix iter_torch_batches use of ray.train.torch.get_device outside Train (#57816)
  • Fix exception-queue race condition in ThreadRunner (#57249)

📖 Documentation

  • Add validation and details to checkpoint docs (#57065)

🏗 Architecture / tests

  • Enable Train v2 across test suites; migrate remaining tests and isolate/disable stragglers (#56868, #57256, #57534, #57722, #57764)
  • Isolate circular-dependency tests and resolve circular imports (#57710, #56921)
  • Replace Checkpoint Manager Pydantic v2 APIs with v1 (#57147)
  • Bump test timeouts (test_util, torch_trainer) (#57939, #57873)

Ray Tune

💫 Enhancements:

  • Updated release tests to import from tune (#57956)
  • Better integration with Train V2 backend

Ray Serve

🎉 New Features:

  • Application-level autoscaling. Introduces support for custom autoscaling policies that operate across all deployments in an application, enabling coordinated scaling decisions based on aggregate metrics. (#57535, #57548, #57637, #57756)
  • Autoscaling metrics aggregation functions. Adds support for min, max, and time-weighted average aggregation over timeseries data, providing more flexible autoscaling control. (#56871)
  • Enhanced autoscaling context with replica-level metrics. Wires up AutoscalingContext constructor arguments to expose total_running_requests, total_queued_requests, and total_num_requests for use in custom autoscaling policies. (#57202)
  • Multiple task consumers in a single application. Ray Serve applications can now run multiple task consumer deployments concurrently. (#56618)

💫 Enhancements:

  • Reconfigure invoked on replica rank changes. The reconfigure method now receives both user_config and rank parameters when ranks change, enabling replicas to adapt their configuration dynamically. (#57091)
  • Celery adapter configuration improvements. Added default serializer and new configuration fields to enhance Celery integration flexibility. (#56707)
  • AutoscalingContext promoted to public API. The autoscaling context is now officially part of the public API with comprehensive documentation. (#57600)
  • Async inference telemetry. Added telemetry tracking to monitor the number of replicas using asynchronous inference. (#57665)
  • Rank logging verbosity reduced. Changed seven rank-related INFO logs to DEBUG level, reducing log noise during normal operations. (#57831)
  • Controller logging optimized. Removed expensive debug logs from the controller that were costly in large clusters. (#57813)

🔨 Fixes:

  • Max constructor retry count test fixed for Windows. Adjusted test resource requirements to account for Windows process creation overhead compared to Linux forking. (#57541)
  • Streaming test stability improvements. Added synchronization mechanisms to prevent chunk coalescing and rechunking, eliminating test flakiness. (#57592, #57728)
  • Autoscaling test deflaking. Fixed race conditions in application-level autoscaling tests and removed flaky min aggregation test scenario. (#57784, #57967)
  • State API usage test corrected. Fixed a unit test that was broken but not running in CI. (#56948)
  • Controller recovery logging condition fixed. Updated test condition to properly verify debug and JSON logs after controller recovery. (#57568)

📖 Documentation:

  • Custom autoscaling documentation. Added comprehensive guide for implementing custom autoscaling policies with examples and best practices. (#57600)
  • Replica ranks documentation. Documented the replica rank feature, including how ranks are assigned and how to use them in reconfigure methods. (#57649)
  • Application-level autoscaling guide. Added documentation explaining how to configure and use application-level autoscaling policies. (#57756)
  • Autoscaling documentation improvements. Updated serve autoscaling docs with clearer explanations and examples. (#57652)
  • Performance flags documentation. Documented performance-related configuration flags for Ray Serve. (#57845)
  • Metrics documentation fix. Corrected ray_serve_deployment_queued_queries metric name discrepancy in documentation. (#57629)
  • AutoscalingContext import added to examples. Fixed missing import statement in custom autoscaling policy example. (#57876)
  • App builder guide typo corrected. Fixed command syntax error in typed application builder example. (#57634)
  • Celery filesystem broker note. Added warning about using filesystem as a broker in Celery workers. (#57686)
  • Async inference alpha stage warning. Added notice that async inference is in alpha stage. (#57268)

🏗 Architecture refactoring:

  • Autoscaling contro...
Read more

Ray-2.50.1

18 Oct 19:21
7cf6817

Choose a tag to compare

Ray Core: Fix deadlock when cancelling stale requests on in-order actors (#57746)

Ray-2.50.0

10 Oct 23:06
fc4510f

Choose a tag to compare

Release Highlights

Ray Data:
This release offers many updates to Ray Data, including:

  • The default shuffle strategy is now changed from sort-based to hash-based. This will result in much lower peak memory usage and improved shuffle performance for aggregations.
  • We’ve added a new expression API enables predicate-based filtering, UDF transformations with with_column, and column aliasing for more powerful data transformations
  • Ray Data LLM has a number of new enhancements for multimodal data pipelines, including multi-node tensor and pipeline parallelism support per replica and ability to share vLLM engines across processors.

Ray Core:

Alpha release of Ray Direct Transport (formerly GPU objects) - simply enable it by adding the tensor_transport parameter to the existing native Ray Core API. This keeps GPU data in GPU memory until a transfer is needed, avoiding expensive serialization and copies to and from the Ray object store. It uses efficient data transports such as collective communication libraries (GLOO or NCCL) or point-to-point RDMA (via NVIDIA’s NIXL) to transfer data directly between devices, including both CPUs and GPUs.

Ray Train:

Local mode support for multi-process training with torchrun, enhanced checkpoint management with new upload modes and validation functions

Ray Serve:

  • Async Inference alpha release - New Ray Serve APIs for supporting long-running asynchronous inference tasks, such as for video or large document processing. Includes capabilities for using different message brokers, adapters like celery and DLQ.
  • Support for replica ranks - Replica level ranks are added for supporting large-model inference use-cases such as wide Data Parallel and Expert Parallel setups.
  • FastAPI factory pattern support - Enables using FastAPI plugins that are not serializable via cloudpickle.
  • Throughput optimizations - Enable these using the RAY_SERVE_THROUGHPUT_OPTIMIZED environment variable.

RLLib:
Add StepFailedRecreateEnv exception for users with unsatisfiable environments

Ray Serve/Data LLM:

Improvements to multi node serving, loading models from remote storages, and sharing resources for efficiency (fractional gpus, sharing gpus on a data pipeline with shared stages)

Ray Libraries

Ray Data

🎉 New Features:

  • Expression and Filtering API: New expression API enables predicate-based filtering, UDF transformations with with_column, and column aliasing for more powerful data transformations (#56716, #56313, #56550, #55915, #55788, #56193, #56596)
  • Added support for projection pushdown into Parquet reads (#56500)
  • New download expression enables efficient loading of data from columns containing URIs with improved performance and error handling (#55824, #56462, #56294, #56852, #57146)
  • New explain() API provides insights into dataset execution plans (#55482)
  • Added streaming_train_test_split to avoid materialization for train/test splits (#56803)
  • Ray Data LLM:
    • Enabled multi-node tensor and pipeline parallelism for LLM processing (#56779)
    • Added chat_template_kwargs parameter for customizing chat templates (#56490)
    • Added support for OpenAI's nested image URL format in multimodal pipelines (#56584)
    • vLLM engines can now be shared across sequential processors for better resource utilization (#55179)
  • Enhanced Dataset.stats() output with input/output row counts per operator (#56040)
  • Added new metrics for task duration, inputs per task, and output blocks (#56958, #56379)
  • Time to first batch metric for better iteration performance monitoring (#55758)
  • Added type-specific aggregators for numerical, categorical, and vector columns (#56610)
  • Added fine-grained concurrency controls with max_task_concurrency and resource allocation options (#56370, #56381)

💫 Enhancements:

  • Join and shuffle improvements:
    • Default shuffle strategy changed from sort-based to hash-based for better performance (#55510)
    • Improved groupby performance with sort-shuffle pull-based approach (#57014)
    • Improved join operations with new abstractions (#57022, #56945, #55759)
  • Tensor type handling improvements:
    • Improved compatibility between PyArrow native types, extension types, and pandas Arrow dtypes (#57566, #57176, #57057)
    • Joins now supported with list/tensor non-key columns (#5648)
    • Enhanced support for variable-shaped tensor arrays with different dimensions (#57240, #56918, #56457)
    • Added serialization/deserialization for PyArrow Extension Arrays (#51972)
  • Removing Parquet metadata fetching in ParquetDatasource (#56105)
  • Resource requirements (num_cpus/gpus, memory) are now top-level parameters in most APIs for easier configuration (#56419)
  • zip() operator now supports combining multiple datasets, not just pairs (#56524)
  • Concurrency parameter now accepts tuples for more flexible configuration (#55867)
  • Write operations now use iterators instead of accumulating blocks in memory (#57108)
  • Reduced memory usage for OneHotEncoder (#56565)
  • Reduced memory usage for schema unification (#55880)
  • Eliminated unnecessary block copying and double execution of arrow conversions (#56569, #56793)
  • Improved Parquet encoding ratio estimation (#56268)
  • Enabled per-block limiting for Limit operator (#55239)
  • Optimized schema handling with deduplication and removed unnecessary unification (#55854, #55926)
  • Improved issue detection with event emission instead of just logs (#55717)
  • Better metric organization and external queue metric handling (#55495, #56604)
  • New backpressure policy based on downstream processing capacity (#55463)

🔨 Fixes:

  • Fixed streaming executor to properly drain output queues (#56941)
  • Improved resource management and reservation for operators (#56319, #57123)
  • Fixed retry logic for hash shuffle operations (#57575)
  • Fix split_blocks produce empty blocks (#57085)
  • Initialize datacontext after setting src_fn_name in actor worker (#57117)
  • Fix mongo datasource collStats invocation (#57027)
  • Fixing empty projection handling in ParquetDataSource (#56299)
  • Fix UnboundLocalError when read_parquet with columns and no partitioning (#55820)
  • Fix high memory usage with FileBasedDatasource & ParquetDatasource when using a large number of files (#55978)
  • [llm] Fixed LLM processor deployment with Ray Serve (#57061)
  • [llm] Fixed multimodal image extraction when system prompts are absent (#56435)
  • Ignore metadata for pandas block (#56402)
  • Remove metadata for hashing + truncate warning logs (#56093)

📖 Documentation:

  • Error in ray.data.groupby example in docs. (#57036)
  • Update on ray.data.Dataset.map() type hints. (#52455)
  • Small typo fix. (#56560)
  • Fix a typo. (#56587)
  • Fix documentation for new execution options resource limits assignment. (#56051)
  • Fix broken code snippets in user guides. (#55519)
  • Add Autoscaling Config for Context docs. (#55712)
  • Make object store tuning tips consistent with other pages. (#56705)
  • New example of how to perform batch inference with embedding models (#56027)

Ray Train

🎉 New Features:

  • Local mode support for Ray Train V2
    • Add local mode support to Ray Train v2 (num_workers=0). (#55487)
    • Add PyTorch local mode support for multi-process training with torchrun. (#56218)
  • Async checkpoint and validation for Ray Train
    • Add checkpoint_upload_mode to ray.train.report. (#55637)
    • Add checkpoint_upload_function to ray.train.report. (#56208)
    • Add validate_function and validate_config to ray.train.report. (#56360)
    • Add ray.train.get_all_reported_checkpoints method. (#54555)

💫 Enhancements:

  • Ray Train V2 Migration
    • Implement BaseWorkerGroup for V1/V2 compatibility. (#57151)
  • Train Controller is always actor + fix tune integration to enable this. (#55556)
  • Refactor AcceleratorSetupCallback to use before_init_train_context. (#56509)
  • Move collective implementations to train_fn_utils. (#55689)
  • Ray Train Framework support enhancements
    • Add hf trainer support for dictionary of datasets. (#56484)
    • Add usage tag key for JaxTrainer. (#55887)
  • Add Torch process group shutdown timeout. (#56182)
  • Ray Train disables blocking get inside async warning. (#56757)
  • ThreadRunner captures exceptions from nested threads. (#55756)
  • Abort reconciliation thread catches ray.util.state.get_actor exception. (#56600)
  • Ray Data Integration
    • Minor rework of get_dataset_shard. (#55825)
    • Create a deepcopy of the data context on the split coordinator process. (#56211)
    • Enable debug logging; fix default actor_locality_enabled. (#56632)
  • Refactor call_with_retry into shared library and use it to retry checkpoint upload. (#56608)
  • Remove Placement Group on Train Run Abort. (#56011)

🔨 Fixes:

  • Fix LightGBM v2 callbacks for Tune only usage. (#57042)
  • Ignore tensorflow test for py312. (#56244)
  • Revising test_jax_trainer flaky test. (#56854)
  • Fix test_jax_trainer imports. (#55799)
  • Fix test_jax_trainer::test_minimal_multihost Flaky Test. (#56548)
  • Disable drop_last flag to fix division by zero in torch dataloader baselines. (#56395)
  • Preload a subset of modules for torch dataloader forkserver multiprocessing. (#56343)

📖 Documentation:

  • Add checkpoint_upload_mode to checkpoint docs. (#56860)
  • Add get_all_reported_checkpoints and ReportedCheckpoint to API docs. (#56174)
  • Fix typo for Instantiating in ray train doc. (#55826)

🏗 Architecture refactoring:

  • Release tests for ray train local mode. (#56862)
  • Migrate tune_rllib_connect_test & tune_cloud_long_running_cloud_storage to ray train v2. (#56844)
  • Add v2 multinode persistence release test. (#56856)
  • Attach a quick checkpoint when reporting metrics. (#56718)
  • Upgrade tune_torch_benchmark to v2. (#56804)
  • Move tune_with_frequent_pausing to Ray Train v2 and tune_tests folder...
Read more

Ray-2.49.2

19 Sep 18:10
479fa71

Choose a tag to compare

There is no difference between 2.49.2 and 2.49.1, though we needed a patch version for other out of band reasons. To fill the awkward blankness, here is a haiku about Ray:

Summit drawing near
Ray advances, step by step
Scaling without end

Ray-2.49.1

03 Sep 00:44
c057f1e

Choose a tag to compare

  • Ray Dashboard: Fix issue where GPU metrics are missing (#56006)
  • Ray Data: Fixed regression in handling very large schemas (#56058)