Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

upstream changes #6

Merged
merged 267 commits into from
Feb 14, 2023
Merged
Changes from 1 commit
Commits
Show all changes
267 commits
Select commit Hold shift + click to select a range
3343c76
Add informative progress bar names to map_batches (#31526)
peytondmurray Jan 27, 2023
7b2299b
Enable Log Rotation on Serve (#31844)
andreapiso Jan 27, 2023
ed72ca8
[core][state] Handle driver tasks (#31832)
rickyyx Jan 27, 2023
3f1a880
[serve] Add exponential backoff when retrying replicas (#31436)
zcin Jan 27, 2023
76d7467
[RLlib] Fixed the autorom dependency issue (#31933)
kouroshHakha Jan 27, 2023
15af485
Polish the Dashboard new IA part 2 (#31946)
alanwguo Jan 27, 2023
eab29ca
[Tune] Clarify which `RunConfig` is used when there are multiple plac…
justinvyu Jan 27, 2023
1b20ae9
[docs] Fix linkcheck error and map batches docstring test (#31996)
krfricke Jan 27, 2023
02ca4c9
[Datasets] [Autoscaling Actor Pool - 1/2] Refactor `MapOperator`, exe…
clarkzinzow Jan 27, 2023
ffbd87a
[data] [streaming] [12/n]--- Improve output backpressure reporting an…
ericl Jan 27, 2023
25a7df6
[tune] Fix tune_cloud_* tests fow new Trial constructor arguments (#3…
krfricke Jan 27, 2023
e64b44b
[core] remove legacy memory monitor from task submission codepath (#3…
clarng Jan 27, 2023
2a7dd31
[docs] Revamp Ray core fault tolerance guide (#27573)
stephanie-wang Jan 27, 2023
dd36360
[Serve] [release test] Add max_retries and max_restarts (#32011)
shrekris-anyscale Jan 27, 2023
5d1f2e4
[core][state] Adjust worker side reporting with batches && add debugs…
rickyyx Jan 28, 2023
675c6a0
[Dataset] Exclude breaking test case in `read_parquet_benchmark_singl…
scottjlee Jan 28, 2023
00416d2
[Data] Add tests for remainder of map_batches operations with new opt…
amogkam Jan 28, 2023
b5899d4
[ci/release] Change exponential_backoff_retry to use warn instead of …
cadedaniel Jan 28, 2023
51c5eda
Revert "[core] Fix gcs healthch manager crash when node is removed by…
krfricke Jan 28, 2023
22177cb
[Datasets] [Autoscaling Actor Pool - 2/2] Add autoscaling support to …
clarkzinzow Jan 28, 2023
ef28b5a
[Dashboard] Add cluster utilization graph (#31896)
rkooo567 Jan 28, 2023
e44a7d0
[Datasets] Add logical operator for randomize_block_order() (#31977)
c21 Jan 28, 2023
b58bb93
[docs] collapse navbar (#31994)
maxpumperla Jan 28, 2023
09f45ad
[core] Add code owner to GCS module. (#32018)
fishbone Jan 28, 2023
8e188db
Refactor block_fn out of map-like logical operators (#32021)
c21 Jan 28, 2023
cc6d30a
[train][docs] fix doc search issues, examples gallery & filter (#31635)
maxpumperla Jan 28, 2023
f9fa0b2
[Dashboard] Timeline implemented by a new task backend (#31856)
rkooo567 Jan 28, 2023
20bfcdd
[RLlib] Separate PPO torch regression test, and make it longer (#31892)
ArturNiederfahrenhorst Jan 28, 2023
c889349
Revert "[core][state] Adjust worker side reporting with batches && ad…
krfricke Jan 28, 2023
80d13d1
[docs] simple web crawler example (#31900)
maxpumperla Jan 28, 2023
112a265
[Datasets] [Docs] Add `seealso` to map-related methods (#30579)
bveeramani Jan 29, 2023
1929bb1
[RLlib] Give more time to impala release tests (#31910)
ArturNiederfahrenhorst Jan 29, 2023
6708b31
[docs] remove archive link (#32030)
ericl Jan 29, 2023
cce092b
Fix whitespace in help message for ray cli (#31905)
lukehsiao Jan 30, 2023
3fc2aac
[RLlib] Auto-detect old gym/gymnasium APIs and wrap accordingly. Conf…
sven1977 Jan 30, 2023
d390df8
[RLlib] Reparameterize the construction of TrainerRunner and RLTraine…
kouroshHakha Jan 30, 2023
d26b55b
[RLlib; release tests] Unify RLlib's team tag to be `rllib` (not `rl`…
ArturNiederfahrenhorst Jan 30, 2023
56b7911
[RLlib] Contribution of LeelaChessZero algorithm for playing chess in…
Capiru Jan 30, 2023
e331f6e
[2/n] Stabilize GCS/Autoscaler interface: Drain and Kill Node API (#3…
Jan 30, 2023
907e968
[Core] Remove dead actor checkpoint code (#32045)
jjyao Jan 30, 2023
664c844
Revert "Revert "[core] Fix gcs healthch manager crash when node is re…
fishbone Jan 30, 2023
cc5baaa
[tune] Do not default to reuse_actors=True when mixins are used (#31999)
krfricke Jan 30, 2023
43a0d8f
[metrics] Switch metric view to 5 min by default #32065
ericl Jan 30, 2023
96440cf
[data] [streaming] Fixes to autoscaling actor pool streaming op (#32023)
ericl Jan 30, 2023
baac0a6
[CI] Increase target time for `test_result_throughput_cluster` (#32062)
cadedaniel Jan 30, 2023
fe729aa
[core] Add generic `__ray_ready__` method to Actor classes (#31997)
krfricke Jan 30, 2023
b350f8d
[Serve] Mark `long_running_serve_failure` test as `stable` (#32063)
shrekris-anyscale Jan 30, 2023
fb96935
[core] Reduce the timeout for many nodes actor tests. (#32066)
fishbone Jan 30, 2023
fefd5e3
Fix unit test (#32084)
alanwguo Jan 30, 2023
34e2cd5
[Datasets] Remove the non-useful comment in `map_batches()` (#32020)
c21 Jan 31, 2023
755b56f
simplify metrics pgae (#32089)
ericl Jan 31, 2023
f325ced
[docs] Update top-navigation.js (#32075)
emmyscode Jan 31, 2023
dc974cb
[docs] deploying static ray cluster to K8S with external Redis for fa…
YQ-Wang Jan 31, 2023
b477f4b
fix frontend tests after #32089 (#32097)
alanwguo Jan 31, 2023
8a0e453
[Datasets] Add logical operator for random_shuffle() (#32080)
c21 Jan 31, 2023
06197a5
Add TaskContext for transform function (#32081)
jianoaix Jan 31, 2023
d91d2d6
Advanced Progress Bar (#31750)
alanwguo Jan 31, 2023
3a1709f
[spark] Automatically shut down ray on spark cluster if user does not…
WeichenXu123 Jan 31, 2023
1fdf24e
[Datasets] Add support for string tensor columns in `ArrowTensorArray…
scottjlee Jan 31, 2023
78b8c24
[RLlib] Upgrade tf eager code to no longer use `experimental_relax_sh…
ArturNiederfahrenhorst Jan 31, 2023
61c411f
[RLlib] Change Waterworld v3 to v4 and reinstate indep. MARL test cas…
ArturNiederfahrenhorst Jan 31, 2023
f2b6a6b
[RLlib; docs] Change links and references in code and docs to "Farama…
avnishn Jan 31, 2023
b7746b2
[Datasets] Fix to pass TaskContext in generate_random_shuffle_fn() (#…
c21 Jan 31, 2023
293fe2c
[release] minor fix to pytorch_pbt_failure test when using gpu. (#32070)
xwjiang2010 Jan 31, 2023
5cf61f0
[air] Add test for remote_storage with real hdfs backend. (#31940)
xwjiang2010 Jan 31, 2023
65d904f
[RLlib] [Ray 2.3 release] Marking RLLib release tests as unstable if …
cadedaniel Jan 31, 2023
44a1398
[Datasets] Add logical operator for repartition() (#32102)
c21 Jan 31, 2023
dae13bf
[Core] Expose Internal KV MultiGet operation (#32096)
architkulkarni Jan 31, 2023
e3001e9
Revert "[Datasets] Add support for string tensor columns in `ArrowTen…
scottjlee Jan 31, 2023
ae167f0
[AIR] Add option for per-epoch preprocessor (#31739)
stephanie-wang Jan 31, 2023
7573d49
[observability][autoscaler] Ensure pending nodes is reset to 0 after …
Jan 31, 2023
10d52f7
[tune/execution] Update staged resources in a fixed counter for faste…
krfricke Jan 31, 2023
d15ccfc
Revert "[RLlib] Reparameterize the construction of TrainerRunner and …
architkulkarni Jan 31, 2023
a0b8499
[Dashboard] Better gpu utilization (#32125)
rkooo567 Jan 31, 2023
f28428e
[core] Update the scalability envelop (#32131)
fishbone Jan 31, 2023
b4221c9
Fix docs lint for advanced progress bar (#32124)
alanwguo Jan 31, 2023
2137945
[Datasets] [Operator Fusion - 1/2] Add operator fusion to new executi…
clarkzinzow Jan 31, 2023
12ff13d
[RLlib] Fix waterworld example and test (#32117)
ArturNiederfahrenhorst Jan 31, 2023
3b1e21f
[RLlib] Error out if action_dict is empty in MultiAgentEnv. (#32129)
kouroshHakha Jan 31, 2023
1454e63
[CI] [Datasets] Run Datasets test suites on AIR changes (#32118)
clarkzinzow Feb 1, 2023
909c220
[runtime env] Clarify error message about where to install `smart_ope…
architkulkarni Feb 1, 2023
6ec71d7
[docs] Add exoshuffle use case and move crawler under orchestration (…
ericl Feb 1, 2023
be6b598
Fix dynamic block splitting for new backend. (#32139)
clarkzinzow Feb 1, 2023
5c11090
[Doc] Update the doc to mention dynamic resource update is not allowe…
rkooo567 Feb 1, 2023
13d0982
[ci] disable hdfs test for compat tests. (#32148)
xwjiang2010 Feb 1, 2023
dff4f0a
[core][oom] enable group by parent policy by default (#31976)
clarng Feb 1, 2023
df05cd9
Revert "[Docker] (Kubeflow integration) Add chmod --recursive 777 /ho…
kevin85421 Feb 1, 2023
47bb652
[Core] update grpc to 1.46.6 (#32054)
scv119 Feb 1, 2023
b2c5e63
[Core] Join Ray Jobs API `JobInfo` with GCS `JobTableData` (#31046)
architkulkarni Feb 1, 2023
d74e4c4
[core][state] Adjust worker side reporting with batches && add debugs…
rickyyx Feb 1, 2023
77ac9c2
[Dashboard] Support ray status output to the dashboard job page (#32040)
rkooo567 Feb 1, 2023
5dd1406
[Observability] Unpin open telemetry version for tracing feature (#32…
rkooo567 Feb 1, 2023
fb1e0b0
[RLlib] Fix revert of trainer runner (#32146)
kouroshHakha Feb 1, 2023
cf7bc27
[Core] Pick node from top k by default. (#31868)
scv119 Feb 1, 2023
d4b0a20
[Dashboard] Support actor detail (#32103)
rkooo567 Feb 1, 2023
75419d3
[Datasets] Add logical operator for sort() (#32133)
c21 Feb 1, 2023
b8221bb
Update index.md (#32053)
simran-2797 Feb 1, 2023
12d7d7d
[core] Increase the threshold for pubsub integration test (#32145)
fishbone Feb 1, 2023
174f157
[core] surface OOM error when actor is killed due to OOM (#32107)
clarng Feb 1, 2023
890e034
[Tune] Save and restore stateful callbacks as part of experiment chec…
justinvyu Feb 1, 2023
59f72cf
[Tune] Rename `overwrite_trainable` argument in Tuner restore to `tra…
justinvyu Feb 1, 2023
83e1a2a
Revert "[core] Increase the threshold for pubsub integration test (#3…
scv119 Feb 1, 2023
aad24bd
[core] clean up infeasible tasks submitted by the driver when the dri…
clarng Feb 1, 2023
eb660ce
Done (#32104)
rkooo567 Feb 1, 2023
10c46dc
[core][state][dashboard] Use main threads's task id or actor creation…
rickyyx Feb 1, 2023
666e2d9
[air][tune] replace node:<ip> custom resource with NodeAffinitySchedu…
matthewdeng Feb 1, 2023
24d0376
[Ray release] Moving Atari ROM dependencies to S3 (#32150)
cadedaniel Feb 1, 2023
ff16730
[Core] automatically pick max_pending_lease_requests based on number …
scv119 Feb 1, 2023
e9269ab
[Datasets] Fix filter logic and reuse output buffer (#32160)
c21 Feb 1, 2023
223a9a6
[Core] add ray-core as code-owner for most of the core code-path (#32…
scv119 Feb 1, 2023
4d526c5
Revert "[Core] add ray-core as code-owner for most of the core code-p…
scv119 Feb 1, 2023
f49b1b2
[core][state] Fix task failed time when job finishes (#32161)
rickyyx Feb 1, 2023
6e39b2e
[tune/execution][rfc] Cache ready futures in RayTrialExecutor (#32093)
krfricke Feb 1, 2023
6d39879
[Release] Fix bad import in AIR benchmark (#32175)
Yard1 Feb 1, 2023
1f53e60
[tune] Sync less often and only wait at end of experiment (#32155)
krfricke Feb 1, 2023
d6de1ce
[Tune] Add `Tuner.can_restore(path)` utility for checking if an exper…
justinvyu Feb 1, 2023
a954ab7
[ci][job] Move test_cli_integration to large test (#32171)
rickyyx Feb 2, 2023
74266a2
[Datasets] Add support for string tensor columns in `ArrowTensorArray…
scottjlee Feb 2, 2023
5091217
IA polish for demo (#32158)
alanwguo Feb 2, 2023
ed83715
[spark] Refine some text in Ray on Spark exception messages and warni…
WeichenXu123 Feb 2, 2023
ada5db7
Revert "Revert "[Core] add ray-core as code-owner for most of the cor…
scv119 Feb 2, 2023
29cd2fa
[RLlib] Fix typehint for `explore` argument. (#30734)
cool-RR Feb 2, 2023
a53907c
[RLlib] Add tags option to actor manager (#31803)
avnishn Feb 2, 2023
fdfef1f
[RLlib] Optimize the trainer runner test, add method for shutting dow…
avnishn Feb 2, 2023
b81f0cd
[RLlib] Exclude gpu tag from Examples test suite in RLlib (#32141)
kouroshHakha Feb 2, 2023
b31343a
[air] avoid inconsistency of create filesystem from uri for hdfs case…
yuduber Feb 3, 2023
6f97a83
Revert "Revert "[core] Increase the threshold for pubsub integration …
fishbone Feb 3, 2023
370a574
[core] release test for nested air (tune) oom (#31768)
clarng Feb 3, 2023
8b55e2d
[Docs] Fix typo in Huggingface example notebook (#32218)
davidxia Feb 4, 2023
37c0f76
[docs] fix typo in huggingface_text_classification.ipynb (#32207)
davidxia Feb 4, 2023
715e1b2
[tune/doc] improve `log_to_file` doc. (#32128)
xwjiang2010 Feb 4, 2023
5503bcd
[Dashboard] Turn on new IA by default (#32164)
rkooo567 Feb 4, 2023
276559e
[Doc] [runtime env] Address common question about importing packages …
architkulkarni Feb 6, 2023
2314775
[Serve] Remove logging requirement for `long_running_serve_failure` (…
shrekris-anyscale Feb 6, 2023
095960c
[Datasets] Deflake the test_dataset.py (#32200)
jianoaix Feb 6, 2023
e71e3a7
[Docs] Refactor Ray Workflows API documentation (#32248)
c21 Feb 7, 2023
f3ae74e
Allow overriding the UID of the default grafana dashboard exported by…
alanwguo Feb 7, 2023
8030e51
Remove metrics-based progress-bar endpoints (#31702)
alanwguo Feb 7, 2023
eec9791
clean up raylet client mocks (#32216)
clarng Feb 7, 2023
7432367
Refactor API documentation for job submission (#32252)
c21 Feb 7, 2023
c83111a
[air/benchmarks] Fix typo in tensorflow_benchmark.py script preventin…
krfricke Feb 7, 2023
027965b
[RLlib] Chaining Models in RLModules (#31469)
ArturNiederfahrenhorst Feb 7, 2023
2efee15
[Data] Revise "Getting Started" page (#31989)
bveeramani Feb 7, 2023
773f7bf
[Tune] Add `use_threads=False` in pyarrow syncing (#32256)
Yard1 Feb 7, 2023
ce5a21a
Fix overview page to work with the new DASHBOARD_UID env var (#32279)
alanwguo Feb 7, 2023
9995599
[build_base] [Docker] Add cuda 11.8 images (#32247)
ArturNiederfahrenhorst Feb 7, 2023
cf95514
[Tune] Add repr for ResultGrid class (#31941)
woshiyyya Feb 7, 2023
37580d7
[ci/release] Improve error message when kicking off tests from a comm…
krfricke Feb 7, 2023
00db336
[Core] Fix recursive cancelation crashes the worker when actor task i…
rkooo567 Feb 7, 2023
51efd2f
Increase timeout of stress_test_many_tasks to ensure perf metrics are…
cadedaniel Feb 8, 2023
3fa36d9
[Datasets] Fix book-documentation (#32293)
bveeramani Feb 8, 2023
5e1def0
[AIR] Fix `dtype` type hint in `DLPredictor` methods (#32198)
bveeramani Feb 8, 2023
3f43969
[Datasets] Promote `_create_strict_ragged_ndarray` to public API (#31…
bveeramani Feb 8, 2023
1f77e04
[RLlib] PPO torch RLTrainer (#31801)
kouroshHakha Feb 8, 2023
befad81
[Tune] Replace reference values in a config dict with placeholders (#…
Feb 8, 2023
aa504ae
Make write an operator as part of the execution plan (#32015)
jianoaix Feb 8, 2023
cefd3c4
Reenable autoscaling use in xgboost bench (#32196)
jianoaix Feb 8, 2023
e84fcb1
[Tune] Remove Ray Client references from Tune and Train docs/examples…
justinvyu Feb 8, 2023
bae61d9
[release] Improve handle_result in case of empty fetched result. (#32…
xwjiang2010 Feb 8, 2023
585f8aa
[RLlib] Move minibatching into RLTrainer instead of TrainerRunner (#3…
kouroshHakha Feb 8, 2023
59c62e4
[RLlib] Support empty leafs with NestedDict (#32136)
ArturNiederfahrenhorst Feb 8, 2023
b85eb52
[RLlib] Forward fix for failing PPO Torch RLTrainer test (#32308)
kouroshHakha Feb 8, 2023
d256508
[Doc] Add tips of writing fault tolerant Ray applications (#32191)
jjyao Feb 8, 2023
56606ae
[Telemetry] track num tasks created (#32106)
scv119 Feb 8, 2023
cf1bc83
[core] Fix the GCS memory usage high issue
fishbone Feb 8, 2023
cb5129c
[telemetry] remove extra print #32322
scv119 Feb 8, 2023
468e606
[Ray release infra] Script to compare perf metrics between releases (…
cadedaniel Feb 8, 2023
53260af
[AIR] Add `TorchDetectionPredictor` (#32199)
bveeramani Feb 8, 2023
0466bd3
[RLlib] Make one hidden layer config possible for TorchMLP (#32310)
ArturNiederfahrenhorst Feb 8, 2023
f05eeb4
[data] [streaming] No preserve order by default (#32300)
ericl Feb 8, 2023
3bb73d3
[core] Fix comments and a corner case in #32302 (#32323)
fishbone Feb 8, 2023
22bc1e9
[Serve][Doc] Refactor the Ray Serve API doc (#32307)
sihanwang41 Feb 8, 2023
b73f3eb
[RLlib] Modifications to gpu resource logic in rl_trainer (#32149)
avnishn Feb 9, 2023
6cfb541
[Doc] add job overview diagram (#32050)
scottsun94 Feb 9, 2023
b011d56
[doc] Update running large scale ray cluster doc for 2.3. (#32336)
fishbone Feb 9, 2023
63d922b
[core] Improving failure message when ray processes fail to start on …
cadedaniel Feb 9, 2023
5c1c888
[release] update if xgboost test suite requires result or not. (#32340)
xwjiang2010 Feb 9, 2023
5f0f95a
[core] Update oom docs to reflect latest policy (#32219)
clarng Feb 9, 2023
67d1515
[Datasets] Fix sort nightly test to skip `stats()` if exception is ra…
c21 Feb 9, 2023
b2e7699
Use dataset random_shuffle() for shuffle nightly (#32343)
jianoaix Feb 9, 2023
d653f73
[autoscaler][observability] Better memory formatting (#32337)
Feb 9, 2023
90f8511
[core] Add opt-in flag for Windows and OSX clusters, update `ray star…
stephanie-wang Feb 9, 2023
0e56dff
[data] [streaming] Implement locality-aware actor task assignment (#3…
ericl Feb 9, 2023
f80badc
[RLlib] Remove leela chess from release tests (#32325)
avnishn Feb 9, 2023
69a14e7
[core][state] Task backend improve performance (#32251)
rickyyx Feb 9, 2023
8bf1d03
[docs]Fix wording of Many model training guidance (#32319)
Wendi-anyscale Feb 9, 2023
fc81af1
[core] Fix gRPC callback API destruction issues. (#32151)
fishbone Feb 9, 2023
741b7a0
[Doc] Move actor checkpointing to actor fault tolerance page (#32153)
jjyao Feb 9, 2023
188c411
[Core/Observability] Fix the timeline bugs (#32287)
rkooo567 Feb 9, 2023
2bbe8c1
[core][state] Task Backend - reduce lock contention on debug stats / …
rickyyx Feb 9, 2023
b4ad23a
[Data] Add rule for `ReorderRandomizeBlockOrder` (#32254)
amogkam Feb 10, 2023
4420120
[AIR] Automatically move `DatasetIterator` torch tensors to correct d…
amogkam Feb 10, 2023
492ff7e
[air/execution] Event manager part 2: Implementation (#31811)
krfricke Feb 10, 2023
c9cf2ef
[RLlib] Async trainer manager (#32282)
avnishn Feb 10, 2023
d807ce0
Revert "[core] Add opt-in flag for Windows and OSX clusters, update `…
scv119 Feb 10, 2023
73b52e0
[core][oom] Use retriable lifo policy for dask 3x nightly test (#32361)
clarng Feb 10, 2023
a1938c3
[Train] Fix `use_gpu` with `HuggingFacePredictor` (#32333)
Yard1 Feb 10, 2023
841a4fb
[RLlib] Clean up RLModule (#32328)
kouroshHakha Feb 10, 2023
60fa8fe
[RLlib] Cleanup RLTrainer (#32345)
kouroshHakha Feb 10, 2023
9cbf406
[Bug Fix][Object Store] race condition: Pull Manager will hang in cer…
Catch-Bull Feb 10, 2023
d9a17f2
[Tune] Improve logging, unify trial retry logic, improve trial restor…
xwjiang2010 Feb 10, 2023
35e106a
[Job API] Handle multiple drivers with same job submission id in GCS …
architkulkarni Feb 10, 2023
d8639ab
[Datasets] Not change `map_batches()` UDF name in `Dataset.__repr__` …
c21 Feb 10, 2023
b7e671d
[Metrics] Fix flaky test_task_metrics + fix slow report issue from un…
rkooo567 Feb 10, 2023
db9cfa6
[core][state] State API scale losing data (#32408)
rickyyx Feb 10, 2023
613f4b0
[Tune][Doc] Restructure API reference (#32311)
justinvyu Feb 10, 2023
faeb2cc
[AIR] Allow users to pass `Callable[[torch.Tensor], torch.Tensor]` to…
bveeramani Feb 10, 2023
299d8f0
Add triage label to enhancement and doc issues as well (#32352)
jjyao Feb 10, 2023
6879184
[docs] removing docs referring ray client. (#32209)
scv119 Feb 10, 2023
16a7683
[Doc] Document the top-k default scheduling strategy (#32331)
jjyao Feb 10, 2023
08a8c65
[Datasets] Update Ray Data documentation for lazy execution by defaul…
c21 Feb 10, 2023
bc2de90
[ci][core] Do not set flushing thread niceness for task backend #32439
rickyyx Feb 10, 2023
ed640b6
[Datasets] [Docs] Update docs to reflect lazy-by-default execution mo…
clarkzinzow Feb 10, 2023
dade595
Use retriable_lifo policy for shuffle 1tb nightly test (#32417)
jianoaix Feb 10, 2023
2874e47
[Docs] Fix broken Tune links to overview and intergration (#32442)
ArturNiederfahrenhorst Feb 10, 2023
37086a5
[Autoscaler] Make ~/.bashrc optional in autoscaler commands (#32393)
ckw017 Feb 10, 2023
704fd4a
[core] Force kill worker whose job has exited (#32217)
clarng Feb 10, 2023
9a04119
[Datasets] Make ray.data.from_* APIs lazy. (#32390)
clarkzinzow Feb 10, 2023
b3b0336
Fix doc test for dataset.py (#32458)
c21 Feb 11, 2023
80e982b
[RLlib] Shared encoder MARL unittest and example (#32460)
kouroshHakha Feb 11, 2023
4c52789
[RLlib] Derive SAC model from AlgorithmConfig["model"] instead of MOD…
ArturNiederfahrenhorst Feb 13, 2023
cacc982
[RLlib] Add sample timer to all algorithms' `training_step()` methods…
ArturNiederfahrenhorst Feb 13, 2023
2e9b834
[ActorInit] Fix Bug in Actor creation (#32277)
ijrsvt Feb 13, 2023
997e95e
Fix typo in README.md (#32466)
prrajput1199 Feb 13, 2023
4ffa7fd
[RLlib] Added test version of BC algorithm based on RLModules an RLTr…
kouroshHakha Feb 13, 2023
7e662dd
[tune] Move experiment state/checkpoint/resume management into a sepa…
krfricke Feb 13, 2023
6de3cbe
[Jobs] Improve error message in case of `404` (#31120)
architkulkarni Feb 13, 2023
80f2161
[Datasets] Track bundles object store utilization as soon as they're …
clarkzinzow Feb 13, 2023
e71c63f
[tune/train] clean up tune/train result output (#32234)
xwjiang2010 Feb 13, 2023
e56665e
[ci][core] Calculate actor creation time properly for stress_test_man…
rickyyx Feb 13, 2023
2cee078
[tune] Structure refactor: Raise on import of old modules (#32486)
krfricke Feb 13, 2023
91940e3
[Doc] Add data ingestion clarification for AIR converting existing py…
woshiyyya Feb 13, 2023
71dfd20
[Datasets] Always preserve order for the BulkExecutor. (#32437)
clarkzinzow Feb 14, 2023
421b527
[Tune] Fix docstring failures (#32484)
justinvyu Feb 14, 2023
bc01288
[data] Fix pandas import failures by moving it to a top-level data im…
ericl Feb 14, 2023
a447cbb
[RLlib] Allow MARLModule customization from algorithm config (#32473)
kouroshHakha Feb 14, 2023
efc432b
[tune] Fix resuming from cloud storage (+ test) (#32504)
krfricke Feb 14, 2023
99d00ad
[Doc] Restructure core API docs (#32236)
jjyao Feb 14, 2023
b89457a
Deflake test_dataset.py: split torch tests (#32487)
jianoaix Feb 14, 2023
f0d96c5
Clean up RAY_DATASET_FORCE_LOCAL_METADATA flag (#32483)
jianoaix Feb 14, 2023
3414797
Add write operator in new logical plan (#32440)
jianoaix Feb 14, 2023
66c0533
[Datasets] Add logical operator for aggregate (#32462)
c21 Feb 14, 2023
d092b12
[tune] Fix two tests after structure refactor deprecation (#32517)
krfricke Feb 14, 2023
d87d86f
[AIR][Train][Doc] Restructure API reference (#32360)
justinvyu Feb 14, 2023
19ca00b
Fix autosummary to show docstring of class members (#32520)
jjyao Feb 14, 2023
bf5e721
[core] Add opt-in flag for Windows and OSX clusters, update ray start…
stephanie-wang Feb 14, 2023
9dcb369
[Data] Update DatasetPipeline.to_tf API to match with Dataset.to_tf (…
amogkam Feb 14, 2023
b12c0d1
Revert "[data] Fix pandas import failures by moving it to a top-level…
cadedaniel Feb 14, 2023
e8f1cf6
[Tune] Update trainable `remote_checkpoint_dir` upon actor reuse (#32…
justinvyu Feb 14, 2023
b9f7e19
[docs] setting up grafana and prometheus (#31129)
alanwguo Feb 14, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
[docs] simple web crawler example (ray-project#31900)
  • Loading branch information
maxpumperla authored Jan 28, 2023
commit 80d13d16fdeb7c591962192ee200bf2be30d9ecc
5 changes: 5 additions & 0 deletions doc/source/_static/css/custom.css
Original file line number Diff line number Diff line change
@@ -316,6 +316,10 @@ img.horizontal-scroll {
float: right;
}

.card-body {
padding: 0.5rem !important;
}

/* Wrap code blocks instead of horizontal scrolling. */
pre {
white-space: pre-wrap;
@@ -325,6 +329,7 @@ pre {
.cell .cell_output {
max-height: 250px;
overflow-y: auto;
font-weight: bold;
}

/* Yellow doesn't render well on light background */
18 changes: 18 additions & 0 deletions doc/source/_static/js/custom.js
Original file line number Diff line number Diff line change
@@ -28,6 +28,24 @@ window.addEventListener("scroll", loadVisibleTermynals);
createTermynals();
loadVisibleTermynals();


document.addEventListener("DOMContentLoaded", function() {
let images = document.getElementsByClassName("fixed-height-img");
let maxHeight = 0;

for (let i = 0; i < images.length; i++) {
if (images[i].height > maxHeight) {
maxHeight = images[i].height;
}
}

for (let i = 0; i < images.length; i++) {
let margin = Math.floor((maxHeight - images[i].height) / 2);
images[i].style.cssText = "margin-top: " + margin + "px !important;" +
"margin-bottom: " + margin + "px !important;"
}
});

// Remember the scroll position when the page is unloaded.
window.onload = function() {
let sidebar = document.querySelector("#bd-docs-nav");
1 change: 1 addition & 0 deletions doc/source/_toc.yml
Original file line number Diff line number Diff line change
@@ -32,6 +32,7 @@ parts:
- file: ray-core/examples/batch_prediction
- file: ray-core/examples/batch_training
- file: ray-core/examples/automl_for_time_series
- file: ray-core/examples/web-crawler
- file: ray-core/api

- file: cluster/getting-started
4 changes: 2 additions & 2 deletions doc/source/custom_directives.py
Original file line number Diff line number Diff line change
@@ -313,10 +313,10 @@ def build_gallery(app):
---
:img-top: {item["image"]}

{item["description"]}

{gh_stars}

{item["description"]}

+++
.. link-button:: {item["website"]}
{ref}
9 changes: 1 addition & 8 deletions doc/source/ray-air/user-guides.rst
Original file line number Diff line number Diff line change
@@ -12,14 +12,13 @@ AIR User Guides
.. panels::
:container: text-center
:column: col-md-4 px-2 py-2
:img-top-cls: pt-5 w-75 d-block mx-auto
:img-top-cls: pt-5 w-75 d-block mx-auto fixed-height-img

---
:img-top: /ray-air/images/preprocessors.svg

.. https://docs.google.com/drawings/d/1ZIbsXv5vvwTVIEr2aooKxuYJ_VL7-8VMNlRinAiPaTI/edit

+++
.. link-button:: /ray-air/preprocessors
:type: ref
:text: Using Preprocessors
@@ -30,7 +29,6 @@ AIR User Guides

.. https://docs.google.com/drawings/d/15SXGHbKPWdrzx3aTAIFcO2uh_s6Q7jLU03UMuwKSzzM/edit

+++
.. link-button:: trainer
:type: ref
:text: Using Trainers
@@ -41,7 +39,6 @@ AIR User Guides

.. https://docs.google.com/drawings/d/10GZE_6s6ss8PSxLYyzcbj6yEalWO4N7MS7ao8KO7ne0/edit

+++
.. link-button:: air-ingest
:type: ref
:text: Configuring Training Datasets
@@ -52,7 +49,6 @@ AIR User Guides

.. https://docs.google.com/drawings/d/1yMd12iMkyo6DGrFoET1TIlKfFnXX9dfh2u3GSdTz6W4/edit

+++
.. link-button:: /ray-air/tuner
:type: ref
:text: Configuring Hyperparameter Tuning
@@ -63,7 +59,6 @@ AIR User Guides

.. https://docs.google.com/presentation/d/1jfkQk0tGqgkLgl10vp4-xjcbYG9EEtlZV_Vnve_NenQ/edit#slide=id.g131c21f5e88_0_549

+++
.. link-button:: predictors
:type: ref
:text: Using Predictors for Inference
@@ -74,7 +69,6 @@ AIR User Guides

.. https://docs.google.com/drawings/d/1-rg77bV-vEMURXZw5_mIOUFM3FObIIYbFOiYzFJW_68/edit

+++
.. link-button:: /ray-air/examples/serving_guide
:type: ref
:text: Deploying Predictors with Serve
@@ -85,7 +79,6 @@ AIR User Guides

.. https://docs.google.com/drawings/d/1ja1RfNCEFn50B9FHWSemUzwhtPAmVyoak1JqEJUmxs4/edit

+++
.. link-button:: air-deployment
:type: ref
:text: How to Deploy AIR
244 changes: 244 additions & 0 deletions doc/source/ray-core/examples/web-crawler.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,244 @@
{
"cells": [
{
"cell_type": "markdown",
"source": [
"# Speed up your web crawler by parallelizing it with Ray"
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "markdown",
"source": [
"In this example we'll quickly demonstrate how to build a simple web scraper in Python and\n",
"parallelize it with Ray Tasks with minimal code changes.\n",
"\n",
"To run this example locally on your machine, please first install `ray` and `beautifulsoup` with\n",
"\n",
"```\n",
"pip install \"beautifulsoup4==4.11.1\" \"ray>=2.2.0\"\n",
"```\n",
"\n",
"First, we'll define a function called `find_links` which takes a starting page (`start_url`) to crawl,\n",
"and we'll take the Ray documentation as example of such a starting point.\n",
"Our crawler simply extracts all available links from the starting URL that contain a given `base_url`\n",
"(e.g. in our example we only want to follow links on `http://docs.ray.io`, not any external links).\n",
"The `find_links` function is then called recursively with all the links we found this way, until a\n",
"certain depth is reached.\n",
"\n",
"To extract the links from HTML elements on a site, we define a little helper function called\n",
"`extract_links`, which takes care of handling relative URLs properly and sets a limit on the\n",
"number of links returned from a site (`max_results`) to control the runtime of the crawler more easily.\n",
"\n",
"Here's the full implementation:"
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "code",
"execution_count": 154,
"outputs": [],
"source": [
"import requests\n",
"from bs4 import BeautifulSoup\n",
"\n",
"def extract_links(elements, base_url, max_results=100):\n",
" links = []\n",
" for e in elements:\n",
" url = e[\"href\"]\n",
" if \"https://\" not in url:\n",
" url = base_url + url\n",
" if base_url in url:\n",
" links.append(url)\n",
" return set(links[:max_results])\n",
"\n",
"\n",
"def find_links(start_url, base_url, depth=2):\n",
" if depth == 0:\n",
" return set()\n",
"\n",
" page = requests.get(start_url)\n",
" soup = BeautifulSoup(page.content, \"html.parser\")\n",
" elements = soup.find_all(\"a\", href=True)\n",
" links = extract_links(elements, base_url)\n",
"\n",
" for url in links:\n",
" new_links = find_links(url, base_url, depth-1)\n",
" links = links.union(new_links)\n",
" return links"
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "markdown",
"source": [
"Let's define a starting and base URL and crawl the Ray docs to a `depth` of 2."
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "code",
"execution_count": 162,
"outputs": [],
"source": [
"base = \"https://docs.ray.io/en/latest/\"\n",
"docs = base + \"index.html\""
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "code",
"execution_count": 163,
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 19.3 s, sys: 340 ms, total: 19.7 s\n",
"Wall time: 25.8 s\n"
]
},
{
"data": {
"text/plain": "591"
},
"execution_count": 163,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%time len(find_links(docs, base))"
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "markdown",
"source": [
"As you can see, crawling the documentation root recursively like this returns a\n",
"total of `591` pages and the wall time comes in at around 25 seconds.\n",
"\n",
"Crawling pages can be parallelized in many ways.\n",
"Probably the simplest way is to simple start with multiple starting URLs and call\n",
"`find_links` in parallel for each of them.\n",
"We can do this with [Ray Tasks](https://docs.ray.io/en/latest/ray-core/tasks.html) in a straightforward way.\n",
"We simply use the `ray.remote` decorator to wrap the `find_links` function in a task called `find_links_task` like this:"
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "code",
"execution_count": 157,
"outputs": [],
"source": [
"import ray\n",
"\n",
"@ray.remote\n",
"def find_links_task(start_url, base_url, depth=2):\n",
" return find_links(start_url, base_url, depth)"
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "markdown",
"source": [
"To use this task to kick off a parallel call, the only thing you have to do is use\n",
"`find_links_tasks.remote(...)` instead of calling the underlying Python function directly.\n",
"\n",
"Here's how you run six crawlers in parallel, the first three (redundantly) crawl\n",
"`docs.ray.io` again, the other three crawl the main entry points of the Ray RLlib,\n",
"Tune, and Serve libraries, respectively:"
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "code",
"execution_count": 160,
"outputs": [],
"source": [
"links = [find_links_task.remote(f\"{base}{lib}/index.html\", base)\n",
" for lib in [\"\", \"\", \"\", \"rllib\", \"tune\", \"serve\"]]"
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "code",
"execution_count": 161,
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"591\n",
"591\n",
"105\n",
"204\n",
"105\n",
"CPU times: user 65.5 ms, sys: 47.8 ms, total: 113 ms\n",
"Wall time: 27.2 s\n"
]
}
],
"source": [
"%time for res in ray.get(links): print(len(res))"
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "markdown",
"source": [
"This parallel run crawls around four times the number of pages in roughly the same time as the initial, sequential run.\n",
"Note the use of `ray.get` in the timed run to retrieve the results from Ray (the `remote` call promise gets resolved with `get`).\n",
"\n",
"Of course, there are much smarter ways to create a crawler and efficiently parallelize it, and this example\n",
"gives you a starting point to work from."
],
"metadata": {
"collapsed": false
}
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.6"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
10 changes: 5 additions & 5 deletions doc/source/ray-overview/eco-gallery.yml
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
meta:
section-titles: true
container: container pb-12
column: col-md-12 px-2 py-2
img-top-cls: pt-10 w-50 d-block mx-auto
section-titles: false
container: container pb-4
column: col-md-4 px-1 py-1
img-top-cls: p-2 w-75 d-block mx-auto fixed-height-img

buttons:
classes: btn-outline-info btn-block
@@ -146,7 +146,7 @@ projects:
random forests, gradient boosting, k-means and DBSCAN, and is designed to
interoperate with the Python numerical and scientific libraries NumPy and SciPy.
website: https://docs.ray.io/en/master/joblib.html
repo: https://docs.ray.io/en/master/joblib.html
repo: https://github.com/scikit-learn/scikit-learn
image: ../images/scikit.png
- name: Seldon Alibi Integration
section_title: Seldon Alibi
Loading