Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: Rename HydroflowPlus to Hydro #1617

Merged
merged 12 commits into from
Dec 20, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .github/workflows/template.yml
Original file line number Diff line number Diff line change
Expand Up @@ -96,7 +96,7 @@ jobs:
fi

test_hydroflow_plus:
name: Test hydroflow_plus
name: Test hydro_lang
if: ${{ needs.pre_job.outputs.should_skip != 'true' || github.event_name != 'pull_request' }}
timeout-minutes: 10
needs: pre_job
Expand All @@ -117,7 +117,7 @@ jobs:
uses: cargo-generate/cargo-generate-action@v0.20.0
with:
name: generated
template: template/hydroflow_plus
template: template/hydro
arguments: "-d hydroflow_git=${{ github.event.pull_request.head.repo.clone_url }} -d hydroflow_branch=${{ github.event.pull_request.head.ref }}"
- name: Move generated project
run: |
Expand Down
2 changes: 1 addition & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ Relative to the repository root:
* `hydroflow` is the main Hydroflow package, containing the Hydroflow runtime. It re-exports the
surface syntax macros in `hydroflow_macro` and `hydroflow_lang`. The runtime is the "scheduled
layer" while the surface syntax compiler is the "compiled layer".
* `hydroflow_plus` and related packages contain Hydroflow+, which is a functional syntax built on
* `hydro_lang` and related (hydro_*) packages contain Hydro, which is a functional syntax built on
top of `hydroflow`.
* `hydroflow_datalog` provides a datalog compiler, based on top of the Hydroflow surface syntax.
* `docs` is the [Hydro.run](https://hydro.run/) website. `website_playground` contains the
Expand Down
166 changes: 83 additions & 83 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

10 changes: 5 additions & 5 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -13,11 +13,11 @@ members = [
"hydroflow_datalog_core",
"hydroflow_lang",
"hydroflow_macro",
"hydroflow_plus",
"hydroflow_plus_std",
"hydroflow_plus_test",
"hydroflow_plus_test_local",
"hydroflow_plus_test_local_macro",
"hydro_lang",
"hydro_std",
"hydro_test",
"hydro_test_local",
"hydro_test_local_macro",
"lattices",
"lattices_macro",
"multiplatform_test",
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ Hydro provides a high-level language for the majority of developers called [Hydr

## Development Setup

See the [quickstart section of the Hydroflow+ book](https://hydro.run/docs/hydroflow_plus/quickstart/) for instructions on installing Rust and getting started with the Hydroflow+ template.
See the [quickstart section of the Hydroflow+ book](https://hydro.run/docs/hydro/quickstart/) for instructions on installing Rust and getting started with the Hydroflow+ template.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs name change edits. I have a separate PR #1621 with suggested wording, but you'll still need to change the names of directories/files in the docs and links to those in the version in my PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. I will merge #1621 last.

Thanks for putting that CR together!


# A New Approach to Distributed Programming
There have been many frameworks and platforms for distributed programming over the years, with significant tradeoffs. These include:
Expand Down
2 changes: 1 addition & 1 deletion RELEASING.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ To (dry) run the command locally to spot-check for errors and warnings:
cargo smart-release --update-crates-index \
--no-changelog-preview --allow-fully-generated-changelogs \
--bump-dependencies auto --bump minor \ # or `patch`, `major`, `keep`, `auto`
hydroflow hydroflow_lang hydroflow_macro hydroflow_plus \
hydroflow hydroflow_lang hydroflow_macro hydro_lang \
hydroflow_datalog hydroflow_datalog_core \
hydro_deploy hydro_cli hydroflow_cli_integration \
hydroflow_plus_cli_integration \
Expand Down
81 changes: 81 additions & 0 deletions docs/docs/hydro/consistency.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
---
sidebar_position: 3
---

# Consistency and Safety
A key feature of Hydro is its integration with the Rust type system to highlight possible sources of inconsistent distributed behavior due to sources of non-determinism such as batching, timeouts, and message reordering. In this section, we'll walk through the consistency guarantees in Hydro and how to use the **`unsafe`** keyword as an escape hatch when introducing sources of non-determinism.

:::info

Our consistency and safety model is based on the POPL'25 paper [Flo: A Semantic Foundation for Progressive Stream Processing](https://arxiv.org/abs/2411.08274), which covers the formal details and proofs underlying this system.

:::

## Eventual Determinism
Hydro provides strong guarantees on **determinism**, the property that when provided the same inputs, the outputs of the program are always the same. Even when the inputs and outputs are streaming, we can use this property by looking at the **aggregate collection** (i.e. the result of collecting the elements of the stream into a finite collection). This makes it easy to build composable blocks of code without having to worry about runtime behavior such as batching or network delays.

Because Hydro programs can involve network delay, we guarantee **eventual determinism**: given a set of streaming inputs which have arrived, the outputs of the program (which continuously change as inputs arrive) will **eventually** have the same _aggregate_ value.

Again, by focusing on the _aggregate_ value rather than individual outputs, Hydro programs can involve concepts such as retractions (for incremental computation) while still guaranteeing determinism because the _resolved_ output (after processing retractions) will eventually be the same.

:::note

Much existing literature in distributed systems focuses on consistency levels such as "eventual consistency" which typically correspond to guarantees when reading the state of a _replicated_ object (or set of objects) at a _specific point_ in time. Hydro does not use such a consistency model internally, instead focusing on the values local to each distributed location _over time_. Concepts such as replication, however, can be layered on top of this model.

:::

## Unsafe Operations in Hydro
All **safe** APIs in Hydro (the ones you can call regularly in Rust), guarantee determinism. But oftentimes it is necessary to do something non-deterministic, like generate events at a fixed time interval or split an input into arbitrarily sized batches.

Hydro offers APIs for such concepts behind an **`unsafe`** guard. This keyword is typically used to mark Rust functions that may not be memory-safe, but we reuse this in Hydro to mark non-deterministic APIs.

To call such an API, the Rust compiler will ask you to wrap the call in an `unsafe` block. It is typically good practice to also include a `// SAFETY: ...` comment to explain why the non-determinism is there.

```rust,no_run
# use hydro_lang::*;
# let flow = FlowBuilder::new();
# let stream_inputs = flow.process::<()>().source_iter(q!([123]));
use std::time::Duration;

unsafe {
// SAFETY: intentional non-determinism
stream_inputs
.sample_every(q!(Duration::from_secs(1)))
}.for_each(q!(|v| println!("Sample: {:?}", v)))
```

When writing a function with Hydro that involves `unsafe` code, it is important to be extra careful about whether the non-determinism is exposed externally. In some applications, a utility function may involve local non-determinism (such as sending retries), but not expose it outside the function (via deduplication).

But other utilities may expose the non-determinism, in which case they should be marked `unsafe` as well. If the function is public, Rust will require you to put a `# Safety` section in its documentation explain the non-determinism.

```rust
# use hydro_lang::*;
use std::fmt::Debug;
use std::time::Duration;

/// ...
///
/// # Safety
/// This function will non-deterministically print elements
/// from the stream according to a timer.
unsafe fn print_samples<T: Debug, L>(
stream: Stream<T, Process<L>, Unbounded>
) {
unsafe {
// SAFETY: documented non-determinism
stream
.sample_every(q!(Duration::from_secs(1)))
}.for_each(q!(|v| println!("Sample: {:?}", v)))
}
```

## User-Defined Functions
Another source of potential non-determinism is user-defined functions, such as those provided to `map` or `filter`. Hydro allows for arbitrary Rust functions to be called inside these closures, so it is possible to introduce non-determinism which will not be checked by the compiler.

In general, avoid using APIs like random number generators inside transformation functions unless that non-determinism is explicitly documented somewhere.

:::info

To help avoid such bugs, we are working on ways to use formal verification tools (such as [Kani](https://model-checking.github.io/kani/)) to check arbitrary Rust code for properties such as determinism and more. But this remains active research for now and is not yet available.

:::
13 changes: 13 additions & 0 deletions docs/docs/hydro/dataflow-programming.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
---
sidebar_position: 1
---

# Dataflow Programming
Hydro uses a dataflow programming model, which will be familiar if you have used APIs like Rust iterators. Instead of using RPCs or async/await to describe distributed computation, Hydro instead uses **asynchronous streams**, which represent data arriving over time. Streams can represent a series of asynchronous events (e.g. inbound network requests) or a sequence of data items.

Programs in Hydro describe how to **transform** entire collections of data using operators such as `map` (transforming elements one by one), `fold` (aggregating elements into a single value), or `join` (combining elements from multiple streams on matching keys).

If you are familiar with Spark, Flink or Pandas, you will find Hydro syntax familiar. However, note well that the semantics for asynchronous streams in Hydro differ significantly from bulk analytics systems like those above. In particular, Hydro uses the type system to distinguish between bounded streams (originating from finite data) and unbounded streams (originated from asynchronous input). Moreover, Hydro is designed to handle asynchronous streams of small, independent events very efficiently.

<!-- TODO(shadaj): link to collections section -->
<!-- TODO(shadaj): show example of mermaid graph -->
16 changes: 16 additions & 0 deletions docs/docs/hydro/index.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
---
sidebar_position: 0
---

# Introduction
Hydro is a high-level distributed programming framework for Rust powered by the [Hydroflow runtime](../hydroflow/index.mdx). Unlike traditional architectures such as actors or RPCs, Hydro offers _choreographic_ APIs, where expressions and functions can describe computation that takes place across many locations. It also integrates with [Hydro Deploy](../deploy/index.md) to make it easy to deploy and run Hydro programs to the cloud.

Hydro uses a two-stage compilation approach. Hydro programs are standard Rust programs, which first run on the developer's laptop to generate a _deployment plan_. This plan is then compiled to individual binaries for each machine in the distributed system (enabling zero-overhead abstractions), and are then deployed to the cloud using the generated plan along with specifications of cloud resources.

Hydro has been used to write a variety of high-performance distributed systems, including implementations of classic distributed protocols such as two-phase commit and Paxos. Work is ongoing to develop a distributed systems standard library that will offer these protocols and more as reusable components.

:::caution

The docs for Hydro are still a work in progress. If you have any questions or run into bugs, please file an issue on the [Hydroflow GitHub repository](https://github.com/hydro-project/hydroflow).

:::
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,6 @@
"position": 2,
"link": {
"type": "doc",
"id": "hydroflow_plus/quickstart/index"
"id": "hydro/quickstart/index"
}
}
Loading
Loading