Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Getting Started tutorials use non-finance examples and fix up various documentation issues #338

Merged
merged 1 commit into from
Aug 23, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/wiki/api-references/csp.profiler-API.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

Users can simply run graphs under a `Profiler()` context to extract profiling information.
The code snippet below runs a graph in profile mode and extracts the profiling data by calling `results()`.
Note that profiling can also be done in real-time with live updating visuals: see the [how-to](Profile-CSP-Code#profiling-a-real-time-cspgraph) guide here.

```python
from csp import profiler
Expand Down
106 changes: 56 additions & 50 deletions docs/wiki/concepts/CSP-Graph.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,107 +8,113 @@

## Anatomy of a `csp.graph`

To reiterate, `csp.graph` methods are called in order to construct the graph and are only executed before the engine is run.
`csp.graph` methods don't do anything special, they are essentially regular python methods, but they can be defined to accept inputs and generate outputs similar to `csp.nodes`.
This is solely used for type checking.
`csp.graph` methods are called in order to construct the graph and are only executed before the engine is run. A graph is a collection of nodes and adapters which can either be executed as an argument to `csp.run` or composed into a larger graph.
The `csp.graph` decorator is only used for type validation and it is optional when creating a CSP program. A standard Python function without the decorator can also be passed as an argument to `csp.run` if type validation is not required.
`csp.graph` methods can be created to encapsulate components of a graph, and can be called from other `csp.graph` methods in order to help facilitate graph building.

Simple example:

```python
@csp.graph
def calc_symbol_pnl(symbol: str, trades: ts[Trade]) -> ts[float]:
# sub-graph code needed to compute pnl for given symbol and symbol's trades
# sub-graph can subscribe to market data for the symbol as needed
...
def calc_user_time(session_data: ts[UserSession]) -> ts[float]:
# sub-graph code needed to compute the time a user spends on a website
session_time = session_data.logout_time - session_data.login_time
time_online = csp.stats.sum(session_time)
return time_online


@csp.graph
def calc_portfolio_pnl(symbols: [str]) -> ts[float]:
symbol_pnl = []
for symbol in symbols:
symbol_trades = trade_adapter.subscribe(symbol)
symbol_pnl.append(calc_symbol_pnl(symbol, symbol_trades))
def calc_site_traffic(users: List[str]) -> ts[float]:
user_time = []
for user in users:
user_sessions = get_session(user)
user_time.append(calc_user_time(user_sessions))

return csp.sum(symbol_pnl)
return csp.sum(user_time)
```

In this simple example we have a `csp.graph` component `calc_symbol_pnl` which encapsulates computing pnl for a single symbol.
`calc_portfolio_pnl` is a graph that computes portfolio level pnl, it invokes the symbol-level pnl calc for every symbol, then sums up the results for the portfolio level pnl.
In this simple example we compute the total time all users spend on a website. We have a `csp.graph` subcomponent `calc_user_time` which computes the time a single user spends on the site throughout the run.
Then, in `calc_site_traffic` we compute the total user traffic by creating the user-level subgraph for each account and aggregating the results.

## Graph Propagation and Single-dispatch
## Graph Propagation and Single-Dispatch

The CSP graph propagation algorithm ensures that all nodes are executed *once* per engine cycle, and in the correct order.
Correct order means, that all input dependencies of a given node are guaranteed to have been evaluated before a given node is executed.
Take this graph for example:
The CSP graph propagation algorithm ensures that all nodes are executed *after* any of their dependencies on a given engine cycle.

> \[!IMPORTANT\]
> An *engine cycle* refers to a single execution of a CSP graph. There can be multiple engine cycles at the same *timestamp*; for example, a single data source may have two events both at `2020-01-01 00:00:00`. These events will be executed in two *cycles* that both occur at the same timestamp. Another case where multiple cycles can occur is [csp.feedback](Add-Cycles-in-Graphs).
For example, consider the graph below:

![359407953](https://github.com/Point72/csp/assets/3105306/d9416353-6755-4e37-8467-01da516499cf)

On a given cycle lets say the `bid` input ticks.
The CSP engine will ensure that **`mid`** is executed, followed by **`spread`** and only once **`spread`**'s output is updated will **`quote`** be called.
When **`quote`** executes it will have the latest values of the `mid` and `spread` calc for this cycle.
Individuals nodes are executed in *rank order* where the rank of a node is defined as the longest path between the node and an input adapter. The "mid" node is at rank 1, while "spread" is at rank 2 and "quote" is rank 3. Therefore, if "bid" ticks on a given engine cycle then "mid" will be executed before "spread" and "quote". Note that the order of node execution *within* a rank is undefined, and users should never rely on the execution order of nodes at the same rank.

## Graph Pruning

One should note a subtle optimization technique in CSP graphs.
Any part of a graph that is created at graph building time, but is NOT connected to any output nodes, will be pruned from the graph and will not exist during runtime.
Any node in a graph that is not connected to an output will be pruned from the graph and will not exist during runtime.
An output is defined as either an output adapter or a `csp.node` without any outputs of its own.
The idea here is that we can avoid doing work if it doesn't result in any output being generated.
In general its best practice for all `csp.nodes` to be \***side-effect free**, in other words they shouldn't mutate any state outside of the node.
Assuming all nodes are side-effect free, pruning the graph would not have any noticeable effects.
Pruning is an optimization which avoids executing nodes whose result will be discarded.
As a result, it's best practice for any `csp.node` to be \***side-effect free**; they shouldn't mutate any state outside of the node.

## Executing a Graph

Graphs can be executed using the `csp.run` function. Execution takes place in either real-time or historical mode (see [Execution Modes](Execution-Modes)) depending on the `realtime` argument. Graph execution begin at a `starttime` and ends at an `endtime`; the `endtime` argument can either be a `datetime` which is past the start *or* a `timedelta` which is the duration of the run. For example, if we wish to run our `calc_site_traffic` graph over one week of historical data we can execute it with:

```python
csp.run(calc_site_traffic, users=['alice', 'bob'], starttime=start, endtime=timedelta(weeks=1), realtime=False)
```

## Collecting Graph Outputs

If the `csp.graph` passed to `csp.run` has outputs, the full timeseries will be returned from `csp.run` like so:
There are multiple methods of getting in-process outputs after executing a `csp.graph`. If the graph returns one or more time-series, the full history of those values will be returned from `csp.run`.

**outputs example**
**return example**

```python
import csp
from datetime import datetime, timedelta

@csp.graph
def my_graph() -> ts[int]:
return csp.merge(csp.const(1), csp.const(2, timedelta(seconds=1)))
return csp.merge(csp.const(1), csp.const(2, delay=timedelta(seconds=1)))

if __name__ == '__main__':
res = csp.run(my_graph, starttime=datetime(2021,11,8))
print(res)
res = csp.run(my_graph, starttime=datetime(2021,11,8))
```

result:
res:

```raw
{0: [(datetime.datetime(2021, 11, 8, 0, 0), 1), (datetime.datetime(2021, 11, 8, 0, 0, 1), 2)]}
```

Note that the result is a list of `(datetime, value)` tuples.
Note that the result is a list of `(time, value)` tuples. You can have the result returned as two separate NumPy arrays, one for the times and one for the values, by setting `output_numpy=True` in the `run` call.

You can also use [csp.add_graph_output](Base-Adapters-API#cspadd_graph_output) to add outputs.
These do not need to be in the top-level graph called directly from `csp.run`.
```python
res = csp.run(my_graph, starttime=datetime(2021,11,8), output_numpy=True)
```

res:

This gives the same result:
```raw
{0: (array(['2021-11-08T00:00:00.000000000', '2021-11-08T00:00:01.000000000'], dtype='datetime64[ns]'), array([1, 2], dtype=int64))}
```

You can also use [csp.add_graph_output](Base-Adapters-API#cspadd_graph_output) to add outputs.
These do not need to be in the top-level graph called directly from `csp.run`. Users can also specify the amount of history they want stored in the output using the `tick_count` and `tick_history` arguments to `add_graph_output`. For example, if only the last value needs to be stored set `tick_count=1`.

**add_graph_output example**

```python
@csp.graph
def my_graph():
csp.add_graph_output('a', csp.merge(csp.const(1), csp.const(2, timedelta(seconds=1))))
```
same_thing = csp.merge(csp.const(1), csp.const(2, delay=timedelta(seconds=1)))
csp.add_graph_output('my_name', same_thing)

In addition to python outputs like above, you can set the optional `csp.run` argument `output_numpy` to `True` to get outputs as numpy arrays:

**numpy outputs**

```python
result = csp.run(my_graph, starttime=datetime(2021,11,8), output_numpy=True)
res = csp.run(my_graph, starttime=datetime(2021,11,8))
```

result:
res:

```raw
{0: (array(['2021-11-08T00:00:00.000000000', '2021-11-08T00:00:01.000000000'], dtype='datetime64[ns]'), array([1, 2], dtype=int64))}
{'my_name': [(datetime.datetime(2021, 11, 8, 0, 0), 1), (datetime.datetime(2021, 11, 8, 0, 0, 1), 2)]}
```

Note that the result there is a tuple per output, containing two numpy arrays, one with the datetimes and one with the values.
9 changes: 6 additions & 3 deletions docs/wiki/concepts/CSP-Node.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
- [Table of Contents](#table-of-contents)
- [Anatomy of a `csp.node`](#anatomy-of-a-cspnode)
- [Basket inputs](#basket-inputs)
- [**Node Outputs**](#node-outputs)
- [Node Outputs](#node-outputs)
- [Basket Outputs](#basket-outputs)
- [Generic Types](#generic-types)

Expand All @@ -21,7 +21,7 @@ They may (or may not) generate an output as a result of an input tick.
```python
from datetime import timedelta

@csp.node # 1
@csp.node(name='my_node') # 1
def demo_node(n: int, xs: ts[float], ys: ts[float]) -> ts[float]: # 2
with csp.alarms(): # 3
# Define an alarm time-series of type bool # 4
Expand Down Expand Up @@ -52,7 +52,7 @@ def demo_node(n: int, xs: ts[float], ys: ts[float]) -> ts[float]: # 2

Lets review line by line

1\) Every CSP node must start with the **`@csp.node`** decorator
1\) Every CSP node must start with the **`@csp.node`** decorator. The name of the node will be the name of the function, unless a `name` argument is provided. The name is used when visualizing a graph with `csp.show_graph` or profiling with CSP's builtin [`profiler`](#Profile-csp-code).

2\) CSP nodes are fully typed and type-checking is strictly enforced.
All arguments must be typed, as well as all outputs.
Expand Down Expand Up @@ -269,3 +269,6 @@ This allows us to pass in a `ts[int]` for example, and get a `ts[int]` as an out

`const` takes value as an *instance* of type `T`, and returns a timeseries of type `T`.
So we can call `const(5)` and get a `ts[int]` output, or `const('hello!')` and get a `ts[str]` output, etc...

If a value is provided rather than an explicit type argument (for example, to `const`) then CSP resolves the type using internal logic. In some cases, it may be easier to override the automatic type inference.
Users can force a type variable to be a specific value with the `.using` function. For example, `csp.const(1)` will be resolved to a `ts[int]`; if you want to instead force the type to be `float`, do `csp.const.using(T=float)(1)`.
Loading