Point72 · AdamGlustein · Aug 23, 2024 · Jul 16, 2024
@@ -2,6 +2,7 @@
 
 Users can simply run graphs under a `Profiler()` context to extract profiling information.
 The code snippet below runs a graph in profile mode and extracts the profiling data by calling `results()`.
+Note that profiling can also be done in real-time with live updating visuals: see the [how-to](Profile-CSP-Code#profiling-a-real-time-cspgraph) guide here.
 
 ```python
 from csp import profiler

@@ -8,107 +8,113 @@
 
 ## Anatomy of a `csp.graph`
 
-To reiterate, `csp.graph` methods are called in order to construct the graph and are only executed before the engine is run.
-`csp.graph` methods don't do anything special, they are essentially regular python methods, but they can be defined to accept inputs and generate outputs similar to `csp.nodes`.
-This is solely used for type checking.
+`csp.graph` methods are called in order to construct the graph and are only executed before the engine is run. A graph is a collection of nodes and adapters which can either be executed as an argument to `csp.run` or composed into a larger graph.
+The `csp.graph` decorator is only used for type validation and it is optional when creating a CSP program. A standard Python function without the decorator can also be passed as an argument to `csp.run` if type validation is not required.
 `csp.graph` methods can be created to encapsulate components of a graph, and can be called from other `csp.graph` methods in order to help facilitate graph building.
 
 Simple example:
 
 ```python
 @csp.graph
-def calc_symbol_pnl(symbol: str, trades: ts[Trade]) -> ts[float]:
- # sub-graph code needed to compute pnl for given symbol and symbol's trades
- # sub-graph can subscribe to market data for the symbol as needed
- ...
+def calc_user_time(session_data: ts[UserSession]) -> ts[float]:
+ # sub-graph code needed to compute the time a user spends on a website
+ session_time = session_data.logout_time - session_data.login_time
+ time_online = csp.stats.sum(session_time)
+ return time_online
 
 
 @csp.graph
-def calc_portfolio_pnl(symbols: [str]) -> ts[float]:
- symbol_pnl = []
- for symbol in symbols:
- symbol_trades = trade_adapter.subscribe(symbol)
- symbol_pnl.append(calc_symbol_pnl(symbol, symbol_trades))
+def calc_site_traffic(users: List[str]) -> ts[float]:
+ user_time = []
+ for user in users:
+ user_sessions = get_session(user)
+ user_time.append(calc_user_time(user_sessions))
 
- return csp.sum(symbol_pnl)
+ return csp.sum(user_time)
 ```
 
-In this simple example we have a `csp.graph` component `calc_symbol_pnl` which encapsulates computing pnl for a single symbol.
-`calc_portfolio_pnl` is a graph that computes portfolio level pnl, it invokes the symbol-level pnl calc for every symbol, then sums up the results for the portfolio level pnl.
+In this simple example we compute the total time all users spend on a website. We have a `csp.graph` subcomponent `calc_user_time` which computes the time a single user spends on the site throughout the run.
+Then, in `calc_site_traffic` we compute the total user traffic by creating the user-level subgraph for each account and aggregating the results.
 
-## Graph Propagation and Single-dispatch
+## Graph Propagation and Single-Dispatch
 
-The CSP graph propagation algorithm ensures that all nodes are executed *once* per engine cycle, and in the correct order.
-Correct order means, that all input dependencies of a given node are guaranteed to have been evaluated before a given node is executed.
-Take this graph for example:
+The CSP graph propagation algorithm ensures that all nodes are executed *after* any of their dependencies on a given engine cycle.
+
+> \[!IMPORTANT\]
+> An *engine cycle* refers to a single execution of a CSP graph. There can be multiple engine cycles at the same *timestamp*; for example, a single data source may have two events both at `2020-01-01 00:00:00`. These events will be executed in two *cycles* that both occur at the same timestamp. Another case where multiple cycles can occur is [csp.feedback](Add-Cycles-in-Graphs).
+
+For example, consider the graph below:
 
 ![359407953](https://github.com/Point72/csp/assets/3105306/d9416353-6755-4e37-8467-01da516499cf)
 
-On a given cycle lets say the `bid` input ticks.
-The CSP engine will ensure that **`mid`** is executed, followed by **`spread`** and only once **`spread`**'s output is updated will **`quote`** be called.
-When **`quote`** executes it will have the latest values of the `mid` and `spread` calc for this cycle.
+Individuals nodes are executed in *rank order* where the rank of a node is defined as the longest path between the node and an input adapter. The "mid" node is at rank 1, while "spread" is at rank 2 and "quote" is rank 3. Therefore, if "bid" ticks on a given engine cycle then "mid" will be executed before "spread" and "quote". Note that the order of node execution *within* a rank is undefined, and users should never rely on the execution order of nodes at the same rank.
 
 ## Graph Pruning
 
-One should note a subtle optimization technique in CSP graphs.
-Any part of a graph that is created at graph building time, but is NOT connected to any output nodes, will be pruned from the graph and will not exist during runtime.
+Any node in a graph that is not connected to an output will be pruned from the graph and will not exist during runtime.
 An output is defined as either an output adapter or a `csp.node` without any outputs of its own.
-The idea here is that we can avoid doing work if it doesn't result in any output being generated.
-In general its best practice for all `csp.nodes` to be \***side-effect free**, in other words they shouldn't mutate any state outside of the node.
-Assuming all nodes are side-effect free, pruning the graph would not have any noticeable effects.
+Pruning is an optimization which avoids executing nodes whose result will be discarded.
+As a result, it's best practice for any `csp.node` to be \***side-effect free**; they shouldn't mutate any state outside of the node.
+
+## Executing a Graph
+
+Graphs can be executed using the `csp.run` function. Execution takes place in either real-time or historical mode (see [Execution Modes](Execution-Modes)) depending on the `realtime` argument. Graph execution begin at a `starttime` and ends at an `endtime`; the `endtime` argument can either be a `datetime` which is past the start *or* a `timedelta` which is the duration of the run. For example, if we wish to run our `calc_site_traffic` graph over one week of historical data we can execute it with:
+
+```python
+csp.run(calc_site_traffic, users=['alice', 'bob'], starttime=start, endtime=timedelta(weeks=1), realtime=False)
+```
 
 ## Collecting Graph Outputs
 
-If the `csp.graph` passed to `csp.run` has outputs, the full timeseries will be returned from `csp.run` like so:
+There are multiple methods of getting in-process outputs after executing a `csp.graph`. If the graph returns one or more time-series, the full history of those values will be returned from `csp.run`.
 
-**outputs example**
+**return example**
 
 ```python
 import csp
 from datetime import datetime, timedelta
 
 @csp.graph
 def my_graph() -> ts[int]:
- return csp.merge(csp.const(1), csp.const(2, timedelta(seconds=1)))
+ return csp.merge(csp.const(1), csp.const(2, delay=timedelta(seconds=1)))
 
-if __name__ == '__main__':
- res = csp.run(my_graph, starttime=datetime(2021,11,8))
- print(res)
+res = csp.run(my_graph, starttime=datetime(2021,11,8))
 ```
 
-result:
+res:
 
 ```raw
 {0: [(datetime.datetime(2021, 11, 8, 0, 0), 1), (datetime.datetime(2021, 11, 8, 0, 0, 1), 2)]}
 ```
 
-Note that the result is a list of `(datetime, value)` tuples.
+Note that the result is a list of `(time, value)` tuples. You can have the result returned as two separate NumPy arrays, one for the times and one for the values, by setting `output_numpy=True` in the `run` call.
 
-You can also use [csp.add_graph_output](Base-Adapters-API#cspadd_graph_output) to add outputs.
-These do not need to be in the top-level graph called directly from `csp.run`.
+```python
+res = csp.run(my_graph, starttime=datetime(2021,11,8), output_numpy=True)
+```
+
+res:
 
-This gives the same result:
+```raw
+{0: (array(['2021-11-08T00:00:00.000000000', '2021-11-08T00:00:01.000000000'], dtype='datetime64[ns]'), array([1, 2], dtype=int64))}
+```
+
+You can also use [csp.add_graph_output](Base-Adapters-API#cspadd_graph_output) to add outputs.
+These do not need to be in the top-level graph called directly from `csp.run`. Users can also specify the amount of history they want stored in the output using the `tick_count` and `tick_history` arguments to `add_graph_output`. For example, if only the last value needs to be stored set `tick_count=1`.
 
 **add_graph_output example**
 
 ```python
 @csp.graph
 def my_graph():
- csp.add_graph_output('a', csp.merge(csp.const(1), csp.const(2, timedelta(seconds=1))))
-```
+ same_thing = csp.merge(csp.const(1), csp.const(2, delay=timedelta(seconds=1)))
+ csp.add_graph_output('my_name', same_thing)
 
-In addition to python outputs like above, you can set the optional `csp.run` argument `output_numpy` to `True` to get outputs as numpy arrays:
-
-**numpy outputs**
-
-```python
-result = csp.run(my_graph, starttime=datetime(2021,11,8), output_numpy=True)
+res = csp.run(my_graph, starttime=datetime(2021,11,8))
 ```
 
-result:
+res:
 
 ```raw
-{0: (array(['2021-11-08T00:00:00.000000000', '2021-11-08T00:00:01.000000000'], dtype='datetime64[ns]'), array([1, 2], dtype=int64))}
+{'my_name': [(datetime.datetime(2021, 11, 8, 0, 0), 1), (datetime.datetime(2021, 11, 8, 0, 0, 1), 2)]}
 ```
-
-Note that the result there is a tuple per output, containing two numpy arrays, one with the datetimes and one with the values.
@@ -3,7 +3,7 @@
 - [Table of Contents](#table-of-contents)
 - [Anatomy of a `csp.node`](#anatomy-of-a-cspnode)
 - [Basket inputs](#basket-inputs)
-- [**Node Outputs**](#node-outputs)
+- [Node Outputs](#node-outputs)
 - [Basket Outputs](#basket-outputs)
 - [Generic Types](#generic-types)
 
@@ -21,7 +21,7 @@ They may (or may not) generate an output as a result of an input tick.
 ```python
 from datetime import timedelta
 
-@csp.node  # 1
+@csp.node(name='my_node') # 1
 def demo_node(n: int, xs: ts[float], ys: ts[float]) -> ts[float]: # 2
  with csp.alarms(): # 3
  # Define an alarm time-series of type bool # 4
@@ -52,7 +52,7 @@ def demo_node(n: int, xs: ts[float], ys: ts[float]) -> ts[float]: # 2
 
 Lets review line by line
 
-1\) Every CSP node must start with the **`@csp.node`** decorator
+1\) Every CSP node must start with the **`@csp.node`** decorator. The name of the node will be the name of the function, unless a `name` argument is provided. The name is used when visualizing a graph with `csp.show_graph` or profiling with CSP's builtin [`profiler`](#Profile-csp-code).
 
 2\) CSP nodes are fully typed and type-checking is strictly enforced.
 All arguments must be typed, as well as all outputs.
@@ -269,3 +269,6 @@ This allows us to pass in a `ts[int]` for example, and get a `ts[int]` as an out
 
 `const` takes value as an *instance* of type `T`, and returns a timeseries of type `T`.
 So we can call `const(5)` and get a `ts[int]` output, or `const('hello!')` and get a `ts[str]` output, etc...
+
+If a value is provided rather than an explicit type argument (for example, to `const`) then CSP resolves the type using internal logic. In some cases, it may be easier to override the automatic type inference.
+Users can force a type variable to be a specific value with the `.using` function. For example, `csp.const(1)` will be resolved to a `ts[int]`; if you want to instead force the type to be `float`, do `csp.const.using(T=float)(1)`.