Skip to content

Commit

Permalink
[Issue 5518][docs] Update docs mainly for Python API (#5536)
Browse files Browse the repository at this point in the history
* Documentation updates, mainly for Python API

* Corrections from @jennifer88huang

* Remove links to functions-api

* Update functions-develop.md
  • Loading branch information
candlerb authored and Jennifer88huang-zz committed Nov 5, 2019
1 parent c4ffada commit 1a63b08
Show file tree
Hide file tree
Showing 7 changed files with 224 additions and 854 deletions.
731 changes: 0 additions & 731 deletions site2/docs/functions-api.md

This file was deleted.

7 changes: 7 additions & 0 deletions site2/docs/functions-debug.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,18 @@ sidebar_label: How-to: Debug

You can use the following methods to debug Pulsar Functions:

* [Captured stderr](functions-debug.md#captured-stderr)
* [Use unit test](functions-debug.md#use-unit-test)
* [Debug with localrun mode](functions-debug.md#debug-with-localrun-mode)
* [Use log topic](functions-debug.md#use-log-topic)
* [Use Functions CLI](functions-debug.md#use-functions-cli)

## Captured stderr

Function startup information and captured stderr output is written to `logs/functions/<tenant>/<namespace>/<function>/<function>-<instance>.log`

This is useful for debugging why a function fails to start.

## Use unit test

A Pulsar Function is a function with inputs and outputs, you can test a Pulsar Function in a similar way as you test any function.
Expand Down
2 changes: 1 addition & 1 deletion site2/docs/functions-deploying.md
Original file line number Diff line number Diff line change
Expand Up @@ -178,7 +178,7 @@ If a Pulsar Function is running in [cluster mode](#cluster-mode), you can **trig

> Triggering a function is ultimately no different from invoking a function by producing a message on one of the function's input topics. The [`pulsar-admin functions trigger`](reference-pulsar-admin.md#trigger) command is essentially a convenient mechanism for sending messages to functions without needing to use the [`pulsar-client`](reference-cli-tools.md#pulsar-client) tool or a language-specific client library.
To show an example of function triggering, let's start with a simple [Python function](functions-api.md#functions-for-python) that returns a simple string based on the input:
To show an example of function triggering, let's start with a simple Python function that returns a simple string based on the input:

```python
# myfunc.py
Expand Down
199 changes: 199 additions & 0 deletions site2/docs/functions-develop.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,18 @@ def process(input):
```
For complete code, see [here](https://github.com/apache/pulsar/blob/master/pulsar-functions/python-examples/native_exclamation_function.py).

> Note
> You can write Pulsar Functions in python2 or python3. However, Pulsar only looks for `python` as the interpreter.
>
> If you're running Pulsar Functions on an Ubuntu system that only supports python3, you might fail to
> start the functions. In this case, you can create a symlink. Your system will fail if
> you subsequently install any other package that depends on Python 2.x. A solution is under development in
> [Issue 5518](https://github.com/apache/pulsar/issues/5518).
>
> ```bash
> sudo update-alternatives --install /usr/bin/python python /usr/bin/python3 10
> ```
<!--END_DOCUSAURUS_CODE_TABS-->
The following example uses Pulsar Functions SDK.
Expand Down Expand Up @@ -702,3 +714,190 @@ To access metrics created by Pulsar Functions, refer to [Monitoring](deploy-moni
Pulsar Functions use [Apache BookKeeper](https://bookkeeper.apache.org) as a state storage interface. Pulsar installation, including the local standalone installation, includes deployment of BookKeeper bookies.

Since Pulsar 2.1.0 release, Pulsar integrates with Apache BookKeeper [table service](https://docs.google.com/document/d/155xAwWv5IdOitHh1NVMEwCMGgB28M3FyMiQSxEpjE-Y/edit#heading=h.56rbh52koe3f) to store the `State` for functions. For example, a `WordCount` function can store its `counters` state into BookKeeper table service via Pulsar Functions State API.

States are key-value pairs, where the key is a string and the value is arbitrary binary data - counters are stored as 64-bit big-endian binary values. Keys are scoped to an individual Pulsar Function, and shared between instances of that function.

You can access states within Pulsar Functions using the `putState`, `getState`, `incrCounter`, `getCounter` and `deleteState` calls on the context object. You can also manage states using the [querystate](pulsar-admin.md#querystate) and [putstate](pulsar-admin.md#putstate) options to `pulsar-admin functions`.

### API

<!--DOCUSAURUS_CODE_TABS-->
<!--Java-->
Currently Pulsar Functions expose the following APIs for mutating and accessing State. These APIs are available in the [Context](functions-develop.md#context) object when you are using Java SDK functions.

#### incrCounter

```java
/**
* Increment the builtin distributed counter refered by key
* @param key The name of the key
* @param amount The amount to be incremented
*/
void incrCounter(String key, long amount);
```

Application can use `incrCounter` to change the counter of a given `key` by the given `amount`.

#### getCounter

```java
/**
* Retrieve the counter value for the key.
*
* @param key name of the key
* @return the amount of the counter value for this key
*/
long getCounter(String key);
```

Application can use `getCounter` to retrieve the counter of a given `key` mutated by `incrCounter`.

Except the `counter` API, Pulsar also exposes a general key/value API for functions to store
general key/value state.

#### putState

```java
/**
* Update the state value for the key.
*
* @param key name of the key
* @param value state value of the key
*/
void putState(String key, ByteBuffer value);
```

#### getState

```java
/**
* Retrieve the state value for the key.
*
* @param key name of the key
* @return the state value for the key.
*/
ByteBuffer getState(String key);
```

#### deleteState

```java
/**
* Delete the state value for the key.
*
* @param key name of the key
*/
```

Counters and binary values share the same keyspace, so this deletes either type.

<!--Python-->
Currently Pulsar Functions expose the following APIs for mutating and accessing State. These APIs are available in the [Context](#context) object when you are using Python SDK functions.

#### incr_counter

```python
def incr_counter(self, key, amount):
"""incr the counter of a given key in the managed state"""
```

Application can use `incr_counter` to change the counter of a given `key` by the given `amount`.
If the `key` does not exist, a new key is created.

#### get_counter

```python
def get_counter(self, key):
"""get the counter of a given key in the managed state"""
```

Application can use `get_counter` to retrieve the counter of a given `key` mutated by `incrCounter`.

Except the `counter` API, Pulsar also exposes a general key/value API for functions to store
general key/value state.

#### put_state

```python
def put_state(self, key, value):
"""update the value of a given key in the managed state"""
```

The key is a string, and the value is arbitrary binary data.

#### get_state

```python
def get_state(self, key):
"""get the value of a given key in the managed state"""
```

#### del_counter

```python
def del_counter(self, key):
"""delete the counter of a given key in the managed state"""
```

Counters and binary values share the same keyspace, so this deletes either type.

<!--END_DOCUSAURUS_CODE_TABS-->

### Query State

A Pulsar Function can use the [State API](#api) for storing state into Pulsar's state storage
and retrieving state back from Pulsar's state storage. Additionally Pulsar also provides
CLI commands for querying its state.

```shell
$ bin/pulsar-admin functions querystate \
--tenant <tenant> \
--namespace <namespace> \
--name <function-name> \
--state-storage-url <bookkeeper-service-url> \
--key <state-key> \
[---watch]
```

If `--watch` is specified, the CLI will watch the value of the provided `state-key`.

### Example

<!--DOCUSAURUS_CODE_TABS-->
<!--Java-->

{@inject: github:`WordCountFunction`:/pulsar-functions/java-examples/src/main/java/org/apache/pulsar/functions/api/examples/WordCountFunction.java} is a very good example
demonstrating on how Application can easily store `state` in Pulsar Functions.

```java
public class WordCountFunction implements Function<String, Void> {
@Override
public Void process(String input, Context context) throws Exception {
Arrays.asList(input.split("\\.")).forEach(word -> context.incrCounter(word, 1));
return null;
}
}
```

The logic of this `WordCount` function is pretty simple and straightforward:

1. The function first splits the received `String` into multiple words using regex `\\.`.
2. For each `word`, the function increments the corresponding `counter` by 1 (via `incrCounter(key, amount)`).

<!--Python-->

```python
from pulsar import Function

class WordCount(Function):
def process(self, item, context):
for word in item.split():
context.incr_counter(word, 1)
```

The logic of this `WordCount` function is pretty simple and straightforward:

1. The function first splits the received string into multiple words on space.
2. For each `word`, the function increments the corresponding `counter` by 1 (via `incr_counter(key, amount)`).

<!--END_DOCUSAURUS_CODE_TABS-->
6 changes: 3 additions & 3 deletions site2/docs/functions-metrics.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,15 +7,15 @@ sidebar_label: Metrics
Pulsar Functions can publish arbitrary metrics to the metrics interface which can then be queried. This doc contains instructions for publishing metrics using the [Java](#java-sdk) and [Python](#python-sdk) Pulsar Functions SDKs.

> #### Metrics and stats not available through language-native interfaces
> If a Pulsar Function uses the language-native interface for [Java](functions-api.md#java-native-functions) or [Python](#python-native-functions), that function will not be able to publish metrics and stats to Pulsar.
> If a Pulsar Function uses the [language-native interface](functions-develop.md#available-apis) for Java or Python, that function will not be able to publish metrics and stats to Pulsar.
## Accessing metrics

For a guide to accessing metrics created by Pulsar Functions, see the guide to [Monitoring](deploy-monitoring.md) in Pulsar.

## Java SDK

If you're creating a Pulsar Function using the [Java SDK](functions-api.md#java-sdk-functions), the {@inject: javadoc:Context:/pulsar-functions/org/apache/pulsar/functions/api/Context} object has a `recordMetric` method that you can use to register both a name for the metric and a value. Here's the signature for that method:
If you're creating a Pulsar Function using the Java SDK, the {@inject: javadoc:Context:/pulsar-functions/org/apache/pulsar/functions/api/Context} object has a `recordMetric` method that you can use to register both a name for the metric and a value. Here's the signature for that method:

```java
void recordMetric(String metricName, double value);
Expand All @@ -40,4 +40,4 @@ This function counts the length of each incoming message (of type `String`) and

## Python SDK

Documentation for the [Python SDK](functions-api.md#python-sdk-functions) is coming soon.
Documentation for the Python SDK is coming soon.
16 changes: 14 additions & 2 deletions site2/docs/functions-overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -108,10 +108,10 @@ class RoutingFunction(Function):
self.vegetables_topic = "persistent://public/default/vegetables"

def is_fruit(item):
return item in ["apple", "orange", "pear", "other fruits..."]
return item in [b"apple", b"orange", b"pear", b"other fruits..."]

def is_vegetable(item):
return item in ["carrot", "lettuce", "radish", "other vegetables..."]
return item in [b"carrot", b"lettuce", b"radish", b"other vegetables..."]

def process(self, item, context):
if self.is_fruit(item):
Expand All @@ -123,6 +123,18 @@ class RoutingFunction(Function):
context.get_logger().warn(warning)
```

If this code is stored in `~/router.py`, then you can deploy it in your Pulsar cluster using the [command line](functions-deploy.md#command-line-interface) as follows.

```bash
$ bin/pulsar-admin functions create \
--py ~/router.py \
--classname router.RoutingFunction \
--tenant public \
--namespace default \
--name route-fruit-veg \
--inputs persistent://public/default/basket-items
```

### Functions, messages and message types
Pulsar Functions take byte arrays as inputs and spit out byte arrays as output. However in languages that support typed interfaces(Java), you can write typed Functions, and bind messages to types in the following ways.
* [Schema Registry](functions-develop.md#schema-registry)
Expand Down
Loading

0 comments on commit 1a63b08

Please sign in to comment.