Skip to content

Commit

Permalink
Merge pull request #38 from apify/copy/jan
Browse files Browse the repository at this point in the history
Copy/jan
  • Loading branch information
jancurn authored Aug 18, 2024
2 parents 664d6e1 + 411a244 commit 2cf38db
Show file tree
Hide file tree
Showing 7 changed files with 143 additions and 101 deletions.
69 changes: 38 additions & 31 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
**The whitepaper describes a new concept for building serverless microapps called _Actors_,
which are easy to develop, share, integrate, and build upon.
Actors are a reincarnation of the UNIX philosophy
for programs running in the cloud.**
for programs or agents running in the cloud.**

By [Jan Čurn](https://apify.com/jancurn),
[Marek Trunkát](https://apify.com/mtrunkat),
Expand All @@ -16,7 +16,9 @@ By [Jan Čurn](https://apify.com/jancurn),
<!-- toc -->

- [Introduction](#introduction)
* [Background](#background)
* [Overview](#overview)
* [Apify platform](#apify-platform)
- [Basic concepts](#basic-concepts)
* [Input](#input)
* [Run environment](#run-environment)
Expand All @@ -30,7 +32,7 @@ By [Jan Čurn](https://apify.com/jancurn),
* [Relation to the Actor model](#relation-to-the-actor-model)
* [Why the name "Actor"](#why-the-name-actor)
- [Installation and setup](#installation-and-setup)
* [Apify platform](#apify-platform)
* [Running on the Apify platform](#running-on-the-apify-platform)
* [Node.js](#nodejs)
* [Python](#python)
* [Command-line interface (CLI)](#command-line-interface-cli)
Expand Down Expand Up @@ -72,23 +74,26 @@ By [Jan Čurn](https://apify.com/jancurn),

## Introduction

This document introduces _Actors_,
a new kind of serverless microapps (or agents, cloud programs, ...) for general-purpose
This whitepaper introduces _Actors_,
a new kind of serverless microapps (or agents, cloud programs, functions) for general-purpose
language-agnostic computing and automation jobs.
The main design goal for Actors is to make it easy for developers build and ship reusable
cloud software tools, which are also easy to run
The main goal for Actors is to make it easy for developers build and ship reusable
software automation tools, which are also easy to run
and integrate by other users.


### Background

The Actors were first introduced by [Apify](https://apify.com/) in late 2017,
as a way to easily build, package, and ship web scraping and web automation tools to customers.
Over the years, Apify keeps developing the concept and has applied
it successfully to thousands of real-world use cases in many business areas,
well beyond the domain of web scraping.

Drawing on this experience,
Building on this experience,
we're releasing this whitepaper to introduce the philosophy of Actors
to the public and receive feedback on it.
Our hope is that Actor programming model will eventually become an open standard,
to the public and receive your feedback on it.
Our hope is to make the Actor programming model an open standard,
which will help community to more effectively
build and ship reusable software automation tools,
as well as encourage new implementations of the model in other programming languages.
Expand All @@ -100,8 +105,8 @@ by the Apify platform, with SDKs for
[Node.js](https://sdk.apify.com/) and
[Python](https://pypi.org/project/apify/),
and a [command-line interface (CLI)](https://docs.apify.com/cli).
Beware that the frameworks might not yet implement all the features of Actor programming model.
This is work in progress.
Beware that the frameworks might not yet implement all the features of Actor programming model
described in this whitepaper. This is work in progress.


### Overview
Expand All @@ -114,24 +119,27 @@ or removing duplicates from a large dataset.
Actors can run as short or as long as necessary, from seconds to hours, even infinitely.

Basically, Actors are programs packaged as Docker images,
which accept a well-defined input JSON object, perform
which accept a well-defined JSON input, perform
an action, and optionally produce a well-defined JSON output.

Actors have the following elements:
- **Dockerfile** which specifies where is the Actor's source code,
how to build it, and run it
- **Documentation** in a form of README.md file
how to build it, and run it.
- **Documentation** in a form of README.md file.
- **Input and output schemas** that describe what input the Actor requires,
and what results it produces
- Access to an out-of-box **storage system** for Actor data, results, and files
- **Metadata** such as the Actor name, description, author, and version
and what results it produces.
- Access to an out-of-box **storage system** for Actor data, results, and files.
- **Metadata** such as the Actor name, description, author, and version.

The documentation and the input/output schemas make it possible for people to easily understand what the Actor does,
enter the required inputs both in user interface or API,
and integrate the results of the Actor into their other workflows.
Actors can easily call and interact with each other, enabling building more complex
systems on top of simple ones.


### Apify platform

The Actors can be published
on the [Apify platform](https://apify.com/store),
which automatically generates a rich website with documentation
Expand All @@ -145,15 +153,10 @@ The Apify platform provides an open API, cron-style scheduler, webhooks
and [integrations](https://apify.com/integrations)
to services such as Zapier or Make, which make it easy for users
to integrate Actors into their existing workflows. Additionally, the Actor developers
can set a price tag for the usage of their Actors, and thus make
[passive income](https://blog.apify.com/make-regular-passive-income-developing-web-automation-Actors-b0392278d085/)
to have an incentive to keep developing and improving the Actor for the users.

Currently, Actors can run locally or on the Apify platform. One of the goals of this whitepaper
is to motivate the community to create new runtime environments outside of Apify.
can set a price tag for the usage of their Actors, and thus earn income
and have an incentive to keep developing and improving the Actor for the users.
For details, see [Monetization](#monetization).

The ultimate goal of the Actor programming model is to make it as simple as possible
for people to develop, run, and integrate software automation tools.


## Basic concepts
Expand Down Expand Up @@ -385,12 +388,13 @@ Our primary focus was always on practical software engineering utility, not an
implementation of a formal mathematical model.

For example, our Actors
do not provide any standard message passing mechanism. The Actors might communicate together
do not provide any standard message passing mechanism, but they can communicate together
directly via HTTP requests (see [live-view web server](#live-view-web-server)),
manipulate each other's operation using the Apify platform API (e.g. abort another Actor),
manipulate each other's operation via the Apify platform API (e.g. abort another Actor),
or affect each other by sharing some internal state or storage.
The Actors simply do not have any formal restrictions,
and they can access whichever external systems they want.
The Actors do not have any formal restrictions,
and they can access whichever external systems they want,
and thus going beyond the formal mathematical Actor model.


### Why the name "Actor"
Expand All @@ -403,11 +407,12 @@ And they are related to the Actor model known from the computer science.

To make it clear Actors are not people, the letter "A" is capitalized.


## Installation and setup

Below are steps to start building Actors in various languages and environments.

### Apify platform
### Running on the Apify platform

You can develop and run Actors in [Apify Console](https://console.apify.com/actors) without
installing any software locally. Just create a free account, and start building Actors
Expand Down Expand Up @@ -1174,7 +1179,7 @@ $ actor abort --run-id RUN_ID
$ kill <PID>
```
<!-- TODO: Include Actor.boot() or not? I'd say yes -->
<!-- TODO: Include Actor.boot() or not? I'd say yes. See https://github.com/apify/actor-specs/issues/23 -->
### Live view web server
Expand Down Expand Up @@ -1225,6 +1230,7 @@ https://bob--screenshot-taker.apify.actor
Currently, the specific Standby mode settings, authentication options, or OpenAPI schema are not part of this Actor specification,
but they might be in the future introduced as new settings in the `actor.json` file.
<!-- TODO: Consider unifying Standby mode with Live view web server, they are really two sides of the same thing -->
### Migration to another server
Expand Down Expand Up @@ -1639,6 +1645,7 @@ The monetization gives developers an incentive to further develop and maintain t
Actors provide a new way for software developers like you to monetize their skills,
bringing the creator economy model to SaaS.
For more details, read our essay [Make passive income developing web automation Actors](https://blog.apify.com/make-regular-passive-income-developing-web-automation-Actors-b0392278d085/).
## Future work
Expand Down
34 changes: 24 additions & 10 deletions pages/ACTOR_FILE.md
Original file line number Diff line number Diff line change
@@ -1,32 +1,46 @@
# Actor file specification

This JSON file must be present at `.actor/actor.json` and contains the main definition of the Actor.
This JSON file must be present at `.actor/actor.json` and contains an object with the definition of the Actor.

It looks as follows:
The file looks as follows:

```jsonc
{
"actorSpecification": 2, // required
// Required, indicates that this is an Actor definition file
"actorSpecification": 1,

// Metadata
"name": "google-search-scraper",
"title": "Google Search Scraper",
"description": "The 200-char description",
"version": "0.0", // required
"buildTag": "latest", // if omitted, builds with "latest" tag
"version": "0.0", // Required
"buildTag": "latest", // If omitted, builds with "latest" tag
"environmentVariables": {
"MYSQL_USER": "my_username",
"MYSQL_PASSWORD": "@mySecretPassword"
},
"dockerfile": "./Dockerfile", // if omitted, it checks "./Dockerfile" and "../Dockerfile"
"readme": "./ACTOR.md", // if omitted, it checks "./ACTOR.md" and "../README.md"

// Optional min and max memory for running this Actor
"minMemoryMbytes": 128,
"maxMemoryMbytes": 4096,

// Links to other Actor defintion files
"dockerfile": "./Dockerfile", // If omitted, the system looks for "./Dockerfile" and "../Dockerfile"
"readme": "./README.md", // If omitted, the system looks for "./ACTOR.md" and "../README.md"
"changelog": "../../../shared/CHANGELOG.md",

// Links to input/output schema files, or inlined schema objects.
"input": "./input_schema.json",
"output": "./output_schema.json",
"minMemoryMbytes": 128, // optional number, min memory in megabytes allowed for running this Actor
"maxMemoryMbytes": 4096, // optional number, max memory in megabytes allowed for running this Actor

// Links to storage schema files, or inlined schema objects.
"storages": {
"keyValueStore": "./key_value_store_schema.json",
"dataset": "../shared-schemas/dataset_schema.json",
"dataset": "../shared_schemas/generic_dataset_schema.json",
"requestQueue": "./request_queue_schema.json"
},

// Scripts that might be used by the CLI to ease the local Actor development.
"scripts": {
"post-create": "npm install",
"run": "npm start"
Expand Down
48 changes: 33 additions & 15 deletions pages/DATASET_SCHEMA.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,10 @@
# Dataset schema file specification

Dataset storage enables you to sequentially save and retrieve data. Each Actor run is assigned its own dataset, which is created when the first item is stored to it. Datasets usually contain results from web scraping, crawling or data processing jobs. The data can be visualized as a table where each object is a row and its attributes are the columns. The data can be exported in JSON, CSV, XML, RSS, Excel or HTML formats.
Dataset storage enables you to sequentially store and retrieve data records, in various formats.
Each Actor run is assigned its own dataset, which is created when the first item is stored to it.
Datasets usually contain results from web scraping, crawling or data processing jobs.
The data can be visualized as a table where each object is a row and its attributes are the columns.
The data can be exported in JSON, CSV, XML, RSS, Excel or HTML formats.

Dataset can be assigned a schema which describes:

Expand Down Expand Up @@ -41,22 +45,35 @@ Uncaught Error: Dataset schema is not compatible with the provided schema
"title": "Eshop products",
"description": "Dataset containing the whole product catalog including prices and stock availability.",
// Not supported yet
// Define a JSON schema for the dataset fields, including their type, description, etc.
"fields": {
"title": "string",
"imageUrl": "string",
"priceUsd": "number",
"manufacturer": {
"title": "string",
"url": "number",
"type": "object",
"properties": {
"title": {
"type": "string",
"description": "The name of the results",
},
"imageUrl": {
"type": "string",
"description": "Function executed for each request",
},
"priceUsd": {
"type": "integer",
"description": "Price of the item",
},
"manufacturer": {
"type": "object",
"properties": {
"title": { ... },
"url": { ... },
}
},
...
},
"productVariants": [{
"color": "?string"
}],
...
"required": ["title"],
},
// Define the ways how to present the Dataset to users
"views": {
"overview": {
"title": "Products overview",
Expand Down Expand Up @@ -116,7 +133,8 @@ Uncaught Error: Dataset schema is not compatible with the provided schema

### JSON schema

Items of a dataset can be described by a JSON schema definition. Apify platform then ensures that each object accomplies with the provided schema. In the first version only the standard JSON schema will be supported, i.e.:
Items of a dataset can be described by a JSON schema definition. Apify platform then ensures that each object complies with the provided schema.
In the first version only the standard JSON schema will be supported, i.e.:


```jsonc
Expand Down Expand Up @@ -155,7 +173,7 @@ Items of a dataset can be described by a JSON schema definition. Apify platform
}
```

with simplifed version comming in the near future:
And potentially with simplified version coming in the future:

```jsonc
{
Expand Down
4 changes: 2 additions & 2 deletions pages/INPUT_SCHEMA.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Actor input schema file specification

NOTE: Currently the Apify platform only supports [input schema v1](https://docs.apify.com/Actors/development/input-schema),
this document describes how the v2 should look like, but it's not implemented yet.
NOTE: Currently the Apify platform only supports [input schema version 1](https://docs.apify.com/Actors/development/input-schema),
this document describes how the version 2 should look like, but it's not implemented yet.

## Work in progress

Expand Down
1 change: 0 additions & 1 deletion pages/KEY_VALUE_STORE_SCHEMA.md
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,6 @@ https://api.apify.com/v2/key-value-stores/storeId/keys?prefix=post-images-

## TODO(@jancurn)
- Finalize this text, keep `collections` for now
- xx
- What is kv-store schema is used by Actor to define structure of key-value store it operates on,
but the developer defines a non-compatible record group for "INPUT" prefix?
Maybe the default kv-stores should be created with a default record group to cover the "INPUT" prefixes
Expand Down
Loading

0 comments on commit 2cf38db

Please sign in to comment.