Skip to content

EdgeLake/edgelake.github.io

Repository files navigation




EdgeLake

Transform your edge nodes into members of a permissioned decentralized network, optimized to manage and monitor data and resources at the edge.

  • Deploy EdgeLake instances on your nodes at the edge.
  • Enable services on each node.
  • Stream data from your PLCs, Sensors, and applications to the edge nodes.
  • Query the distributed data from a single point (as if the data is hosted in a centralized database).
  • Manage your edge resources from a single point (the network of nodes reflects a Single System Image).

Table of Content

How it Works

  • By deploying EdgeLake on a node, the node joins a decentralized, P2P network of nodes.
  • Using a network protocol and a shared metadata layer, the nodes operate as a single machine that is optimized to capture, host, manage and query data at the edge.
  • The nodes share a metadata layer. The metadata includes policies that describe the schemas of the data, the data distribution, the participating nodes, security and data ownerships and more. The shared metadata is hosted in one of the following:
    • A member node designated as a Master Node.
    • A blockchain (making the network fully decentralized).
  • Each node in the network is configured to provide data services. Examples of services:
    • Capture data via REST, MQTT, gRPC, JSON Files.
    • Host data in a local database (like SQLite or PostgreSQL, MongoDB).
    • Satisfy Queries.

When an application issues a query, it is delivered to one of the nodes in the network. This node serves as an orchestrator of the query and operates as follows: Using the shared metadata, the node determines which are the target nodes that host the relevant data. The query is transferred to the target nodes and the replies from all the target nodes are aggregated dynamically and returned as a unified reply to the application. This process is similar to MapReduce, whereas the target nodes are determined dynamically by the query and the shared metadata. Monitoring of resources operates in a similar way.

Deploying an EdgeLake node and making the node a member of a network is done as follows:

  • Download and install the EdgeLake software on the Edge Node.
  • Enable the services that determine the functionalities provided by the node.

Services are enabled by one, or a combination of the following:

  • Issuing configuration commands using the Node's Command Line Interface (CLI).
  • Listing configuration commands in script files and associating the node with the files.
  • Listing configuration commands in policies that are hosted in the shared metadata and associating the node with the policies.

The services configured determine the role of a node which can be one or multiple of the following:

  • Operator Node - a node that captures data and hosts the data on a local DBMS. Data sources like devices, PLCs and applications deliver data to Operator Nodes for storage.
  • Query Node - a node that orchestrates a query process. Applications deliver their queries to Query Nodes, these nodes interact with Operator Nodes (that host the data) to return a unified and complete reply for each query.
  • Master Node - a node that replaces a blockchain platform for storage of metadata policies. The network metadata is organized in Policies and users can associate a blockchain or alternatively a Master Node for metadata storage.

In a deployed network, devices, sensors, PLCs and applications send their data to Operator Nodes. Data management on each Operator Node is automated.
Queries are satisfied by Query Nodes as if all the distributed data is managed in a centralized database.
The same setup monitors edge resources - for example, users and applications can monitor CPU, Network, disk-space, of the distributed edge resources from a single point.

Download and Install

Detailed directions for Install EdgeLke can be found in docker-compose repository

Prepare Node(s):

  • Install requirements
    • Docker
    • docker-compose
    • Makefile
    
sudo snap install docker
sudo apt-get -y install docker-compose 
sudo apt-get -y install make
 
# Grant non-root user permissions to use docker
USER=`whoami` 
sudo groupadd docker 
sudo usermod -aG docker ${USER} 
newgrp docker
    
  • Clone docker-compose repository from EdgeLake
    
git clone https://github.com/EdgeLake/docker-compose
cd docker-compose
    

Deploy EdgeLake:

    
#--- General ---
# Information regarding which EdgeLake node configurations to enable. By default, even if everything is disabled, EdgeLake starts TCP and REST connection services.
NODE_TYPE=master
# Name of the EdgeLake instance
NODE_NAME=anylog-master
# Owner of the EdgeLake instance
COMPANY_NAME=New Company

#--- Networking ---
# Port address used by EdgeLake's TCP protocol to communicate with other nodes in the network
ANYLOG_SERVER_PORT=32048
# Port address used by EdgeLake's REST protocol
ANYLOG_REST_PORT=32049
# A bool value that determines if to bind to a specific IP and Port (a false value binds to all IPs)
TCP_BIND=false
# A bool value that determines if to bind to a specific IP and Port (a false value binds to all IPs)
REST_BIND=false

#--- Blockchain ---
# TCP connection information for Master Node
LEDGER_CONN=127.0.0.1:32048

#--- Advanced Settings ---
# Whether to automatically run a local (or personalized) script at the end of the process
DEPLOY_LOCAL_SCRIPT=false
    
  1. Start Node using makefile
    
make up [NODE_TYPE]

# examples
make up master
make up operator
make up query
    

Prerequisite and Setup considerations

Feature Requirement
Operating System Linux (Ubuntu, RedHat, Alpine, Suse), Windows, OSX
Memory footprint 100 MB available for EdgeLake deployed without Docker
300 MB available for EdgeLake deployed with Docker
Databases PostgreSQL installed (optional)
SQLite (default, no need to install)
MongoDB installed (Only if blob storage is needed)
CPU Intel, ARM and AMD are supported.
EdgeLake can be deployed on a single CPU machine and up to the largest servers (can be deployed on gateways, Raspberry PI, and all the way to the largest multi-core machines).
Storage EdgeLake supports horizontal scaling - nodes (and storage) are added dynamically as needed, therefore less complexity in scaling considerations. Requirements are based on expected volume and duration of data on each node. EdgeLake supports automated archival and transfer to larger nodes (if needed).
Network Required: a TCP based network (local TCP based networks, over the internet and combinations are supported)
An overlay network is recommended. Most overlay networks can be used transparently. Nebula used as a default overlay network.
Static IP and 3 ports open and accessible on each node (either via an Overlay Network, or without an Overlay).
Cloud Integration Build in integration using REST, Pub-Sub, and Kafka.
Deployment options Executable (can be deployed as a background process), or Docker or Kubernetes.

Comments:

  • Databases:

    • SQLite recommended for smaller nodes and in-memory data.
    • PostgreSQL recommended for larger nodes.
    • MongoDB used for blob storage.
    • Multiple databases can be deployed and used on the same node.
  • Network: An Overlay network is recommended for the following reasons:

    • Isolate the network for security considerations.
    • Manage IP and Ports availability. Without an overlay network, users needs to configure and manage availability of IP and Ports used.