Skip to content

a13zen/amazon-eks-autonomous-driving-data-service

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 

Repository files navigation

Data Service for ADAS and ADS Development

Overview

This is an example of a data service typically used in advanced driver assistance systems (ADAS), and automated driving systems (ADS) development. The data service is composed from following modular AWS services: Amazon Elastic Kubernetes Service (EKS), Amazon Managed Streaming for Apache Kafka (MSK), Amazon Redshift, Amazon FSx, Amazon Elastic File System (EFS), AWS Batch, AWS Step Functions, AWS Secrets Manager, Amazon Glue, Amazon Fargate, Amazon Elastic Container Registry (ECR), Amazon Virtual Private Cloud (VPC), Amazon EC2, and Amazon S3.

The typical use case addressed by this data service is to serve sensor data from a specified drive scene of interest, either in batch mode as a single rosbag file, or in streaming mode as an ordered series of rosbag files, whereby each rosbag file contains the drive scene data for a single time step, which by default is 1 second. Each rosbag file is dynamically composed from the drive scene data stored in Amazon S3, using the meta-data stored in Amazon Redshift.

Key concepts

The data service runs in Kubernetes Pods in an Amazon EKS cluster configured to use Horizontal Pod Autoscaler and EKS Cluster Autoscaler.

An Amazon Managed Service For Apache Kafka (MSK) cluster provides the communication channel between the data service, and the client. The data service implements a request-response paradigm over Apache Kafka topics. However, the response data is not sent back over the Kafka topics. Instead, the data service stages the response data in Amazon S3, Amazon FSx, or Amazon EFS, as specified in the data client request, and the location of the response data is sent back to the data client over the Kafka topics. The data client directly reads the response data from its staged location.

Data client request

Concretely, imagine the data client wants to request drive scene data in rosbag file format from A2D2 autonomous driving dataset for vehicle id a2d2, drive scene id 20190401145936, starting at timestamp 1554121593909500 (microseconds) , and stopping at timestamp 1554122334971448 (microseconds). The data client wants the response to include data only from the front-left camera in sensor_msgs/Image ROS data type, and the front-left lidar in sensor_msgs/PointCloud2 ROS data type. The data client wants the response data to be streamed back chunked in series of rosbag files, each file spanning 1000000 microseconds of the drive scene. Finally, the data client wants the response rosbag files to be stored on a shared Amazon FSx file system.

The data client can encode such a data request using the JSON object shown below, and send it to the Kafka bootstrap servers b-1.msk-cluster-1:9092,b-2.msk-cluster-1:9092 on the Apache Kafka topic named a2d2:

{
	"servers": "b-1.msk-cluster-1:9092,b-2.msk-cluster-1:9092",
	"requests": [{
		"kafka_topic": "a2d2", 
		"vehicle_id": "a2d2",
		"scene_id": "20190401145936",
		"sensor_id": ["lidar/front_left", "camera/front_left"],
		"start_ts": 1554121593909500, 
		"stop_ts": 1554122334971448,
		"ros_topic": {"lidar/front_left": "/a2d2/lidar/front_left", 
				"camera/front_left": "/a2d2/camera/front_left"},
		"data_type": {"lidar/front_left": "sensor_msgs/PointCloud2",
				"camera/front_left": "sensor_msgs/Image"},
		"step": 1000000,
		"accept": "fsx/multipart/rosbag",
		"preview": false
	}]
}

For a detailed description of each request field shown in the example above, see data request fields below.

Tutorial step by step guide

Overview

In this tutorial, we use A2D2 autonomous driving dataset. The high-level outline of this tutorial is as follows:

  • Prerequisites
  • Configure the data service
  • Prepare the A2D2 data
  • Run the data service
  • Run the data service client

Prerequisites

This tutorial assumes you have an AWS Account, and you have Administrator job function access to the AWS Management Console.

To get started:

  • Select your AWS Region. The AWS Regions supported by this project include, us-east-1, us-east-2, us-west-2, eu-west-1, eu-central-1, ap-southeast-1, ap-southeast-2, ap-northeast-1, ap-northeast-2, and ap-south-1. The A2D2 dataset used in this tutorial is stored in eu-central-1.
  • Subscribe to Ubuntu Pro 18.04 LTS and Ubuntu Pro 20.04 LTS.
  • If you do not already have an Amazon EC2 key pair, create a new Amazon EC2 key pair. You will need the key pair name to specify the KeyName parameter when creating the AWS CloudFormation stack below.
  • You will need an Amazon S3 bucket. If you don't have one, create a new Amazon S3 bucket in the selected AWS region. You will use the S3 bucket name to specify the S3Bucket parameter in the stack. The bucket is used to store the A2D2 data.
  • Use the public internet address of your laptop as the base value for the CIDR to specify DesktopRemoteAccessCIDR parameter in the CloudFormation stack you will create below.
  • For all passwords used in this tutorial, we recommend using strong passwords using the best-practices recommended for AWS root account user password.

Configure the data service

Create AWS CloudFormation Stack

The AWS CloudFormation template cfn/mozart.yml in this repository creates AWS Identity and Access Management (IAM) resources, so when you create the CloudFormation Stack using the console, in the Review step, you must check I acknowledge that AWS CloudFormation might create IAM resources.

Create a new AWS CloudFormation stack using the cfn/mozart.yml template. The stack input parameters you must specify are described below:

Parameter Name Parameter Description
KeyPairName This is a required parameter whereby you select the Amazon EC2 key pair name used for SSH access to the desktop. You must have access to the selected key pair's private key to connect to your desktop.
RedshiftMasterUserPassword This is a required parameter whereby you specify the Redshift database master user password.
DesktopRemoteAccessCIDR This is a required parameter whereby you specify the public IP CIDR range from where you need remote access to your graphics desktop, e.g. 1.2.3.4/32, or 7.8.0.0/16.
DesktopInstanceType This is a required parameter whereby you select an Amazon EC2 instance type for the ROS desktop. The default value, g3s.xlarge, may not be available for your selected region, in which case, we recommend you try g4dn.xlarge, or one of the other instance types.
S3Bucket This is a required parameter whereby you specify the name of the Amazon S3 bucket to store your data. The S3 bucket must already exist.

For all other stack input parameters, default values are recommended during first walkthrough. See complete list of all the template input parameters below.

Key AWS resources

The key resources in the CloudFormation stack are listed below:

  • A ROS desktop EC2 instance (default type g3s.xlarge)
  • An Amazon EKS cluster with 2 managed node group nodes (default type r5n.8xlarge)
  • An Amazon MSK cluster with 3 broker nodes (default type kafka.m5.2xlarge)
  • An Amazon Redshift cluster with 3 nodes (default type ra3.4xlarge)
  • An Amazon Fsx for Lustre file-system (default size 7,200 GiB)
  • An Amazon EFS file-system

Connect to the graphics desktop using SSH

  • Once the stack status in CloudFormation console is CREATE_COMPLETE, find the desktop instance launched in your stack in the Amazon EC2 console, and connect to the instance using SSH as user ubuntu, using your SSH key pair.
  • When you connect to the desktop using SSH, and you see the message "Cloud init in progress. Machine will REBOOT after cloud init is complete!!", disconnect and try later after about 20 minutes. The desktop installs the NICE DCV server on first-time startup, and reboots after the install is complete.
  • If you see the message NICE DCV server is enabled!, run the command sudo passwd ubuntu to set a new password for user ubuntu. Now you are ready to connect to the desktop using the NICE DCV client

Connect to the graphics desktop using NICE DCV Client

  • Download and install the NICE DCV client on your laptop.
  • Use the NICE DCV Client to login to the desktop as user ubuntu
  • When you first login to the desktop using the NICE DCV client, you will be asked if you would like to upgrade the OS version. Do not upgrade the OS version .

Now you are ready to proceed with the following steps. For all the commands in this tutorial, we assume the working directory to be ~/amazon-eks-autonomous-driving-data-service on the graphics desktop.

Configure EKS cluster access

In this step, you need AWS credentials for programmatic access for the IAM user, or role, you used to create the AWS CloudFormation stack above. You must not use the AWS credentials for a different IAM user, or role. The AWS credentials are used one-time to enable EKS cluster access from the ROS desktop, and are automatically removed at the end of this step.

If you used an IAM role to create the CloudFormation stack above, you must manually configure the credentials associated with the IAM role in the ~/.aws/credentials file with the following fields:

	aws_access_key_id=
	aws_secret_access_key=
	aws_session_token=

If you used an IAM user to create the stack, you do not have to manually configure the credentials in ~/.aws/credentials file.

In the working directory, run the command:

	./scripts/configure-eks-auth.sh

At the successful execution of this command, you must see AWS Credentials Removed.

Setup EKS cluster environment

To setup the eks cluster environment, in the working directory, run the command:

	./scripts/setup-dev.sh

This step also builds and pushes the data service container image into Amazon ECR.

Prepare the A2D2 data

Before we can run the A2D2 data service, we need to extract the raw A2D2 data into your S3 bucket, extract the metadata from the raw data, and upload the metadata into the Redshift cluster. We execute these three steps using an AWS Step Functions state machine. To create and execute the AWS Step Functions state machine, execute the following command in the working directory:

	./scripts/a2d2-etl-steps.sh

Note the executionArn of the state machine execution in the output of the previous command. To check the status the status of the execution, use following command, replacing executionArn below with your value:

aws stepfunctions describe-execution --execution-arn executionArn

You can see the status of the Step Function State Machine execution in the Step Functions console, as well.

The state machine execution time depends on many variable factors and may take anywhere from 12 - 24 hours, or possibly longer. You can see the status and the CloudWatch logs for each AWS Batch job spawned by the Step Function State Machine execution in the AWS Batch console. If a Batch job fails, it is automatically retried.

Run the data service

For best performance, preload A2D2 data from S3 to FSx. For a quick preview of the data service, you may proceed with the step below.

The data service is deployed using an Helm Chart, and runs as a kubernetes deployment in EKS. To start the data service, execute the following command in the working directory:

	helm install --debug a2d2-data-service ./a2d2/charts/a2d2-data-service/

To verify that the a2d2-data-service deployment is running, execute the command:

	kubectl get pods -n a2d2

Raw data input data source, and response data staging

The data service can be configured to input raw data from S3, FSx (see Preload A2D2 data from S3 to FSx ), or EFS (see Preload A2D2 data from S3 to EFS ). Similarly, the data client can specify the accept field in the request to request that the response data be staged on S3, FSx, or EFS. The raw data input source is fixed when the data service is deployed. However, the response data staging option is specified in the data client request, and is therefore dynamic.

Below is the Helm chart configuration in values.yaml for various raw input data source options, with recommended Kubernetes resource requests for pod memory and cpu:

Raw input data source values.yaml Configuration
fsx (default) a2d2.requests.memory: "72Gi"
a2d2.requests.cpu: "8000m"
configMap.data_store.input: "fsx"
efs a2d2.requests.memory: "32Gi"
a2d2.requests.cpu: "1000m"
configMap.data_store.input: "efs"
s3 a2d2.requests.memory: "8Gi"
a2d2.requests.cpu: "1000m"
configMap.data_store.input: "s3"

For matching response data staging options in data client request, see requests.accept field in data request fields. The recommended response data staging option for fsx raw data source is"accept": "fsx/multipart/rosbag", for efs raw data source is"accept": "efs/multipart/rosbag", and for s3 raw data source is"accept": "s3/multipart/rosbag"

Run the data service client

To visualize the response data requested by the A2D2 data client, we will use rviz tool on the graphics desktop. Open a terminal on the desktop, and run rviz. In the rviz tool, use File>Open Config to select /home/ubuntu/amazon-eks-autonomous-driving-data-service/a2d2/config/a2d2.rviz as the rviz config. You should see rviz tool now configured with two areas, one for visualizing image data, and the other for visualizing point cloud data.

To run the data client, execute the following command in the working directory:

	python ./a2d2/src/data_client.py \
		--config ./a2d2/config/c-config-ex1.json 1>/tmp/a.out 2>&1 & 

After a brief delay, you should be able to preview the response data in the rviz tool To preview data from a different drive scene, execute:

	python ./a2d2/src/data_client.py \
		--config ./a2d2/config/c-config-ex2.json 1>/tmp/a.out 2>&1 & 

You can set "preview": false in the data client config file, and run the above command to view the complete response, but before you do that, we recommend that for best performance, preload A2D2 data from S3 to FSx.

It is important that you do not run multiple data client instances on the ROS desktop concurrently. This is because the response data is played back on ROS topics, and there is only one ROS server running on the ROS desktop. Wait for a data client instance to exit, before you start another instance. Aborting the data client manually does not stop the data service pod from producing the response.

Hard reset of the data service

This step is for reference purposes. If at any time you need to do a hard reset of the data service, you can do so by executing:

	helm delete a2d2-data-service

This will delete all data service EKS pods immediately. All in-flight service responses will be aborted. Because the connection between the data client and data service is asynchronous, the data clients may wait indefinitely, and you may need to cleanup the data client processes manually on the ROS desktop using operating system tools. Note, each data client instance spawns multiple Python processes.

Deleting the AWS CloudFormation stack

When you no longer need the data service , you may delete the AWS CloudFormation stack from the AWS CloudFormation console. Deleting the stack will terminate the desktop instance, and delete the EFS and FSx for Lustre file-systems created in the stack. The Amazon S3 bucket is not deleted.

Reference

Data client request fields

Below, we explain the semantics of the various fields in the data client request JSON object.

Request field name Request field description
servers The servers identify the AWS MSK Kafka cluster endpoint.
delay The delay specifies the delay in seconds that the data client delays sending the request. Default value is 0.
use_time (Optional) The use_time specifies whether to use the received time, or header time when playing back the received messages. Default value is received.
requests The JSON document sent by the client to the data service must include an array of one or more data requests for drive scene data.
requests.kafka_topic The kafka_topic specifies the Kafka topic on which the data request is sent from the client to the data service. The data service is listening on the topic.
requests.vehicle_id The vehicle_id is used to identify the relevant drive scene dataset.
requests.scene_id The scene_id identifies the drive scene of interest, which in this example is 20190401145936, which in this example is a string representing the date and time of the drive scene, but in general could be any unique value.
requests.start_ts The start_ts (microseconds) specifies the start timestamp for the drive scene data request.
requests.stop_ts The stop_ts (microseconds) specifies the stop timestamp for the drive scene data request.
requests.ros_topic The ros_topic is a dictionary from sensor ids in the vehicle to ros topics.
requests.data_type The data_type is a dictionary from sensor ids to ros data types.
requests.step The step is the discreet time interval (microseconds) used to discretize the timespan between start_ts and stop_ts. If requests.accept value contains multipart, the data service responds with a rosbag file for each discreet step: See possible values below.
requests.accept The accept specifies the response data staging format acceptable to the client: See possible values below.
requests.image (Optional) The value undistorted undistorts the camera image. Undistoring an image slows down the image frame rate. Default value is original distorted image.
requests.lidar_view (Optional) The value vehicle transforms lidar points to vehicle frame of reference view. Default value is camera.
requests.preview If the preview field is set to true, the data service returns requested data over a single time step starting from start_ts , and ignores the stop_ts.

Possible requests.accept field values

Below we describe the possible values for requests.accept field:

requests.accept value Description
fsx/multipart/rosbag Stage response data on shared Amazon FSx file-system as discretized rosbag chunks
efs/multipart/rosbag Stage response data on shared Amazon EFS file-system as discretized rosbag chunks
s3/multipart/rosbag Stage response data on Amazon S3 as discretized rosbag chunks
fsx/singlepart/rosbag Stage response data on shared Amazon FSx file-system as single rosbag
efs/singlepart/rosbag Stage response data on on shared Amazon EFS file-system as single rosbag
s3/singlepart/rosbag Stage response data on Amazon S3 as single rosbag
manifest Return a manifest of meta-data containing S3 paths to the raw data: manifest is returned directly over the Kafka response topic, and is not staged

AWS CloudFormation template input parameters

This repository provides an AWS CloudFormation template that is used to create the required stack.

Below, we describe the AWS CloudFormation template input parameters. Desktop below refers to the NICE DCV enabled high-performance graphics desktop that acts as the data service client in this tutorial.

Parameter Name Parameter Description
DesktopInstanceType This is a required parameter whereby you select an Amazon EC2 instance type for the desktop running in AWS cloud. Default value is g3s.xlarge.
DesktopEbsVolumeSize This is a required parameter whereby you specify the size of the root EBS volume (default size is 200 GB) on the desktop. Typically, the default size is sufficient.
DesktopEbsVolumeType This is a required parameter whereby you select the EBS volume type (default is gp3).
DesktopHasPublicIpAddress This is a required parameter whereby you select whether a Public Ip Address be associated with the Desktop. Default value is true.
DesktopRemoteAccessCIDR This parameter specifies the public IP CIDR range from where you need remote access to your client desktop, e.g. 1.2.3.4/32, or 7.8.0.0/16.
EKSEncryptSecrets This is a required parameter whereby you select if encryption of EKS secrets is Enabled. Default value is Enabled.
EKSEncryptSecretsKmsKeyArn This is an optional advanced parameter whereby you specify the AWS KMS key ARN that is used to encrypt EKS secrets. Leave blank to create a new KMS key.
EKSNodeGroupInstanceType This is a required parameter whereby you select EKS Node group EC2 instance type. Default value is r5n.8xlarge.
EKSNodeVolumeSizeGiB This is a required parameter whereby you specify EKS Node group instance EBS volume size. Default value is 200 GiB.
EKSNodeGroupMinSize This is a required parameter whereby you specify EKS Node group minimum size. Default value is 1 node.
EKSNodeGroupMaxSize This is a required parameter whereby you specify EKS Node group maximum size. Default value is 8 nodes.
EKSNodeGroupDesiredSize This is a required parameter whereby you specify EKS Node group initial desired size. Default value is 2 nodes.
FargateComputeType This is a required parameter whereby you specify Fargate compute environment type. Allowed values are FARGATE_SPOT and FARGATE. Default value is FARGATE_SPOT.
FargateComputeMax This is a required parameter whereby you specify maximum size of Fargate compute environment in vCpus. Default value is 1024.
FSxStorageCapacityGiB This is a required parameter whereby you specify the FSx Storage capacity, which must be in multiples of 2400 GiB. Default value is 7200 GiB.
FSxS3ImportPrefix This is an optional advanced parameter whereby you specify FSx S3 bucket path prefix for importing data from S3 bucket. Leave blank to import the complete bucket.
KeyPairName This is a required parameter whereby you select the Amazon EC2 key pair name used for SSH access to the desktop. You must have access to the selected key pair's private key to connect to your desktop.
KubectlVersion This is a required parameter whereby you specify EKS kubectl version. Default value is 1.21.2/2021-07-05.
KubernetesVersion This is a required parameter whereby you specify EKS cluster version. Default value is 1.21.
MSKBrokerNodeType This is a required parameter whereby you specify the type of node to be provisioned for AWS MSK Broker.
MSKNumberOfNodes This is a required parameter whereby you specify the number of MSK Broker nodes, which must be >= 2.
PrivateSubnet1CIDR This is a required parameter whereby you specify the Private Subnet1 CIDR in Vpc CIDR. Default value is 172.30.64.0/18.
PrivateSubnet2CIDR This is a required parameter whereby you specify the Private Subnet2 CIDR in Vpc CIDR. Default value is 172.30.128.0/18.
PrivateSubnet3CIDR This is a required parameter whereby you specify the Private Subnet3 CIDR in Vpc CIDR. Default value is 172.30.192.0/18.
PublicSubnet1CIDR This is a required parameter whereby you specify the Public Subnet1 CIDR in Vpc CIDR. Default value is 172.30.0.0/24.
PublicSubnet2CIDR This is a required parameter whereby you specify the Public Subnet2 CIDR in Vpc CIDR. Default value is 172.30.1.0/24.
PublicSubnet3CIDR This is a required parameter whereby you specify the Public Subnet3 CIDR in Vpc CIDR. Default value is 172.30.2.0/24.
RedshiftDatabaseName This is a required parameter whereby you specify the name of the Redshift database. Default value is mozart.
RedshiftMasterUsername This is a required parameter whereby you specify the name Redshift Master user name. Default value is admin.
RedshiftMasterUserPassword This is a required parameter whereby you specify the name Redshift Master user password.
RedshiftNodeType This is a required parameter whereby you specify the type of node to be provisioned for Redshift cluster. Default value is ra3.4xlarge.
RedshiftNumberOfNodes This is a required parameter whereby you specify the number of compute nodes in the Redshift cluster, which must be >= 2.
RedshiftPortNumber This is a required parameter whereby you specify the port number on which the Redshift cluster accepts incoming connections. Default value is 5439.
RosVersion This is a required parameter whereby you specify the version of ROS. The supported versions are melodic on Ubuntu Bionic, and noetic on Ubuntu Focal. Default value is noetic.
S3Bucket This is a required parameter whereby you specify the name of the Amazon S3 bucket to store your data.
UbuntuAMI This is an optional advanced parameter whereby you specify Ubuntu AMI (18.04 or 20.04).
VpcCIDR This is a required parameter whereby you specify the Amazon VPC CIDR for the VPC created in the stack. Default value is 172.30.0.0/16. If you change this value, all the subnet parameters above may need to be set, as well.

Preload A2D2 data from S3 to FSx

This step can be executed anytime after "Configure the data service" step has been executed

Amazon FSx for Lustre automatically lazy loads data from the configured S3 bucket. Therefore, this step is strictly a performance optimization step . However, for maximal performance, it is highly recommended. Execute following command to start preloading data from your S3 bucket to the FSx file-system:

kubectl apply -n a2d2 -f a2d2/fsx/stage-data-a2d2.yaml

To check if the step is complete, execute:

kubectl get pods stage-fsx-a2d2 -n a2d2

If the pod is still Running, the step has not yet completed. This step takes approximately 5 hours to complete.

Preload A2D2 data from S3 to EFS

This step can be executed anytime after "Configure the data service" step has been executed

This step is required only if you plan to configure the data service to use EFS as the raw data input source, otherwise, it may be safely skipped. Execute following command to start preloading data from your S3 bucket to the EFS file-system:

kubectl apply -n a2d2 -f a2d2/efs/stage-data-a2d2.yaml

To check if the step is complete, execute:

kubectl get pods stage-efs-a2d2 -n a2d2

If the pod is still Running, the step has not yet completed. This step takes approximately 6.5 hours to complete.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 64.9%
  • Shell 31.4%
  • CMake 3.7%