This is an example of a data service typically used in advanced driver assistance systems (ADAS), and automated driving systems (ADS) development. The data service is composed from following modular AWS services: Amazon Elastic Kubernetes Service (EKS), Amazon Managed Streaming for Apache Kafka (MSK), Amazon Redshift, Amazon FSx, Amazon Elastic File System (EFS), AWS Batch, AWS Step Functions, AWS Secrets Manager, Amazon Glue, Amazon Fargate, Amazon Elastic Container Registry (ECR), Amazon Virtual Private Cloud (VPC), Amazon EC2, and Amazon S3.
The typical use case addressed by this data service is to serve sensor data from a specified drive scene of interest, either in batch mode as a single rosbag
file, or in streaming mode as an ordered series of rosbag
files, whereby each rosbag
file contains the drive scene data for a single time step, which by default is 1 second. Each rosbag
file is dynamically composed from the drive scene data stored in Amazon S3, using the meta-data stored in Amazon Redshift.
The data service runs in Kubernetes Pods in an Amazon EKS cluster configured to use Horizontal Pod Autoscaler and EKS Cluster Autoscaler.
An Amazon Managed Service For Apache Kafka (MSK) cluster provides the communication channel between the data service, and the client. The data service implements a request-response paradigm over Apache Kafka topics. However, the response data is not sent back over the Kafka topics. Instead, the data service stages the response data in Amazon S3, Amazon FSx, or Amazon EFS, as specified in the data client request, and the location of the response data is sent back to the data client over the Kafka topics. The data client directly reads the response data from its staged location.
Concretely, imagine the data client wants to request drive scene data in rosbag
file format from A2D2 autonomous driving dataset for vehicle id a2d2
, drive scene id 20190401145936
, starting at timestamp 1554121593909500
(microseconds) , and stopping at timestamp 1554122334971448
(microseconds). The data client wants the response to include data only from the front-left camera
in sensor_msgs/Image
ROS data type, and the front-left lidar
in sensor_msgs/PointCloud2
ROS data type. The data client wants the response data to be streamed back chunked in series of rosbag
files, each file spanning 1000000
microseconds of the drive scene. Finally, the data client wants the response rosbag
files to be stored on a shared Amazon FSx file system.
The data client can encode such a data request using the JSON object shown below, and send it to the Kafka bootstrap servers b-1.msk-cluster-1:9092,b-2.msk-cluster-1:9092
on the Apache Kafka topic named a2d2
:
{
"servers": "b-1.msk-cluster-1:9092,b-2.msk-cluster-1:9092",
"requests": [{
"kafka_topic": "a2d2",
"vehicle_id": "a2d2",
"scene_id": "20190401145936",
"sensor_id": ["lidar/front_left", "camera/front_left"],
"start_ts": 1554121593909500,
"stop_ts": 1554122334971448,
"ros_topic": {"lidar/front_left": "/a2d2/lidar/front_left",
"camera/front_left": "/a2d2/camera/front_left"},
"data_type": {"lidar/front_left": "sensor_msgs/PointCloud2",
"camera/front_left": "sensor_msgs/Image"},
"step": 1000000,
"accept": "fsx/multipart/rosbag",
"preview": false
}]
}
For a detailed description of each request field shown in the example above, see data request fields below.
In this tutorial, we use A2D2 autonomous driving dataset. The high-level outline of this tutorial is as follows:
- Prerequisites
- Configure the data service
- Prepare the A2D2 data
- Run the data service
- Run the data service client
This tutorial assumes you have an AWS Account, and you have Administrator job function access to the AWS Management Console.
To get started:
- Select your AWS Region. The AWS Regions supported by this project include, us-east-1, us-east-2, us-west-2, eu-west-1, eu-central-1, ap-southeast-1, ap-southeast-2, ap-northeast-1, ap-northeast-2, and ap-south-1. The A2D2 dataset used in this tutorial is stored in
eu-central-1
. - Subscribe to Ubuntu Pro 18.04 LTS and Ubuntu Pro 20.04 LTS.
- If you do not already have an Amazon EC2 key pair, create a new Amazon EC2 key pair. You will need the key pair name to specify the
KeyName
parameter when creating the AWS CloudFormation stack below. - You will need an Amazon S3 bucket. If you don't have one, create a new Amazon S3 bucket in the selected AWS region. You will use the S3 bucket name to specify the
S3Bucket
parameter in the stack. The bucket is used to store the A2D2 data. - Use the public internet address of your laptop as the base value for the CIDR to specify
DesktopRemoteAccessCIDR
parameter in the CloudFormation stack you will create below. - For all passwords used in this tutorial, we recommend using strong passwords using the best-practices recommended for AWS root account user password.
The AWS CloudFormation template cfn/mozart.yml
in this repository creates AWS Identity and Access Management (IAM) resources, so when you create the CloudFormation Stack using the console, in the Review step, you must check
I acknowledge that AWS CloudFormation might create IAM resources.
Create a new AWS CloudFormation stack using the cfn/mozart.yml
template. The stack input parameters you must specify are described below:
Parameter Name | Parameter Description |
---|---|
KeyPairName | This is a required parameter whereby you select the Amazon EC2 key pair name used for SSH access to the desktop. You must have access to the selected key pair's private key to connect to your desktop. |
RedshiftMasterUserPassword | This is a required parameter whereby you specify the Redshift database master user password. |
DesktopRemoteAccessCIDR | This is a required parameter whereby you specify the public IP CIDR range from where you need remote access to your graphics desktop, e.g. 1.2.3.4/32, or 7.8.0.0/16. |
DesktopInstanceType | This is a required parameter whereby you select an Amazon EC2 instance type for the ROS desktop. The default value, g3s.xlarge , may not be available for your selected region, in which case, we recommend you try g4dn.xlarge , or one of the other instance types. |
S3Bucket | This is a required parameter whereby you specify the name of the Amazon S3 bucket to store your data. The S3 bucket must already exist. |
For all other stack input parameters, default values are recommended during first walkthrough. See complete list of all the template input parameters below.
The key resources in the CloudFormation stack are listed below:
- A ROS desktop EC2 instance (default type
g3s.xlarge
) - An Amazon EKS cluster with 2 managed node group nodes (default type
r5n.8xlarge
) - An Amazon MSK cluster with 3 broker nodes (default type
kafka.m5.2xlarge
) - An Amazon Redshift cluster with 3 nodes (default type
ra3.4xlarge
) - An Amazon Fsx for Lustre file-system (default size 7,200 GiB)
- An Amazon EFS file-system
- Once the stack status in CloudFormation console is
CREATE_COMPLETE
, find the desktop instance launched in your stack in the Amazon EC2 console, and connect to the instance using SSH as userubuntu
, using your SSH key pair. - When you connect to the desktop using SSH, and you see the message
"Cloud init in progress. Machine will REBOOT after cloud init is complete!!"
, disconnect and try later after about 20 minutes. The desktop installs the NICE DCV server on first-time startup, and reboots after the install is complete. - If you see the message
NICE DCV server is enabled!
, run the commandsudo passwd ubuntu
to set a new password for userubuntu
. Now you are ready to connect to the desktop using the NICE DCV client
- Download and install the NICE DCV client on your laptop.
- Use the NICE DCV Client to login to the desktop as user
ubuntu
- When you first login to the desktop using the NICE DCV client, you will be asked if you would like to upgrade the OS version. Do not upgrade the OS version .
Now you are ready to proceed with the following steps. For all the commands in this tutorial, we assume the working directory to be ~/amazon-eks-autonomous-driving-data-service
on the graphics desktop.
In this step, you need AWS credentials for programmatic access for the IAM user, or role, you used to create the AWS CloudFormation stack above. You must not use the AWS credentials for a different IAM user, or role. The AWS credentials are used one-time to enable EKS cluster access from the ROS desktop, and are automatically removed at the end of this step.
If you used an IAM role to create the CloudFormation stack above, you must manually configure the credentials associated with the IAM role in the ~/.aws/credentials
file with the following fields:
aws_access_key_id=
aws_secret_access_key=
aws_session_token=
If you used an IAM user to create the stack, you do not have to manually configure the credentials in ~/.aws/credentials
file.
In the working directory, run the command:
./scripts/configure-eks-auth.sh
At the successful execution of this command, you must see AWS Credentials Removed
.
To setup the eks cluster environment, in the working directory, run the command:
./scripts/setup-dev.sh
This step also builds and pushes the data service container image into Amazon ECR.
Before we can run the A2D2 data service, we need to extract the raw A2D2 data into your S3 bucket, extract the metadata from the raw data, and upload the metadata into the Redshift cluster. We execute these three steps using an AWS Step Functions state machine. To create and execute the AWS Step Functions state machine, execute the following command in the working directory:
./scripts/a2d2-etl-steps.sh
Note the executionArn
of the state machine execution in the output of the previous command. To check the status the status of the execution, use following command, replacing executionArn
below with your value:
aws stepfunctions describe-execution --execution-arn executionArn
You can see the status of the Step Function State Machine execution in the Step Functions console, as well.
The state machine execution time depends on many variable factors and may take anywhere from 12 - 24 hours, or possibly longer. You can see the status and the CloudWatch logs for each AWS Batch job spawned by the Step Function State Machine execution in the AWS Batch console. If a Batch job fails, it is automatically retried.
For best performance, preload A2D2 data from S3 to FSx. For a quick preview of the data service, you may proceed with the step below.
The data service is deployed using an Helm Chart, and runs as a kubernetes deployment
in EKS. To start the data service, execute the following command in the working directory:
helm install --debug a2d2-data-service ./a2d2/charts/a2d2-data-service/
To verify that the a2d2-data-service
deployment is running, execute the command:
kubectl get pods -n a2d2
The data service can be configured to input raw data from S3, FSx (see Preload A2D2 data from S3 to FSx ), or EFS (see Preload A2D2 data from S3 to EFS ). Similarly, the data client can specify the accept
field in the request
to request that the response data be staged on S3, FSx, or EFS. The raw data input source is fixed when the data service is deployed. However, the response data staging option is specified in the data client request, and is therefore dynamic.
Below is the Helm chart configuration in values.yaml
for various raw input data source options, with recommended Kubernetes resource requests for pod memory
and cpu
:
Raw input data source | values.yaml Configuration |
---|---|
fsx (default) |
a2d2.requests.memory: "72Gi" a2d2.requests.cpu: "8000m" configMap.data_store.input: "fsx" |
efs |
a2d2.requests.memory: "32Gi" a2d2.requests.cpu: "1000m" configMap.data_store.input: "efs" |
s3 |
a2d2.requests.memory: "8Gi" a2d2.requests.cpu: "1000m" configMap.data_store.input: "s3" |
For matching response data staging options in data client request, see requests.accept
field in data request fields. The recommended response data staging option for fsx
raw data source is"accept": "fsx/multipart/rosbag"
, for efs
raw data source is"accept": "efs/multipart/rosbag"
, and for s3
raw data source is"accept": "s3/multipart/rosbag"
To visualize the response data requested by the A2D2 data client, we will use rviz tool on the graphics desktop. Open a terminal on the desktop, and run rviz
. In the rviz
tool, use File>Open Config to select /home/ubuntu/amazon-eks-autonomous-driving-data-service/a2d2/config/a2d2.rviz
as the rviz
config. You should see rviz
tool now configured with two areas, one for visualizing image data, and the other for visualizing point cloud data.
To run the data client, execute the following command in the working directory:
python ./a2d2/src/data_client.py \
--config ./a2d2/config/c-config-ex1.json 1>/tmp/a.out 2>&1 &
After a brief delay, you should be able to preview the response data in the rviz
tool To preview data from a different drive scene, execute:
python ./a2d2/src/data_client.py \
--config ./a2d2/config/c-config-ex2.json 1>/tmp/a.out 2>&1 &
You can set "preview": false
in the data client config file, and run the above command to view the complete response, but before you do that, we recommend that for best performance, preload A2D2 data from S3 to FSx.
It is important that you do not run multiple data client instances on the ROS desktop concurrently. This is because the response data is played back on ROS topics, and there is only one ROS server running on the ROS desktop. Wait for a data client instance to exit, before you start another instance. Aborting the data client manually does not stop the data service pod from producing the response.
This step is for reference purposes. If at any time you need to do a hard reset of the data service, you can do so by executing:
helm delete a2d2-data-service
This will delete all data service EKS pods immediately. All in-flight service responses will be aborted. Because the connection between the data client and data service is asynchronous, the data clients may wait indefinitely, and you may need to cleanup the data client processes manually on the ROS desktop using operating system tools. Note, each data client instance spawns multiple Python processes.
When you no longer need the data service , you may delete the AWS CloudFormation stack from the AWS CloudFormation console. Deleting the stack will terminate the desktop instance, and delete the EFS and FSx for Lustre file-systems created in the stack. The Amazon S3 bucket is not deleted.
Below, we explain the semantics of the various fields in the data client request JSON object.
Request field name | Request field description |
---|---|
servers |
The servers identify the AWS MSK Kafka cluster endpoint. |
delay |
The delay specifies the delay in seconds that the data client delays sending the request. Default value is 0 . |
use_time |
(Optional) The use_time specifies whether to use the received time, or header time when playing back the received messages. Default value is received . |
requests |
The JSON document sent by the client to the data service must include an array of one or more data requests for drive scene data. |
requests.kafka_topic |
The kafka_topic specifies the Kafka topic on which the data request is sent from the client to the data service. The data service is listening on the topic. |
requests.vehicle_id |
The vehicle_id is used to identify the relevant drive scene dataset. |
requests.scene_id |
The scene_id identifies the drive scene of interest, which in this example is 20190401145936 , which in this example is a string representing the date and time of the drive scene, but in general could be any unique value. |
requests.start_ts |
The start_ts (microseconds) specifies the start timestamp for the drive scene data request. |
requests.stop_ts |
The stop_ts (microseconds) specifies the stop timestamp for the drive scene data request. |
requests.ros_topic |
The ros_topic is a dictionary from sensor ids in the vehicle to ros topics. |
requests.data_type |
The data_type is a dictionary from sensor ids to ros data types. |
requests.step |
The step is the discreet time interval (microseconds) used to discretize the timespan between start_ts and stop_ts . If requests.accept value contains multipart , the data service responds with a rosbag file for each discreet step : See possible values below. |
requests.accept |
The accept specifies the response data staging format acceptable to the client: See possible values below. |
requests.image |
(Optional) The value undistorted undistorts the camera image. Undistoring an image slows down the image frame rate. Default value is original distorted image. |
requests.lidar_view |
(Optional) The value vehicle transforms lidar points to vehicle frame of reference view. Default value is camera . |
requests.preview |
If the preview field is set to true , the data service returns requested data over a single time step starting from start_ts , and ignores the stop_ts . |
Below we describe the possible values for requests.accept
field:
requests.accept value |
Description |
---|---|
fsx/multipart/rosbag |
Stage response data on shared Amazon FSx file-system as discretized rosbag chunks |
efs/multipart/rosbag |
Stage response data on shared Amazon EFS file-system as discretized rosbag chunks |
s3/multipart/rosbag |
Stage response data on Amazon S3 as discretized rosbag chunks |
fsx/singlepart/rosbag |
Stage response data on shared Amazon FSx file-system as single rosbag |
efs/singlepart/rosbag |
Stage response data on on shared Amazon EFS file-system as single rosbag |
s3/singlepart/rosbag |
Stage response data on Amazon S3 as single rosbag |
manifest |
Return a manifest of meta-data containing S3 paths to the raw data: manifest is returned directly over the Kafka response topic, and is not staged |
This repository provides an AWS CloudFormation template that is used to create the required stack.
Below, we describe the AWS CloudFormation template input parameters. Desktop below refers to the NICE DCV enabled high-performance graphics desktop that acts as the data service client in this tutorial.
Parameter Name | Parameter Description |
---|---|
DesktopInstanceType | This is a required parameter whereby you select an Amazon EC2 instance type for the desktop running in AWS cloud. Default value is g3s.xlarge . |
DesktopEbsVolumeSize | This is a required parameter whereby you specify the size of the root EBS volume (default size is 200 GB) on the desktop. Typically, the default size is sufficient. |
DesktopEbsVolumeType | This is a required parameter whereby you select the EBS volume type (default is gp3). |
DesktopHasPublicIpAddress | This is a required parameter whereby you select whether a Public Ip Address be associated with the Desktop. Default value is true . |
DesktopRemoteAccessCIDR | This parameter specifies the public IP CIDR range from where you need remote access to your client desktop, e.g. 1.2.3.4/32, or 7.8.0.0/16. |
EKSEncryptSecrets | This is a required parameter whereby you select if encryption of EKS secrets is Enabled . Default value is Enabled . |
EKSEncryptSecretsKmsKeyArn | This is an optional advanced parameter whereby you specify the AWS KMS key ARN that is used to encrypt EKS secrets. Leave blank to create a new KMS key. |
EKSNodeGroupInstanceType | This is a required parameter whereby you select EKS Node group EC2 instance type. Default value is r5n.8xlarge . |
EKSNodeVolumeSizeGiB | This is a required parameter whereby you specify EKS Node group instance EBS volume size. Default value is 200 GiB. |
EKSNodeGroupMinSize | This is a required parameter whereby you specify EKS Node group minimum size. Default value is 1 node. |
EKSNodeGroupMaxSize | This is a required parameter whereby you specify EKS Node group maximum size. Default value is 8 nodes. |
EKSNodeGroupDesiredSize | This is a required parameter whereby you specify EKS Node group initial desired size. Default value is 2 nodes. |
FargateComputeType | This is a required parameter whereby you specify Fargate compute environment type. Allowed values are FARGATE_SPOT and FARGATE . Default value is FARGATE_SPOT . |
FargateComputeMax | This is a required parameter whereby you specify maximum size of Fargate compute environment in vCpus. Default value is 1024 . |
FSxStorageCapacityGiB | This is a required parameter whereby you specify the FSx Storage capacity, which must be in multiples of 2400 GiB . Default value is 7200 GiB . |
FSxS3ImportPrefix | This is an optional advanced parameter whereby you specify FSx S3 bucket path prefix for importing data from S3 bucket. Leave blank to import the complete bucket. |
KeyPairName | This is a required parameter whereby you select the Amazon EC2 key pair name used for SSH access to the desktop. You must have access to the selected key pair's private key to connect to your desktop. |
KubectlVersion | This is a required parameter whereby you specify EKS kubectl version. Default value is 1.21.2/2021-07-05 . |
KubernetesVersion | This is a required parameter whereby you specify EKS cluster version. Default value is 1.21 . |
MSKBrokerNodeType | This is a required parameter whereby you specify the type of node to be provisioned for AWS MSK Broker. |
MSKNumberOfNodes | This is a required parameter whereby you specify the number of MSK Broker nodes, which must be >= 2. |
PrivateSubnet1CIDR | This is a required parameter whereby you specify the Private Subnet1 CIDR in Vpc CIDR. Default value is 172.30.64.0/18 . |
PrivateSubnet2CIDR | This is a required parameter whereby you specify the Private Subnet2 CIDR in Vpc CIDR. Default value is 172.30.128.0/18 . |
PrivateSubnet3CIDR | This is a required parameter whereby you specify the Private Subnet3 CIDR in Vpc CIDR. Default value is 172.30.192.0/18 . |
PublicSubnet1CIDR | This is a required parameter whereby you specify the Public Subnet1 CIDR in Vpc CIDR. Default value is 172.30.0.0/24 . |
PublicSubnet2CIDR | This is a required parameter whereby you specify the Public Subnet2 CIDR in Vpc CIDR. Default value is 172.30.1.0/24 . |
PublicSubnet3CIDR | This is a required parameter whereby you specify the Public Subnet3 CIDR in Vpc CIDR. Default value is 172.30.2.0/24 . |
RedshiftDatabaseName | This is a required parameter whereby you specify the name of the Redshift database. Default value is mozart . |
RedshiftMasterUsername | This is a required parameter whereby you specify the name Redshift Master user name. Default value is admin . |
RedshiftMasterUserPassword | This is a required parameter whereby you specify the name Redshift Master user password. |
RedshiftNodeType | This is a required parameter whereby you specify the type of node to be provisioned for Redshift cluster. Default value is ra3.4xlarge . |
RedshiftNumberOfNodes | This is a required parameter whereby you specify the number of compute nodes in the Redshift cluster, which must be >= 2. |
RedshiftPortNumber | This is a required parameter whereby you specify the port number on which the Redshift cluster accepts incoming connections. Default value is 5439 . |
RosVersion | This is a required parameter whereby you specify the version of ROS. The supported versions are melodic on Ubuntu Bionic, and noetic on Ubuntu Focal. Default value is noetic . |
S3Bucket | This is a required parameter whereby you specify the name of the Amazon S3 bucket to store your data. |
UbuntuAMI | This is an optional advanced parameter whereby you specify Ubuntu AMI (18.04 or 20.04). |
VpcCIDR | This is a required parameter whereby you specify the Amazon VPC CIDR for the VPC created in the stack. Default value is 172.30.0.0/16. If you change this value, all the subnet parameters above may need to be set, as well. |
This step can be executed anytime after "Configure the data service" step has been executed
Amazon FSx for Lustre automatically lazy loads data from the configured S3 bucket. Therefore, this step is strictly a performance optimization step . However, for maximal performance, it is highly recommended. Execute following command to start preloading data from your S3 bucket to the FSx file-system:
kubectl apply -n a2d2 -f a2d2/fsx/stage-data-a2d2.yaml
To check if the step is complete, execute:
kubectl get pods stage-fsx-a2d2 -n a2d2
If the pod is still Running
, the step has not yet completed. This step takes approximately 5 hours to complete.
This step can be executed anytime after "Configure the data service" step has been executed
This step is required only if you plan to configure the data service to use EFS as the raw data input source, otherwise, it may be safely skipped. Execute following command to start preloading data from your S3 bucket to the EFS file-system:
kubectl apply -n a2d2 -f a2d2/efs/stage-data-a2d2.yaml
To check if the step is complete, execute:
kubectl get pods stage-efs-a2d2 -n a2d2
If the pod is still Running
, the step has not yet completed. This step takes approximately 6.5 hours to complete.