The Amazon Kinesis Agent is a stand-alone Java software application that offers an easier way to collect and ingest data into Amazon Kinesis services, including Amazon Kinesis Streams and Amazon Kinesis Firehose.
To run tests in IDEA right click on tst
folder and mark directory as -> test sources root
,
then you can run tests as any other JUnit tests.
If you specify "filepath": "TRACKED_FILE_PATH"
as part of metadata in ADDMETADATA
step, the filepath will be
replaced with current log file's absolute file path.
- Monitors file patterns and sends new data records to delivery streams
- Handles file rotation, checkpointing, and retry upon failure
- Delivers all data in a reliable, timely, and simpler manner
- Emits Amazon CloudWatch metrics to help you better monitor and troubleshoot the streaming process
- Sign up for AWS — Before you begin, you need an AWS account. For more information about creating an AWS account and retrieving your AWS credentials, see AWS Account and Credentials in the AWS SDK for Java Developer Guide.
- Sign up for Amazon Kinesis — Go to the Amazon Kinesis console to sign up for the service and create a Amazon Kinesis stream or Firehose delivery stream. For more information, see Create an Amazon Kinesis Stream in the Amazon Kinesis Streams Developer Guide or Create an Amazon Kinesis Firehose Delivery Stream in the Amazon Kinesis Firehose Developer Guide.
- Minimum requirements — To start the Amazon Kinesis Agent, you need Java 1.7+.
- Using the Amazon Kinesis Agent — For more information about using the Amazon Kinesis Agent to deliver data to Streams and Firehose, see Writing to Amazon Kinesis with Agents and Writing to Delivery Streams with Agents.
After you've downloaded the code from GitHub, you can install the Amazon Kinesis Agent with the following command:
# Optionally, you can set DEBUG=1 in your environment to enable massively
# verbose output of the script
sudo ./setup --install
This setup script downloads all the dependencies and bootstraps the environment for running the Java program.
After the agent is installed, the configuration file can be found in /etc/aws-kinesis/agent.json
. You need to modify this configuration file to set the data destinations and AWS credentials, and to point the agent to the files to push. After you complete the configuration, you can make the agent start automatically at system startup with the following command:
sudo chkconfig aws-kinesis-agent on
If you do not want the agent running at system startup, turn it off with the following command:
sudo chkconfig aws-kinesis-agent off
To start the agent manually, use the following command:
sudo service aws-kinesis-agent start
You can make sure the agent is running with the following command:
sudo service aws-kinesis-agent status
You may see messages such as aws-kinesis-agent (pid [PID]) is running...
To stop the agent, use the following command:
sudo service aws-kinesis-agent stop
The agent writes its logs to /var/log/aws-kinesis-agent/aws-kinesis-agent.log
To uninstall the agent, use the following command:
sudo ./setup --uninstall
The installation done by the setup script is only tested on the following OS Disributions:
- Red Hat Enterprise Linux version 7 or later
- Amazon Linux AMI version 2015.09 or later
- Ubuntu Linux version 12.04 or later
- Debian Linux version 8.6 or later
For other distributions or platforms, you can build the Java project with the following command:
sudo ./setup --build
or by using Ant target as you would build any Java program:
ant [-Dbuild.dependencies=DEPENDENCY_DIR]
If you use Ant command, you need to download all the dependencies listed in pom.xml before building the Java program. DEPENDENCY_DIR is the directory where you download and store the dependencies. By default, the Amazon Kinesis Agent reads the configuration file from /etc/aws-kinesis/agent.json. You need to create such a file if it does not already exist. A sample configuration can be found at ./configuration/release/aws-kinesis-agent.json
To start the program, use the following command:
java -cp CLASSPATH "com.amazon.kinesis.streaming.agent.Agent"
To build a Debian package, this will need to be run inside on Debian/Ubuntu machine. First, build a docker container with:
docker build .devcontainer/ -t aws-kinesis-ubuntu
Then, run the package command inside the container:
docker run -v $PWD/:/aws-kinesis-agent/ -w /aws-kinesis-agent/ aws-kinesis-ubuntu ./setup --package
This will generate 2 files in the package
directory:
- Debian package:
amazon-kinesis-agent_1.1-etleap_amd64.deb
- RPM package:
amazon-kinesis-agent-1.1-etleap.x86_64.rpm
Both files need to be present in S3. To copy to the correct bucket run:
aws s3 cp package/amazon-kinesis-agent_1.1-etleap_amd64.deb s3://datadanze-emr/conf-hadoop2/
aws s3 cp package/amazon-kinesis-agent-1.1-etleap.x86_64.rpm s3://datadanze-emr/conf-hadoop2/
CLASSPATH is the classpath to your dependencies and the target JAR file that you built from the step above.
- Pre-process data before sending it to destinations — Amazon Kinesis Agent now supports to pre-process the records parsed from monitored files before sending them to your streams. The processing capability can be enabled by adding dataProcessingOptions configuration to file flow. There are three available options for now: SINGLELINE, CSVTOJSON, and LOGTOJSON. For more information, see Writing to Amazon Kinesis with Agents and Writing to Delivery Streams with Agents.
- Ingore tailing compressed files — Compressed file extensions, e.g. .gz, .bz2, and .zip, are ignored for tailing.
- Force to kill the program on out-of-memory error — The program will be killed when it's out of memory.
- This is the first release.