[MINOR][STREAMING][DOCS] Minor changes on kinesis integration

keypointt · tdas · commit 05d7151ccbcc · 2016-07-11T18:09:14.000-07:00
## What changes were proposed in this pull request? Some minor changes for documentation page "Spark Streaming + Kinesis Integration". Moved "streaming-kinesis-arch.png" before the bullet list, not in between the bullets. ## How was this patch tested? Tested manually, on my local machine. Author: Xin Ren <iamshrek@126.com> Closes #14097 from keypointt/kinesisDoc.
diff --git a/docs/streaming-kinesis-integration.md b/docs/streaming-kinesis-integration.md
@@ -111,7 +111,7 @@ A Kinesis stream can be set up at one of the valid Kinesis endpoints with 1 or m
 
 	- `[checkpoint interval]`: The interval (e.g., Duration(2000) = 2 seconds) at which the Kinesis Client Library saves its position in the stream.  For starters, set it to the same as the batch interval of the streaming application.
 
-	- `[initial position]`: Can be either `InitialPositionInStream.TRIM_HORIZON` or `InitialPositionInStream.LATEST` (see Kinesis Checkpointing section and Amazon Kinesis API documentation for more details).
+	- `[initial position]`: Can be either `InitialPositionInStream.TRIM_HORIZON` or `InitialPositionInStream.LATEST` (see [`Kinesis Checkpointing`](#kinesis-checkpointing) section and [`Amazon Kinesis API documentation`](http://docs.aws.amazon.com/streams/latest/dev/developing-consumers-with-sdk.html) for more details).
 
 	- `[message handler]`: A function that takes a Kinesis `Record` and outputs generic `T`.
 
@@ -128,14 +128,6 @@ A Kinesis stream can be set up at one of the valid Kinesis endpoints with 1 or m
 	Alternatively, you can also download the JAR of the Maven artifact `spark-streaming-kinesis-asl-assembly` from the
 	[Maven repository](http://search.maven.org/#search|ga|1|a%3A%22spark-streaming-kinesis-asl-assembly_{{site.SCALA_BINARY_VERSION}}%22%20AND%20v%3A%22{{site.SPARK_VERSION_SHORT}}%22) and add it to `spark-submit` with `--jars`.
 
-	*Points to remember at runtime:*
-
-	- Kinesis data processing is ordered per partition and occurs at-least once per message.
-
-	- Multiple applications can read from the same Kinesis stream.  Kinesis will maintain the application-specific shard and checkpoint info in DynamoDB.
-
-	- A single Kinesis stream shard is processed by one input DStream at a time.
-
 	<p style="text-align: center;">
   		<img src="img/streaming-kinesis-arch.png"
        		title="Spark Streaming Kinesis Architecture"
@@ -145,6 +137,14 @@ A Kinesis stream can be set up at one of the valid Kinesis endpoints with 1 or m
 	  	<!-- Images are downsized intentionally to improve quality on retina displays -->
 	</p>
 
+	*Points to remember at runtime:*
+
+	- Kinesis data processing is ordered per partition and occurs at-least once per message.
+
+	- Multiple applications can read from the same Kinesis stream.  Kinesis will maintain the application-specific shard and checkpoint info in DynamoDB.
+
+	- A single Kinesis stream shard is processed by one input DStream at a time.
+
 	- A single Kinesis input DStream can read from multiple shards of a Kinesis stream by creating multiple KinesisRecordProcessor threads.
 
 	- Multiple input DStreams running in separate processes/instances can read from a Kinesis stream.
@@ -173,7 +173,7 @@ To run the example,
 
 - Set up Kinesis stream (see earlier section) within AWS. Note the name of the Kinesis stream and the endpoint URL corresponding to the region where the stream was created.
 
-- Set up the environment variables AWS_ACCESS_KEY_ID and AWS_SECRET_KEY with your AWS credentials.
+- Set up the environment variables `AWS_ACCESS_KEY_ID` and `AWS_SECRET_KEY` with your AWS credentials.
 
 - In the Spark root directory, run the example as
 
@@ -216,6 +216,6 @@ de-aggregate records during consumption.
 
 - Checkpointing too frequently will cause excess load on the AWS checkpoint storage layer and may lead to AWS throttling.  The provided example handles this throttling with a random-backoff-retry strategy.
 
-- If no Kinesis checkpoint info exists when the input DStream starts, it will start either from the oldest record available (InitialPositionInStream.TRIM_HORIZON) or from the latest tip (InitialPositionInStream.LATEST).  This is configurable.
-- InitialPositionInStream.LATEST could lead to missed records if data is added to the stream while no input DStreams are running (and no checkpoint info is being stored).
-- InitialPositionInStream.TRIM_HORIZON may lead to duplicate processing of records where the impact is dependent on checkpoint frequency and processing idempotency.
+- If no Kinesis checkpoint info exists when the input DStream starts, it will start either from the oldest record available (`InitialPositionInStream.TRIM_HORIZON`) or from the latest tip (`InitialPositionInStream.LATEST`).  This is configurable.
+  - `InitialPositionInStream.LATEST` could lead to missed records if data is added to the stream while no input DStreams are running (and no checkpoint info is being stored).
+  - `InitialPositionInStream.TRIM_HORIZON` may lead to duplicate processing of records where the impact is dependent on checkpoint frequency and processing idempotency.