Skip to content

Conversation

@viirya
Copy link
Member

@viirya viirya commented Jan 1, 2021

What changes were proposed in this pull request?

This patch proposes to add latest offset to source progress for streaming queries.

Why are the changes needed?

Currently we record start and end offsets per source in streaming process. Latest offset is an important information for streaming process but the progress lacks of this info. We can use it to track the process lag and adjust streaming queries. We should add latest offset to source progress.

Does this PR introduce any user-facing change?

Yes, for new metric about latest source offset in source progress.

How was this patch tested?

Unit test. Manually test in Spark cluster:

    "description" : "KafkaV2[Subscribe[page_view_events]]",
    "startOffset" : {
      "page_view_events" : {
        "2" : 582370921,
        "4" : 391910836,
        "1" : 631009201,
        "3" : 406601346,
        "0" : 195799112
      }
    },
    "endOffset" : {
      "page_view_events" : {
        "2" : 583764414,
        "4" : 392338002,
        "1" : 632183480,
        "3" : 407101489,
        "0" : 197304028
      }
    },
    "latestOffset" : {
      "page_view_events" : {
        "2" : 589852545,
        "4" : 394204277,
        "1" : 637313869,
        "3" : 409286602,
        "0" : 203878962
      }
    },
    "numInputRows" : 4999997,
    "inputRowsPerSecond" : 29287.70501405811,

@SparkQA

This comment has been minimized.

@SparkQA
Copy link

SparkQA commented Jan 1, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38176/

@SparkQA
Copy link

SparkQA commented Jan 1, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38176/

@github-actions github-actions bot added the BUILD label Jan 2, 2021
@SparkQA
Copy link

SparkQA commented Jan 2, 2021

Test build #133590 has finished for PR 30988 at commit 5032861.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

cc @HeartSaVioR and @xuanyuanking FYI

@HyukjinKwon
Copy link
Member

Looks making sense to me too

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thank you, @viirya and @HyukjinKwon .
Merged to master for Apache Spark 3.2.0.

flyrain pushed a commit to flyrain/spark that referenced this pull request Sep 21, 2021
This patch proposes to add latest offset to source progress for streaming queries.

Currently we record start and end offsets per source in streaming process. Latest offset is an important information for streaming process but the progress lacks of this info. We can use it to track the process lag and adjust streaming queries. We should add latest offset to source progress.

Yes, for new metric about latest source offset in source progress.

Unit test. Manually test in Spark cluster:

```
    "description" : "KafkaV2[Subscribe[page_view_events]]",
    "startOffset" : {
      "page_view_events" : {
        "2" : 582370921,
        "4" : 391910836,
        "1" : 631009201,
        "3" : 406601346,
        "0" : 195799112
      }
    },
    "endOffset" : {
      "page_view_events" : {
        "2" : 583764414,
        "4" : 392338002,
        "1" : 632183480,
        "3" : 407101489,
        "0" : 197304028
      }
    },
    "latestOffset" : {
      "page_view_events" : {
        "2" : 589852545,
        "4" : 394204277,
        "1" : 637313869,
        "3" : 409286602,
        "0" : 203878962
      }
    },
    "numInputRows" : 4999997,
    "inputRowsPerSecond" : 29287.70501405811,
```

Closes apache#30988 from viirya/latest-offset.

Authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
@viirya viirya deleted the latest-offset branch December 27, 2023 18:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants