Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-31238] [WIP] Use IngestDB to speed up Rocksdb rescaling recovery #23169

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

mayuehappy
Copy link
Contributor

What is the purpose of the change

(For example: This pull request makes task deployment go through the blob server, rather than through RPC. That way we avoid re-transferring them on each deployment (during recovery).)

Brief change log

(for example:)

  • The TaskInfo is stored in the blob store on job creation time as a persistent artifact
  • Deployments RPC transmits only the blob storage reference
  • TaskManagers retrieve the TaskInfo from the blob cache

Verifying this change

Please make sure both new and modified tests in this PR follows the conventions defined in our code quality guide: https://flink.apache.org/contributing/code-style-and-quality-common.html#testing

(Please pick either of the following options)

This change is a trivial rework / code cleanup without any test coverage.

(or)

This change is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(example:)

  • Added integration tests for end-to-end deployment with large payloads (100MB)
  • Extended integration test for recovery after master (JobManager) failure
  • Added test that validates that TaskInfo is transferred only once across recoveries
  • Manually verified the change by running a 4 node cluster with 2 JobManagers and 4 TaskManagers, a stateful streaming program, and killing one JobManager and two TaskManagers during the execution, verifying that recovery happens correctly.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (yes / no)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (yes / no)
  • The serializers: (yes / no / don't know)
  • The runtime per-record code paths (performance sensitive): (yes / no / don't know)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes / no / don't know)
  • The S3 file system connector: (yes / no / don't know)

Documentation

  • Does this pull request introduce a new feature? (yes / no)
  • If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)

@mayuehappy mayuehappy changed the title draft:[FLINK-31238] Use IngestDB to speed up Rocksdb rescaling recovery [FLINK-31238] Use IngestDB to speed up Rocksdb rescaling recovery Aug 8, 2023
@flinkbot
Copy link
Collaborator

flinkbot commented Aug 8, 2023

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

@mayuehappy mayuehappy changed the title [FLINK-31238] Use IngestDB to speed up Rocksdb rescaling recovery [FLINK-31238] [WIP] Use IngestDB to speed up Rocksdb rescaling recovery Aug 8, 2023
@pnowojski pnowojski requested review from StefanRRichter and rkhachatryan and removed request for StefanRRichter September 4, 2023 09:41
@StefanRRichter
Copy link
Contributor

@mayuehappy thanks for the draft PR. General idea of the PR looks good to me, we should still polish it and avoid code duplication. I currently cannot test the PR because the frocksDB dependency is missing. How do you want to proceed with this?

@pnowojski
Copy link
Contributor

It looks like we will need to bump RocksDB from 6.20.x to 8.5.x

8.5.0-ververica-1.0-db-SNASHOT

@curcur or @Myasuka , do you know if there are any known hurdles that would prevent us from upgrading RocksDB atm?

@Myasuka
Copy link
Member

Myasuka commented Sep 5, 2023

@pnowojski I think we might have two small problems to face:

  1. We might face the performance regression on micro-benchmarks after upgrading to RocksDB-8.x, just like we upgrade from 5.x to 6.x. However, I actually try to bump to RocksDB-7.x internally and found performance improvement due to the compaction improvement. Since our micro-benchmarks cannot reflect the compaction problem, we can use nexmark to verify the end-to-end performance then.
  2. Latest RocksDB supports a new platform: s390x, we might not able to compile and get one binary .so file due to lacking machines. However, since our current version does not support that platform, we can launch an announcement/discussion to not support this new platform.

@fredia
Copy link
Contributor

fredia commented Sep 6, 2023

Newer versions of RocksDB are backward compatible, compatibility will not be a hindrance. https://github.com/facebook/rocksdb/wiki/RocksDB-Overview#compatibility
@Myasuka For s390x, I think we can use travis-ci to build one binary .so file if needed.

@pnowojski
Copy link
Contributor

Ok, thanks @Myasuka and @fredia for the input. Yes will need to benchmark this both with micro and ideally macro benchmarks as well. I would hope there won't be a micro benchmark regression, and that 5.x -> 6.x was one of issue, but who knows.

@StefanRRichter
Copy link
Contributor

@mayuehappy I noticed that one required PR against RocksDB (facebook/rocksdb#11646) is still not merged and the conversation is stalling. Do you think it's ok to ping Adam to take another look?

@mayuehappy
Copy link
Contributor Author

It looks like we will need to bump RocksDB from 6.20.x to 8.5.x

8.5.0-ververica-1.0-db-SNASHOT

@curcur or @Myasuka , do you know if there are any known hurdles that would prevent us from upgrading RocksDB atm?

@mayuehappy thanks for the draft PR. General idea of the PR looks good to me, we should still polish it and avoid code duplication. I currently cannot test the PR because the frocksDB dependency is missing. How do you want to proceed with this?
@StefanRRichter
I’m not sure how the Flink community tested Frocksdb-jni before it was released. Is there a maven repository for testing where I can deploy my test-frocksdb-jar ? @Myasuka

@mayuehappy
Copy link
Contributor Author

@mayuehappy I noticed that one required PR against RocksDB (facebook/rocksdb#11646) is still not merged and the conversation is stalling. Do you think it's ok to ping Adam to take another look?

@mayuehappy I noticed that one required PR against RocksDB (facebook/rocksdb#11646) is still not merged and the conversation is stalling. Do you think it's ok to ping Adam to take another look?

@StefanRRichter Thanks for the reminder, I've pinged him and cbi42 under the PR.
BTW, is it possible that we merge this part of the JNI code into frocksdb before they merge into rocksdb?

@StefanRRichter
Copy link
Contributor

@mayuehappy I think there is already a reply on the RocksDB PR. So if you think with their guidance it's now reasonable to assume this can be merged soon, then there is no need to merge early. But otherwise that would be an option to make progress on the Flink side.

@mayuehappy
Copy link
Contributor Author

@flinkbot run azure

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants