-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
importccl: faster Avro imports #45097
Labels
Comments
miretskiy
pushed a commit
to miretskiy/cockroach
that referenced
this issue
Feb 25, 2020
Parallelize avro importer to improve its throughput (2.8x improvement). Fixes cockroachdb#45097 Release notes (performance): Faster avro import
miretskiy
pushed a commit
to miretskiy/cockroach
that referenced
this issue
Feb 26, 2020
Parallelize avro importer to improve its throughput (2.8x improvement). Fixes cockroachdb#45097 Release notes (performance): Faster avro import
miretskiy
pushed a commit
to miretskiy/cockroach
that referenced
this issue
Feb 28, 2020
Parallelize avro importer to improve its throughput (2.8x improvement). Fixes cockroachdb#45097 Release notes (performance): Faster avro import
miretskiy
pushed a commit
to miretskiy/cockroach
that referenced
this issue
Feb 28, 2020
Parallelize avro importer to improve its throughput (2.8x improvement). Fixes cockroachdb#45097 Release notes (performance): Faster avro import
miretskiy
pushed a commit
to miretskiy/cockroach
that referenced
this issue
Mar 2, 2020
Parallelize avro importer to improve its throughput (2.8x improvement). Fixes cockroachdb#45097 Release notes (performance): Faster avro import
craig bot
pushed a commit
that referenced
this issue
Mar 2, 2020
45269: importccl: Parallelize avro import r=miretskiy a=miretskiy Parallelize avro importer to improve its throughput (2.8x improvement). Touches #40374. Fixes #45097. Release notes (performance): Faster avro import 45482: storage: integrate Concurrency Manager into Replica request path r=nvanbenschoten a=nvanbenschoten Related to #41720. Related to #44976. This commit integrates the new concurrency package into the storage package. Each Replica is given a concurrency manager, which replaces its existing latch manager and txn wait queue. The change also uses the concurrency manager to simplify the role of the intent resolver. The intent resolver no longer directly handles WriteIntentErrors. As a result, we are able to delete the contention queue entirely. With this change, all requests are now sequenced through the concurrency manager. When sequencing, latches are acquired and conflicting locks are detected. If any locks are found, the requests wait in lock wait-queues for the locks to be resolved. This is a major deviation from how things currently work because today, even with the contention queue, requests end up waiting for conflicting transactions to commit/abort in the txnWaitQueue after at least one RPC. Now, requests wait directly next to the intents/locks that they are waiting on and react directly to the resolution of these intents/locks. Once requests are sequenced by the concurrency manager, they are theoretically fully isolated from all conflicting requests. However, this is not strictly true today because we have not yet pulled all replicated locks into the concurrency manager's lock table. We will do so in a future change. Until then, the concurrency manager maintains a notion of "intent discovery", which is integrated into the Replica-level concurrency retry loop. Performance numbers will be published shortly. This will be followed by performance numbers using the SELECT FOR UPDATE locking (#40205) improvements that this change enables. 45484: sql: simplify connection state machine - stop tracking retry intent r=andreimatei a=andreimatei Before this patch, the SQL connection state machine had an optimization: if a transaction that hadn't used "SAVEPOINT cockroach_restart" encountered a retriable error that we can't auto-retry, then we'd release the txn's locks eagerly and enter the Aborted state. As opposed to transactions that had used the "SAVEPOINT cockroach_restart", which go to RestartWait. This optimization is a significant complication for the state machine, so this patch is removing it. All transactions now go to RestartWait, and wait for a ROLLBACK to release the locks. On the flip side, doing "RELEASE SAVEPOINT cockroach_restart" and "ROLLBACK SAVEPOINT cockroach_restart" now works even for transactions that haven't explicitly declared that savepoint, which is nice. Although I don't promise I'll keep it working. Release note: None Co-authored-by: Yevgeniy Miretskiy <yevgeniy@cockroachlabs.com> Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com> Co-authored-by: Andrei Matei <andrei@cockroachlabs.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Imports can parse Avro files as of #30283 but the speed is not yet optimized.
Outcome of this issue would be to increase Avro speed to an acceptable level--ideally comparable to our fastest Import functionality today. Working with partners to determine sufficient speed will be a part of acceptance of this issue.
The text was updated successfully, but these errors were encountered: