Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

importccl: faster Avro imports #45097

Closed
mwang1026 opened this issue Feb 13, 2020 · 0 comments · Fixed by #45269
Closed

importccl: faster Avro imports #45097

mwang1026 opened this issue Feb 13, 2020 · 0 comments · Fixed by #45269

Comments

@mwang1026
Copy link

Imports can parse Avro files as of #30283 but the speed is not yet optimized.

Outcome of this issue would be to increase Avro speed to an acceptable level--ideally comparable to our fastest Import functionality today. Working with partners to determine sufficient speed will be a part of acceptance of this issue.

miretskiy pushed a commit to miretskiy/cockroach that referenced this issue Feb 25, 2020
Parallelize avro importer to improve its throughput (2.8x improvement).

Fixes cockroachdb#45097

Release notes (performance): Faster avro import
miretskiy pushed a commit to miretskiy/cockroach that referenced this issue Feb 26, 2020
Parallelize avro importer to improve its throughput (2.8x improvement).

Fixes cockroachdb#45097

Release notes (performance): Faster avro import
miretskiy pushed a commit to miretskiy/cockroach that referenced this issue Feb 28, 2020
Parallelize avro importer to improve its throughput (2.8x improvement).

Fixes cockroachdb#45097

Release notes (performance): Faster avro import
miretskiy pushed a commit to miretskiy/cockroach that referenced this issue Feb 28, 2020
Parallelize avro importer to improve its throughput (2.8x improvement).

Fixes cockroachdb#45097

Release notes (performance): Faster avro import
miretskiy pushed a commit to miretskiy/cockroach that referenced this issue Mar 2, 2020
Parallelize avro importer to improve its throughput (2.8x improvement).

Fixes cockroachdb#45097

Release notes (performance): Faster avro import
craig bot pushed a commit that referenced this issue Mar 2, 2020
45269: importccl: Parallelize avro import r=miretskiy a=miretskiy

Parallelize avro importer to improve its throughput (2.8x improvement).

Touches #40374.
Fixes #45097.

Release notes (performance): Faster avro import

45482: storage: integrate Concurrency Manager into Replica request path r=nvanbenschoten a=nvanbenschoten

Related to #41720.
Related to #44976.

This commit integrates the new concurrency package into the storage package. Each Replica is given a concurrency manager, which replaces its existing latch manager and txn wait queue. The change also uses the concurrency manager to simplify the role of the intent resolver. The intent resolver no longer directly handles WriteIntentErrors. As a result, we are able to delete the contention queue entirely.

With this change, all requests are now sequenced through the concurrency manager. When sequencing, latches are acquired and conflicting locks are detected. If any locks are found, the requests wait in lock wait-queues for the locks to be resolved. This is a major deviation from how things currently work because today, even with the contention queue, requests end up waiting for conflicting transactions to commit/abort in the txnWaitQueue after at least one RPC. Now, requests wait directly next to the intents/locks that they are waiting on and react directly to the resolution of these intents/locks.

Once requests are sequenced by the concurrency manager, they are theoretically fully isolated from all conflicting requests. However, this is not strictly true today because we have not yet pulled all replicated locks into the concurrency manager's lock table. We will do so in a future change. Until then, the concurrency manager maintains a notion of "intent discovery", which is integrated into the Replica-level concurrency retry loop.

Performance numbers will be published shortly. This will be followed by performance numbers using the SELECT FOR UPDATE locking (#40205) improvements that this change enables.

45484: sql: simplify connection state machine - stop tracking retry intent r=andreimatei a=andreimatei

Before this patch, the SQL connection state machine had an optimization:
if a transaction that hadn't used "SAVEPOINT cockroach_restart"
encountered a retriable error that we can't auto-retry, then we'd
release the txn's locks eagerly and enter the Aborted state. As opposed
to transactions that had used the "SAVEPOINT cockroach_restart", which
go to RestartWait.
This optimization is a significant complication for the state machine,
so this patch is removing it. All transactions now go to RestartWait,
and wait for a ROLLBACK to release the locks.

On the flip side, doing "RELEASE SAVEPOINT cockroach_restart" and
"ROLLBACK SAVEPOINT cockroach_restart" now works even for transactions
that haven't explicitly declared that savepoint, which is nice. Although
I don't promise I'll keep it working.

Release note: None

Co-authored-by: Yevgeniy Miretskiy <yevgeniy@cockroachlabs.com>
Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
Co-authored-by: Andrei Matei <andrei@cockroachlabs.com>
@craig craig bot closed this as completed in 7bd8e7d Mar 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants