Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added implementation notes #1

Merged
merged 1 commit into from
Feb 2, 2016

Conversation

andreimatei
Copy link
Collaborator

No description provided.

@andreimatei andreimatei merged this pull request into nvanbenschoten/typing-rfc Feb 2, 2016
@nvanbenschoten nvanbenschoten deleted the andreimatei-rfc branch February 2, 2016 02:22
nvanbenschoten pushed a commit that referenced this pull request May 29, 2016
This prevents the following deprecation warning from being printed:

(node) v8::ObjectTemplate::Set() with non-primitive values is deprecated
(node) and will stop working in the next major release.

==== JS stack trace =========================================

Security context: 0x3130826c9e59 <JS Object>#0#
    1: .node [module.js:568] [pc=0x1190f72183a4] (this=0x1cc469bec951 <an Object with map 0x7bdf6818161>#1#,module=0x225f801cc5a1 <a Module with map 0x7bdf6818739>#2#,filename=0x225f801cc501 <String[131]: /Users/tamird/src/go/src/github.com/cockroachdb/cockroach/ui/node_modules/fsevents/lib/binding/Release/node-v48-darwin-x64/fse.node>)
    2: load [module.js:458] [pc=0x1190f71396f2] (this=0x225f801cc5a1 <a Module with map 0x7bdf6818739>#2#,filename=0x225f801cc501 <String[131]: /Users/tamird/src/go/src/github.com/cockroachdb/cockroach/ui/node_modules/fsevents/lib/binding/Release/node-v48-darwin-x64/fse.node>)
    3: tryModuleLoad(aka tryModuleLoad) [module.js:417] [pc=0x1190f713921d] (this=0x313082604189 <undefined>,module=0x225f801cc5a1 <a Module with map 0x7bdf6818739>#2#,filename=0x225f801cc501 <String[131]: /Users/tamird/src/go/src/github.com/cockroachdb/cockroach/ui/node_modules/fsevents/ 1: v8::Template::Set(v8::Local<v8::Name>, v8::Local<v8::Data>, v8::PropertyAttribute)
 2: fse::FSEvents::Initialize(v8::Local<v8::Object>)
 3: node::DLOpen(v8::FunctionCallbackInfo<v8::Value> const&)
 4: v8::internal::FunctionCallbackArguments::Call(void (*)(v8::FunctionCallbackInfo<v8::Value> const&))
 5: v8::internal::MaybeHandle<v8::internal::Object> v8::internal::(anonymous namespace)::HandleApiCallHelper<false>(v8::internal::Isolate*, v8::internal::(anonymous namespace)::BuiltinArguments<(v8::internal::BuiltinExtraArguments)1>)
 6: v8::internal::Builtin_HandleApiCall(int, v8::internal::Object**, v8::internal::Isolate*)
 7: 0x1190f700961b
 8: 0x1190f72183a4
(node) v8::ObjectTemplate::Set() with non-primitive values is deprecated
(node) and will stop working in the next major release.

==== JS stack trace =========================================

Security context: 0x3130826c9e59 <JS Object>#0#
    1: .node [module.js:568] [pc=0x1190f72183a4] (this=0x1cc469bec951 <an Object with map 0x7bdf6818161>#1#,module=0x225f801cc5a1 <a Module with map 0x7bdf6818739>#2#,filename=0x225f801cc501 <String[131]: /Users/tamird/src/go/src/github.com/cockroachdb/cockroach/ui/node_modules/fsevents/lib/binding/Release/node-v48-darwin-x64/fse.node>)
    2: load [module.js:458] [pc=0x1190f71396f2] (this=0x225f801cc5a1 <a Module with map 0x7bdf6818739>#2#,filename=0x225f801cc501 <String[131]: /Users/tamird/src/go/src/github.com/cockroachdb/cockroach/ui/node_modules/fsevents/lib/binding/Release/node-v48-darwin-x64/fse.node>)
    3: tryModuleLoad(aka tryModuleLoad) [module.js:417] [pc=0x1190f713921d] (this=0x313082604189 <undefined>,module=0x225f801cc5a1 <a Module with map 0x7bdf6818739>#2#,filename=0x225f801cc501 <String[131]: /Users/tamird/src/go/src/github.com/cockroachdb/cockroach/ui/node_modules/fsevents/ 1: v8::Template::Set(v8::Local<v8::Name>, v8::Local<v8::Data>, v8::PropertyAttribute)
 2: fse::FSEvents::Initialize(v8::Local<v8::Object>)
 3: node::DLOpen(v8::FunctionCallbackInfo<v8::Value> const&)
 4: v8::internal::FunctionCallbackArguments::Call(void (*)(v8::FunctionCallbackInfo<v8::Value> const&))
 5: v8::internal::MaybeHandle<v8::internal::Object> v8::internal::(anonymous namespace)::HandleApiCallHelper<false>(v8::internal::Isolate*, v8::internal::(anonymous namespace)::BuiltinArguments<(v8::internal::BuiltinExtraArguments)1>)
 6: v8::internal::Builtin_HandleApiCall(int, v8::internal::Object**, v8::internal::Isolate*)
 7: 0x1190f700961b
 8: 0x1190f72183a4
nvanbenschoten pushed a commit that referenced this pull request Jan 18, 2018
The current opt test has two parts to it:
  1. Data-driven testing framework that dynamically loads the test
     case and expected result from a data file.
  2. The actual tests.

Already, the tests use multiple different kinds of commands, each
with its own input and result formats. Fast-forward a few years and
this code will become even more complex. This change separates the
parts so that #1 can be reused elsewhere. Even #2 should eventually
be broken up into separate files, one for each major chunk of
functionality (exec, normalization, build, etc). That will keep our
tests more decoupled from one another and easier to maintain.
nvanbenschoten pushed a commit that referenced this pull request Mar 26, 2019
This PR fixes the following test failure on Linux:

```
$ make test IGNORE_GOVERS=1 PKG=./pkg/util/log
--- FAIL: TestCrashReportingPacket (0.20s)
    --- FAIL: TestCrashReportingPacket/#00 (0.00s)
        crash_reporting_packet_test.go:155: expected crash_reporting_packet_test.go:85: boom, got testing.go:865: boom
    --- FAIL: TestCrashReportingPacket/#1 (0.00s)
        crash_reporting_packet_test.go:155: expected crash_reporting_packet_test.go:93: baam, got testing.go:865: baam
FAIL
```

Interestingly this is not quite the same failure noted in cockroachdb#35792, maybe that's
MacOS specific?

Release note: None
nvanbenschoten pushed a commit that referenced this pull request Mar 26, 2019
36144: log: fix call depth for go1.12 panics r=ajwerner a=ajwerner

This PR fixes the following test failure on Linux:

```
$ make test IGNORE_GOVERS=1 PKG=./pkg/util/log
--- FAIL: TestCrashReportingPacket (0.20s)
    --- FAIL: TestCrashReportingPacket/#00 (0.00s)
        crash_reporting_packet_test.go:155: expected crash_reporting_packet_test.go:85: boom, got testing.go:865: boom
    --- FAIL: TestCrashReportingPacket/#1 (0.00s)
        crash_reporting_packet_test.go:155: expected crash_reporting_packet_test.go:93: baam, got testing.go:865: baam
FAIL
```

Interestingly this is not quite the same failure noted in cockroachdb#35792, maybe that's
MacOS specific?

Release note: None

Co-authored-by: Andrew Werner <ajwerner@cockroachlabs.com>
nvanbenschoten added a commit that referenced this pull request Jul 10, 2019
Fixes cockroachdb#36089.
Informs cockroachdb#37199.

This commit addresses the second concern from cockroachdb#37199 (comment)
by implementing its suggestion #1.

It augments the TxnMeta struct to begin storing the transaction's
minimum timestamp. This allows pushers to have perfect accuracy
into whether an intent is part of a transaction that can eventually
be committed or whether it has been aborted by any other pusher and
uncommittable.

This allows us to get rid of the false positive cases where a pusher
incorrectly detects that a transaction is still active and begins
waiting on it. In this worst case, this could lead to transactions
waiting for the entire transaction expiration for a contending txn.

Release note: None
nvanbenschoten added a commit that referenced this pull request Jul 10, 2019
Fixes cockroachdb#36089.
Informs cockroachdb#37199.

This commit addresses the second concern from cockroachdb#37199 (comment)
by implementing its suggestion #1.

It augments the TxnMeta struct to begin storing the transaction's
minimum timestamp. This allows pushers to have perfect accuracy
into whether an intent is part of a transaction that can eventually
be committed or whether it has been aborted by any other pusher and
uncommittable.

This allows us to get rid of the false positive cases where a pusher
incorrectly detects that a transaction is still active and begins
waiting on it. In this worst case, this could lead to transactions
waiting for the entire transaction expiration for a contending txn.

Release note: None
nvanbenschoten added a commit that referenced this pull request Jul 10, 2019
Fixes cockroachdb#36089.
Informs cockroachdb#37199.

This commit addresses the second concern from cockroachdb#37199 (comment)
by implementing its suggestion #1.

It augments the TxnMeta struct to begin storing the transaction's
minimum timestamp. This allows pushers to have perfect accuracy
into whether an intent is part of a transaction that can eventually
be committed or whether it has been aborted by any other pusher and
uncommittable.

This allows us to get rid of the false positive cases where a pusher
incorrectly detects that a transaction is still active and begins
waiting on it. In this worst case, this could lead to transactions
waiting for the entire transaction expiration for a contending txn.

Release note: None
nvanbenschoten added a commit that referenced this pull request Jul 10, 2019
Fixes cockroachdb#36089.
Informs cockroachdb#37199.

This commit addresses the second concern from cockroachdb#37199 (comment)
by implementing its suggestion #1.

It augments the TxnMeta struct to begin storing the transaction's
minimum timestamp. This allows pushers to have perfect accuracy
into whether an intent is part of a transaction that can eventually
be committed or whether it has been aborted by any other pusher and
uncommittable.

This allows us to get rid of the false positive cases where a pusher
incorrectly detects that a transaction is still active and begins
waiting on it. In this worst case, this could lead to transactions
waiting for the entire transaction expiration for a contending txn.

Release note: None
nvanbenschoten added a commit that referenced this pull request Jul 11, 2019
38782: storage: persist minimum transaction timestamps in intents r=nvanbenschoten a=nvanbenschoten

Fixes cockroachdb#36089.
Informs cockroachdb#37199.

This commit addresses the second concern from cockroachdb#37199 (comment) by implementing its suggestion #1.

It augments the TxnMeta struct to begin storing the transaction's minimum timestamp. This allows pushers to have perfect accuracy into whether an intent is part of a transaction that can eventually be committed or whether it has been aborted by any other pusher and uncommittable.

This allows us to get rid of the false positive cases where a pusher incorrectly detects that a transaction is still active and begins waiting on it. In this worst case, this could lead to transactions waiting for the entire transaction expiration for a contending txn.

@tbg I'm assigning you because you reviewed most of the lazy transaction record stuff (especially cockroachdb#33523), but let me know if you'd like me to find a different reviewer.

Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
nvanbenschoten pushed a commit that referenced this pull request Jul 2, 2020
Previously, the underyling SQL tables holding file metadata and payloads
had the user as the suffix of their names.

As discussed offline, the URI for interacting with the FileTableSystem
will be of the form: `userfile://qualified.table.name/some/file/path`

The `host` is used as a prefix to create names for the underlying SQL
tables which hold the file metadata and payload. The `path` is the
filename entry which will be made in the SQL tables. All operations on
these SQL tables will be executed from the POV of the user which is
supplied when creating an instance of the FileTableStorage.  This allows
us to enforce access privileges, thereby making this usable in a
multi-user cluster.

Additionally, this commit moves all external storage tests to its own
subpackage.  Previously, keeping the cloudimpl tests in the outer pkg
worked fine because none of the tests spun up test servers/clusters.
Since this is necessary for testing the new UserFileTableStorage all
tests have been moved to a subpackage to break the
`server -> cloudimpl -> server` dependency cycle.

This is commit #1 of 2 to integrate the new UserFileTableStorage.

Release note: None
nvanbenschoten pushed a commit that referenced this pull request Apr 2, 2021
We arrived at the previous default rate of 10% back in cockroachdb#59379. This was
back when we were creating real tracing spans for all statements, and
for sampled statements, we were propagating additional stats payloads.
Consequently what cockroachdb#59379 ended up measuring (and finding the overhead
acceptable) for was the performance hit we would incur for propagating
stats payloads for statements already using real tracing spans.

Since then, the landscape has changed. Notably we introduced cockroachdb#61777,
which made it so that we were only using real tracing spans for sampled
statements. This was done after performance analysis in cockroachdb#59424 showed
that the use of real tracing spans in all statements resulted in
tremendous overhead, for no real benefit.

What this now leaves us with is a sampling rate that was tuned by only
considering the stats payload overhead. What we want now is to also
consider the overhead of using real tracing spans for sampled
statements, vs. not. Doing this analysis gives us a very different
picture for what the default rate should be.

---

To find out what the overhead for sampled statements are currently, we
experimented with kv95/enc=false/nodes=1/cpu=32. It's a simple
benchmark that does little more than one-off statements, so should give
us a concise picture of the sampling overhead. We ran six experiments
in total (each corresponding to a pair of read+write rows), done in
groups of three (each group corresponding to a table below). Each
run in turn is comprised of 10 iterations of kv95, and what's varied
between each run is the default sampling rate. We pin a sampling rate of
0.0 as the baseline that effectively switches off sampling entirely (and
tracing), and measure the throughput degradation as we vary the sampling
rate.

                          ops/sec            ops/sec
    --------------------|------------------|------------------
    rate   op      grp  | median    diff   | mean      diff
    --------------------|------------------|------------------
    0.00 / read  / #1   | 69817.90         | 69406.37
    0.01 / read  / #1   | 69300.35  -0.74% | 68717.23  -0.99%
    0.10 / read  / #1   | 67743.35  -2.97% | 67601.81  -2.60%
    0.00 / write / #1   |  3672.55         |  3653.63
    0.01 / write / #1   |  3647.65  -0.68% |  3615.90  -1.03%
    0.10 / write / #1   |  3567.20  -2.87% |  3558.90  -2.59%

                          ops/sec            ops/sec
    --------------------|------------------|------------------
    rate   op      grp  | median    diff   | mean      diff
    --------------------|------------------|------------------
    0.00 / read  / #2   | 69440.80          68893.24
    0.01 / read  / #2   | 69481.55  +0.06%  69463.13  +0.82% (probably in the noise margin)
    0.10 / read  / #2   | 67841.80  -2.30%  66992.55  -2.76%
    0.00 / write / #2   |  3652.45           3625.24
    0.01 / write / #2   |  3657.55  -0.14%   3654.34  +0.80%
    0.10 / write / #2   |  3570.75  -2.24%   3526.04  -2.74%

The results above suggest that the current default rate of 10% is too
high, and a 1% rate is much more acceptable.

---

The fact that the cost of sampling is largely dominated by tracing is
extremely unfortunate. We have ideas for how that can be improved
(prototyped in cockroachdb#62227), but they're much too invasive to backport to
21.1. It's unfortunate that we only discovered the overhead this late in
the development cycle. It was due to two major reasons:
- cockroachdb#59992 landed late in the cycle, and enabled tracing for realsies (by
  propagating real tracing spans across rpc boundaries). We had done
  sanity checking for the tracing overhead before this point, but failed
  to realize that cockroachdb#59992 would merit re-analysis.
- The test that alerted us to the degradation (tpccbench) had be
  persistently failing for a myriad of other reasons, so we didn't learn
  until too late that tracing was the latest offendor. tpccbench also
  doesn't deal with VM overload well (something cockroachdb#62361 hopes to
  address), and after tracing was enabled for realsies, this was the
  dominant failure mode. This resulted in perf data not making it's way
  to roachperf, which further hid possible indicators we had a major
  regression on our hands. We also didn't have a healthy process looking
  at roachperf on a continual basis, something we're looking to rectify
  going forward. We would've picked up on this regression had we been
  closely monitoring the kv95 charts.

Release note: None
nvanbenschoten pushed a commit that referenced this pull request Apr 2, 2021
62998: sql: lower default sampling rate to 1% r=irfansharif a=irfansharif

We arrived at the previous default rate of 10% back in cockroachdb#59379. This was
back when we were creating real tracing spans for all statements, and
for sampled statements, we were propagating additional stats payloads.
Consequently what cockroachdb#59379 ended up measuring (and finding the overhead
acceptable) for was the performance hit we would incur for propagating
stats payloads for statements already using real tracing spans.

Since then, the landscape has changed. Notably we introduced cockroachdb#61777,
which made it so that we were only using real tracing spans for sampled
statements. This was done after performance analysis in cockroachdb#59424 showed
that the use of real tracing spans in all statements resulted in
tremendous overhead, for no real benefit.

What this now leaves us with is a sampling rate that was tuned by only
considering the stats payload overhead. What we want now is to also
consider the overhead of using real tracing spans for sampled
statements, vs. not. Doing this analysis gives us a very different
picture for what the default rate should be.

---

To find out what the overhead for sampled statements are currently, we
experimented with kv95/enc=false/nodes=1/cpu=32. It's a simple
benchmark that does little more than one-off statements, so should give
us a concise picture of the sampling overhead. We ran six experiments
in total (each corresponding to a pair of read+write rows), done in
groups of three (each group corresponding to a table below). Each
run in turn is comprised of 10 iterations of kv95, and what's varied
between each run is the default sampling rate. We pin a sampling rate of
0.0 as the baseline that effectively switches off sampling entirely (and
tracing), and measure the throughput degradation as we vary the sampling
rate.

                          ops/sec            ops/sec
    --------------------|------------------|------------------
    rate   op      grp  | median    diff   | mean      diff
    --------------------|------------------|------------------
    0.00 / read  / #1   | 69817.90         | 69406.37
    0.01 / read  / #1   | 69300.35  -0.74% | 68717.23  -0.99%
    0.10 / read  / #1   | 67743.35  -2.97% | 67601.81  -2.60%
    0.00 / write / #1   |  3672.55         |  3653.63
    0.01 / write / #1   |  3647.65  -0.68% |  3615.90  -1.03%
    0.10 / write / #1   |  3567.20  -2.87% |  3558.90  -2.59%

                          ops/sec            ops/sec
    --------------------|------------------|------------------
    rate   op      grp  | median    diff   | mean      diff
    --------------------|------------------|------------------
    0.00 / read  / #2   | 69440.80          68893.24
    0.01 / read  / #2   | 69481.55  +0.06%  69463.13  +0.82% (probably in the noise margin)
    0.10 / read  / #2   | 67841.80  -2.30%  66992.55  -2.76%
    0.00 / write / #2   |  3652.45           3625.24
    0.01 / write / #2   |  3657.55  -0.14%   3654.34  +0.80%
    0.10 / write / #2   |  3570.75  -2.24%   3526.04  -2.74%

The results above suggest that the current default rate of 10% is too
high, and a 1% rate is much more acceptable.

---

The fact that the cost of sampling is largely dominated by tracing is
extremely unfortunate. We have ideas for how that can be improved
(prototyped in cockroachdb#62227), but they're much too invasive to backport to
21.1.

Release note: None

Co-authored-by: irfan sharif <irfanmahmoudsharif@gmail.com>
nvanbenschoten pushed a commit that referenced this pull request Apr 8, 2021
We arrived at the previous default rate of 10% back in cockroachdb#59379. This was
back when we were creating real tracing spans for all statements, and
for sampled statements, we were propagating additional stats payloads.
Consequently what cockroachdb#59379 ended up measuring (and finding the overhead
acceptable) for was the performance hit we would incur for propagating
stats payloads for statements already using real tracing spans.

Since then, the landscape has changed. Notably we introduced cockroachdb#61777,
which made it so that we were only using real tracing spans for sampled
statements. This was done after performance analysis in cockroachdb#59424 showed
that the use of real tracing spans in all statements resulted in
tremendous overhead, for no real benefit.

What this now leaves us with is a sampling rate that was tuned by only
considering the stats payload overhead. What we want now is to also
consider the overhead of using real tracing spans for sampled
statements, vs. not. Doing this analysis gives us a very different
picture for what the default rate should be.

---

To find out what the overhead for sampled statements are currently, we
experimented with kv95/enc=false/nodes=1/cpu=32. It's a simple
benchmark that does little more than one-off statements, so should give
us a concise picture of the sampling overhead. We ran six experiments
in total (each corresponding to a pair of read+write rows), done in
groups of three (each group corresponding to a table below). Each
run in turn is comprised of 10 iterations of kv95, and what's varied
between each run is the default sampling rate. We pin a sampling rate of
0.0 as the baseline that effectively switches off sampling entirely (and
tracing), and measure the throughput degradation as we vary the sampling
rate.

                          ops/sec            ops/sec
    --------------------|------------------|------------------
    rate   op      grp  | median    diff   | mean      diff
    --------------------|------------------|------------------
    0.00 / read  / #1   | 69817.90         | 69406.37
    0.01 / read  / #1   | 69300.35  -0.74% | 68717.23  -0.99%
    0.10 / read  / #1   | 67743.35  -2.97% | 67601.81  -2.60%
    0.00 / write / #1   |  3672.55         |  3653.63
    0.01 / write / #1   |  3647.65  -0.68% |  3615.90  -1.03%
    0.10 / write / #1   |  3567.20  -2.87% |  3558.90  -2.59%

                          ops/sec            ops/sec
    --------------------|------------------|------------------
    rate   op      grp  | median    diff   | mean      diff
    --------------------|------------------|------------------
    0.00 / read  / #2   | 69440.80          68893.24
    0.01 / read  / #2   | 69481.55  +0.06%  69463.13  +0.82% (probably in the noise margin)
    0.10 / read  / #2   | 67841.80  -2.30%  66992.55  -2.76%
    0.00 / write / #2   |  3652.45           3625.24
    0.01 / write / #2   |  3657.55  -0.14%   3654.34  +0.80%
    0.10 / write / #2   |  3570.75  -2.24%   3526.04  -2.74%

The results above suggest that the current default rate of 10% is too
high, and a 1% rate is much more acceptable.

---

The fact that the cost of sampling is largely dominated by tracing is
extremely unfortunate. We have ideas for how that can be improved
(prototyped in cockroachdb#62227), but they're much too invasive to backport to
21.1. It's unfortunate that we only discovered the overhead this late in
the development cycle. It was due to two major reasons:
- cockroachdb#59992 landed late in the cycle, and enabled tracing for realsies (by
  propagating real tracing spans across rpc boundaries). We had done
  sanity checking for the tracing overhead before this point, but failed
  to realize that cockroachdb#59992 would merit re-analysis.
- The test that alerted us to the degradation (tpccbench) had be
  persistently failing for a myriad of other reasons, so we didn't learn
  until too late that tracing was the latest offendor. tpccbench also
  doesn't deal with VM overload well (something cockroachdb#62361 hopes to
  address), and after tracing was enabled for realsies, this was the
  dominant failure mode. This resulted in perf data not making it's way
  to roachperf, which further hid possible indicators we had a major
  regression on our hands. We also didn't have a healthy process looking
  at roachperf on a continual basis, something we're looking to rectify
  going forward. We would've picked up on this regression had we been
  closely monitoring the kv95 charts.

Release note: None
nvanbenschoten added a commit that referenced this pull request Dec 13, 2021
Fixes cockroachdb#58459.
Informs cockroachdb#36431.

This commit fixes a long-standing correctness issue where non-transactional
requests did not ensure single-key linearizability even if they deferred their
timestamp allocation to the leaseholder of their (single) range. They still
don't entirely, because of cockroachdb#36431, but this change brings us one step closer to
the fix we plan to land for cockroachdb#36431 also applying to non-transactional requests.

The change addresses this by giving non-transactional requests uncertainty
intervals. This ensures that they also guarantee single-key linearizability even
with only loose (but bounded) clock synchronization. Non-transactional requests
that use a client-provided timestamp do not have uncertainty intervals and do
not make real-time ordering guarantees.

Unlike transactions, which establish an uncertainty interval on their
coordinator node during initialization, non-transactional requests receive
uncertainty intervals from their target leaseholder, using a clock reading
from the leaseholder's local HLC as the local limit and this clock reading +
the cluster's maximum clock skew as the global limit.

It is somewhat non-intuitive that non-transactional requests need uncertainty
intervals — after all, they receive their timestamp to the leaseholder of the
only range that they talk to, so isn't every value with a commit timestamp
above their read timestamp certainly concurrent? The answer is surprisingly
"no" for the following reasons, so they cannot forgo the use of uncertainty
interval:

1. the request timestamp is allocated before consulting the replica's lease.
   This means that there are times when the replica is not the leaseholder at
   the point of timestamp allocation, and only becomes the leaseholder later.
   In such cases, the timestamp assigned to the request is not guaranteed to
   be greater than the written_timestamp of all writes served by the range at
   the time of allocation. This is true despite invariants 1 & 2 from above,
   because the replica allocating the timestamp is not yet the leaseholder.

   In cases where the replica that assigned the non-transactional request's
   timestamp takes over as the leaseholder after the timestamp allocation, we
   expect minimumLocalLimitForLeaseholder to forward the local uncertainty
   limit above TimestampFromServerClock, to the lease start time.

2. ignoring reason #1, the request timestamp assigned by the leaseholder is
   equal to its local uncertainty limit and is guaranteed to lead the
   written_timestamp of all writes served by the range at the time the
   request arrived at the leaseholder node. However, this timestamp is still
   not guaranteed to lead the commit timestamp of all of these writes. As a
   result, the non-transactional request needs an uncertainty interval with a
   global uncertainty limit far enough in advance of the local HLC clock to
   ensure that it considers any value that was part of a transaction which
   could have committed before the request was received by the leaseholder to
   be uncertain. Concretely, the non-transactional request needs to consider
   values of the following form to be uncertain:

     written_timestamp < local_limit && commit_timestamp < global_limit

   The value that the non-transactional request is observing may have been
   written on the local leaseholder at time 10, its transaction may have been
   committed remotely at time 20, acknowledged, then the non-transactional
   request may have begun and received a timestamp of 15 from the local
   leaseholder, then finally the value may have been resolved asynchronously
   and moved to timestamp 20 (written_timestamp: 10, commit_timestamp: 20).
   The failure of the non-transactional request to observe this value would
   be a stale read.

----

Conveniently, because non-transactional requests are always scoped to a
single-range, those that hit uncertainty errors can always retry on the server,
so these errors never bubble up to the client that initiated the request.

However, before this commit, this wasn't actually possible because we did not
support server-side retries of `ReadWithinUncertaintyIntervalError`. The second
part of this commit adds support for this. This new support allows us to make
the guarantee we want about non-transactional requests never returning these
errors to the client, even though they use uncertainty intervals. It also serves
as a performance optimization for transactional requests, which now benefit from
this new capability. There's some complexity around supporting this form of
server-side retry, because it must be done above latching instead of below, but
recent refactors in cockroachdb#73557 and cockroachdb#73717 have made this possible to support
cleanly.

----

This change is related to cockroachdb#36431 in two ways. First, it allows non-transactional
requests to benefit from our incoming fix for that issue. Second, it unblocks
some of the clock refactors proposed in cockroachdb#72121 (comment),
and by extension cockroachdb#72663. Even though there are clear bugs today, I still don't
feel comfortable removing the `hlc.Clock.Update` in `Store.Send` until we make
this change. We know from cockroachdb#36431 that invariant 1 from
[`uncertainty.D6`](https://github.com/cockroachdb/cockroach/blob/22df0a6658a927e7b65dafbdfc9d790500791993/pkg/kv/kvserver/uncertainty/doc.go#L131)
doesn't hold, and yet I still do think the `hlc.Clock.Update` in `Store.Send`
masks the bugs in this area in many cases. Removing that clock update (I don't
actually plan to remove it, but I plan to disconnect it entirely from operation
timestamps) without first giving non-transactional requests uncertainty intervals
seems like it may reveal these bugs in ways we haven't seen in the past. So I'd
like to land this fix before making that change.

----

Release note (performance improvement): Certain forms of automatically retried
"read uncertainty" errors are now retried more efficiently, avoiding a network
round trip.
nvanbenschoten pushed a commit that referenced this pull request Apr 8, 2022
nvanbenschoten pushed a commit that referenced this pull request Apr 29, 2022
…db#80762 cockroachdb#80773

79911: opt: refactor and test lookup join key column and expr generation r=mgartner a=mgartner

#### opt: simplify fetching outer column in CustomFuncs.findComputedColJoinEquality

Previously, `CustomFuncs.findComputedColJoinEquality` used
`CustomFuncs.OuterCols` to retrieve the outer columns of computed column
expressions. `CustomFuncs.OuterCols` returns the cached outer columns in
the expression if it is a `memo.ScalarPropsExpr`, and falls back to
calculating the outer columns with `memo.BuildSharedProps` otherwise.
Computed column expressions are never `memo.ScalarPropsExpr`s, so we use
just use `memo.BuildSharedProps` directly.

Release note: None

#### opt: make RemapCols a method on Factory instead of CustomFuncs

Release note: None

#### opt: use partial-index-reduced filters when building lookup expressions

This commit makes a minor change to `generateLookupJoinsImpl`.
Previously, equality filters were extracted from the original `ON`
filters. Now they are extracted from filters that have been reduced by
partial index implication. This has no effect on behavior because
equality filters that reference columns in two tables cannot exist in
partial index predicates, so they will never be eliminated during
partial index implication.

Release note: None

#### opt: moves some lookup join generation logic to lookup join package

This commit adds a new `lookupjoin` package. Logic for determining the
key columns and lookup expressions for lookup joins has been moved to
`lookupJoin.ConstraintBuilder`. The code was moved with as few changes
as possible, and the behavior does not change in any way. This move will
make it easier to test this code in isolation in the future, and allow
for further refactoring.

Release note: None

#### opt: generalize lookupjoin.ConstraintBuilder API

This commit makes the lookupjoin.ConstraintBuilder API more general to
make unit testing easier in a future commit.

Release note: None

#### opt: add data-driven tests for lookupjoin.ConstraintBuilder

Release note: None

#### opt: add lookupjoin.Constraint struct

The `lookupjoin.Constraint` struct has been added to encapsulate
multiple data structures that represent a strategy for constraining a
lookup join.

Release note: None

80511: pkg/cloud/azure: Support specifying Azure environments in storage URLs r=adityamaru a=nlowe-sx

The Azure Storage cloud provider learned a new parameter, AZURE_ENVIRONMENT,
which specifies which azure environment the storage account in question
belongs to. This allows cockroach to backup and restore data to Azure
Storage Accounts outside the main Azure Public Cloud. For backwards
compatibility, this defaults to "AzurePublicCloud" if AZURE_ENVIRONMENT
is not specified.
 
Fixes cockroachdb#47163
 
## Verification Evidence
 
I spun up a single node cluster:
 
```
nlowe@nlowe-z4l:~/projects/github/cockroachdb/cockroach [feat/47163-azure-storage-support-multiple-environments L|✚ 2] [🗓  2022-04-22 08:25:49]
$ bazel run //pkg/cmd/cockroach:cockroach -- start-single-node --insecure
WARNING: Option 'host_javabase' is deprecated
WARNING: Option 'javabase' is deprecated
WARNING: Option 'host_java_toolchain' is deprecated
WARNING: Option 'java_toolchain' is deprecated
INFO: Invocation ID: 11504a98-f767-413a-8994-8f92793c2ecf
INFO: Analyzed target //pkg/cmd/cockroach:cockroach (0 packages loaded, 0 targets configured).
INFO: Found 1 target...
Target //pkg/cmd/cockroach:cockroach up-to-date:
  _bazel/bin/pkg/cmd/cockroach/cockroach_/cockroach
INFO: Elapsed time: 0.358s, Critical Path: 0.00s
INFO: 1 process: 1 internal.
INFO: Build completed successfully, 1 total action
INFO: Build completed successfully, 1 total action
*
* WARNING: ALL SECURITY CONTROLS HAVE BEEN DISABLED!
*
* This mode is intended for non-production testing only.
*
* In this mode:
* - Your cluster is open to any client that can access any of your IP addresses.
* - Intruders with access to your machine or network can observe client-server traffic.
* - Intruders can log in without password and read or write any data in the cluster.
* - Intruders can consume all your server's resources and cause unavailability.
*
*
* INFO: To start a secure server without mandating TLS for clients,
* consider --accept-sql-without-tls instead. For other options, see:
*
* - https://go.crdb.dev/issue-v/53404/dev
* - https://www.cockroachlabs.com/docs/dev/secure-a-cluster.html
*
*
* WARNING: neither --listen-addr nor --advertise-addr was specified.
* The server will advertise "nlowe-z4l" to other nodes, is this routable?
*
* Consider using:
* - for local-only servers:  --listen-addr=localhost
* - for multi-node clusters: --advertise-addr=<host/IP addr>
*
*
CockroachDB node starting at 2022-04-22 15:25:55.461315977 +0000 UTC (took 2.1s)
build:               CCL unknown @  (go1.17.6)
webui:               http://nlowe-z4l:8080/
sql:                 postgresql://root@nlowe-z4l:26257/defaultdb?sslmode=disable
sql (JDBC):          jdbc:postgresql://nlowe-z4l:26257/defaultdb?sslmode=disable&user=root
RPC client flags:    /home/nlowe/.cache/bazel/_bazel_nlowe/cf6ed4d0d14c8e474a5c30d572846d8a/execroot/cockroach/bazel-out/k8-fastbuild/bin/pkg/cmd/cockroach/cockroach_/cockroach <client cmd> --host=nlowe-z4l:26257 --insecure
logs:                /home/nlowe/.cache/bazel/_bazel_nlowe/cf6ed4d0d14c8e474a5c30d572846d8a/execroot/cockroach/bazel-out/k8-fastbuild/bin/pkg/cmd/cockroach/cockroach_/cockroach.runfiles/cockroach/cockroach-data/logs
temp dir:            /home/nlowe/.cache/bazel/_bazel_nlowe/cf6ed4d0d14c8e474a5c30d572846d8a/execroot/cockroach/bazel-out/k8-fastbuild/bin/pkg/cmd/cockroach/cockroach_/cockroach.runfiles/cockroach/cockroach-data/cockroach-temp4100501952
external I/O path:   /home/nlowe/.cache/bazel/_bazel_nlowe/cf6ed4d0d14c8e474a5c30d572846d8a/execroot/cockroach/bazel-out/k8-fastbuild/bin/pkg/cmd/cockroach/cockroach_/cockroach.runfiles/cockroach/cockroach-data/extern
store[0]:            path=/home/nlowe/.cache/bazel/_bazel_nlowe/cf6ed4d0d14c8e474a5c30d572846d8a/execroot/cockroach/bazel-out/k8-fastbuild/bin/pkg/cmd/cockroach/cockroach_/cockroach.runfiles/cockroach/cockroach-data
storage engine:      pebble
clusterID:           bb3942d7-f241-4d26-aa4a-1bd0d6556e4d
status:              initialized new cluster
nodeID:              1
```
 
I was then able to view the contents of a backup hosted in an azure
government storage account:
 
```
root@:26257/defaultdb> SELECT DISTINCT object_name FROM [SHOW BACKUP 'azure://container/path/to/backup?AZURE_ACCOUNT_NAME=account&AZURE_ACCOUNT_KEY=***&AZURE_ENVIRONMENT=AzureUSGovernmentCloud'] WHERE object_type = 'database';
               object_name
------------------------------------------
  example_database
  ...
(17 rows)
 
Time: 5.859632889s
```
 
Omitting the `AZURE_ENVIRONMENT` parameter, we can see cockroach
defaults to the public cloud where my storage account does not exist:
 
```
root@:26257/defaultdb> SELECT DISTINCT object_name FROM [SHOW BACKUP 'azure://container/path/to/backup?AZURE_ACCOUNT_NAME=account&AZURE_ACCOUNT_KEY=***'] WHERE object_type = 'database';
ERROR: reading previous backup layers: unable to list files for specified blob: Get "https://account.blob.core.windows.net/container?comp=list&delimiter=path%2Fto%2Fbackup&restype=container&timeout=61": dial tcp: lookup account.blob.core.windows.net on 8.8.8.8:53: no such host
```
 
## Tests
 
Two new tests are added to verify that the storage account URL is correctly
built from the provided Azure Environment name, and that the Environment
defaults to the Public Cloud if unspecified for backwards compatibility. I
verified the existing tests pass against a government storage account after
specifying `AZURE_ENVIRONMENT` as `AzureUSGovernmentCloud` in the backup URL
query parameters:
 
```
nlowe@nlowe-mbp:~/projects/github/cockroachdb/cockroachdb [feat/47163-azure-storage-support-multiple-environments| …3] [🗓  2022-04-22 17:38:26]
$ export AZURE_ACCOUNT_NAME=account
nlowe@nlowe-mbp:~/projects/github/cockroachdb/cockroachdb [feat/47163-azure-storage-support-multiple-environments| …3] [🗓  2022-04-22 17:38:42]
$ export AZURE_ACCOUNT_KEY=***
nlowe@nlowe-mbp:~/projects/github/cockroachdb/cockroachdb [feat/47163-azure-storage-support-multiple-environments| …3] [🗓  2022-04-22 17:39:25]
$ export AZURE_CONTAINER=container
nlowe@nlowe-mbp:~/projects/github/cockroachdb/cockroachdb [feat/47163-azure-storage-support-multiple-environments| …3] [🗓  2022-04-22 17:39:48]
$ export AZURE_ENVIRONMENT=AzureUSGovernmentCloud
nlowe@nlowe-mbp:~/projects/github/cockroachdb/cockroachdb [feat/47163-azure-storage-support-multiple-environments| …3] [🗓  2022-04-22 17:40:15]
$ bazel test --test_output=streamed --test_arg=-test.v --action_env=AZURE_ACCOUNT_NAME --action_env=AZURE_ACCOUNT_KEY --action_env=AZURE_CONTAINER --action_env=AZURE_ENVIRONMENT //pkg/cloud/azure:azure_test
INFO: Invocation ID: aa88a942-f3c7-4df6-bade-8f5f0e18041f
WARNING: Streamed test output requested. All tests will be run locally, without sharding, one at a time
INFO: Build option --action_env has changed, discarding analysis cache.
INFO: Analyzed target //pkg/cloud/azure:azure_test (468 packages loaded, 16382 targets configured).
INFO: Found 1 test target...
initialized metamorphic constant "span-reuse-rate" with value 28
=== RUN   TestAzure
=== RUN   TestAzure/simple_round_trip
=== RUN   TestAzure/exceeds-4mb-chunk
=== RUN   TestAzure/exceeds-4mb-chunk/rand-readats
=== RUN   TestAzure/exceeds-4mb-chunk/rand-readats/#00
    cloud_test_helpers.go:226: read 3345 of file at 4778744
=== RUN   TestAzure/exceeds-4mb-chunk/rand-readats/#1
    cloud_test_helpers.go:226: read 7228 of file at 226589
=== RUN   TestAzure/exceeds-4mb-chunk/rand-readats/#2
    cloud_test_helpers.go:226: read 634 of file at 256284
=== RUN   TestAzure/exceeds-4mb-chunk/rand-readats/#3
    cloud_test_helpers.go:226: read 7546 of file at 3546208
=== RUN   TestAzure/exceeds-4mb-chunk/rand-readats/#4
    cloud_test_helpers.go:226: read 24123 of file at 4821795
=== RUN   TestAzure/exceeds-4mb-chunk/rand-readats/cockroachdb#5
    cloud_test_helpers.go:226: read 16899 of file at 403428
=== RUN   TestAzure/exceeds-4mb-chunk/rand-readats/cockroachdb#6
    cloud_test_helpers.go:226: read 29467 of file at 4886370
=== RUN   TestAzure/exceeds-4mb-chunk/rand-readats/cockroachdb#7
    cloud_test_helpers.go:226: read 11700 of file at 1876920
=== RUN   TestAzure/exceeds-4mb-chunk/rand-readats/cockroachdb#8
    cloud_test_helpers.go:226: read 2928 of file at 489781
=== RUN   TestAzure/exceeds-4mb-chunk/rand-readats/cockroachdb#9
    cloud_test_helpers.go:226: read 19933 of file at 1483342
=== RUN   TestAzure/read-single-file-by-uri
=== RUN   TestAzure/write-single-file-by-uri
=== RUN   TestAzure/file-does-not-exist
=== RUN   TestAzure/List
=== RUN   TestAzure/List/root
=== RUN   TestAzure/List/file-slash-numbers-slash
=== RUN   TestAzure/List/root-slash
=== RUN   TestAzure/List/file
=== RUN   TestAzure/List/file-slash
=== RUN   TestAzure/List/slash-f
=== RUN   TestAzure/List/nothing
=== RUN   TestAzure/List/delim-slash-file-slash
=== RUN   TestAzure/List/delim-data
--- PASS: TestAzure (34.81s)
    --- PASS: TestAzure/simple_round_trip (9.66s)
    --- PASS: TestAzure/exceeds-4mb-chunk (16.45s)
        --- PASS: TestAzure/exceeds-4mb-chunk/rand-readats (6.41s)
            --- PASS: TestAzure/exceeds-4mb-chunk/rand-readats/#00 (0.15s)
            --- PASS: TestAzure/exceeds-4mb-chunk/rand-readats/#1 (0.64s)
            --- PASS: TestAzure/exceeds-4mb-chunk/rand-readats/#2 (0.65s)
            --- PASS: TestAzure/exceeds-4mb-chunk/rand-readats/#3 (0.60s)
            --- PASS: TestAzure/exceeds-4mb-chunk/rand-readats/#4 (0.75s)
            --- PASS: TestAzure/exceeds-4mb-chunk/rand-readats/cockroachdb#5 (0.80s)
            --- PASS: TestAzure/exceeds-4mb-chunk/rand-readats/cockroachdb#6 (0.75s)
            --- PASS: TestAzure/exceeds-4mb-chunk/rand-readats/cockroachdb#7 (0.65s)
            --- PASS: TestAzure/exceeds-4mb-chunk/rand-readats/cockroachdb#8 (0.65s)
            --- PASS: TestAzure/exceeds-4mb-chunk/rand-readats/cockroachdb#9 (0.77s)
    --- PASS: TestAzure/read-single-file-by-uri (0.60s)
    --- PASS: TestAzure/write-single-file-by-uri (0.60s)
    --- PASS: TestAzure/file-does-not-exist (1.05s)
    --- PASS: TestAzure/List (2.40s)
        --- PASS: TestAzure/List/root (0.30s)
        --- PASS: TestAzure/List/file-slash-numbers-slash (0.30s)
        --- PASS: TestAzure/List/root-slash (0.30s)
        --- PASS: TestAzure/List/file (0.30s)
        --- PASS: TestAzure/List/file-slash (0.30s)
        --- PASS: TestAzure/List/slash-f (0.30s)
        --- PASS: TestAzure/List/nothing (0.15s)
        --- PASS: TestAzure/List/delim-slash-file-slash (0.15s)
        --- PASS: TestAzure/List/delim-data (0.30s)
=== RUN   TestAntagonisticAzureRead
--- PASS: TestAntagonisticAzureRead (103.90s)
=== RUN   TestParseAzureURL
=== RUN   TestParseAzureURL/Defaults_to_Public_Cloud_when_AZURE_ENVIRONEMNT_unset
=== RUN   TestParseAzureURL/Can_Override_AZURE_ENVIRONMENT
--- PASS: TestParseAzureURL (0.00s)
    --- PASS: TestParseAzureURL/Defaults_to_Public_Cloud_when_AZURE_ENVIRONEMNT_unset (0.00s)
    --- PASS: TestParseAzureURL/Can_Override_AZURE_ENVIRONMENT (0.00s)
=== RUN   TestMakeAzureStorageURLFromEnvironment
=== RUN   TestMakeAzureStorageURLFromEnvironment/AzurePublicCloud
=== RUN   TestMakeAzureStorageURLFromEnvironment/AzureUSGovernmentCloud
--- PASS: TestMakeAzureStorageURLFromEnvironment (0.00s)
    --- PASS: TestMakeAzureStorageURLFromEnvironment/AzurePublicCloud (0.00s)
    --- PASS: TestMakeAzureStorageURLFromEnvironment/AzureUSGovernmentCloud (0.00s)
PASS
Target //pkg/cloud/azure:azure_test up-to-date:
  _bazel/bin/pkg/cloud/azure/azure_test_/azure_test
INFO: Elapsed time: 159.865s, Critical Path: 152.35s
INFO: 66 processes: 2 internal, 64 darwin-sandbox.
INFO: Build completed successfully, 66 total actions
//pkg/cloud/azure:azure_test                                             PASSED in 139.9s
 
INFO: Build completed successfully, 66 total actions
```

80705: kvclient: fix gRPC stream leak in rangefeed client r=tbg,srosenberg a=erikgrinaker

When the DistSender rangefeed client received a `RangeFeedError` message
and propagated a retryable error up the stack, it would fail to close
the existing gRPC stream, causing stream/goroutine leaks.

Release note (bug fix): Fixed a goroutine leak when internal rangefeed
clients received certain kinds of retriable errors.

80762: joberror: add ConnectionReset/ConnectionRefused to retryable err allow list r=miretskiy a=adityamaru

Bulk jobs will no longer treat `sysutil.IsErrConnectionReset`
and `sysutil.IsErrConnectionRefused` as permanent errors. IMPORT,
RESTORE and BACKUP will treat this error as transient and retry.

Release note: None

80773: backupccl: break dependency to testcluster r=irfansharif a=irfansharif

Noticed we were building testing library packages when building CRDB
binaries.

    $ bazel query "somepath(//pkg/cmd/cockroach-short, //pkg/testutils/testcluster)"
    //pkg/cmd/cockroach-short:cockroach-short
    //pkg/cmd/cockroach-short:cockroach-short_lib
    //pkg/ccl:ccl
    //pkg/ccl/backupccl:backupccl
    //pkg/testutils/testcluster:testcluster

Release note: None

Co-authored-by: Marcus Gartner <marcus@cockroachlabs.com>
Co-authored-by: Nathan Lowe <nathan.lowe@spacex.com>
Co-authored-by: Erik Grinaker <grinaker@cockroachlabs.com>
Co-authored-by: Aditya Maru <adityamaru@gmail.com>
Co-authored-by: irfan sharif <irfanmahmoudsharif@gmail.com>
nvanbenschoten pushed a commit that referenced this pull request Mar 17, 2024
nvanbenschoten pushed a commit that referenced this pull request May 7, 2024
For some reason, `StopServiceForVirtualCluster` fails with this error on
drt clusters:

```
20:23:41 node_kill.go:51: operation status: killing node 1  with signal 15
20:23:41 cluster.go:2148: stoping virtual cluster
20:23:41 operation_impl.go:128: operation failure #1: no service for virtual cluster ""
```

The debug message has a bug, the virtual cluster is set to "system" but it
seems like the service discovery process isn't able to determine the cockroach
process based on dns settings in the drt project. This change makes the
node-kill operation more dns-agnostic by looking for the cockroach process.

Epic: none

Release note: None
nvanbenschoten pushed a commit that referenced this pull request May 7, 2024
123517: roachtest: move node-kill operation to pkill/pgrep-based kill approach r=renatolabs a=itsbilal

For some reason, `StopServiceForVirtualCluster` fails with this error on drt clusters:

```
20:23:41 node_kill.go:51: operation status: killing node 1  with signal 15
20:23:41 cluster.go:2148: stoping virtual cluster
20:23:41 operation_impl.go:128: operation failure #1: no service for virtual cluster ""
```

The debug message has a bug, the virtual cluster is set to "system" but it seems like the service discovery process isn't able to determine the cockroach process based on dns settings in the drt project. This change makes the node-kill operation more dns-agnostic by looking for the cockroach process.

Epic: none

Release note: None

Co-authored-by: Bilal Akhtar <bilal@cockroachlabs.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant