-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
wip: add pgvector implementation #12
base: master
Are you sure you want to change the base?
Conversation
Important Auto Review SkippedMore than 25% of the files skipped due to max files limit. The review is being skipped to prevent a low-quality review. 233 files out of 298 files are above the max files limit of 50. Please upgrade to Pro plan to get higher limits. You can disable this status message by setting the Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configration File (
|
This commit simplifies the assignment of this field on the conn executor's `extraTxnState`. Epic: none Release note: None
Add the recorded txn stats so we can do some testing validation on the stats collected. Epic: none Release note: None
Previously, the `{pgurl}` expansion would force the `cluster` connection parameter to `system`. This means that if the caller previously configured a different default virtual cluster, `pgurl` would ignore that and continue connecting to `system`, which is quite surprising. In this commit, we only set an explicit `cluster` connection parameter if the user passed one; this should allow `pgurl` users to connect to the default tenant by default. Epic: none Release note: None
Use a different server for each subtest, and close it if there was a connection timeout. Also, make the connection timeout longer. Release note: None
Release note: None
Both these options are not supported for roachprod azure yet, resulting in cluster_creation errors. Informs: cockroachdb#120621 Fixes: none Epic: none Release note: none
In cockroachdb#109655, we stopped using the transaction id of a previously executed transaction for `BEGIN`s. This change meant that `BEGIN` will be logged with no transaction id. This led to logs related to cluster setting `sql.log.user_audit.reduced_config.enabled` spewing error messages about a nil transaction (since the logic behind said cluster setting requires us to call on `MemberOfWithAdminOption` -- which uses a transaction for its lookup). This patch ensures that we create an internal transaction before we perform the `MemberOfWithAdminOption` lookup to handle the nil BEGIN statement case. Epic: none Fixes: cockroachdb#123592 Release note: None
123769: roachtest: disable azure tests that use zfs or specify ssd counts r=srosenberg,renatolabs a=DarrylWong Both these options are not supported for roachprod azure yet, resulting in cluster_creation errors. Informs: cockroachdb#120621 Fixes: none Epic: none Release note: none Co-authored-by: DarrylWong <darryl@cockroachlabs.com>
* [x] Update `build/teamcity/internal/release/build-and-publish-patched-go/impl.sh` with the new version and adjust SHA256 sums as necessary. * [x] Adjust `GO_VERSION` and `GO_FIPS_COMMIT` for the FIPS Go toolchain ([source](./teamcity/internal/release/build-and-publish-patched-go/impl-fips.sh)). * [x] Run the `Internal / Cockroach / Build / Toolchains / Publish Patched Go for Mac` build configuration in TeamCity with your latest version of the script above. Note the job depends on another job `Build and Publish Patched Go`. That job prints out the SHA256 of all tarballs, which you will need to copy-paste into `WORKSPACE` (see below). `Publish Patched Go for Mac` is an extra step that publishes the *signed* `go` binaries for macOS. That job also prints out the SHA256 of the Mac tarballs in particular. * [x] Adjust `--@io_bazel_rules_go//go/toolchain:sdk_version` in [.bazelrc](../.bazelrc). * [x] Bump the version in `WORKSPACE` under `go_download_sdk`. You may need to bump [rules_go](https://github.com/bazelbuild/rules_go/releases). Also edit the filenames listed in `sdks` and update all the hashes to match what you built in the step above. * [x] Bump the version in `WORKSPACE` under `go_download_sdk` for the FIPS version of Go (`go_sdk_fips`). * [x] Run `./dev generate bazel` to refresh `distdir_files.bzl`, then `bazel fetch @distdir//:archives` to ensure you've updated all hashes to the correct value. * [x] Bump the default installed version of Go in `bootstrap-debian.sh` ([source](./bootstrap/bootstrap-debian.sh)). Epic: none Release note: None
Release note: None Epic: none
123753: logictest: preserve logs for cockroach-go/testserver in remote execution r=rafiss a=rafiss Epic: None Release note: None Co-authored-by: Rafi Shamim <rafi@cockroachlabs.com>
The pre-data bundle is read from during the restore but we've already read everything we need from it by the time we write the download job, and have marked it as dropped. We don't need to download it just to drop it. Release note: none. Epic: none.
123730: jobs: exempt stats jobs from logs r=dt a=dt Unlike most jobs that represent some meaningful user/system action or event, where the time the event started and ended are rare and often significant, stats are continiously updated by the system as an automatic background process. Unfortunately that process creates a distinct job each time it chooses to update a table, resulting in this constant background process appearing as constant stream of separate jobs, rather than one continuous one. To reduce the amount this stream of jobs adds noise to the logs about significant jobs, this patch moves log messages about job state transitions behind a verbose check for these jobs. Release note: none. Epic: none. Co-authored-by: David Taylor <tinystatemachine@gmail.com>
123791: Revert "changefeedccl: make mock pulsar sink synchronous" r=rharding6373 a=rharding6373 Reverts cockroachdb#123228 This is causing flaky pulsar tests (e.g., cockroachdb#123720) Epic: None Release note: none Co-authored-by: Rachael Harding <rharding6373@users.noreply.github.com>
This separates the metamorphic functionality out into its own package from `pkg/util`. This is useful for some future functionality that I would like to implement: * Simplify/make explicit metamorphic constant logging * Remove the `metamorphic_disable` build tag (see cockroachdb#123319) This is a re-implementation of cockroachdb#102474. Epic: none Release note: None
This commit fixes - what I believe is - an oversight where we applied the huge cost to the plans with full scans of virtual tables when `disallow_full_table_scans` variable is set. The intent behind this penalty is to make plans with full scans prohibitively expensive so that other plans (without full scans) would be preferred, and if there is no other plan, then the query would error out. Implementation of this approach was incomplete for virtual tables: namely, we'd apply the cost penalty, but then we would not actually error out the query. Additionally, given that we don't have stats on virtual tables, we would apply the cost penalty to all full scans of virtual tables. As a result, we would either pick a plan without the full scan (that might actually be worse) or would allow the plan with the full scan to get executed still. We would also favor plans with fewer full scans of virtual tables (which might also be worse than had we not applied the cost penalty). This commit adjusts the code to not apply the cost penalty to full scans of virtual tables. This change is gated based on the new session variable `optimizer_apply_full_scan_penalty_to_virtual_tables` which defaults to `false` on master. We can discuss on the backport PRs whether it should be default to `true` there to require users to opt in for this PR to have an effect. Release note: None
123432: util/metamorphic: introduce `metamorphic` package r=srosenberg,RaduBerinde a=rickystewart This separates the metamorphic functionality out into its own package from `pkg/util`. This is useful for some future functionality that I would like to implement: * Simplify/make explicit metamorphic constant logging * Remove the `metamorphic_disable` build tag (see cockroachdb#123319) This is a re-implementation of cockroachdb#102474. Epic: none Release note: None Co-authored-by: Ricky Stewart <ricky@cockroachlabs.com>
123765: roachprod: don't overwrite tenant in `pgurl` expansion r=DarrylWong,srosenberg a=renatolabs Previously, the `{pgurl}` expansion would force the `cluster` connection parameter to `system`. This means that if the caller previously configured a different default virtual cluster, `pgurl` would ignore that and continue connecting to `system`, which is quite surprising. In this commit, we only set an explicit `cluster` connection parameter if the user passed one; this should allow `pgurl` users to connect to the default tenant by default. Epic: none Release note: None Co-authored-by: Renato Costa <renato@cockroachlabs.com>
Release note: none. Epic: none.
123424: operations: add a few more variants to the ADD COLUMN op r=rafiss a=rafiss This adds some coverage for computed columns and NOT NULL constraints. fixes cockroachdb#122251 Release note: None 123674: rttanalysis: deflake Jobs/jobs_page_type_filtered r=rafiss a=rafiss The test sometimes detects a lower number of round trips, which is actually more desirable anyway. fixes cockroachdb#123591 fixes cockroachdb#123691 Release note: None 123773: pgwire: deflake TestParseClientProvidedSessionParameters r=rafiss a=rafiss Use a different server for each subtest, and close it if there was a connection timeout. Also, make the connection timeout longer. fixes cockroachdb#123484 fixes cockroachdb#123098 Release note: None Co-authored-by: Rafi Shamim <rafi@cockroachlabs.com>
This benchmark runs quicker than BenchmarkDropLargeDatabase and can be used to study how DROP DATABASE scales with larger numbers of objects. Release note: None
Release note: none. Epic: none.
122285: sql,multiregionccl: rename multiregion system DB cluster setting r=rafiss a=rafiss Avoid using the word "preview" in the name. That shouldn't be needed, since the cluster setting is hidden, so users would only know about it if we tell them to enable it. Keeping "preview" in the name isn't a huge problem, but it's just slightly annoying if we decide that we want to keep the setting around even after the feature leaves the "preview" status, since it requires updating docs and any customer code that was using the old name. Also, this removes the mention of the setting in error messages. An error should not refer to an undocumented setting, since a user would have no way of learning more about it. informs: cockroachdb#63365 Epic: CRDB-33032 Release note: None Co-authored-by: Rafi Shamim <rafi@cockroachlabs.com>
The code was iterating over the list of tests, and setting spot VM value to true. But, as the values are passed as copies, the flag was not set. The change is to set the value to the test list itself. Also, removed the 0.5 probability and running all benchmark tests on spot. Fixes: cockroachdb#116778 Epic: none
123733: changefeedccl: skip flaky pulsar tests r=rharding6373 a=wenyihu6 This patch skips flaky tests for pulsar sinks. We will add them back when pulsar is fully supported. Release note: none Fixes: cockroachdb#123718 Fixes: cockroachdb#123719 Fixes: cockroachdb#123720 Fixes: cockroachdb#123721 Co-authored-by: Wenyi Hu <wenyi@cockroachlabs.com>
If a mixed-version test fails before user-provided functions had a chance to run, then it means that *something* went wrong while setting up the cluster. In these cases, it doesn't make sense to create an issue with the team that owns the test as it adds noise (e.g., cockroachdb#123610). With this commit, we now redirect these failures to test-eng, who will be better positioned to diagnose and fix any issues. Epic: none Release note: None
Using the default Set method is sufficient. Release note: None
123810: sql_test: add BenchmarkDropLargeDatabaseWithGenerateTestObjects r=rafiss a=rafiss This benchmark runs quicker than BenchmarkDropLargeDatabase and can be used to study how DROP DATABASE scales with larger numbers of objects. fixes cockroachdb#100850 Release note: None Co-authored-by: Rafi Shamim <rafi@cockroachlabs.com>
123859: sql: expose an ability to request redacted stmt bundles r=yuzefovich a=yuzefovich This commit extends the stmt bundle collection infrastructure to support requesting redacted stmt bundles. Previously, this was only possible via an explicit `EXPLAIN ANALYZE (DEBUG, REDACT)` stmt, but this commit adds the support via overloads to `crdb_internal.request_statement_bundle` builtin as well as the server API. This paves the way for requesting the redacted bundles via the DB Console, but it'll be done separately. To support this we need to add another boolean column to `system.statement_diagnostics_requests` table to indicate whether a request is for the redacted bundle or not. This requires adding a migration in which we populate the existing rows with all `false` values. In order to reduce the amount of code churn and simplify this change, this commit repurposes some of the existing code that we introduced the last time we added a migration for plan-gist matching in stmt bundles. In other words, this commit simultenously removes some stale code and handles the new column. Epic: CRDB-37839 Release note: None Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>
This roachtest has failed consistently for the past few days, since being enabled in cockroachdb#123694. Skip while we debug, to avoid noise. Touches: cockroachdb#123989. Release note: None.
124189: sql: simplify usage of some enum settings r=yuzefovich a=yuzefovich This commit adjusts a few enum settings to use `EnumSetting.String` when defining the session variables to clean up the code a bit. Epic: None Release note: None 124237: execinfra: add a sanity check that DistSQL version is not bumped r=yuzefovich a=yuzefovich The last bump was in 23.1, and we won't ever bump DistSQL version in backwards-incompatible going forward. Epic: None Release note: None 124241: Updated the AUTHORS file to include Naveen Setlur r={msbutler} a=navsetlur As stated in the title. Basic PR. Epic: None Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com> Co-authored-by: Naveen Setlur <naveen.setlur@cockroachlabs.com>
124226: sqlstats: remove unnecessary StatementFingerprintID construction r=xinhaoz a=xinhaoz Previously we were constructing the stmt fingerprint id unnecessarily in 2 functions in sqlstats: - The Next function for the StmtStatsIterator - PopAllStats, which returns the sql stats collected thus far and resets the in-memory sql stats containers. The ID is available in the recorded statement in both of these cases- there's no need to reconstruct this value and doing is needlessly expensive. Epic: none Release note: None Co-authored-by: Xin Hao Zhang <xzhang@cockroachlabs.com>
Release note (ops change): Added the sql.pgwire.pipeline.count metric, which is a gauge that measures how many wire protocol commands have been received by the server, but have not yet begun processing. This metric will only grow if clients are using the "pipeline mode" of the Postgres wire protocol.
124231: pgwire: add a metric for number of pipelined requests r=rafiss a=rafiss fixes cockroachdb#123912 Release note (ops change): Added the sql.pgwire.pipeline.count metric, which is a gauge that measures how many wire protocol commands have been received by the server, but have not yet begun processing. This metric will only grow if clients are using the "pipeline mode" of the Postgres wire protocol. Co-authored-by: Rafi Shamim <rafi@cockroachlabs.com>
124177: roachtest: skip multi-store-remove while debugging r=itsbilal,sumeerbhola a=nicktrav This roachtest has failed consistently for the past few days, since being enabled in cockroachdb#123694. Skip while we debug, to avoid noise. Touches: cockroachdb#123989. Release note: None. Epic: None. Co-authored-by: Nick Travers <travers@cockroachlabs.com>
Remove deprecated 23.2 gates - these features will always be enabled in any cluster that a 24.2 node is part of. `TODODelete_V23_1` has a lot of uses and will be removed separately. Epic: none Release note: None
124013: clusterversion: remove deprecated internal versions r=RaduBerinde a=RaduBerinde Remove deprecated 23.2 gates - these features will always be enabled in any cluster that a 24.2 node is part of. `TODODelete_V23_1` has a lot of uses and will be removed separately. Epic: none Release note: None Co-authored-by: Radu Berinde <radu@cockroachlabs.com>
We were incorrectly using the boolean getter for this integer variable. The bug was introduced when this variable was added in 1c1d313. Release note: None
This commit fixes an omission where `cost_scans_with_default_col_size` session variable didn't actually use the corresponding cluster setting for its default value. In other words, the cluster setting actually had no effect. The bug has been present since this variable was introduced in af32d9d. Release note: None
This commit adjusts global defaults for a few session variables to use the postgres style (so that the global default value and session variable value would be stored alike). Release note: None
This commit refactors how we collect session variables and cluster settings into `env.sql` file in the stmt bundle. Previously, we used a hard-coded list of session variables that were always included while ignoring all variables not in the list. If a variable had a value different from the "binary default" (meaning the value that you got right after the cluster startup, before any cluster setting modifications are applied), then it would be in a SET statement that would run on the `bundle recreate`; if the value was the same as "binary default", then the SET statement was commented out. Also, all cluster setting values were included into the file as a comment. This commit makes it so that we include all session variables and cluster settings that differ from their defaults. This allows us to recreate the environment while also reducing the overhead of including everything into `env.sql`. For session variables further clarification is warranted: - all read-only variables have been audited, and most are explicitly excluded. Out of 23 currently present session variables, only 9 were deemed to be possibly useful during investigations, so they are not excluded (one of 9 is `crdb_version` which we print out separately, so it's actually excluded too). - for writable variables, we include it if its value differs from the "binary" default (the one that you get on a fresh cluster) or from the "cluster" default (the one that you get on a fresh session on the current cluster). The latter kind is obtained via the provided `settings.Values` container whereas the former is obtained via a global singleton container. Additionally, this commit adjusts the SET and SET CLUSTER SETTING statements to have single quotes around the setting values that need them. For cluster settings all types work with having single quotes, but for session variables some (like integers) don't work with quotes while others (like strings) need them. This commit adds a map for those that need them as well as a simple test to catch some of the missing ones (the list might be incomplete given that the test exercises the default config). All values are adjusted to fit on a single line (we have some cluster settings that might span multiple lines). It also adjusts `EXPLAIN (OPT, ENV)` to include the list of cluster settings with non-default values. Release note: None
123794: sql: improve env collection in stmt bundles r=yuzefovich a=yuzefovich This PR improves stmt bundle collection to include all session variables and cluster settings that differ from their defaults. Fixes: cockroachdb#119786 Fixes: cockroachdb#123830. Release note: None Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>
Due to the complexity of fetching the secrets from the secrets manager, the secrets are now maintained in cloud storage. Fixes: cockroachdb#117125 Epic: none
124099: roachprod: fetch secrets from cloud store r=srosenberg a=nameisbhaskar Due to the complexity of fetching the secrets from the secrets manager, the secrets are now maintained in cloud storage. Fixes: cockroachdb#117125 Epic: none Co-authored-by: Bhaskarjyoti Bora <bhaskar.bora@cockroachlabs.com>
This change adds operations to mutate cluster settings. The values are mutated based on a preset frequency and RNG. For example, if the frequency of a cluster setting operation is set to "30 minutes" it will only change to a new value every 30 minutes, even if the operation has been run multiple times within the same 30-minute window. Running in the same window will just set the same value again. Epic: None Release Note: None
Epic: None Release Note: None
Epic: None Release Notes: None
Previously, to upgrade a cluster it had to be done manually by running several commands to copy over new binaries, drain and stop cockroach and then running the same start command. This change introduces a `deploy` command that works similar as `stage` but also deploys the new version to the cluster by doing all of the above. It does it node for node to ensure a cluster remains in a healthy state especially if a load balancer is used to manage connections. Epic: None Release Note: None
Epic: None Release Notes: None
123461: roachprod: deploy cockroach r=renatolabs,srosenberg a=herkolategan Previously, to upgrade a cluster it had to be done manually by running several commands to copy over new binaries, drain and stop cockroach and then running the same start command. This change introduces a `deploy` command that works similar as `stage` but also deploys the new version to the cluster by doing all of the above. It does it node for node to ensure a cluster remains in a healthy state especially if a load balancer is used to manage connections. Epic: None Release Note: None Co-authored-by: Herko Lategan <herko@cockroachlabs.com>
123806: roachtest: add cluster settings operations r=itsbilal a=herkolategan This change adds operations to mutate cluster settings. The values are mutated based on a preset frequency and RNG. For example, if the frequency of a cluster setting operation is set to "30 minutes" it will only change to a new value every 30 minutes, even if the operation has been run multiple times within the same 30-minute window. Running in the same window will just set the same value again. Epic: None Release Note: None Co-authored-by: Herko Lategan <herko@cockroachlabs.com>
Release note (sql change): implement pgvector encoding, decoding, and operators, without index acceleration.
This change is