Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: support more index-type combinations #35730

Open
knz opened this issue Mar 14, 2019 · 20 comments
Open

sql: support more index-type combinations #35730

knz opened this issue Mar 14, 2019 · 20 comments
Labels
A-sql-pgcompat Semantic compatibility with PostgreSQL C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) meta-issue Contains a list of several other issues. T-sql-queries SQL Queries Team X-anchored-telemetry The issue number is anchored by telemetry references.

Comments

@knz
Copy link
Contributor

knz commented Mar 14, 2019

The following combinations are not supported:

Type Regular index Inverted index
JSON maybe #35706 already supported
VARCHAR already supported #41285
composite types #120531 (no issue yet)
other types already supported (no issue yet)

This is a meta issue filed for tracking and telemetry purposes.

To users who are inconvenience: please leave comments with an explanation of your use case.

Jira issue: CRDB-4560

@knz knz added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-sql-pgcompat Semantic compatibility with PostgreSQL X-anchored-telemetry The issue number is anchored by telemetry references. labels Mar 14, 2019
@davidsbond
Copy link

davidsbond commented Dec 4, 2019

Hello, my team are using CockroachDB in production and are finding it great so far. Our current use-case for wanting to index array fields is fairly simple. We have a store of personal information where each row has a TEXT[] type for storing phone numbers.

When we add new people to this table we want to check in advance if one of their phone numbers is already in use by someone else. So we have a query like:

SELECT id FROM person WHERE $1::TEXT = ANY(phone_numbers)

Without indexing it takes around 3 seconds per phone number we search where there are ~450k rows. In some instances, someone could be added who has many phone numbers, or many people who have many phone numbers, which we would individually check, leading to a linear increase in query time.

Perhaps we are approaching this problem incorrectly, but I suspect being able to index this field would yield faster queries.

@knz
Copy link
Contributor Author

knz commented Dec 4, 2019

cc @awoods187 @andy-kimball for triage

@mkharibalaji
Copy link

Hi Team,
We are using it for tagging/labelling purpose inorder to send emails/push notifications for only those whoare tagged with a particular group and in our usecase it's primarily Customer segmentation for sending messages.Now the data has grown large ,we are really in need of an index on the simple array [INT/STRING] asap.

@andy-kimball
Copy link
Contributor

cc @jordanlewis

@jordanlewis
Copy link
Member

The two requests in this thread are for inverted indexes on arrays. Filed #43199 to track this.

@mzimmerman
Copy link

We use an internal unique id for a user. For transactions in the system can affect multiple users, so each transaction logged may have multiple users affected and we look which ones the transaction is applied to. (Usually less than 10). We have millions of users and billions of transactions to log. Not bring able to index on an array has kept me from trying cockroach.

@jordanlewis
Copy link
Member

jordanlewis commented May 20, 2020

Arrays are now indexable with inverted indexes.

@awoods187
Copy link
Contributor

awoods187 commented Jul 8, 2020

I ran into this today for VARCHAR and updated the issue at the top to track

@asubiotto asubiotto added the meta-issue Contains a list of several other issues. label Jul 14, 2020
@jdabrowski
Copy link

Hi,

Arrays are now indexable with inverted indexes, and will be indexable with forward indexes in 20.2.

Is it still planned for 20.2? I'm getting

SQL State : 0A000 Error Code : 0 Message : ERROR: unimplemented: column verified_mobiles is of type varchar[] and thus is not indexable Hint: You have attempted to use a feature that is not yet implemented. See: https://go.crdb.dev/issue-v/35730/v20.2 Location : migrations/postgres/user/V1__user.sql (/file:/app.jar!/migrations/postgres/user/V1__user.sql) Line : 91 Statement : CREATE INDEX IF NOT EXISTS users_verified_mobiles_equals ON users(verified_mobiles)

against cockroachdb/cockroach-unstable:v20.2.0-rc.1

@RaduBerinde
Copy link
Member

@jordanlewis I think you know most about inverted indexes on arrays, can you take a look at the question above>

@jordanlewis
Copy link
Member

@RaduBerinde we do support inverted indexes on arrays.

@jdabrowski correctly points out that we do not support forward indexes on arrays, despite what I said above. We ran into some issues that prevented us from getting them into 20.2 after all.

See: #50662 #50656 #50659 #17154

@jdabrowski
Copy link

@jordanlewis thanks for the answer
any chances for it to fit into 21 release? If not is it on roadmap for 2021?

@ghost
Copy link

ghost commented Apr 6, 2021

We have an issue here with an optimised data storage for placing trades that contain several sub-components.

There need to be two arrays on the object.

  1. Contains the list of markets we are matching for outcomes.
  2. Contains the corresponding array of event outcomes (the specific market position taken by the participant).

The query is made for a market event and corresponding correct outcome.

SELECT from positions p where p.mkt @> [10000] and p.pos @> ARRAY['10000:ln-std:1,10,30']

10000 is the id of the market
ln-std:1,10,30 is the participant's position concerning the market.

@RaduBerinde
Copy link
Member

@bryanhuntesl you should be able to create two inverted indexes on the two arrays (or only one, for the more restrictive condition).

@ghost
Copy link

ghost commented Apr 6, 2021

@bryanhuntesl you should be able to create two inverted indexes on the two arrays (or only one, for the more restrictive condition).

Thanks, @RaduBerinde. I have verified the scenario works with the latest pre-release version (v21.1.0-beta.2).

Executing the following :

CREATE TABLE positions (pos VARCHAR[]);
CREATE INDEX pos_index ON positions USING GIN (pos);
INSERT INTO positions VALUES (ARRAY['ln-std:1,10,30' ]);
EXPLAIN SELECT * FROM positions WHERE pos  @> ARRAY['ln-std:1,10,30'];


On Cockroach cloud (free), produces the following results:


"",distribution,local
"",vectorized,false
index join,"",""
 │,table,positions@primary
 └── scan,"",""
"",missing stats,""
"",table,positions@pos_index
"",spans,"[/ARRAY['ln-std:1,10,30'] - /ARRAY['ln-std:1,10,30']]"
​


Running locally ...

docker run -ti -p26257:26257 cockroachdb/cockroach-unstable:v21.1.0-beta.2 start-single-node --insecuredocker run -ti -p26257:26257 cockroachdb/cockroach-unstable:v21.1.0-beta.2 start-single-node --insecure

distribution: local
vectorized: true
​
• index join
│ estimated row count: 0
│ table: positions@primary
│
└── • scan
      estimated row count: 0 (11% of the table; stats collected 57 seconds ago)
      table: positions@pos_index
      spans: 1 span

@z0mb1ek
Copy link

z0mb1ek commented May 10, 2021

any chance to support for varchar inverted indexes?

@jlinder jlinder added the T-sql-queries SQL Queries Team label Jun 16, 2021
@ajwerner
Copy link
Contributor

any chance to support for varchar inverted indexes?

I think now you can do some form of this with a computed expression over the varchar field which you can get with either a virtual computed column in 21.1 or an expression based index in the upcoming 21.2.

@jordanlewis
Copy link
Member

After #79705 was merged, it's now possible to create trigram indexes on string data that accelerate LIKE, equality, and similarity queries (compatible with pg_trgm).

See #41285 if you would like to vote on the remainder of support necessary for full compatibility with pg_trgm.

@rharding6373
Copy link
Collaborator

Looks like only inverted indexes on VARCHAR is left in this meta issue. We're going to check if there is any code pointing to this issue number, and potentially close this issue.

@rharding6373
Copy link
Collaborator

We still use this issue for telemetry. Moving to backlog.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sql-pgcompat Semantic compatibility with PostgreSQL C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) meta-issue Contains a list of several other issues. T-sql-queries SQL Queries Team X-anchored-telemetry The issue number is anchored by telemetry references.
Projects
Status: Cold Storage
Development

No branches or pull requests