-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[YSQL] Index back-filling rows cannot be seen in the same transaction with the INDEX creation unless the transaction is COMMITed. #5240
Comments
Even more concise example, reproducible with ysqlsh: BEGIN;
CREATE TABLE t(k uuid default gen_random_uuid() PRIMARY KEY, v int); -- could also be serial
INSERT INTO t(v) SELECT * FROM generate_series(1, 10000);
CREATE UNIQUE INDEX t_v_unq ON t(v); -- reproducible with a regular index as well
EXPLAIN SELECT COUNT(*) FROM t WHERE v = 100;
SELECT COUNT(*) FROM t WHERE v = 100; Output:
The crucial part here is that the index is created after inserts are done, if you reorder the operation it works fine. |
Nice. Right, my
Do you think that this is a manifestation of the same problem? Or is it (likely to be) an independent problem? And what about this? -- Test 2
BEGIN;
CREATE TABLE t(k uuid default gen_random_uuid() PRIMARY KEY, v int);
INSERT INTO t(v) SELECT * FROM generate_series(1, 10000);
CREATE UNIQUE INDEX t_v_unq ON t(v);
ROLLBACK;
\d t
SELECT count(*) FROM t; Here's what the
And here's s what the
Given that the effects of Of course, on PG 11.2, Test 2 leaves no trace of the table or index after the |
At present DDL statements are not fully transactional - instead, they are executed somewhat like in their own separate transaction. We have umbrella issue #4192 to track this, but until then - I believe this is expected, since the transaction essentially becomes BEGIN;
-- DDL, executed separately
INSERT INTO t(v) SELECT * FROM generate_series(1, 10000);
-- DDL, executed separately
ROLLBACK; As for the
I haven't looked into that part yet, will do a quick follow up later on. |
That error message looks like one in issue #4770, but the difference is that that issue deals with range split tables. |
Here is a smaller repro CREATE TABLE t (k serial, v int);
INSERT INTO t (v) VALUES (1);
BEGIN;
CREATE INDEX ON t (v);
SELECT * FROM t WHERE v = 1;
|
|
@bllewell Regarding your compressed example, can you try this? CREATE TABLE IF NOT EXISTS t(i int);
CREATE OR REPLACE PROCEDURE playground()
LANGUAGE plpgsql
AS $body$
BEGIN
DROP INDEX IF EXISTS t_idx;
RAISE NOTICE 'Index dropped';
CREATE INDEX t_idx on t (i);
END;
$body$;
CALL playground(); For me it gives
But I was testing it on the recent master in debug config. |
Here's what I see using my YB-2.2.0.0 one-node cluster on my MacBook:
This sems to be entirely what I'd expect. The result is exactly the same on PG 11.2—except that there it says |
Here's a new testcase, inspired by this comment from frozenspider:
Apologies that it's rather baroque. But I wanted it to write a nice essay of the current state before reporting each next observation.
Here's the output, stripped of the
I wrote it like this because it's the best simulation of my procedure that first reveled this bug. It does this
The purpose of the unique indexes is to show that the results satisfy a requirement that the procedure that produces these is supposed to satisfy. I have no interest in having the whole thing being transactional. The steps implemented by the procedure used to be implemented ordinarily by a series of top-level server calls issued from a I modified my procedure to issue Given what I've understaood from the analysis from development presented here, my modified approach ought to be reliable. Please comment on the safety of my modified approach. In case this question isn't clear, here's the long form version:
|
@bllewell Yes, the index is safe to use after you |
Jira Link: DB-1465
In the following example,
ROW (id = 1, v = 1)
cannot be seen when being selected using indexidx_v
within a transaction even though the row can be seen when the indexidx_id
is used.Furthermore,
ROW (id = 1, v = 1)
was being back-filled when indexidx_v
was created. On the other hand,ROW (id = 2, v = 2)
that is inserted after theidx_v
index creation can be seen by the transaction as expected.This issue originally reported in the following example.
We are working and analyzing this example and will update the issue accordingly.
Observations made on my MacBook, Mojave Version 10.14.6, using a single-node YB-2.2.0.0 cluster.
Here's the minimal testcase, Save it as, say,
testcase.sql
.Notice that it invokes
prepare-explain-execute.sql
. Here it is:When you start
testcase.sql
in psql, using vanilla, PG 11.2 you get the same result:for each of the two attempts. When you start the script in ysqlsh using YB-2.2.0.0, the first gives the expected 1. But the second gives the wrong-result zero. Here is the YB version of
report.txt
:Notice that the
\d t
output for the second run, withcall p(false)
, is indistinguishable from that for the first, withcall p(true)
. The same is true for the plans. Obviously the planner thinks that the index is viable, and chooses it—just as you'd expect for a query that uses an identity predicate on a column with a unique index in a biggish table.The problem first manifest for me in a large code-kit. It sprung up when I made what I thought was an inessential change by encapsulating steps like you see here in a procedure for re-use—and to facilitate reliable timing using a server-side elapsed wall-clock scheme.
What's worse is that the problem first advertised itself by many repetitions of this error message:
The
report.txt
showed the same basic problem: zero rows from a query whose plan showed that it would use an index. Even more confusing, initially (the maximally user-unfriendly error aside) was that the planner decided for a similar query with a different selectivity not to use the index—and so it gave the right results there.I have been unable to create a small testcase that provokes the
XX000...ybctid...
error. So I'm including the smallest one that I could manage. Seeissue-5240.zip
. It's noticeably cut down from its progenitor in my real project. Simply unzip it and start0.sql
at the ysqlsh prompt. Notice that the harness runs the basicdo-analysis-for-one.sql
eight times. but the theXX000...ybctid...
error.= occurs just seven times. Even more confusing because the results are wrong in the same way for each of the eight tests. Of course, it runs faultlessly and gives the right results in vanilla PG 11.2.Notice the file
cr-populate-actor-actor-edges.sql
. This is the analogue to theDO
block in my minimal testcase. I added thedo_commit
logic using a variable in theDECLARE
section when I finally hit on the source of the problem. When every SQL statement is committed immediately, the problem goes away.I decided not to test whether every SQL statement needs to be committed. None should need to be—so I'm leaving that testing to whomever takes on this bug.
Commentary
I understand that in PG, DDLs are transactional—so you have to consider this if you set
AUTOCOMMIT
to'off'
. But aDO
block, and a procedureCALL
, start their own transaction and commit automatically at the end.I believe that, through YB-2.2.0.0 , DDLs are still not properly transactional. But that should not be relevant here where not one of the SQL statements (in the procedure that the file
cr-populate-actor-actor-edges.sql
creates or in my minimal testcase) causes an error and the procedure therefore autocommits on completion.The text was updated successfully, but these errors were encountered: