Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[YSQL] Index back-filling rows cannot be seen in the same transaction with the INDEX creation unless the transaction is COMMITed. #5240

Open
bllewell opened this issue Jul 28, 2020 · 11 comments
Assignees
Labels
area/ysql Yugabyte SQL (YSQL) kind/bug This issue is a bug priority/medium Medium priority issue

Comments

@bllewell
Copy link
Contributor

bllewell commented Jul 28, 2020

Jira Link: DB-1465
In the following example, ROW (id = 1, v = 1) cannot be seen when being selected using index idx_v within a transaction even though the row can be seen when the index idx_id is used.

Furthermore, ROW (id = 1, v = 1) was being back-filled when index idx_v was created. On the other hand, ROW (id = 2, v = 2) that is inserted after the idx_v index creation can be seen by the transaction as expected.

postgres=# CREATE TABLE tab(id int, v int);
CREATE TABLE
postgres=# CREATE INDEX idx_id ON tab(id);
CREATE INDEX
postgres=# BEGIN;
BEGIN
postgres=# INSERT INTO tab VALUES (1, 1);
INSERT 0 1
postgres=# CREATE INDEX idx_v ON tab(v);
CREATE INDEX
postgres=# INSERT INTO tab VALUES (2, 2);
INSERT 0 1
postgres=# SELECT * FROM tab WHERE id = 1;
 id | v 
----+---
  1 | 1
(1 row)

postgres=# SELECT * FROM tab WHERE id = 2;
 id | v 
----+---
  2 | 2
(1 row)

postgres=# SELECT * FROM tab WHERE v = 1;
 id | v 
----+---
(0 rows)

postgres=# SELECT * FROM tab WHERE v = 2;
 id | v 
----+---
  2 | 2
(1 row)

This issue originally reported in the following example.
We are working and analyzing this example and will update the issue accordingly.

Observations made on my MacBook, Mojave Version 10.14.6, using a single-node YB-2.2.0.0 cluster.

Here's the minimal testcase, Save it as, say, testcase.sql.

set client_min_messages = warning;
drop table if exists t;
create table t(k uuid default gen_random_uuid() primary key, v int);

create or replace procedure p(do_commit in boolean)
  language plpgsql
as $body$
begin
  execute 'drop index if exists t_v_unq';
  if do_commit then commit; end if;

  execute 'truncate table t';
  if do_commit then commit; end if;

  insert into t(v)
  select series.val from generate_series(1, 10000) as series(val);
  if do_commit then commit; end if;

  execute 'create unique index t_v_unq on t(v)';
  if do_commit then commit; end if;
end;
$body$;

\t on
\o report.txt

select 'using "call p(true)"';
call p(true);
\i prepare-explain-execute.sql

select 'using "call p(false)"';
call p(false);
\i prepare-explain-execute.sql

\t off
\o

Notice that it invokes prepare-explain-execute.sql. Here it is:

-- prepare-explain-execute.sql

deallocate all;

\d t

prepare stmt(int)
as select 'count(*) from t: '||count(*)::text from t where v = $1;

explain execute stmt(100);
execute stmt(100);

When you start testcase.sql in psql, using vanilla, PG 11.2 you get the same result:

count(*) from t: 1

for each of the two attempts. When you start the script in ysqlsh using YB-2.2.0.0, the first gives the expected 1. But the second gives the wrong-result zero. Here is the YB version of report.txt:

 using "call p(true)"

 k      | uuid    |           | not null | gen_random_uuid()
 v      | integer |           |          | 

 Aggregate  (cost=4.12..4.13 rows=1 width=32)
   ->  Index Only Scan using t_v_unq on t  (cost=0.00..4.11 rows=1 width=0)
         Index Cond: (v = 100)

 count(*) from t: 1

 using "call p(false)"

 k      | uuid    |           | not null | gen_random_uuid()
 v      | integer |           |          | 

 Aggregate  (cost=4.12..4.13 rows=1 width=32)
   ->  Index Only Scan using t_v_unq on t  (cost=0.00..4.11 rows=1 width=0)
         Index Cond: (v = 100)

 count(*) from t: 0

Notice that the \d t output for the second run, with call p(false), is indistinguishable from that for the first, with call p(true). The same is true for the plans. Obviously the planner thinks that the index is viable, and chooses it—just as you'd expect for a query that uses an identity predicate on a column with a unique index in a biggish table.

The problem first manifest for me in a large code-kit. It sprung up when I made what I thought was an inessential change by encapsulating steps like you see here in a procedure for re-use—and to facilitate reliable timing using a server-side elapsed wall-clock scheme.

What's worse is that the problem first advertised itself by many repetitions of this error message:

ysqlsh:my.sql:27: ERROR:  XX000: Query error: Given ybctid is not associated with any row in table: 0 vs. 1
LOCATION:  HandleYBStatus, pg_yb_utils.c:242

The report.txt showed the same basic problem: zero rows from a query whose plan showed that it would use an index. Even more confusing, initially (the maximally user-unfriendly error aside) was that the planner decided for a similar query with a different selectivity not to use the index—and so it gave the right results there.

I have been unable to create a small testcase that provokes the XX000...ybctid... error. So I'm including the smallest one that I could manage. See issue-5240.zip. It's noticeably cut down from its progenitor in my real project. Simply unzip it and start 0.sql at the ysqlsh prompt. Notice that the harness runs the basic do-analysis-for-one.sql eight times. but the the XX000...ybctid... error.= occurs just seven times. Even more confusing because the results are wrong in the same way for each of the eight tests. Of course, it runs faultlessly and gives the right results in vanilla PG 11.2.

Notice the file cr-populate-actor-actor-edges.sql. This is the analogue to the DO block in my minimal testcase. I added the do_commit logic using a variable in the DECLARE section when I finally hit on the source of the problem. When every SQL statement is committed immediately, the problem goes away.

I decided not to test whether every SQL statement needs to be committed. None should need to be—so I'm leaving that testing to whomever takes on this bug.

Commentary

I understand that in PG, DDLs are transactional—so you have to consider this if you set AUTOCOMMIT to 'off'. But a DO block, and a procedure CALL, start their own transaction and commit automatically at the end.

I believe that, through YB-2.2.0.0 , DDLs are still not properly transactional. But that should not be relevant here where not one of the SQL statements (in the procedure that the file cr-populate-actor-actor-edges.sql creates or in my minimal testcase) causes an error and the procedure therefore autocommits on completion.

@bllewell bllewell added the area/ycql Yugabyte CQL (YCQL) label Jul 28, 2020
@bllewell bllewell changed the title [YSQL[ Wrong results when the planner chooses an apparently viab.e index [YSQL] Wrong results when the planner chooses an apparently viable index Jul 28, 2020
@bllewell
Copy link
Contributor Author

issue-5240.zip

@frozenspider
Copy link
Contributor

frozenspider commented Jul 28, 2020

Even more concise example, reproducible with ysqlsh:

BEGIN;
CREATE TABLE t(k uuid default gen_random_uuid() PRIMARY KEY, v int); -- could also be serial
INSERT INTO t(v) SELECT * FROM generate_series(1, 10000);
CREATE UNIQUE INDEX t_v_unq ON t(v); -- reproducible with a regular index as well
EXPLAIN SELECT COUNT(*) FROM t WHERE v = 100;
SELECT COUNT(*) FROM t WHERE v = 100;

Output:

yugabyte=# EXPLAIN SELECT COUNT(*) FROM t WHERE v = 100;
                                 QUERY PLAN
----------------------------------------------------------------------------
 Aggregate  (cost=4.12..4.12 rows=1 width=8)
   ->  Index Only Scan using t_v_unq on t  (cost=0.00..4.11 rows=1 width=0)
         Index Cond: (v = 100)
(3 rows)

yugabyte=# SELECT COUNT(*) FROM t WHERE v = 100;
 count
-------
     0
(1 row)

The crucial part here is that the index is created after inserts are done, if you reorder the operation it works fine.
cc @jaki

@bllewell
Copy link
Contributor Author

bllewell commented Jul 28, 2020

Nice. Right, my DO block was just a device to start a transaction. BEGIN is yet more explicit. What about this error:

ERROR:  XX000: Query error: Given ybctid is not associated with any row in table: 0 vs. 1

Do you think that this is a manifestation of the same problem? Or is it (likely to be) an independent problem?

And what about this?

-- Test 2
BEGIN;
CREATE TABLE t(k uuid default gen_random_uuid() PRIMARY KEY, v int);
INSERT INTO t(v) SELECT * FROM generate_series(1, 10000);
CREATE UNIQUE INDEX t_v_unq ON t(v);
ROLLBACK;

\d t
SELECT count(*) FROM t;

Here's what the \d t command shows:

 Column |  Type   | Collation | Nullable |      Default      
--------+---------+-----------+----------+-------------------
 k      | uuid    |           | not null | gen_random_uuid()
 v      | integer |           |          | 

And here's s what the select count(*) from t; shows:

 count 
-------
     0

Given that the effects of create table and create index survived rollback (showing that these are not yet transactional), how on earth could the effect of the insert be rolled back? It happened before the later self-committing create index?

Of course, on PG 11.2, Test 2 leaves no trace of the table or index after the rollback.

@frozenspider frozenspider added area/ysql Yugabyte SQL (YSQL) and removed area/ycql Yugabyte CQL (YCQL) labels Jul 28, 2020
@frozenspider
Copy link
Contributor

At present DDL statements are not fully transactional - instead, they are executed somewhat like in their own separate transaction. We have umbrella issue #4192 to track this, but until then - I believe this is expected, since the transaction essentially becomes

BEGIN;
-- DDL, executed separately
INSERT INTO t(v) SELECT * FROM generate_series(1, 10000);
-- DDL, executed separately
ROLLBACK;

As for the

ERROR:  XX000: Query error: Given ybctid is not associated with any row in table: 0 vs. 1

I haven't looked into that part yet, will do a quick follow up later on.

@jaki
Copy link
Contributor

jaki commented Jul 28, 2020

That error message looks like one in issue #4770, but the difference is that that issue deals with range split tables.

@jaki
Copy link
Contributor

jaki commented Jul 28, 2020

Here is a smaller repro

CREATE TABLE t (k serial, v int);
INSERT INTO t (v) VALUES (1);
BEGIN;
CREATE INDEX ON t (v);
SELECT * FROM t WHERE v = 1;
 k | v                                                                                                                                                                                                                                                                                                                      
---+---                                                                                                                                                                                                                                                                                                                     
(0 rows)

@jaki
Copy link
Contributor

jaki commented Jul 28, 2020

  • CREATE INDEX ON t (v ASC) doesn't help
  • CREATE TABLE t (k serial PRIMARY KEY, v int) doesn't help
  • CREATE TABLE t (k serial, v int, PRIMARY KEY (k ASC)) doesn't help
  • BEGIN moved earlier doesn't help
  • BEGIN moved later does help

@frozenspider
Copy link
Contributor

frozenspider commented Jul 28, 2020

@bllewell Regarding your compressed example, can you try this?

CREATE TABLE IF NOT EXISTS t(i int);

CREATE OR REPLACE PROCEDURE playground()
  LANGUAGE plpgsql
AS $body$
BEGIN
  DROP INDEX IF EXISTS t_idx;
  RAISE NOTICE 'Index dropped';
  CREATE INDEX t_idx on t (i);
END;
$body$;

CALL playground();

For me it gives

NOTICE:  index "t_idx" does not exist, skipping
NOTICE:  Index dropped
WARNING:  plancache reference leak: plan 0x11e9bc118 not closed
WARNING:  transaction left non-empty SPI stack
HINT:  Check for missing "SPI_finish" calls.
WARNING:  snapshot 0x11e372120 still active
server closed the connection unexpectedly
	This probably means the server terminated abnormally
	before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!>

But I was testing it on the recent master in debug config.
Note: I'll create a separate issue from this later today

@bllewell
Copy link
Contributor Author

bllewell commented Jul 28, 2020

Here's what I see using my YB-2.2.0.0 one-node cluster on my MacBook:

CALL playground();
NOTICE:  index "t_idx" does not exist, skipping
NOTICE:  Index dropped

\d t
                   Table "u1.t"
 Column |  Type   | Collation | Nullable | Default 
--------+---------+-----------+----------+---------
 i      | integer |           |          | 
Indexes:
    "t_idx" lsm (i HASH)

This sems to be entirely what I'd expect. The result is exactly the same on PG 11.2—except that there it says t_idx" btree (i) as we'd expect.

@bllewell
Copy link
Contributor Author

bllewell commented Aug 1, 2020

Here's a new testcase, inspired by this comment from frozenspider:

At present DDL statements are not fully transactional - instead, they are executed somewhat like in their own separate transaction. We have umbrella issue #4192 to track this, but until then - I believe this is expected...

Apologies that it's rather baroque. But I wanted it to write a nice essay of the current state before reporting each next observation.

create table t (k serial, v text);
alter sequence t_k_seq  cache 100000;
insert into t (v)
select 'v'||ltrim(to_char(g.v*5, '00000')) as val
from generate_series(1, 10000) as g(v);

-- At this point, the rows in t ARE COMMITTED.

prepare t_qry as
select k from t
where v = 'v00100';

\pset null '<null>'
do $body$
declare
  pad constant int := 25;
  t_qry constant text := $$select k from t where v = 'v00100'$$;
  p text not null := '?';
  v text;
begin
  raise info '';
  raise info 'No index yet.';
  raise info '';
  execute 'explain '||t_qry into p;
  raise info '% %', rpad('  plan:', pad), p;
  execute t_qry into v;
  raise info '% %', rpad('  v (1st observation):', pad), v;

  execute 'create index on t (v)';

  raise info '';
  raise info 'Index exists now -- but isn''t committed so ISN''T YET RELIABLE.';
  raise info '';
  execute 'explain '||t_qry into p;
  raise info '% %', rpad('  new plan:', pad), p;
  execute t_qry into v;
  raise info '% %', rpad('  v (2nd observation):', pad), v;

  commit;

  raise info '';
  raise info 'Now the index is committed and IS reliable.';
  raise info '';
  execute t_qry into v;
  raise info '% %', rpad('  v (3rd observation):', pad), v;
end;
$body$;

Here's the output, stripped of the INFO: clutter:

No index yet.

  plan:                   Foreign Scan on t  (cost=0.00..102.50 rows=1000 width=4)
  v (1st observation):    20

Index exists now -- but isn't committed so ISN'T YET RELIABLE.

  new plan:               Index Scan using t_v_idx on t  (cost=0.00..5.22 rows=10 width=4)
  v (2nd observation):    <NULL>

Now the index is committed and IS reliable.

  v (3rd observation):    20

I wrote it like this because it's the best simulation of my procedure that first reveled this bug. It does this

  • Drops two unique indexes on columns of interest in a pre-existing results table.
  • Truncates the table to be ready for new results.
  • Inserts the new results,
  • Re-creates the two unique indexes.
  • Writes timings made with a server-side clock of the critical steps, and other findings, to a second table.

The purpose of the unique indexes is to show that the results satisfy a requirement that the procedure that produces these is supposed to satisfy. I have no interest in having the whole thing being transactional.

The steps implemented by the procedure used to be implemented ordinarily by a series of top-level server calls issued from a .sql script with AUTOCOMMIT set to the default value of 'on'. But it was hugely inconvenient to run it this way. So I encapsulated the whole thing in a stored procedure. This is a very natural way to work in the context of a project that's intended to develop, and test, a new implementation that meets a rather hard requirement where speed (and of course correctness) must be tested after every frequent change that's intended to improve performance. I didn't think to emulate autocommit mode. That's when I hit the bug.

I modified my procedure to issue commit after each individual SQL statement. My own subsequent testing shows that it works reliably now.

Given what I've understaood from the analysis from development presented here, my modified approach ought to be reliable.

Please comment on the safety of my modified approach.

In case this question isn't clear, here's the long form version:

If I commit after every single SQL in a DO block or language plpgsql proc, esp. when at least one of these is a regular insert, update, or commit and a later one is a DDL like create index, will I always (i.e. safely) get the same outcome as if each SQL was done in autocommit mode as a top-level call from a client like ysqlsh?

@frozenspider
Copy link
Contributor

@bllewell Yes, the index is safe to use after you COMMITits creation (either implicitly or explicitly) - this includes using it in a subsequent transaction, as you do in your example.

@nocaway nocaway changed the title [YSQL] Wrong results when the planner chooses an apparently viable index [YSQL] Inserted rows are not seen in the same transaction unless it is COMMITed. Oct 27, 2020
@nocaway nocaway added the kind/bug This issue is a bug label Oct 27, 2020
@nocaway nocaway changed the title [YSQL] Inserted rows are not seen in the same transaction unless it is COMMITed. [YSQL] Inserted rows that are in the same transaction with back-filling INDEX cannot be seen unless the transaction is COMMITed. Oct 27, 2020
@nocaway nocaway changed the title [YSQL] Inserted rows that are in the same transaction with back-filling INDEX cannot be seen unless the transaction is COMMITed. [YSQL] Index back-filling rows cannot be seen in the same transaction with the INDEX creation unless the transaction is COMMITed. Oct 27, 2020
@m-iancu m-iancu added this to YQL-beta Nov 30, 2021
@yugabyte-ci yugabyte-ci added the priority/medium Medium priority issue label Jun 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ysql Yugabyte SQL (YSQL) kind/bug This issue is a bug priority/medium Medium priority issue
Projects
Status: No status
Development

No branches or pull requests

8 participants