Skip to content

Commit

Permalink
#3108 Limited transactional DDL: each DDL statement in its own transa…
Browse files Browse the repository at this point in the history
…ction

Summary:
This diff makes a step towards enabling transactional DDL in YugabyteDB
by switching to transactional execution of multiple YSQL system catalog
read/write operations that happen as part of the same DDL operation.

We are making YSQL system catalog tables transactional at the DocDB
level, backed by the usual transaction status table that resides on
tablet servers. This introduces an additional dependency of the master
on tablet servers, so that accessing YSQL system catalog tables in the
common case now requires tablet servers to be up and running and the
transaction status table to exist, because there could be provisional
records whose status is determined by a status tablet living on tablet
servers. This new dependency could be removed later by making the system
catalog tablet a transaction status tablet in addition to everything
else it already stores.

Previously, we used to create the transaction status table whenever the
first request to create a transactional table (including any YSQL table)
was received. Now, the transaction status table is also required by
operations such as creating a role in YSQL, or anything that reads or
changes the state of YSQL system catalog tables. Therefore, if YSQL is
enabled, we now create the transaction status table automatically on the
master leader right after the number of tablet servers equal to the
cluster's replication factor have registered.

After we've added DocDB transactional capabilities to YSQL system
catalog tables, there are multiple possible choices in how we could wrap
formerly non-transactional DDL operations that access the YSQL system
catalog into transactions.  Ideally, we would want to allow arbitrary
user-controlled transactions to include DDL statements that could be
committed/rolled back similarly to DML statements. That would require us
to commit and roll back the corresponding Yugabyte-level operations
(creating namespaces, tablets, and tablets in the Yugabyte system
catalog, which is also stored in the master but is distinct from the
YSQL system catalog) along with YSQL system catalog changes. This more
complete feature is tracked at #3109 and is not part of this diff.

This diff's main goal is an incremental improvement over the existing
state of completely non-transactional YSQL system catalog changes that
would allow a reader to e.g. see a state of a half-done CREATE TABLE
statement. With this diff, all DDL statements are wrapped in a separate
DocDB-level transaction that is distinct from the regular DocDB-level
transaction matching the user's YSQL transaction. One such separate
DDL-only transaction is created per each top-level DML statement,
including any internal sub-statements that it issues. This should solve
the problem of leaving the effects of a half-done CREATE TABLE statement
in the system catalog, and the problem of observing results of an
incomplete CREATE TABLE statement (e.g. before it sets the default value
of a column, as in #2021). However, this could produce some anomalies,
e.g. #3110, that will be addressed as we implement full transactional
DDL support (see #3109).  Even though the example given in #3110
involves row-level security, this diff does not negatively affect row
level security specificially. Any CREATE TABLE statement executed as
part of a user-controlled transaction will now create and commit rows as
part of a separate DDL-only tranasction, and if the enclosing
user-controlled transaction is running at REPEATABLE READ isolation,
further statements in the user-controlled transaction won't see those
rows. They would still see the newly created table, because internal
read operations on system catalog tables are always done outside of any
transaction.  YugabyteDB's SERIALIZABLE transactions always read the
most recent state of the data and lock it against concurrent changes, so
switching the isolation level to SERIALIZABLE is sufficient to fix
the yb_pg_rowsecurity test.

These separate DDL-only transactions are executed at the REPEATABLE READ
isolation level by default, but can be set to serializable by the
setting the new `--ysql_enable_manual_sys_table_txn_ctl` flag.

We are adding a new flag to table properties, is_ysql_catalog_table. To
initialize this flag for existing clusters, we loop over all YSQL system
catalog tables and replicate the necessary changes to table entities in
the YugabyteDB system catalog, as well as updates to the system catalog
tablet metadata. For the tablet metadata change we use the existing
add_table field in ChangeMetadataRequestPB. This was originally intended
for use with new tables only, but would only produce an error message on
cluster upgrades in release builds (due to a DCHECK), and effectively
replace the old metadata protobuf with the new one.

Test Plan:
Jenkins

Upgrade test:

~/code/yugabyte-db-old/bin/yb-ctl create

Load the Northwind database as described at
https://docs.yugabyte.com/latest/sample-data/northwind/

~/code/yugabyte-db-old/bin/yb-ctl stop

~/code/yugabyte-db-new/bin/yb-ctl start

Look for these messages in the master log:

I1212 22:13:36.832412 2963256 sys_catalog_initialization.cc:312] Made 655 YSQL sys catalog tables transactional
I1212 22:13:36.832451 2963256 sys_catalog_initialization.cc:315] Marking YSQL system catalog as transactional in YSQL catalog config

Verify that the Northwind data is intact.

Repeat the above steps with `--replication_factor=3` as well.

Reviewers: mihnea, sergei, neha

Reviewed By: sergei, neha

Subscribers: bogdan

Differential Revision: https://phabricator.dev.yugabyte.com/D7156
  • Loading branch information
mbautin committed Dec 12, 2019
1 parent b99ed7f commit b68b84b
Show file tree
Hide file tree
Showing 106 changed files with 1,547 additions and 631 deletions.
2 changes: 2 additions & 0 deletions ent/src/yb/integration-tests/cdc_service-int-test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,7 @@ DECLARE_int32(update_min_cdc_indices_interval_secs);
DECLARE_double(leader_failure_max_missed_heartbeat_periods);
DECLARE_bool(enable_load_balancing);
DECLARE_int32(follower_unavailable_considered_failed_sec);
DECLARE_bool(enable_ysql);

METRIC_DECLARE_entity(cdc);
METRIC_DECLARE_gauge_int64(last_read_opid_index);
Expand Down Expand Up @@ -77,6 +78,7 @@ class CDCServiceTest : public YBMiniClusterTestBase<MiniCluster> {
YBMiniClusterTestBase::SetUp();

MiniClusterOptions opts;
SetAtomicFlag(false, &FLAGS_enable_ysql);
opts.num_tablet_servers = server_count();
opts.num_masters = 1;
cluster_.reset(new MiniCluster(env_.get(), opts));
Expand Down
2 changes: 2 additions & 0 deletions ent/src/yb/integration-tests/encryption-test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,8 @@ class EncryptionTest : public YBTableTestBase {

bool use_external_mini_cluster() override { return false; }

bool enable_ysql() override { return false; }

int num_tablet_servers() override {
return 3;
}
Expand Down
7 changes: 2 additions & 5 deletions ent/src/yb/integration-tests/snapshot-test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@ using namespace std::literals;
DECLARE_uint64(log_segment_size_bytes);
DECLARE_int32(log_min_seconds_to_retain);
DECLARE_bool(tablet_verify_flushed_frontier_after_modifying);
DECLARE_bool(enable_ysql);

namespace yb {

Expand Down Expand Up @@ -100,10 +101,9 @@ class SnapshotTest : public YBMiniClusterTestBase<MiniCluster> {
void SetUp() override {
YBMiniClusterTestBase::SetUp();

flag_saver_.emplace();

FLAGS_log_min_seconds_to_retain = 5;
FLAGS_tablet_verify_flushed_frontier_after_modifying = true;
FLAGS_enable_ysql = false;

MiniClusterOptions opts;
opts.num_tablet_servers = 3;
Expand Down Expand Up @@ -139,8 +139,6 @@ class SnapshotTest : public YBMiniClusterTestBase<MiniCluster> {
cluster_.reset();
}

flag_saver_.reset();

YBMiniClusterTestBase::DoTearDown();
}

Expand Down Expand Up @@ -348,7 +346,6 @@ class SnapshotTest : public YBMiniClusterTestBase<MiniCluster> {
unique_ptr<MasterBackupServiceProxy> proxy_backup_;
RpcController controller_;
std::unique_ptr<client::YBClient> client_;
boost::optional<google::FlagSaver> flag_saver_;
};

TEST_F(SnapshotTest, CreateSnapshot) {
Expand Down
1 change: 1 addition & 0 deletions ent/src/yb/integration-tests/twodc-test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -100,6 +100,7 @@ class TwoDCTest : public YBTest, public testing::WithParamInterface<int> {
SetUpWithParams(std::vector<uint32_t> num_consumer_tablets,
std::vector<uint32_t> num_producer_tablets,
uint32_t replication_factor) {
FLAGS_enable_ysql = false;
// Allow for one-off network instability by ensuring a single CDC RPC timeout << test timeout.
FLAGS_cdc_read_rpc_timeout_ms = (kRpcTimeout / 6) * 1000;
FLAGS_cdc_write_rpc_timeout_ms = (kRpcTimeout / 6) * 1000;
Expand Down
4 changes: 2 additions & 2 deletions ent/src/yb/master/sys_catalog-test_ent.cc
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,7 @@ TEST_F(SysCatalogTest, TestSysCatalogCDCStreamOperations) {
loader->Reset();
ASSERT_OK(sys_catalog->Visit(loader.get()));
ASSERT_EQ(1, loader->streams.size());
ASSERT_TRUE(MetadatasEqual(stream.get(), loader->streams[0]));
ASSERT_METADATA_EQ(stream.get(), loader->streams[0]);

// 2. CHECK DELETE_CDCSTREAM.
ASSERT_OK(sys_catalog->DeleteItem(stream.get(), kLeaderTerm));
Expand Down Expand Up @@ -122,7 +122,7 @@ TEST_F(SysCatalogTest, TestSysCatalogUniverseReplicationOperations) {
loader->Reset();
ASSERT_OK(sys_catalog->Visit(loader.get()));
ASSERT_EQ(1, loader->universes.size());
ASSERT_TRUE(MetadatasEqual(universe.get(), loader->universes[0]));
ASSERT_METADATA_EQ(universe.get(), loader->universes[0]);

// 2. CHECK DELETE_UNIVERSE_REPLICATION.
ASSERT_OK(sys_catalog->DeleteItem(universe.get(), kLeaderTerm));
Expand Down
6 changes: 6 additions & 0 deletions java/yb-client/src/test/java/org/yb/client/TestUtils.java
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,8 @@ public class TestUtils {
"/tmp/ybtest-" + System.getProperty("user.name") + "-" + startTimeMillis + "-" +
new Random().nextInt(Integer.MAX_VALUE);

private static boolean isJenkins = System.getProperty("user.name").equals("jenkins");

private static final AtomicBoolean defaultTestTmpDirCleanupHookRegistered = new AtomicBoolean();

// The amount of time to wait for in addition to the ttl specified.
Expand Down Expand Up @@ -514,4 +516,8 @@ public static <T> List<T> joinLists(List<T> a, List<T> b) {
return joinedList;
}

public static boolean isJenkins() {
return isJenkins;
}

}
22 changes: 6 additions & 16 deletions java/yb-pgsql/src/test/java/org/yb/pgsql/PgRegressRunner.java
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,7 @@ public PgRegressRunner(String schedule, String pgHost, int pgPort, String pgUser
throw new RuntimeException(ex);
}


this.pgSchedule = schedule;
this.pgHost = pgHost;
this.pgPort = pgPort;
Expand Down Expand Up @@ -230,29 +231,18 @@ public void stop() throws InterruptedException, IOException {
).forEach(pathToCopy -> {
String fileName = pathToCopy.toFile().getName();
String relPathStr = pgRegressOutputPath.relativize(pathToCopy).toString();
String fileNameNoExt = FilenameUtils.removeExtension(fileName);
File srcFile = pathToCopy.toFile();
File destFile = new File(getPgRegressDir(), relPathStr);
if ((resultFileNames.contains(fileNameNoExt) || fileName.endsWith(".diffs")) &&
!fileName.endsWith(".source")) {
if ((fileName.endsWith(".out") || fileName.endsWith(".diffs")) &&
!relPathStr.startsWith("expected/")) {
File srcFile = pathToCopy.toFile();
File destFile = new File(getPgRegressDir(), relPathStr);
LOG.info("Copying file " + srcFile + " to " + destFile);
try {
FileUtils.copyFile(srcFile, destFile);
copiedFiles.add(relPathStr);
} catch (IOException ex) {
LOG.warn("Failed copying file " + srcFile + " to " + destFile, ex);
failedFiles.add(relPathStr);
}
} else {
skippedFiles.add(relPathStr);
}
});

String fromWhereToWhere = "from " + pgRegressOutputPath + " to " + getPgRegressDir() + ": ";
LOG.info("Copied files " + fromWhereToWhere + copiedFiles);
LOG.info("Skipped copying files " + fromWhereToWhere + skippedFiles);
if (!failedFiles.isEmpty()) {
LOG.info("Failed copying files " + fromWhereToWhere + failedFiles);
}
}

if (EnvAndSysPropertyUtil.isEnvVarOrSystemPropertyTrue("YB_PG_REGRESS_IGNORE_RESULT")) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2157,26 +2157,20 @@ public void testReferencesOnColumns() throws Exception {
"CREATE TABLE test_table(id int PRIMARY KEY, cid int REFERENCES table1(id1))",
PERMISSION_DENIED
);
// TODO remove this DROP once #1383 is implemented.
statement.execute("DROP TABLE test_table");
runInvalidQuery(
statement,
"CREATE TABLE test_table(id int PRIMARY KEY," +
" cid int REFERENCES table1(id1)," +
" did int REFERENCES table1(id2))",
PERMISSION_DENIED
);
// TODO remove this DROP once #1383 is implemented.
statement.execute("DROP TABLE test_table");
runInvalidQuery(
statement,
"CREATE TABLE test_table(id int PRIMARY KEY," +
" cid int REFERENCES table2(id3)," +
" did int REFERENCES table2(id2))",
PERMISSION_DENIED
);
// TODO remove this DROP once #1383 is implemented.
statement.execute("DROP TABLE test_table");
});

withRole(statement, "no_references", () -> {
Expand All @@ -2186,8 +2180,6 @@ public void testReferencesOnColumns() throws Exception {
"CREATE TABLE test_table(id int PRIMARY KEY, cid int REFERENCES table1(id2))",
PERMISSION_DENIED
);
// TODO remove this DROP once #1383 is implemented.
statement.execute("DROP TABLE test_table");
runInvalidQuery(
statement,
"CREATE TABLE test_table(id int PRIMARY KEY, cid int REFERENCES table2(id3))",
Expand Down
Loading

0 comments on commit b68b84b

Please sign in to comment.