Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(c/driver/postgresql): Factor out Postgres type abstraction and test it independently of the driver #573

Merged
merged 90 commits into from
Apr 20, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
90 commits
Select commit Hold shift + click to select a range
541e9e0
document test build setup
paleolimbot Apr 1, 2023
93d7c51
thinking about converting
paleolimbot Apr 2, 2023
401944b
two converters
paleolimbot Apr 2, 2023
b53ae7b
some postgres type stuff
paleolimbot Apr 3, 2023
c20673f
add some other endian swappers
paleolimbot Apr 3, 2023
f1a056c
maybe nested
paleolimbot Apr 3, 2023
28baa38
type names
paleolimbot Apr 3, 2023
d5464cc
rethinking
paleolimbot Apr 3, 2023
20e67b6
add test to cmake
paleolimbot Apr 3, 2023
3226645
test the basics
paleolimbot Apr 3, 2023
5358db5
test reocrds
paleolimbot Apr 3, 2023
82b6873
test init schema
paleolimbot Apr 3, 2023
b4e8c7c
better unknown support
paleolimbot Apr 3, 2023
ecfacfe
some resolver tests
paleolimbot Apr 3, 2023
fa0ef95
test domain and range and array
paleolimbot Apr 3, 2023
8dc3327
more tests
paleolimbot Apr 3, 2023
9ae2b24
rtti
paleolimbot Apr 3, 2023
602ac44
fix rat
paleolimbot Apr 4, 2023
13e96d1
format
paleolimbot Apr 4, 2023
2f5c2b8
maybe fix windows build
paleolimbot Apr 4, 2023
26aa557
maybe fix ubuntu build
paleolimbot Apr 4, 2023
4104fc4
punt on overflow
paleolimbot Apr 4, 2023
3f05227
maybe fix lint
paleolimbot Apr 4, 2023
ee05ff2
more lint
paleolimbot Apr 4, 2023
8d6ff12
lint
paleolimbot Apr 4, 2023
2a8e34e
maybe some reading
paleolimbot Apr 4, 2023
dcd738c
simplify a bit
paleolimbot Apr 4, 2023
5c6acfe
split + rename
paleolimbot Apr 4, 2023
6367ca2
wire up another test
paleolimbot Apr 4, 2023
e1009cf
more utils
paleolimbot Apr 4, 2023
fcbfa02
fiddling with nested types
paleolimbot Apr 4, 2023
547f491
some more tweaks, add test data
paleolimbot Apr 4, 2023
45f52f7
tidy
paleolimbot Apr 4, 2023
1f8bdc1
fix sign compare
paleolimbot Apr 4, 2023
4baafe3
maybe fix cpptidy
paleolimbot Apr 4, 2023
65f6266
sketching
paleolimbot Apr 4, 2023
ee56a2b
some rethinking of the copy reader class def
paleolimbot Apr 5, 2023
8f9adce
maybe get recursion to work for instantiating converters
paleolimbot Apr 5, 2023
4b143d1
maybe full lifecycle
paleolimbot Apr 5, 2023
f62e62a
totally theoretical header reader
paleolimbot Apr 5, 2023
7abb871
tidy + build errors
paleolimbot Apr 6, 2023
da0afb3
data is a pointer
paleolimbot Apr 6, 2023
a445999
maybe fix tidy + lint
paleolimbot Apr 6, 2023
edd3f21
pass an actual read test!
paleolimbot Apr 6, 2023
2a31194
text passing
paleolimbot Apr 6, 2023
64afbe2
array support
paleolimbot Apr 6, 2023
d8d2406
recor types!
paleolimbot Apr 6, 2023
f10b06c
maybe fix runtime/int
paleolimbot Apr 6, 2023
c8c7d5f
better name for file
paleolimbot Apr 6, 2023
80f5b2c
wire up the postgres_type to the statement
paleolimbot Apr 6, 2023
52cf0fa
maybe remove all uses of the type mapping
paleolimbot Apr 6, 2023
9f9d045
use reverse mapping in tests
paleolimbot Apr 6, 2023
1e13e50
remove type mappinng
paleolimbot Apr 6, 2023
54be7f0
remove type.h/cc
paleolimbot Apr 6, 2023
1d793f7
one more long cast to cover up
paleolimbot Apr 6, 2023
4e08dc5
fix can't convert
paleolimbot Apr 8, 2023
2299964
nix copy reader for now
paleolimbot Apr 8, 2023
eac6666
build + pass tests with mege
paleolimbot Apr 9, 2023
94b8dfc
fix R package build
paleolimbot Apr 9, 2023
88338b3
devirtualize type resolver
paleolimbot Apr 9, 2023
90b2ad2
start tidy of names
paleolimbot Apr 10, 2023
b6ccaee
more typeid help
paleolimbot Apr 10, 2023
d543b62
fix typname
paleolimbot Apr 10, 2023
5f1b109
documentation
paleolimbot Apr 10, 2023
77e77b8
a few more tweaks
paleolimbot Apr 10, 2023
cb97516
factor out building the type resolver
paleolimbot Apr 10, 2023
97110dd
some tweaks to support all internal types
paleolimbot Apr 10, 2023
14e5929
remove some more debugging
paleolimbot Apr 10, 2023
cd72468
remove more debugging
paleolimbot Apr 10, 2023
3ce92a9
handle +test reverse lookup
paleolimbot Apr 10, 2023
e618e59
maybe fix on old gcc
paleolimbot Apr 14, 2023
d892fa3
remove unused
paleolimbot Apr 14, 2023
1bfab0c
maybe fix windows R build
paleolimbot Apr 14, 2023
cb2d5e9
fix sign compare
paleolimbot Apr 14, 2023
92394bf
rename types
paleolimbot Apr 19, 2023
f071c67
use enum class
paleolimbot Apr 19, 2023
bdc3746
Update c/driver/postgresql/postgres_type.h
paleolimbot Apr 19, 2023
187de41
move verbose bits to an implementation file
paleolimbot Apr 19, 2023
ade7d5a
newline at end of file
paleolimbot Apr 19, 2023
de8a81f
better include strategy
paleolimbot Apr 19, 2023
e1c081b
delete vendored files
paleolimbot Apr 19, 2023
6b08ab0
fix R packaging
paleolimbot Apr 19, 2023
bb56c3a
attempt exporting things for tests on windows
paleolimbot Apr 20, 2023
c90a619
maybe more export symbols
paleolimbot Apr 20, 2023
3ab7a84
rename some enums
paleolimbot Apr 20, 2023
ab63163
fix all enum values
paleolimbot Apr 20, 2023
8b54f3a
back to header-only for testing on Windows
paleolimbot Apr 20, 2023
5388371
more undoing of implementation file
paleolimbot Apr 20, 2023
978b5c3
one more definition to remove
paleolimbot Apr 20, 2023
68ff9d8
don't use devel pak
paleolimbot Apr 20, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion .github/workflows/native-unix.yml
Original file line number Diff line number Diff line change
Expand Up @@ -501,7 +501,6 @@ jobs:

- uses: r-lib/actions/setup-r-dependencies@v2
with:
pak-version: devel
extra-packages: any::rcmdcheck, local::../adbcdrivermanager
needs: check
working-directory: r/${{ matrix.config.pkg }}
Expand Down
2 changes: 1 addition & 1 deletion c/driver/postgresql/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,6 @@ add_arrow_lib(adbc_driver_postgresql
database.cc
postgresql.cc
statement.cc
type.cc
OUTPUTS
ADBC_LIBRARIES
CMAKE_PACKAGE_NAME
Expand Down Expand Up @@ -77,6 +76,7 @@ if(ADBC_BUILD_TESTS)
PREFIX
adbc
SOURCES
postgres_type_test.cc
postgresql_test.cc
../../validation/adbc_validation.cc
../../validation/adbc_validation_util.cc
Expand Down
33 changes: 33 additions & 0 deletions c/driver/postgresql/CMakeUserPresets.example.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
{
"version": 3,
"cmakeMinimumRequired": {
"major": 3,
"minor": 21,
"patch": 0
},
"configurePresets": [
{
"name": "user-local",
"displayName": "(user) local build",
"cacheVariables": {
"CMAKE_BUILD_TYPE": "Debug",
"ADBC_BUILD_TESTS": "ON"
},
"environment": {
"PKG_CONFIG_PATH": ""
}
}
],
"testPresets": [
{
"name": "user-test-preset",
"description": "",
"displayName": "(user) test preset",
"configurePreset": "user-local",
"environment": {
"CTEST_OUTPUT_ON_FAILURE": "1",
"ADBC_POSTGRESQL_TEST_URI": "postgresql://localhost:5432/postgres?user=postgres&password=password"
}
}
]
}
13 changes: 13 additions & 0 deletions c/driver/postgresql/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,10 +50,23 @@ $ docker run -it --rm \
postgres
```

Alternatively use the `docker compose` provided by ADBC to manage the test
database container.

```shell
$ docker compose up postgres_test
# When finished:
# docker compose down postgres_test
```

Then, to run the tests, set the environment variable specifying the
PostgreSQL URI before running tests:

```shell
$ export ADBC_POSTGRESQL_TEST_URI=postgresql://localhost:5432/postgres?user=postgres&password=password
$ ctest
```

Users of VSCode can use the CMake extension with the supplied CMakeUserPresets.json
example to supply the required CMake and environment variables required to build and
run tests.
2 changes: 1 addition & 1 deletion c/driver/postgresql/connection.cc
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ AdbcStatusCode PostgresConnection::Init(struct AdbcDatabase* database,
}
database_ =
*reinterpret_cast<std::shared_ptr<PostgresDatabase>*>(database->private_data);
type_mapping_ = database_->type_mapping();
type_resolver_ = database_->type_resolver();
return database_->Connect(&conn_, error);
}

Expand Down
8 changes: 5 additions & 3 deletions c/driver/postgresql/connection.h
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@
#include <adbc.h>
#include <libpq-fe.h>

#include "type.h"
#include "postgres_type.h"

namespace adbcpq {
class PostgresDatabase;
Expand All @@ -41,11 +41,13 @@ class PostgresConnection {
AdbcStatusCode SetOption(const char* key, const char* value, struct AdbcError* error);

PGconn* conn() const { return conn_; }
const std::shared_ptr<TypeMapping>& type_mapping() const { return type_mapping_; }
const std::shared_ptr<PostgresTypeResolver>& type_resolver() const {
return type_resolver_;
}

private:
std::shared_ptr<PostgresDatabase> database_;
std::shared_ptr<TypeMapping> type_mapping_;
std::shared_ptr<PostgresTypeResolver> type_resolver_;
PGconn* conn_;
bool autocommit_;
};
Expand Down
220 changes: 179 additions & 41 deletions c/driver/postgresql/database.cc
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,8 @@

#include <cstring>
#include <memory>
#include <utility>
#include <vector>

#include <adbc.h>
#include <libpq-fe.h>
Expand All @@ -29,52 +31,13 @@
namespace adbcpq {

PostgresDatabase::PostgresDatabase() : open_connections_(0) {
type_mapping_ = std::make_shared<TypeMapping>();
type_resolver_ = std::make_shared<PostgresTypeResolver>();
}
PostgresDatabase::~PostgresDatabase() = default;

AdbcStatusCode PostgresDatabase::Init(struct AdbcError* error) {
// Connect to validate the parameters.
PGconn* conn = nullptr;
AdbcStatusCode final_status = Connect(&conn, error);
if (final_status != ADBC_STATUS_OK) {
return final_status;
}

// Build the type mapping table.
const std::string kTypeQuery = R"(
SELECT
oid,
typname,
typreceive
FROM
pg_catalog.pg_type
)";

pg_result* result = PQexec(conn, kTypeQuery.c_str());
ExecStatusType pq_status = PQresultStatus(result);
if (pq_status == PGRES_TUPLES_OK) {
int num_rows = PQntuples(result);
for (int row = 0; row < num_rows; row++) {
const uint32_t oid = static_cast<uint32_t>(
std::strtol(PQgetvalue(result, row, 0), /*str_end=*/nullptr, /*base=*/10));
const char* typname = PQgetvalue(result, row, 1);
const char* typreceive = PQgetvalue(result, row, 2);

type_mapping_->Insert(oid, typname, typreceive);
}
} else {
SetError(error, "Failed to build type mapping table: ", PQerrorMessage(conn));
final_status = ADBC_STATUS_IO;
}
PQclear(result);

// Disconnect since PostgreSQL connections can be heavy.
{
AdbcStatusCode status = Disconnect(&conn, error);
if (status != ADBC_STATUS_OK) final_status = status;
}
return final_status;
return RebuildTypeResolver(error);
}

AdbcStatusCode PostgresDatabase::Release(struct AdbcError* error) {
Expand Down Expand Up @@ -121,4 +84,179 @@ AdbcStatusCode PostgresDatabase::Disconnect(PGconn** conn, struct AdbcError* err
}
return ADBC_STATUS_OK;
}

// Helpers for building the type resolver from queries
static inline int32_t InsertPgAttributeResult(
pg_result* result, const std::shared_ptr<PostgresTypeResolver>& resolver);

static inline int32_t InsertPgTypeResult(
pg_result* result, const std::shared_ptr<PostgresTypeResolver>& resolver);

AdbcStatusCode PostgresDatabase::RebuildTypeResolver(struct AdbcError* error) {
PGconn* conn = nullptr;
AdbcStatusCode final_status = Connect(&conn, error);
if (final_status != ADBC_STATUS_OK) {
return final_status;
}

// We need a few queries to build the resolver. The current strategy might
// fail for some recursive definitions (e.g., arrays of records of arrays).
// First, one on the pg_attribute table to resolve column names/oids for
// record types.
const std::string kColumnsQuery = R"(
SELECT
attrelid,
attname,
atttypid
FROM
pg_catalog.pg_attribute
ORDER BY
attrelid, attnum
)";

// Second, a query of the pg_type table. This query may need a few attempts to handle
// recursive definitions (e.g., record types with array column). This currently won't
// handle range types because those rows don't have child OID information. Arrays types
// are inserted after a successful insert of the element type.
const std::string kTypeQuery = R"(
SELECT
oid,
typname,
typreceive,
typbasetype,
typarray,
typrelid
FROM
pg_catalog.pg_type
WHERE
(typreceive != 0 OR typname = 'aclitem') AND typtype != 'r' AND typreceive::TEXT != 'array_recv'
ORDER BY
oid
)";

// Create a new type resolver (this instance's type_resolver_ member
// will be updated at the end if this succeeds).
auto resolver = std::make_shared<PostgresTypeResolver>();

// Insert record type definitions (this includes table schemas)
pg_result* result = PQexec(conn, kColumnsQuery.c_str());
ExecStatusType pq_status = PQresultStatus(result);
if (pq_status == PGRES_TUPLES_OK) {
InsertPgAttributeResult(result, resolver);
} else {
SetError(error, "Failed to build type mapping table: ", PQerrorMessage(conn));
final_status = ADBC_STATUS_IO;
}

PQclear(result);

// Attempt filling the resolver a few times to handle recursive definitions.
int32_t max_attempts = 3;
for (int32_t i = 0; i < max_attempts; i++) {
result = PQexec(conn, kTypeQuery.c_str());
ExecStatusType pq_status = PQresultStatus(result);
if (pq_status == PGRES_TUPLES_OK) {
InsertPgTypeResult(result, resolver);
} else {
SetError(error, "Failed to build type mapping table: ", PQerrorMessage(conn));
final_status = ADBC_STATUS_IO;
}

PQclear(result);
if (final_status != ADBC_STATUS_OK) {
break;
}
}

// Disconnect since PostgreSQL connections can be heavy.
{
AdbcStatusCode status = Disconnect(&conn, error);
if (status != ADBC_STATUS_OK) final_status = status;
}

if (final_status == ADBC_STATUS_OK) {
type_resolver_ = std::move(resolver);
}

return final_status;
}

static inline int32_t InsertPgAttributeResult(
pg_result* result, const std::shared_ptr<PostgresTypeResolver>& resolver) {
int num_rows = PQntuples(result);
std::vector<std::pair<std::string, uint32_t>> columns;
uint32_t current_type_oid = 0;
int32_t n_added = 0;

for (int row = 0; row < num_rows; row++) {
const uint32_t type_oid = static_cast<uint32_t>(
std::strtol(PQgetvalue(result, row, 0), /*str_end=*/nullptr, /*base=*/10));
const char* col_name = PQgetvalue(result, row, 1);
const uint32_t col_oid = static_cast<uint32_t>(
std::strtol(PQgetvalue(result, row, 2), /*str_end=*/nullptr, /*base=*/10));

if (type_oid != current_type_oid && !columns.empty()) {
resolver->InsertClass(current_type_oid, columns);
columns.clear();
current_type_oid = type_oid;
n_added++;
}

columns.push_back({col_name, col_oid});
}

if (!columns.empty()) {
resolver->InsertClass(current_type_oid, columns);
n_added++;
}

return n_added;
}

static inline int32_t InsertPgTypeResult(
pg_result* result, const std::shared_ptr<PostgresTypeResolver>& resolver) {
int num_rows = PQntuples(result);
PostgresTypeResolver::Item item;
int32_t n_added = 0;

for (int row = 0; row < num_rows; row++) {
const uint32_t oid = static_cast<uint32_t>(
std::strtol(PQgetvalue(result, row, 0), /*str_end=*/nullptr, /*base=*/10));
const char* typname = PQgetvalue(result, row, 1);
const char* typreceive = PQgetvalue(result, row, 2);
const uint32_t typbasetype = static_cast<uint32_t>(
std::strtol(PQgetvalue(result, row, 3), /*str_end=*/nullptr, /*base=*/10));
const uint32_t typarray = static_cast<uint32_t>(
std::strtol(PQgetvalue(result, row, 4), /*str_end=*/nullptr, /*base=*/10));
const uint32_t typrelid = static_cast<uint32_t>(
std::strtol(PQgetvalue(result, row, 5), /*str_end=*/nullptr, /*base=*/10));

// Special case the aclitem because it shows up in a bunch of internal tables
if (strcmp(typname, "aclitem") == 0) {
typreceive = "aclitem_recv";
}

item.oid = oid;
item.typname = typname;
item.typreceive = typreceive;
item.class_oid = typrelid;
item.base_oid = typbasetype;

int result = resolver->Insert(item, nullptr);

// If there's an array type and the insert succeeded, add that now too
if (result == NANOARROW_OK && typarray != 0) {
std::string array_typname = StringBuilder("_", typname);
item.oid = typarray;
item.typname = array_typname.c_str();
item.typreceive = "array_recv";
item.child_oid = oid;

resolver->Insert(item, nullptr);
}
}

return n_added;
}

} // namespace adbcpq
9 changes: 6 additions & 3 deletions c/driver/postgresql/database.h
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@
#include <adbc.h>
#include <libpq-fe.h>

#include "type.h"
#include "postgres_type.h"

namespace adbcpq {
class PostgresDatabase {
Expand All @@ -42,12 +42,15 @@ class PostgresDatabase {

AdbcStatusCode Connect(PGconn** conn, struct AdbcError* error);
AdbcStatusCode Disconnect(PGconn** conn, struct AdbcError* error);
const std::shared_ptr<PostgresTypeResolver>& type_resolver() const {
return type_resolver_;
}

const std::shared_ptr<TypeMapping>& type_mapping() const { return type_mapping_; }
AdbcStatusCode RebuildTypeResolver(struct AdbcError* error);

private:
int32_t open_connections_;
std::string uri_;
std::shared_ptr<TypeMapping> type_mapping_;
std::shared_ptr<PostgresTypeResolver> type_resolver_;
};
} // namespace adbcpq
Loading