Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use rust bindgen for reading/writing the PostgreSQL control file. #6

Closed
wants to merge 3 commits into from

Conversation

hlinnaka
Copy link
Contributor

No description provided.

@ericseppanen
Copy link
Contributor

My findings so far on the use of transmute:

Since ControlFileData contains bools, transmute from a byte array is unsound (if the byte contains anything but 0 or 1, undefined behavior will result). If we expand our use of bindgen further, I expect we'll discover more ways to trigger dangerous things. So I'd prefer to avoid transmute on anything auto-generated: it's not obvious what types are in there, now or in the future.

I hoped there would be a way to attach #[derive(Serialize, Deserialize)] to the generated structs, but this doesn't seem possible right now. There are a few crates out there that also have this problem, and they seem to be using regex workarounds (example).

I also looked around at some of the "safe transmute" crates and RFC 2981. While there is a lot of potential there, I didn't find anything that's able to solve this problem now.

Some possible ways forward:
a) Continue with transmute until we can figure out how to do something better. Perhaps add a big warning comment, to remind us to fix this before it gets out of hand.
b) Go back to hand-written structs. We can replace the encode/decode functions with #[derive(Serialize, Deserialize)] if we like.
c) Build a procedural macro that can apply #[derive(Serialize, Deserialize)].
d) Run the bindgen output through a text regex to insert Serialize, Deserialize in the right place.

(d) seems crude, but it's so much less work than (c) that I think it's probably the right way forward, unless we expect to only have a few C structs imported with bindgen, then handwritten is probably fine.

@hlinnaka
Copy link
Contributor Author

My findings so far on the use of transmute:

Since ControlFileData contains bools, transmute from a byte array is unsound (if the byte contains anything but 0 or 1, undefined behavior will result). If we expand our use of bindgen further, I expect we'll discover more ways to trigger dangerous things. So I'd prefer to avoid transmute on anything auto-generated: it's not obvious what types are in there, now or in the future.

Perhaps we could map Postgres bool to Rust's u8 instead of bool. Not sure how to tell bindgen to do that.

I hoped there would be a way to attach #[derive(Serialize, Deserialize)] to the generated structs, but this doesn't seem possible right now. There are a few crates out there that also have this problem, and they seem to be using regex workarounds (example).

Yeah, that would've been nice.

I also looked around at some of the "safe transmute" crates and RFC 2981. While there is a lot of potential there, I didn't find anything that's able to solve this problem now.

Some possible ways forward:
a) Continue with transmute until we can figure out how to do something better. Perhaps add a big warning comment, to remind us to fix this before it gets out of hand.
b) Go back to hand-written structs. We can replace the encode/decode functions with #[derive(Serialize, Deserialize)] if we like.
c) Build a procedural macro that can apply #[derive(Serialize, Deserialize)].
d) Run the bindgen output through a text regex to insert Serialize, Deserialize in the right place.

(d) seems crude, but it's so much less work than (c) that I think it's probably the right way forward, unless we expect to only have a few C structs imported with bindgen, then handwritten is probably fine.

I'm inclined to do a) for now, and revisit this and perhaps switch to d) if we need many more of these structs. If we're lucky, this gets addressed in bindgen in the meanwhile.

We have a few hand-written structs in the WAL decoding routines that mirror Postgres structs, but those are fairly short and we only need to read them, not write. But if we had a more general solution, would be nice to use it for those too.

@ericseppanen
Copy link
Contributor

ericseppanen commented Apr 19, 2021

Perhaps we could map Postgres bool to Rust's u8 instead of bool. Not sure how to tell bindgen to do that.

I looked for a way to tell bindgen to remap bool to u8 and didn't find one. I think we'd have to write something to post-process the entire bindgen output file. Or trick it by patching the input file with #define bool uint8_t or something like that.

@ericseppanen
Copy link
Contributor

Do we want to be able to convert &[u8] to &ControlFileData (and vice versa)?

Right now it looks like there's a copy happening, unless the compiler can optimize it away.

@ericseppanen
Copy link
Contributor

There's an (undocumented?) standard practice in Rust, which is that whenever you use unsafe, you add a SAFETY comment explaining why this code can't cause bad things to happen. An example from the standard library:

        // SAFETY: the align is guaranteed by Rust to be a power of two and
        // the size+align combo is guaranteed to fit in our address space. As a
        // result use the unchecked constructor here to avoid inserting code
        // that panics if it isn't optimized well enough.
        unsafe { Layout::from_size_align_unchecked(size, align) }

@ericseppanen ericseppanen mentioned this pull request Apr 24, 2021
@ericseppanen
Copy link
Contributor

Should this be closed? The code was merged a while ago under another PR.

@hlinnaka
Copy link
Contributor Author

Should this be closed? The code was merged a while ago under another PR.

Yeah, I'll close this. We had some good discussion and suggestion here, but nothing concrete pending.

@hlinnaka hlinnaka closed this May 11, 2021
@hlinnaka hlinnaka deleted the controlfile-bindgen branch May 11, 2021 08:38
knizhnik added a commit that referenced this pull request Apr 4, 2024

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
## Problem

Running test_pageserver_restarts_under_workload in POR #7275 I get the
following assertion failure in prefetch:
```
#5  0x00005587220d4bf0 in ExceptionalCondition (
    conditionName=0x7fbf24d003c8 "(ring_index) < MyPState->ring_unused && (ring_index) >= MyPState->ring_last", 
    fileName=0x7fbf24d00240 "/home/knizhnik/neon.main//pgxn/neon/pagestore_smgr.c", lineNumber=644)
    at /home/knizhnik/neon.main//vendor/postgres-v16/src/backend/utils/error/assert.c:66
#6  0x00007fbf24cebc9b in prefetch_set_unused (ring_index=1509) at /home/knizhnik/neon.main//pgxn/neon/pagestore_smgr.c:644
#7  0x00007fbf24cec613 in prefetch_register_buffer (tag=..., force_latest=0x0, force_lsn=0x0)
    at /home/knizhnik/neon.main//pgxn/neon/pagestore_smgr.c:891
#8  0x00007fbf24cef21e in neon_prefetch (reln=0x5587233b7388, forknum=MAIN_FORKNUM, blocknum=14110)
    at /home/knizhnik/neon.main//pgxn/neon/pagestore_smgr.c:2055

(gdb) p ring_index
$1 = 1509
(gdb) p MyPState->ring_unused
$2 = 1636
(gdb) p MyPState->ring_last
$3 = 1636
```

## Summary of changes

Check status of `prefetch_wait_for`

## Checklist before requesting a review

- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
problame added a commit that referenced this pull request Jan 14, 2025
This approach here doesn't work because it slows down all the responses.

the workload() thread gets stuck in auth, prob with zero pipeline depth

  0x00007fa28fe48e63 in epoll_wait () from /lib/x86_64-linux-gnu/libc.so.6
  (gdb) bt
  #0  0x00007fa28fe48e63 in epoll_wait () from /lib/x86_64-linux-gnu/libc.so.6
  #1  0x0000561a285bf44e in WaitEventSetWaitBlock (set=0x561a292723e8, cur_timeout=9999, occurred_events=0x7ffd1f11c970, nevents=1)
      at /home/christian/src/neon//vendor/postgres-v16/src/backend/storage/ipc/latch.c:1535
  #2  0x0000561a285bf338 in WaitEventSetWait (set=0x561a292723e8, timeout=9999, occurred_events=0x7ffd1f11c970, nevents=1, wait_event_info=117440512)
      at /home/christian/src/neon//vendor/postgres-v16/src/backend/storage/ipc/latch.c:1481
  #3  0x00007fa2904a7345 in call_PQgetCopyData (shard_no=0, buffer=0x7ffd1f11cad0) at /home/christian/src/neon//pgxn/neon/libpagestore.c:703
  #4  0x00007fa2904a7aec in pageserver_receive (shard_no=0) at /home/christian/src/neon//pgxn/neon/libpagestore.c:899
  #5  0x00007fa2904af471 in prefetch_read (slot=0x561a292863b0) at /home/christian/src/neon//pgxn/neon/pagestore_smgr.c:644
  #6  0x00007fa2904af26b in prefetch_wait_for (ring_index=0) at /home/christian/src/neon//pgxn/neon/pagestore_smgr.c:596
  #7  0x00007fa2904b489d in neon_read_at_lsnv (rinfo=..., forkNum=MAIN_FORKNUM, base_blockno=0, request_lsns=0x7ffd1f11cd60, buffers=0x7ffd1f11cd30, nblocks=1, mask=0x0)
      at /home/christian/src/neon//pgxn/neon/pagestore_smgr.c:3024
  #8  0x00007fa2904b4f34 in neon_read_at_lsn (rinfo=..., forkNum=MAIN_FORKNUM, blkno=0, request_lsns=..., buffer=0x7fa28b969000) at /home/christian/src/neon//pgxn/neon/pagestore_smgr.c:3104
  #9  0x00007fa2904b515d in neon_read (reln=0x561a292ef448, forkNum=MAIN_FORKNUM, blkno=0, buffer=0x7fa28b969000) at /home/christian/src/neon//pgxn/neon/pagestore_smgr.c:3146
  #10 0x0000561a285f1ed5 in smgrread (reln=0x561a292ef448, forknum=MAIN_FORKNUM, blocknum=0, buffer=0x7fa28b969000) at /home/christian/src/neon//vendor/postgres-v16/src/backend/storage/smgr/smgr.c:567
  #11 0x0000561a285a528a in ReadBuffer_common (smgr=0x561a292ef448, relpersistence=112 'p', forkNum=MAIN_FORKNUM, blockNum=0, mode=RBM_NORMAL, strategy=0x0, hit=0x7ffd1f11cf1b)
      at /home/christian/src/neon//vendor/postgres-v16/src/backend/storage/buffer/bufmgr.c:1134
  #12 0x0000561a285a47e3 in ReadBufferExtended (reln=0x7fa28ce1c888, forkNum=MAIN_FORKNUM, blockNum=0, mode=RBM_NORMAL, strategy=0x0)
      at /home/christian/src/neon//vendor/postgres-v16/src/backend/storage/buffer/bufmgr.c:781
  #13 0x0000561a285a46ad in ReadBuffer (reln=0x7fa28ce1c888, blockNum=0) at /home/christian/src/neon//vendor/postgres-v16/src/backend/storage/buffer/bufmgr.c:715
  #14 0x0000561a2811d511 in _bt_getbuf (rel=0x7fa28ce1c888, blkno=0, access=1) at /home/christian/src/neon//vendor/postgres-v16/src/backend/access/nbtree/nbtpage.c:852
  #15 0x0000561a2811d1b2 in _bt_metaversion (rel=0x7fa28ce1c888, heapkeyspace=0x7ffd1f11d9f0, allequalimage=0x7ffd1f11d9f1) at /home/christian/src/neon//vendor/postgres-v16/src/backend/access/nbtree/nbtpage.c:747
  #16 0x0000561a28126220 in _bt_first (scan=0x561a292d0348, dir=ForwardScanDirection) at /home/christian/src/neon//vendor/postgres-v16/src/backend/access/nbtree/nbtsearch.c:1465
  #17 0x0000561a28121a07 in btgettuple (scan=0x561a292d0348, dir=ForwardScanDirection) at /home/christian/src/neon//vendor/postgres-v16/src/backend/access/nbtree/nbtree.c:246
  #18 0x0000561a28111afa in index_getnext_tid (scan=0x561a292d0348, direction=ForwardScanDirection) at /home/christian/src/neon//vendor/postgres-v16/src/backend/access/index/indexam.c:583
  #19 0x0000561a28111d14 in index_getnext_slot (scan=0x561a292d0348, direction=ForwardScanDirection, slot=0x561a292d01a8) at /home/christian/src/neon//vendor/postgres-v16/src/backend/access/index/indexam.c:675
  #20 0x0000561a2810fbcc in systable_getnext (sysscan=0x561a292d0158) at /home/christian/src/neon//vendor/postgres-v16/src/backend/access/index/genam.c:512
  #21 0x0000561a287a1ee1 in SearchCatCacheMiss (cache=0x561a292a0f80, nkeys=1, hashValue=3028463561, hashIndex=1, v1=94670359561576, v2=0, v3=0, v4=0)
      at /home/christian/src/neon//vendor/postgres-v16/src/backend/utils/cache/catcache.c:1440
  #22 0x0000561a287a1d8a in SearchCatCacheInternal (cache=0x561a292a0f80, nkeys=1, v1=94670359561576, v2=0, v3=0, v4=0) at /home/christian/src/neon//vendor/postgres-v16/src/backend/utils/cache/catcache.c:1360
  #23 0x0000561a287a1a4f in SearchCatCache (cache=0x561a292a0f80, v1=94670359561576, v2=0, v3=0, v4=0) at /home/christian/src/neon//vendor/postgres-v16/src/backend/utils/cache/catcache.c:1214
  #24 0x0000561a287be060 in SearchSysCache (cacheId=10, key1=94670359561576, key2=0, key3=0, key4=0) at /home/christian/src/neon//vendor/postgres-v16/src/backend/utils/cache/syscache.c:817
  #25 0x0000561a287be66f in GetSysCacheOid (cacheId=10, oidcol=1, key1=94670359561576, key2=0, key3=0, key4=0) at /home/christian/src/neon//vendor/postgres-v16/src/backend/utils/cache/syscache.c:1055
  #26 0x0000561a286319a5 in get_role_oid (rolname=0x561a29270568 "cloud_admin", missing_ok=true) at /home/christian/src/neon//vendor/postgres-v16/src/backend/utils/adt/acl.c:5251
  #27 0x0000561a283d42ca in check_hba (port=0x561a29268de0) at /home/christian/src/neon//vendor/postgres-v16/src/backend/libpq/hba.c:2493
  #28 0x0000561a283d5537 in hba_getauthmethod (port=0x561a29268de0) at /home/christian/src/neon//vendor/postgres-v16/src/backend/libpq/hba.c:3067
  #29 0x0000561a283c6fd7 in ClientAuthentication (port=0x561a29268de0) at /home/christian/src/neon//vendor/postgres-v16/src/backend/libpq/auth.c:395
  #30 0x0000561a287dc943 in PerformAuthentication (port=0x561a29268de0) at /home/christian/src/neon//vendor/postgres-v16/src/backend/utils/init/postinit.c:247
  #31 0x0000561a287dd9cd in InitPostgres (in_dbname=0x561a29270588 "postgres", dboid=0, username=0x561a29270568 "cloud_admin", useroid=0, load_session_libraries=true, override_allow_connections=false,
      out_dbname=0x0) at /home/christian/src/neon//vendor/postgres-v16/src/backend/utils/init/postinit.c:929
  #32 0x0000561a285fa10b in PostgresMain (dbname=0x561a29270588 "postgres", username=0x561a29270568 "cloud_admin") at /home/christian/src/neon//vendor/postgres-v16/src/backend/tcop/postgres.c:4293
  #33 0x0000561a28524ce4 in BackendRun (port=0x561a29268de0) at /home/christian/src/neon//vendor/postgres-v16/src/backend/postmaster/postmaster.c:4465
  #34 0x0000561a285245da in BackendStartup (port=0x561a29268de0) at /home/christian/src/neon//vendor/postgres-v16/src/backend/postmaster/postmaster.c:4193
  #35 0x0000561a285209c4 in ServerLoop () at /home/christian/src/neon//vendor/postgres-v16/src/backend/postmaster/postmaster.c:1782
  #36 0x0000561a2852030f in PostmasterMain (argc=3, argv=0x561a291c5fc0) at /home/christian/src/neon//vendor/postgres-v16/src/backend/postmaster/postmaster.c:1466
  #37 0x0000561a283dd987 in main (argc=3, argv=0x561a291c5fc0) at /home/christian/src/neon//vendor/postgres-v16/src/backend/main/main.c:238
github-merge-queue bot pushed a commit that referenced this pull request Feb 6, 2025

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
## Problem

Found another pgcopydb segfault in error handling

```bash
2025-02-06 15:30:40.112 51299 ERROR  pgsql.c:2330              [TARGET -738302813] FATAL:  terminating connection due to administrator command
2025-02-06 15:30:40.112 51298 ERROR  pgsql.c:2330              [TARGET -1407749748] FATAL:  terminating connection due to administrator command
2025-02-06 15:30:40.112 51297 ERROR  pgsql.c:2330              [TARGET -2073308066] FATAL:  terminating connection due to administrator command
2025-02-06 15:30:40.112 51300 ERROR  pgsql.c:2330              [TARGET 1220908650] FATAL:  terminating connection due to administrator command
2025-02-06 15:30:40.432 51300 ERROR  pgsql.c:2536              [Postgres] FATAL:  terminating connection due to administrator command
2025-02-06 15:30:40.513 51290 ERROR  copydb.c:773              Sub-process 51300 exited with code 0 and signal Segmentation fault
2025-02-06 15:30:40.578 51299 ERROR  pgsql.c:2536              [Postgres] FATAL:  terminating connection due to administrator command
2025-02-06 15:30:40.613 51290 ERROR  copydb.c:773              Sub-process 51299 exited with code 0 and signal Segmentation fault
2025-02-06 15:30:41.253 51298 ERROR  pgsql.c:2536              [Postgres] FATAL:  terminating connection due to administrator command
2025-02-06 15:30:41.314 51290 ERROR  copydb.c:773              Sub-process 51298 exited with code 0 and signal Segmentation fault
2025-02-06 15:30:43.133 51297 ERROR  pgsql.c:2536              [Postgres] FATAL:  terminating connection due to administrator command
2025-02-06 15:30:43.215 51290 ERROR  copydb.c:773              Sub-process 51297 exited with code 0 and signal Segmentation fault
2025-02-06 15:30:43.215 51290 ERROR  indexes.c:123             Some INDEX worker process(es) have exited with error, see above for details
2025-02-06 15:30:43.215 51290 ERROR  indexes.c:59              Failed to create indexes, see above for details
2025-02-06 15:30:43.232 51271 ERROR  copydb.c:768              Sub-process 51290 exited with code 12
```

```bashadmin@ip-172-31-38-164:~/pgcopydb$ gdb /usr/local/pgsql/bin/pgcopydb core
GNU gdb (Debian 13.1-3) 13.1
Copyright (C) 2023 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "aarch64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/local/pgsql/bin/pgcopydb...
[New LWP 51297]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1".
Core was generated by `pgcopydb: create index ocr.ocr_pipeline_step_results_version_pkey             '.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x0000aaaac3a4b030 in splitLines (lbuf=lbuf@entry=0xffffd8b86930, buffer=<optimized out>) at string_utils.c:630
630				*newLinePtr = '\0';
(gdb) bt
#0  0x0000aaaac3a4b030 in splitLines (lbuf=lbuf@entry=0xffffd8b86930, buffer=<optimized out>) at string_utils.c:630
#1  0x0000aaaac3a3a678 in pgsql_execute_log_error (pgsql=pgsql@entry=0xffffd8b87040, result=result@entry=0x0, 
    sql=sql@entry=0xffff81fe9be0 "CREATE UNIQUE INDEX IF NOT EXISTS ocr_pipeline_step_results_version_pkey ON ocr.ocr_pipeline_step_results_version USING btree (id, transaction_id);", 
    debugParameters=debugParameters@entry=0xaaaaec5f92f0, context=context@entry=0x0) at pgsql.c:2322
#2  0x0000aaaac3a3bbec in pgsql_execute_with_params (pgsql=pgsql@entry=0xffffd8b87040, 
    sql=0xffff81fe9be0 "CREATE UNIQUE INDEX IF NOT EXISTS ocr_pipeline_step_results_version_pkey ON ocr.ocr_pipeline_step_results_version USING btree (id, transaction_id);", paramCount=paramCount@entry=0, 
    paramTypes=paramTypes@entry=0x0, paramValues=paramValues@entry=0x0, context=context@entry=0x0, parseFun=parseFun@entry=0x0) at pgsql.c:1649
#3  0x0000aaaac3a3c468 in pgsql_execute (pgsql=pgsql@entry=0xffffd8b87040, sql=<optimized out>) at pgsql.c:1522
#4  0x0000aaaac3a245f4 in copydb_create_index (specs=specs@entry=0xffffd8b8ec98, dst=dst@entry=0xffffd8b87040, index=index@entry=0xffff81f71800, ifNotExists=<optimized out>) at indexes.c:846
#5  0x0000aaaac3a24ca8 in copydb_create_index_by_oid (specs=specs@entry=0xffffd8b8ec98, dst=dst@entry=0xffffd8b87040, indexOid=<optimized out>) at indexes.c:410
#6  0x0000aaaac3a25040 in copydb_index_worker (specs=specs@entry=0xffffd8b8ec98) at indexes.c:297
#7  0x0000aaaac3a25238 in copydb_start_index_workers (specs=specs@entry=0xffffd8b8ec98) at indexes.c:209
#8  0x0000aaaac3a252f4 in copydb_index_supervisor (specs=specs@entry=0xffffd8b8ec98) at indexes.c:112
#9  0x0000aaaac3a253f4 in copydb_start_index_supervisor (specs=0xffffd8b8ec98) at indexes.c:57
#10 copydb_start_index_supervisor (specs=specs@entry=0xffffd8b8ec98) at indexes.c:34
#11 0x0000aaaac3a51ff4 in copydb_process_table_data (specs=specs@entry=0xffffd8b8ec98) at table-data.c:146
#12 0x0000aaaac3a520dc in copydb_copy_all_table_data (specs=specs@entry=0xffffd8b8ec98) at table-data.c:69
#13 0x0000aaaac3a0ccd8 in cloneDB (copySpecs=copySpecs@entry=0xffffd8b8ec98) at cli_clone_follow.c:602
#14 0x0000aaaac3a0d2cc in start_clone_process (pid=0xffffd8b743d8, copySpecs=0xffffd8b8ec98) at cli_clone_follow.c:502
#15 start_clone_process (copySpecs=copySpecs@entry=0xffffd8b8ec98, pid=pid@entry=0xffffd8b89788) at cli_clone_follow.c:482
#16 0x0000aaaac3a0d52c in cli_clone (argc=<optimized out>, argv=<optimized out>) at cli_clone_follow.c:164
#17 0x0000aaaac3a53850 in commandline_run (command=command@entry=0xffffd8b9eb88, argc=0, argc@entry=22, argv=0xffffd8b9edf8, argv@entry=0xffffd8b9ed48) at /home/admin/pgcopydb/src/bin/pgcopydb/../lib/subcommands.c/commandline.c:71
#18 0x0000aaaac3a01464 in main (argc=22, argv=0xffffd8b9ed48) at main.c:140
(gdb) 

```

The problem is most likely that the following call returned a message in
a read-only memory segment where we cannot replace \n with \0 in
string_utils.c splitLines() function
```C
char *message = PQerrorMessage(pgsql->connection);
```

## Summary of changes

modified the patch to also address this problem
knizhnik pushed a commit that referenced this pull request Feb 10, 2025
## Problem

Found another pgcopydb segfault in error handling

```bash
2025-02-06 15:30:40.112 51299 ERROR  pgsql.c:2330              [TARGET -738302813] FATAL:  terminating connection due to administrator command
2025-02-06 15:30:40.112 51298 ERROR  pgsql.c:2330              [TARGET -1407749748] FATAL:  terminating connection due to administrator command
2025-02-06 15:30:40.112 51297 ERROR  pgsql.c:2330              [TARGET -2073308066] FATAL:  terminating connection due to administrator command
2025-02-06 15:30:40.112 51300 ERROR  pgsql.c:2330              [TARGET 1220908650] FATAL:  terminating connection due to administrator command
2025-02-06 15:30:40.432 51300 ERROR  pgsql.c:2536              [Postgres] FATAL:  terminating connection due to administrator command
2025-02-06 15:30:40.513 51290 ERROR  copydb.c:773              Sub-process 51300 exited with code 0 and signal Segmentation fault
2025-02-06 15:30:40.578 51299 ERROR  pgsql.c:2536              [Postgres] FATAL:  terminating connection due to administrator command
2025-02-06 15:30:40.613 51290 ERROR  copydb.c:773              Sub-process 51299 exited with code 0 and signal Segmentation fault
2025-02-06 15:30:41.253 51298 ERROR  pgsql.c:2536              [Postgres] FATAL:  terminating connection due to administrator command
2025-02-06 15:30:41.314 51290 ERROR  copydb.c:773              Sub-process 51298 exited with code 0 and signal Segmentation fault
2025-02-06 15:30:43.133 51297 ERROR  pgsql.c:2536              [Postgres] FATAL:  terminating connection due to administrator command
2025-02-06 15:30:43.215 51290 ERROR  copydb.c:773              Sub-process 51297 exited with code 0 and signal Segmentation fault
2025-02-06 15:30:43.215 51290 ERROR  indexes.c:123             Some INDEX worker process(es) have exited with error, see above for details
2025-02-06 15:30:43.215 51290 ERROR  indexes.c:59              Failed to create indexes, see above for details
2025-02-06 15:30:43.232 51271 ERROR  copydb.c:768              Sub-process 51290 exited with code 12
```

```bashadmin@ip-172-31-38-164:~/pgcopydb$ gdb /usr/local/pgsql/bin/pgcopydb core
GNU gdb (Debian 13.1-3) 13.1
Copyright (C) 2023 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "aarch64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/local/pgsql/bin/pgcopydb...
[New LWP 51297]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1".
Core was generated by `pgcopydb: create index ocr.ocr_pipeline_step_results_version_pkey             '.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x0000aaaac3a4b030 in splitLines (lbuf=lbuf@entry=0xffffd8b86930, buffer=<optimized out>) at string_utils.c:630
630				*newLinePtr = '\0';
(gdb) bt
#0  0x0000aaaac3a4b030 in splitLines (lbuf=lbuf@entry=0xffffd8b86930, buffer=<optimized out>) at string_utils.c:630
#1  0x0000aaaac3a3a678 in pgsql_execute_log_error (pgsql=pgsql@entry=0xffffd8b87040, result=result@entry=0x0, 
    sql=sql@entry=0xffff81fe9be0 "CREATE UNIQUE INDEX IF NOT EXISTS ocr_pipeline_step_results_version_pkey ON ocr.ocr_pipeline_step_results_version USING btree (id, transaction_id);", 
    debugParameters=debugParameters@entry=0xaaaaec5f92f0, context=context@entry=0x0) at pgsql.c:2322
#2  0x0000aaaac3a3bbec in pgsql_execute_with_params (pgsql=pgsql@entry=0xffffd8b87040, 
    sql=0xffff81fe9be0 "CREATE UNIQUE INDEX IF NOT EXISTS ocr_pipeline_step_results_version_pkey ON ocr.ocr_pipeline_step_results_version USING btree (id, transaction_id);", paramCount=paramCount@entry=0, 
    paramTypes=paramTypes@entry=0x0, paramValues=paramValues@entry=0x0, context=context@entry=0x0, parseFun=parseFun@entry=0x0) at pgsql.c:1649
#3  0x0000aaaac3a3c468 in pgsql_execute (pgsql=pgsql@entry=0xffffd8b87040, sql=<optimized out>) at pgsql.c:1522
#4  0x0000aaaac3a245f4 in copydb_create_index (specs=specs@entry=0xffffd8b8ec98, dst=dst@entry=0xffffd8b87040, index=index@entry=0xffff81f71800, ifNotExists=<optimized out>) at indexes.c:846
#5  0x0000aaaac3a24ca8 in copydb_create_index_by_oid (specs=specs@entry=0xffffd8b8ec98, dst=dst@entry=0xffffd8b87040, indexOid=<optimized out>) at indexes.c:410
#6  0x0000aaaac3a25040 in copydb_index_worker (specs=specs@entry=0xffffd8b8ec98) at indexes.c:297
#7  0x0000aaaac3a25238 in copydb_start_index_workers (specs=specs@entry=0xffffd8b8ec98) at indexes.c:209
#8  0x0000aaaac3a252f4 in copydb_index_supervisor (specs=specs@entry=0xffffd8b8ec98) at indexes.c:112
#9  0x0000aaaac3a253f4 in copydb_start_index_supervisor (specs=0xffffd8b8ec98) at indexes.c:57
#10 copydb_start_index_supervisor (specs=specs@entry=0xffffd8b8ec98) at indexes.c:34
#11 0x0000aaaac3a51ff4 in copydb_process_table_data (specs=specs@entry=0xffffd8b8ec98) at table-data.c:146
#12 0x0000aaaac3a520dc in copydb_copy_all_table_data (specs=specs@entry=0xffffd8b8ec98) at table-data.c:69
#13 0x0000aaaac3a0ccd8 in cloneDB (copySpecs=copySpecs@entry=0xffffd8b8ec98) at cli_clone_follow.c:602
#14 0x0000aaaac3a0d2cc in start_clone_process (pid=0xffffd8b743d8, copySpecs=0xffffd8b8ec98) at cli_clone_follow.c:502
#15 start_clone_process (copySpecs=copySpecs@entry=0xffffd8b8ec98, pid=pid@entry=0xffffd8b89788) at cli_clone_follow.c:482
#16 0x0000aaaac3a0d52c in cli_clone (argc=<optimized out>, argv=<optimized out>) at cli_clone_follow.c:164
#17 0x0000aaaac3a53850 in commandline_run (command=command@entry=0xffffd8b9eb88, argc=0, argc@entry=22, argv=0xffffd8b9edf8, argv@entry=0xffffd8b9ed48) at /home/admin/pgcopydb/src/bin/pgcopydb/../lib/subcommands.c/commandline.c:71
#18 0x0000aaaac3a01464 in main (argc=22, argv=0xffffd8b9ed48) at main.c:140
(gdb) 

```

The problem is most likely that the following call returned a message in
a read-only memory segment where we cannot replace \n with \0 in
string_utils.c splitLines() function
```C
char *message = PQerrorMessage(pgsql->connection);
```

## Summary of changes

modified the patch to also address this problem
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants