osm-logical: store needs_commit in own context #12

mmd-osm · 2020-07-07T20:32:23Z

This commit moves the needs_commit flag to an own context, based on the following example provided in the Postgresql sources: https://github.com/postgres/postgres/blob/master/contrib/test_decoding/test_decoding.c

I changed some of the function names to make it clearer that they belong to this plugin. This might come in handy, if this code is somehow involved in a memory dump (which we hope will not happen in the future).
Added new function calls to MemoryContextDelete(data->context); / MemoryContextSwitchTo(old); / MemoryContextReset(data->context); ... I'm not 100% happy with the level of code duplication introduced.

Before moving this to production next year, we should try to find a plugin expert on the Postgres dev mailing list to review this code.

postgresql-plugin/osm-logical.c

+                                                       ALLOCSET_SMALL_INITSIZE,
+                                                       ALLOCSET_SMALL_MAXSIZE);
+#endif
+


joto · 2020-07-08T16:39:26Z

First: You should not do unrelated changes (here the function renaming) in the same PR, it just makes everything hard to review. At least put it into a separate commit. Okay for now, I have read through it, but something to remember for next time.

Second: That seems all hugely complicated. Can't we just malloc/free a bool and put the pointer into output_plugin_private? Its not that we are storing lots of data. I grepped through the postgres source code and output_plugin_private is never touched outside any plugins, so chances are this will just work. If you can find a PostgreSQL expert great, but I don't think we want to wait for that for too long.

joto · 2020-07-08T16:51:35Z

Come to think of it, we don't even need malloc. We can store the single bit we need in the pointer itself.

mmd-osm · 2020-07-08T17:21:03Z

First: unrelated changes: ok, noted.

Can't we just malloc/free a bool and put the pointer into output_plugin_private?

If you check out the other logical replication implementation on https://wiki.postgresql.org/wiki/Logical_Decoding_Plugins, you'll notice that at least two other implementations use exactly the same approach to store a single boolean value in the context:

To be honest, as this code is running directly in the Postgresql process, I'd be extremely reluctant to use anything which isn't exactly following those published examples. If this doesn't work for any reason, we can go back to the Postgresql folks and ask them to fix their samples or help us with debugging, etc.. If we're using some clever shortcuts, chances are that no one will be able to help us out, if something goes wrong, or changes between versions.

joto · 2020-07-08T18:36:18Z

Frankly, it all seems like cargo-cult to me, but if your change works, we can just merge it and be done with it. Do you think this is okay now or do you want to ask some PostgreSQL people?

mmd-osm · 2020-07-08T20:21:08Z

As we don't have to move this to production in the next few weeks, I wouldn't mind leaving this PR open for a bit. In the meantime I can try finding someone from the PostgreSQL community who could review the code.

I know this code looks terribly complicated. At the time this feature was introduced in PostgreSQL back in 2014, about 10-15 developers implemented/reviewed the logical decoding feature (which happens to include test-decoding.c we're using as a baseline). I can only assume that the way it was implemented was for a good reason, and everyone involved didn't object to it.

mmd-osm · 2020-07-10T08:08:19Z

Quick update on the recent code changes: I got rid of some of the code duplication by using a switch statement, similar to https://github.com/eulerto/wal2json/blob/master/wal2json.c#L1470. I added an empty REORDER_BUFFER_CHANGE_DELETE branch (we don't delete anything in those 3 tables) to avoid triggering an assert in that highly unlikely "delete" case.

Come to think about wal2json, this plugin is available as Ubuntu package and would probably fulfill all of our requirements: you can filter on a list of tables, get a JSON entry for each change (with format=2) that includes all fields + field values, optionally include the LSN, etc. It's a bit unfortunate that I haven't investigated this plugin in more detail earlier on, but I guess that ship has sailed now.

mmd-osm · 2020-07-10T20:23:26Z

(Note: As this pull request includes multiple topics now, I will split them up again later into smaller chunks and multiple PRs).

See #12

joto · 2020-07-14T14:50:50Z

I have now implemented that "simple version" of this in 75b08df, storing the single bit we need in the pointer itself. This way we have something that should work. If this turns out to be not enough, we can come back to this PR.

mmd-osm · 2020-07-14T16:38:36Z

I did some debugging the other day and have a better understanding about the memory context now. The idea is to instantiate our own global memory manager, which manages the memory that the Postgres functions we’re calling inside our plugin need (in case they need any memory on their own). After we’re done we’re resetting the context and all potentially allocated memory is released again. I guess this is all done to avoid nasty memory leaks.

Plugins typically store a pointer to that memory context in their own defined and allocated structure, which happens to double as a good location to store some other stuff, like Boolean flags. So the overall motivation looks quite different now, and it’s difficult to find some good explanation about those details.

I found this blog post quite helpful: https://blog.pgaddict.com/posts/introduction-to-memory-contexts - it gives a good overview about the different concepts in place.

I hope this explanation made some sense...

joto · 2020-09-23T14:10:24Z

@mmd-osm Have you found some PostgreSQL people to review the code and/or explain why my simpler solution doesn't work?

I still think this cargo cult and will close this unless somebody convinces me this is needed.

mmd-osm · 2020-09-23T14:20:58Z

I'm not sure if you read my previous comment I wrote on July 14. Do you have some questions on it, or like to add some comments? The current version of code doesn't handle the requirement to manage memory for Postgresql methods, which are being called from the plugin. Did you consider this part when calling this stuff "cargo cult"?

I pinged two of the plugin authors in openstreetmap/operations#154 (comment), but got no reply.

Unfortunately, I don't have any time to drive this topic any further at the moment.

joto · 2020-09-23T15:15:52Z

Yes, I have read that comment but I don't understand how this applies. We are not allocating memory. Maybe PostgreSQL does allocate memory somewhere behind our backs, but if it does it uses whatever memory context it already has and that's not our concern. The code we currently have does work, so its not that there is no memory context setup already, otherwise we would see some kind of crash.

mmd-osm · 2020-09-23T17:15:45Z

Right, Postgresql would already have a "Logical decoding context" in place when calling our plugin code (see https://github.com/postgres/postgres/blob/master/src/backend/replication/logical/logical.c#L154-L158 + later calls of MemoryContextSwitchTo).
The Postgresql functions we're calling from within the plugin code might allocate some memory on their own using exactly that context, hence the code doesn't crash.

I think the reason why lots of other plugins used their own memory context is to isolate the plugin code from the rest of the Postgresql core. Any (potential) memory leak in the plugin code would be contained to the plugin itself, and not impact a memory context which we don't own. It's a bit like everyone keeping their own space tidy, it reduces the risk that the whole database will have an issue.

joto · 2020-09-27T08:25:28Z

I am closing this for now as it isn't actionable. We can reopen this if needed.

mmd-osm commented Jul 8, 2020

View reviewed changes

postgresql-plugin/osm-logical.c

ALLOCSET_SMALL_INITSIZE,

ALLOCSET_SMALL_MAXSIZE);

#endif

This comment was marked as outdated.

Sign in to view

mmd-osm marked this pull request as ready for review July 8, 2020 16:19

joto mentioned this pull request Jul 9, 2020

New way of generating replication diff openstreetmap/operations#438

Closed

2 tasks

osm-logical: store needs_commit in own context

010c94b

mmd-osm marked this pull request as draft July 10, 2020 20:30

This was referenced Jul 10, 2020

osm-logical: add pg_osmlogical_ prefix to callback functions #15

Merged

Osm-logical: sanity checks #16

Closed

joto added a commit that referenced this pull request Jul 14, 2020

Use output_plugin_private in context instead of static variable

75b08df

See #12

mmd-osm mentioned this pull request Jul 20, 2020

Need a new way of doing diff replication openstreetmap/operations#154

Closed

joto closed this Sep 27, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

osm-logical: store needs_commit in own context #12

osm-logical: store needs_commit in own context #12

mmd-osm commented Jul 7, 2020 •

edited

Loading

This comment was marked as outdated.

joto commented Jul 8, 2020

joto commented Jul 8, 2020

mmd-osm commented Jul 8, 2020 •

edited

Loading

joto commented Jul 8, 2020

mmd-osm commented Jul 8, 2020

mmd-osm commented Jul 10, 2020 •

edited

Loading

mmd-osm commented Jul 10, 2020 •

edited

Loading

joto commented Jul 14, 2020

mmd-osm commented Jul 14, 2020 •

edited

Loading

joto commented Sep 23, 2020

mmd-osm commented Sep 23, 2020 •

edited

Loading

joto commented Sep 23, 2020

mmd-osm commented Sep 23, 2020

joto commented Sep 27, 2020

osm-logical: store needs_commit in own context #12

osm-logical: store needs_commit in own context #12

Conversation

mmd-osm commented Jul 7, 2020 • edited Loading

This comment was marked as outdated.

joto commented Jul 8, 2020

joto commented Jul 8, 2020

mmd-osm commented Jul 8, 2020 • edited Loading

joto commented Jul 8, 2020

mmd-osm commented Jul 8, 2020

mmd-osm commented Jul 10, 2020 • edited Loading

mmd-osm commented Jul 10, 2020 • edited Loading

joto commented Jul 14, 2020

mmd-osm commented Jul 14, 2020 • edited Loading

joto commented Sep 23, 2020

mmd-osm commented Sep 23, 2020 • edited Loading

joto commented Sep 23, 2020

mmd-osm commented Sep 23, 2020

joto commented Sep 27, 2020

mmd-osm commented Jul 7, 2020 •

edited

Loading

mmd-osm commented Jul 8, 2020 •

edited

Loading

mmd-osm commented Jul 10, 2020 •

edited

Loading

mmd-osm commented Jul 10, 2020 •

edited

Loading

mmd-osm commented Jul 14, 2020 •

edited

Loading

mmd-osm commented Sep 23, 2020 •

edited

Loading