Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added draft proposal for WFLY-15659 : Transaction SlotStore config #446

Open
wants to merge 30 commits into
base: main
Choose a base branch
from

Conversation

jhalliday
Copy link

@jhalliday jhalliday commented Nov 16, 2021


=== Dev Contacts

* mailto:{email}[{author}]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you may add myself or yourself or both here


=== Nice-to-Have Requirements

Extend the sever dependency model to allow use of the Persistent Memory library, mashona, to support SlotStore use on pmem hardware. Optionally the components consuming it could simply bundle their own copies, trading version flexibility vs. footprint.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is only for transactions and infinispan could we initially just add the library as a resource in the module.xml file?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be really problematic for the WildFly build to deal with two different versions of the same library being provided from the same feature pack, which is what we'd be talking about with WildFly's own use of mashona.

Is it expected that different consumers of mashona, e.g. Narayana and Infinispan, aren't going to be able to align on a consistent mashona version?

If not, simplest is to provide a separate module, consistent with how most artifacts in WildFly are provided.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The upstreams for narayana and infinispan will inevitably go though periods when they diverge somewhat in the version of the mashona library they use, since they don't release, or subsequently get updated in WF, in lockstep. On the other hand, they should generally be in agreement on which version of the mashona library API they use as with e.g. jboss-logging, so it mostly shouldn't matter if the wildfly pom overrides the minor/patch version that the upstream prefers for sake of unity.

* Testing of the pmem options will require appropriate hardware, though this can be simulated by system configuration (similar to a RAM disk)

== Community Documentation
////

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need something in the jbosstm docs and in the wildfly transaction model

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bstansberry
Copy link
Contributor

@jmesnil FYI


The current options include filesystem based using file-per-transaction or append-only log (reusing code from HornetQ / ActiveMQ Artemis), or a JDBC database.

Narayana upstream now also offers the SlotStore, a filesystem based store that employs an efficient memory mapping approach. Additionally and uniquely, this store can utilise Persistent Memory (pmem) hardware where available for very fast transaction logging.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this expected to be more efficient than the journal store even without pmem hardware?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

YMMV.
The journal model works by gathering a number of tx log records into a single disk flush, which is great if you happen to have a lot of concurrent tx going on. With the trend to smaller deployments, containers with only one microservice and such, that's less of an advantage. It comes at the cost of higher latency transactions, as each must wait to join the next batch. It also forces global ordering on the tx, which is needed if you're a resource manager (databases, message systems) because data updates have to respect causal order. But it's unnecessary overhead for the tx manager, to which the tx are not ordered.
The SlotStore does one disk flush per tx, which at first glance makes it really inefficient. However, modern SSD can sustain a much higher flush rate that HDD could and indeed benefit from the added concurrency as they can better internally stripe the writes than an HDD with few heads can. It also means each tx can flush immediately instead of waiting for a batch fill/timer, which can reduce latency. At some point you hit a scale ceiling where batching is still beneficial, On pmem that's crazy high, since a flush is in the cost cost ballpark as the thread coordination. On an enterprise SSD not quite so much, but it's at a higher point than many smaller deployments with low tx concurrency ever reach.
To be fair, if its batch interval is tuned to the SSD it's on, the journal can be almost as good even at lower concurrency since you'll essentially be running batches of size 1, though it's still got more thread coordination overhead than the SlotStore. Not that anyone ever tunes it, and the defaults we ship are... somewhat dated compared to modern hardware capabilities. But that's a whole other discussion.
So, not guaranteed to be a win for everyone, but helpful in some use cases.


Extending the server's transaction management model to allow configuration of these options would allow users to access the new functionality of this component.

Although the SlotStore code is part of Narayana, the mashona library used to support use on pmem is independent and may also be utilised by other components requiring similar hardware support e.g. Infinispan and messaging. For this reason, it may be suited to packaging as a separate module.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I talked elsewhere about ways to use the subsystem code to allow such a module to be optionally provisioned. But it seems mashona-logwriting is a 32kb jar so that seems like extreme overkill. :)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. I think the decision probably revolves more around the build and version flexibility than the footprint.


=== Hard Requirements

Extend the server management model to facilitate configuration of the new SlotStore transaction log type.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some details on what config options will be available would be good.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Config is at two levels: telling the tx engine to use the SlotStore and where to put it on the filesystem, then sizing it. Because it's memory mapped and Java doesn't like unmapping things, you pretty much have to pick the sizing ahead of time. So, number of slots (roughly the number of concurrent tx you expect) and size of each slot (how much information each tx record contains) Both those are relatively small, such that the best bet may be to just overprovision it significantly as default. I'm almost tempted not to expose the sizing params at all (as with many of the 100+ tx config options, they could still be tweaked by system properties, just not though the model) but maybe that's just inviting trouble.


* Testing of the SlotStore itself can be accomplished by using the same transaction tests that exercise existing store types, but changing the server config to use the new store type.

* Testing of the pmem options will require appropriate hardware, though this can be simulated by system configuration (similar to a RAM disk)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How much of this is covered within Narayana testing?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ObjectStore interface - all of it. The SlotStore implementation of it - no idea. Theoretically all of it, but that would require running all the store tests against each store implementation, which makes for quite a large matrix. I don't know what the current CI setup does. pmem - none, as the tx CI doesn't have any. I run the mashona tests on real pmem hardware for each release, but don't run the tx test suite, though that should be possible. Feels like if the CI for that has to be somewhere, it's better on the narayana side using fake-pmem or just the SlotStore on SSD, rather than on the mashona side. The advantage of having real pmem hardware is in accurate perf numbers for e.g. regression testing, not in functional testing.


* Testing of the new server configuration options will require new tests, patterned on those for existing store configurations.

* Testing of the SlotStore itself can be accomplished by using the same transaction tests that exercise existing store types, but changing the server config to use the new store type.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the relevant tests (somewhat) known? Are they fairly concentrated or are we talking about running significant chunks of the testsuite with an adjusted config?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The WF tests? Anything that hits the tx store, which by default is anything running a tx across two resources e.g. a mdb writing a database. Historically I'd have been more worried that set wasn't big enough, rather than that its was too large to be efficiently run, but my knowledge of the app server test coverage is out of date to put it mildly. Coverage can be supplemented somewhat by running with 1PC optimization disabled, such that any tx with even a single resource gets logged to the store, but I'd guess that's still not huge. What's the approach for non-default store configs today? Is the full WF suite run for each of the fs store, journal and jdbc store, or does the bulk of that get exercised only upstream in narayana testing?

=== Testing By
// Put an x in the relevant field to indicate if testing will be done by Engineering or QE.
// Discuss with QE during the Kickoff state to decide this
* [ ] Engineering

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it will be the case that Engineering tests this so I guess this would be checked?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/cc @mmusgrov

@jhalliday
Copy link
Author

Time to pick this up again after my PTO. If there are no more questions, I guess the next step is to redraft the PR with all the additional material from the Q&A here so we can move on to implementation?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants