Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Future HLA Import Parallelisation work #735

Open
benbelow opened this issue Apr 13, 2022 · 1 comment
Open

Future HLA Import Parallelisation work #735

benbelow opened this issue Apr 13, 2022 · 1 comment
Labels
donor-import Work relates to the donor ingest process, which imports and pre-processes donor information performance Relates to improving the performance of a part of Atlas

Comments

@benbelow
Copy link
Contributor

Context:

This mainly comes about because .NET Core 3 does not support Distributed Transactions: dotnet/runtime#715

The DataRefresh code and the ongoing DonorUpdate code share the same underlying code for processing and updating HLAs in the Matching Database.

We would like that code to be both fast and fully transactional, that is to say, updates for any given donor should either be fully written or should fully fail. If they fail then nothing is left behind in the Database from that write.

Unfortunately .NET Core 3 doesn’t support the tools necessary for to get BOTH of these.

TransactionScope is specifically designed to provide transactionality over large blocks of code, and it works very well. But (as of .NET Core 3) it doesn’t support Distributed Transactions, and it needs DTs in order to support multiple parallel connections. It handles multiple sequential SQL connections just fine, but not in parallel.

Unfortunately, you need multiple parallel connections to be able to write the per-locus HLA data to the DB efficiently. Changing from a Task.WhenAll(DoPerLocusWrite) to foreach() { await DoPerLocusWrite()) dropped the performance by 25-35% . (Note: Parallel Inserts on a single connection was attempted, but MARS doesn’t actually allow for parallel execution - it just interleaves them, so you don’t gain anything.)

So as it stands we can EITHER have it fast OR have it transactional. The code allows either option based on a boolean.

We’ve currently opted for:

the HLA Processing in DataRefresh opts to be “Fast”.

It needs the performance boost, and covers for the transactionality with the batches and overlaps-on-continue functionality.

the ongoing DonorUpdates opts to be “Transactional”.

they aren’t yet known to need the performance, and have bigger problems with transactionality due to less control over message replay and less orderliness.

These choices are controlled by appSettings, independently.

Task

If .NET Core starts to support Distributed Transactions (theoretically in .NET Core 5? See the thread linked above) then we should trial allowing the writes to be parallel AND transactional (just change the await code controlled by the boolean.) You MUST perf test it in detail! See the HighVolume DonorUpdate tests. Hopefully this will be a very quick ~30% win, once .NET Core catches up.

Alternatively, if we conclude that the performance of the DonorUpdates is inadequate at the HLA writes are the only remaining bottleneck (AND running them in parallel would be enough to solve the performance!!) then we would need to look into alternative ways to manage the transactionality of the DonorUpdates. That’s going to be a LOT of extra work, and it will be worth doing a lot of serious perf analysis of other bottlenecks, before you get to that!

@benbelow benbelow added performance Relates to improving the performance of a part of Atlas donor-import Work relates to the donor ingest process, which imports and pre-processes donor information labels Apr 13, 2022
@benbelow
Copy link
Contributor Author

The original ticket description above was taken from AN JIRA, raised in Jul 2020.

The linked issue did not make it into .Net 5, nor into .Net 6.

It is currently scheduled for .Net 7 - at which point we should be able to capitalise on the performance gains described above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
donor-import Work relates to the donor ingest process, which imports and pre-processes donor information performance Relates to improving the performance of a part of Atlas
Projects
None yet
Development

No branches or pull requests

1 participant