Skip to content
This repository has been archived by the owner on Sep 27, 2019. It is now read-only.

RWSet performance fixes, LockFreeArray performance fixes and tests #1401

Merged
merged 18 commits into from
Jun 15, 2018

Conversation

mbutrovich
Copy link
Contributor

@mbutrovich mbutrovich commented Jun 11, 2018

LockFreeArray:

  • Changed functions that unconditionally returned true to be void.
  • Rewrote FindValid to no longer be O(n) lookups. This is used in DataTable:GetTileGroup() and should be fast.
  • Added Doxygen comments.
  • Added more tests.

TimestampOrderingTransactionManger:

  • Keep track of last accessed tile_group_id and if it's the same avoid going to the StorageManager’s CuckooHashMap for the TileGroup.

TransactionContext:

  • Simplified conditionals to what should be equivalent logic. Reduced lookups for Release mode, increased lookups in Debug mode (by moving lookups that only served for PELOTON_ASSERTs into individual lookups inside the ASSERTs)
  • Removed the iterator hints since Intel docs show they're not used:
    https://software.intel.com/en-us/node/506175
  • Removed is_written_ and insert_count_ members and related logic since they were not used for anything.

@pervazea is going to help me out and grab callgrind numbers when he has time to demonstrate these improvements.

return;
}
rw_set_.insert(rw_set_it, std::make_pair(location, RWType::READ));
rw_set_[location] = RWType::READ;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we have still two lookups here, one in line 99 and one in line 103. Is it even possible to do this with a single lookup? No, because some other thread could have changed the RW set between these to lines?

Copy link
Contributor Author

@mbutrovich mbutrovich Jun 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As best as I can tell, based on our current semantics we can't get away from two lookups on Read, or Delete.

Copy link
Contributor

@tcm-marcel tcm-marcel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I have some questions about the RW set, but maybe I just missed something again.

PELOTON_ASSERT(rw_set_.count(location) == 0 ||
(rw_set_[location] != RWType::DELETE &&
rw_set_[location] != RWType::INS_DEL));
rw_set_[location] = RWType::UPDATE;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think some functionality got lost here. If rw_type == RWType::INSERT it should not be update to RWType::UPDATE.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, I'll rearrange that logic. I was trying to get away from two lookups in the case of an insert but it won't allow that.

return false;
}
if (rw_set_it != rw_set_.end() && rw_set_it->second == RWType::INSERT) {
rw_set_it->second = RWType::INS_DEL;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am just wondering - what does happen, if the value of rw_set_it->second changes between line 130 and line 131/133. Do we still do the right thing?

Copy link
Contributor Author

@mbutrovich mbutrovich Jun 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect it's a race that isn't handled in the current semantics of our RWSet. The hope is that we'll eventually get rid of the INS_DEL type anyway when we stop reusing tuple slots.

{
LockFreeArray<value_type> array;

value_type invalid_value = 6288;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Random number?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just any sentinel value for this test that wasn't inserted will do.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use INVALID_OID instead.

@mbutrovich
Copy link
Contributor Author

@pervazea ran a microbenchmark of this branch against master that showed a drop in calls to StorageManager::GetTileGroup (and thereby CuckooMap::Find) from 1229 to 1083.

Copy link
Contributor

@poojanilangekar poojanilangekar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall. I have two main concerns.

  1. I am not entirely sure if using count on the unordered_map is a good idea. It seems semantically wrong.
  2. I think you need to revert the change in return type of RecordDelete. I would definitely check with @apavlo or @yingjunwu about why we returned different values at different conditions. And why this didn't trigger a test failure.
  3. The is_written_ and insert_count_ can be used to make our commits/aborts faster. Because with MVCC, you'd be running such a transaction under snapshot isolation by default. Please change that and talk to Andy about probably handling that condition.

Lastly, can you please add some numbers for this branch vs the master. (Probably run TPC-C & YCSB)? So at a later stage we'd have a point of reference for our performance.

@@ -655,12 +655,20 @@ ResultType TimestampOrderingTransactionManager::CommitTransaction(

// TODO (Pooja): This might be inefficient since we will have to get the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can remove this TODO, your code has done what the TODO was about.

@@ -807,11 +815,20 @@ ResultType TimestampOrderingTransactionManager::AbortTransaction(
// Iterate through each item pointer in the read write set
// TODO (Pooja): This might be inefficient since we will have to get the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, you can remove this TODO.

@@ -80,102 +78,63 @@ void TransactionContext::Init(const size_t thread_id,

isolation_level_ = isolation;

is_written_ = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a little unsure about removing the is_written_ and insert_count_. These variables can be used to make Commits and Aborts faster. If you figure out that the is_written_ is false, you can take an entirely different code path and treat it like a READ_ONLY transaction.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's effectively what @lmwnshn is doing in #1402.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is okay if we get rid of insert_count_ but we should definitely use is_written_. It will help with multi statement transactions that are not explicitly set to read only.

}

void TransactionContext::RecordRead(const ItemPointer &location) {

PELOTON_ASSERT(rw_set_.count(location) == 0 ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why count? Shouldn't this be a find? It makes sense to usecount withunordered_multimap, the unordered_map can never contain more than one element with the same key.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't really make a difference, but sure I'll change it.

}

bool TransactionContext::RecordDelete(const ItemPointer &location) {
void TransactionContext::RecordDelete(const ItemPointer &location) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please revert this.
We used to return true in case location contained RWType::INSERT and false otherwise. I believe we should maintain these semantics.

Copy link
Contributor Author

@mbutrovich mbutrovich Jun 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That value was never captured in any of the calls and struck me as cruft from other CC implementations but I can change it back.

{
LockFreeArray<value_type> array;

value_type invalid_value = 6288;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use INVALID_OID instead.

@mbutrovich
Copy link
Contributor Author

mbutrovich commented Jun 12, 2018

@poojanilangekar Regarding comment 2, there are no tests for transaction_context and for TimestampOrderingTransactionManager, well... :(

Maybe I can adapt the tests I wrote for GC fixes to have EXPECTs that test TOTM. It would be helpful to have some sort of baselines if we're making changes to TOTM's guts.

For performance, our numbers still seem way too variable to take away anything meaningful, but I'll see if a bunch of runs will smooth that out.

@coveralls
Copy link

coveralls commented Jun 12, 2018

Coverage Status

Coverage increased (+0.07%) to 77.029% when pulling 21253c4 on mbutrovich:friday_night into 308a669 on cmu-db:master.

@mbutrovich
Copy link
Contributor Author

Like I mentioned, hard to take away too much from oltpbench runs right now due to variability. I ran master and the friday_night branch with the attached configs on my laptop:

TPC-C (scale factor 4, 4 terminals, 60 seconds, repeated 10 times):
master: mu: 334.63, sigma: 15.48
friday_night: mu: 347.39, sigma: 40.44

YCSB (scale factor 1000, 4 terminals, read only, 60 seconds, repeated 10 times):
master: mu: 16671.82, sigma: 67.89
friday_night: mu: 16669.69, sigma: 92.82

oltpbench_configs.zip

@mbutrovich
Copy link
Contributor Author

I also did some sampling with dtrace, configs from the previous comment:

TPC-C

Sampled calls to StorageManager::GetTileGroup (cuckoo hash lookups) from CommitTransaction:
master: 7317
friday_night: 6880

Sampled calls to tbb::internal_find:
master: 8032
friday_night: 7815

YCSB read-only

Sampled calls to StorageManager::GetTileGroup (cuckoo hash lookups) from CommitTransaction:
master: 2475
friday_night: 2266

Sampled calls to tbb::internal_find:
master: 6331
friday_night: 6070

@poojanilangekar
Copy link
Contributor

@mbutrovich Do you have an idea about why the performance of this branch has higher variance?

Yes, it would be great if you could add a couple of tests, to the TimestampOrderingTransactionManager.

@mbutrovich
Copy link
Contributor Author

@poojanilangekar Luck of the draw with Peloton and oltpbench, really. Also this is on my laptop where I don't have a ton of control over background tasks. I ran them again today:

TPC-C (scale factor 4, 4 terminals, 60 seconds, repeated 10 times):
master: mu: 332.59, sigma: 11.98
friday_night: mu: 333.06, sigma: 13.33

YCSB (scale factor 1000, 4 terminals, read only, 60 seconds, repeated 10 times):
master: mu: 16689.99, sigma: 127.87
friday_night: mu: 16706.10, sigma: 122.95

Again, still tough to take too much away from oltpbench right now. Regarding tests for TOTM, not sure this is the PR for it.

I'll put the is_written_ flag back.

@tomasic
Copy link
Member

tomasic commented Jun 13, 2018 via email

@poojanilangekar
Copy link
Contributor

LGTM. I think this should be merged in once the build passes.

@mbutrovich
Copy link
Contributor Author

@tomasic I just ran dtrace to approximate abort rates (never tried Peloton's internal stats, and not sure how much they slow the system down). dtrace dropped throughput by ~10%, but:

TPC-C:
CommitTransaction() samples: 27261
AbortTransaction() samples: 1

YCSB:
CommitTransaction() samples: 17997
AbortTransaction() samples: 0

It doesn't seem like aborts are an issue, at least under these oltpbench configs on my laptop.

@tomasic
Copy link
Member

tomasic commented Jun 13, 2018 via email

@tli2 tli2 merged commit e53e5b4 into cmu-db:master Jun 15, 2018
@mbutrovich mbutrovich deleted the friday_night branch June 15, 2018 15:49
mtunique pushed a commit to mtunique/peloton that referenced this pull request Apr 16, 2019
RWSet performance fixes, LockFreeArray performance fixes and tests
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants