Optimize for serial commits in 2PC #2345

maysamyabandeh · 2017-05-22T18:16:13Z

Throughput: 46k tps in our sysbench settings (filling the details later)

The idea is to have the simplest change that gives us a reasonable boost
in 2PC throughput.

Major design changes:

The WAL file internal buffer is not flushed after each write. Instead
it is flushed before critical operations (WAL copy via fs) or when
FlushWAL is called by MySQL. Flushing the WAL buffer is also protected
via mutex_.
Use two sequence numbers: last seq, and last seq for write. Last seq
is the last visible sequence number for reads. Last seq for write is the
next sequence number that should be used to write to WAL/memtable. This
allows to have a memtable write be in parallel to WAL writes.
BatchGroup is not used for writes. This means that we can have
parallel writers which changes a major assumption in the code base. To
accommodate for that i) allow only 1 WriteImpl that intends to write to
memtable via mem_mutex_--which is fine since in 2PC almost all of the memtable writes
come via group commit phase which is serial anyway, ii) make all the
parts in the code base that assumed to be the only writer (via
EnterUnbatched) to also acquire mem_mutex_, iii) stat updates are
protected via a stat_mutex_.

Note: the first commit has the approach figured out but is not clean.
Submitting the PR anyway to get the early feedback on the approach. If
we are ok with the approach I will go ahead with this updates:
0) Rebase with Yi's pipelining changes

Currently batching is disabled by default to make sure that it will be
consistent with all unit tests. Will make this optional via a config.
A couple of unit tests are disabled. They need to be updated with the
serial commit of 2PC taken into account.
Replacing BatchGroup with mem_mutex_ got a bit ugly as it requires
releasing mutex_ beforehand (the same way EnterUnbatched does). This
needs to be cleaned up.

yiwu-arbug · 2017-05-22T18:50:31Z

Seems to me now all synchronization is mutex-based. Will we get more improvement if we do the following?

Prepare():
- Use existing WriteImpl approach
Commit():
- lock mem_mutex_
- not responsible for trigger flush (i.e. Do not call PreprocessWrite())
FlushWAL():
- lock mem_mutex_ (to access commit marker buffer)
- Use existing WriteImpl approach

And I personally prefer to have a separate WriteImpl method for the mutex-based write, to make the implementation clean. Not sure what do @siying think.

siying

Great! I think this is a reasonable direction to go. My main comments are:
(1) have a clear separation of the new logic for MyRocks and the current behavior to other RocksDB users.
(2) needs to think about the design of locking so that it is easier to understand and can perform better.

Also, just think about what can make the code easier to maintain going forward: patching the code everywhere, or have a new function and reuse code by helper function. I'm worried that mixed the two locking approaches everywhere in the code may make it harder to maintain in the future.

siying · 2017-05-22T18:29:07Z

db/db_impl_write.cc

@@ -395,9 +441,20 @@ Status DBImpl::WriteToWAL(const autovector<WriteThread::Writer*>& write_group,
  WriteBatchInternal::SetSequence(merged_batch, sequence);

  Slice log_entry = WriteBatchInternal::Contents(merged_batch);
+  if (concurrent) {
+    // We need to lock mutex_ since logs_ etc. might change concurrently
+    mutex_.Lock();


Holding this lock in LogWriter::AddRecord() makes me nervous. This heavy duty lock can block reads and compactions. Even if not doing I/O, LogWriter::AddRecord() is still a relatively expensive operation. It calculates CRC for everything and makes several copies. Is there a way to define another mutex just to protect those WAL related data structures? You already defined two more mutexes. A third one doesn't feel like a big deal.

Make sense. It would require more work since log_ (and its family) are assumed to be protected by mutex_ all over the code. But it is doable.

siying · 2017-05-22T18:32:38Z

include/rocksdb/db.h

+  // Flush the WAL memory buffer to the file. If sync is true, it calls SyncWAL
+  // afterwards.
+  virtual Status FlushWAL(bool sync) {
+    throw std::runtime_error("FlushWAL not implemented");


Should return Status::NotSupported() instead of run time error.

siying · 2017-05-22T18:36:27Z

db/db_impl_write.cc

+    serial_memtable_guarantee = std::unique_ptr<InstrumentedMutexLock>(
+        new InstrumentedMutexLock(&mem_mutex_));
+  }
+  const bool batch_writes = false;


What will be the interface eventually? Add a more parameter to WriteImpl()?

Yeah I am thinking something like options.disable_group_batch and options.enable_wal_buffer

siying · 2017-05-22T18:37:51Z

db/version_set.h

    last_sequence_.store(s, std::memory_order_release);
  }

+  void SetLastToBeWrittenSequence(uint64_t s) {
+    assert(s >= last_to_be_written_sequence_);
+    last_to_be_written_sequence_.store(s, std::memory_order_release);


These are only accessed within DB mutex. So I assume relaxed order is enough.

Make sense. Let me double check and confirm.

SetLastToBeWrittenSequence is used only during initialization so its stronger ordering guarantee does not make much difference in performance. The newly added FetchAddLastToBeWrittenSequence could be accessed outside mutex and its ordering guarantees cannot be relaxed.

It might sound paranoid but how about we stick to either memory_order_seq_cst or memory_order_relaxed to make it easier to reason about correctness.

For this function I can increase it to memory_order_seq_cst but for FetchAddLastToBeWrittenSequence it might affect the performance. I can run a test and will increase it if the performance was not affected.

siying · 2017-05-22T18:40:07Z

db/db_impl_write.cc

+    auto l2bws = versions_->LastToBeWrittenSequence();
+    assert(l2bws >= last_sequence);
+    versions_->SetLastToBeWrittenSequence(l2bws + count);
+    last_sequence = l2bws;


How do you make sure in the WAL the entries are written in the sequence number order, if the lock is released after here? Should we get the ID in the same mutex before writing to WAL? Or maybe never release the mutex until then.

I moved this logic to inside the WriteToWAL under mutex.

siying · 2017-05-22T18:42:02Z

db/db_impl_write.cc

@@ -180,6 +206,9 @@ Status DBImpl::WriteImpl(const WriteOptions& write_options,
    // that nobody else can be writing to these particular stats.
    // We're optimistic, updating the stats before we successfully
    // commit.  That lets us release our leader status early.
+    if (!batch_writes) {
+      stat_mutex_.Lock();


Which stat introduces a data risk?

AddDBStats is not atomic.

275 void AddDBStats(InternalDBStatsType type, uint64_t value) { 276 auto& v = db_stats_[type]; 277 v.store(v.load(std::memory_order_relaxed) + value, 278 std::memory_order_relaxed); 279 }

StatisticsImpl::recordTick seems to be fine (using fetch_add) and PERF_TIMER_* seem to be thread-local and thus ok.

siying · 2017-05-22T18:44:51Z

db/db_impl.cc

@@ -2520,6 +2543,9 @@ Status DBImpl::IngestExternalFile(

    // Stop writes to the DB
    WriteThread::Writer w;
+    mutex_.Unlock();
+    mem_mutex_.Lock();


To play safe, I suggest we only do this extra locking logic in a mode that is only enabled in MyRocks. I'm scared of potential deadlock here.

Yeah me too ;) In the past two days I have been fixing deadlocks in unit tests. Let me think further to see how we can simplify this.

siying · 2017-05-22T18:45:08Z

db/db_impl.h

@@ -791,6 +797,8 @@ class DBImpl : public DB {
  // NOTE: should never acquire options_file_mutex_ and mutex_ at the
  //       same time.
  InstrumentedMutex options_files_mutex_;
+  InstrumentedMutex mem_mutex_;
+  InstrumentedMutex stat_mutex_;


Document the lock holding order of all those mutexes.

siying · 2017-05-22T18:49:32Z

db/db_impl_write.cc

+  std::unique_ptr<InstrumentedMutexLock> serial_memtable_guarantee;
+  if (lock_memtable) {
+    serial_memtable_guarantee = std::unique_ptr<InstrumentedMutexLock>(
+        new InstrumentedMutexLock(&mem_mutex_));


This needs to be avoided in non-MyRocks case.
Also, consider to avoid this malloc. We can add a function in InstrumentedMutexLock like set_mutex().

In the new approach I am using a separate write group and the mem_mutex_ is no longer needed

maysamyabandeh · 2017-05-22T19:10:51Z

Thanks for the feedback @yiwu-arbug. A couple of questions:

Which overhead is expected to be reduced with the proposed approach?
Commit also has WAL writes. How does the proposed approach ensure serial access to the WAL having commits not joining the group commit?
How does the background process sync with memtable access in commits? Currently the background process acquires mem_mutex_ (in addition of EnterUnbatched. Do you suggest we keep that?
I like the idea of removing PreprocessWrite from commit phase. My experiments however did not suggest that it would improve the throughput. I guess it is because the check is fast and the actual work of switching memtable happens very rarely.
Currently FlushWAL does not acquire mem_mutex_. Is that incorrect? Can you clarify why flushing WAL buffer would require mem_mutex_.

PS: My first impl had this split to two workflows. Then I realized that much duplication between the two (stats, etc.) and figured that a single function would make keeping them consistent easier in future. I am open to suggestions though.

yiwu-arbug · 2017-05-22T20:26:36Z

@maysamyabandeh seems I didn't understand the patch very well. Will sync with you offline.

maysamyabandeh · 2017-05-23T01:19:34Z

Thanks @yiwu-arbug for your comment. I enabled batching for preparers as you suggested and the throughput increased to 49k. I guess it is because it reduces competition with the serial commit over shared logs.

facebook-github-bot · 2017-05-31T18:14:57Z

@maysamyabandeh updated the pull request - view changes

facebook-github-bot · 2017-06-01T00:42:23Z

@maysamyabandeh updated the pull request - view changes

facebook-github-bot · 2017-06-02T21:45:12Z

@maysamyabandeh updated the pull request - view changes

yiwu-arbug

Haven't deep dive into the logic, just some idea of reorganizing the code:

Move the new logic in WriteImpl() into WriteImpl2PCPrepare() and WriteImpl2PCCommit() to avoid mixing of normal write path and 2pc write path. Also remove disable_memtable from WriteImpl() since it is use by 2pc only.
Have WriteImpl2PCCommit() (or even FlushWAL()) update db stats, to avoid the need of stat_mutex_.
Not sure if it is a good idea, but you can reuse WriteThread::newest_memtable_writer_ to replace nonmem_write_thread_.

siying

I think this diff makes sense. In MyRocks case, we anyway need a way to allow MyRocks to have find control of WAL writing, and WAL flushing. I believe this is a good change to have anyway.

From coding problem of the view, this is much better than the previous version. I still think it can be made even better if we have a separate lightweight PrepareToWALBuffer() call and build it on top of @yiwu-arbug 's diff.

siying · 2017-06-06T13:54:19Z

db/db_impl_write.cc

  total_log_size_ += log_entry.size();
  alive_log_files_.back().AddSize(log_entry.size());
+  if (concurrent) {


I suggest we put line 423 to 434 to a separate helper function, and then directly call it from a function like DBImpl::PrepareToWALBuffer() where the mutex is protected. And we can do the same for line 470-478.

siying · 2017-06-06T13:55:54Z

db/db_impl_write.cc

  WriteBatchInternal::SetSequence(merged_batch, sequence);

  Slice log_entry = WriteBatchInternal::Contents(merged_batch);
+  log::Writer* log_writer = logs_.back().writer;
  status = log_writer->AddRecord(log_entry);


Maybe whether writing to buffer should be passed in per record basis. In this case, in case, someone issues a write without going through this prepare API, it will call AddRecord() with the flush option so it should still work.

Can you explain more? We are currently enabling the WAL buffer with an option, which can be enabled independently from from 2PC.

siying · 2017-06-06T13:58:58Z

include/rocksdb/options.h

+  // If true WAL is not flushed automatically after each write. Instead it
+  // relies on manual invocation of FlushWAL to write the WAL buffer to its
+  // file.
+  bool manual_wal_flush = false;


Can we just use one option?
The only place I think we need an option is whether to grab mutex for normal writes.

If we have a direct PrepareToWALBuffer() function, there isn't a second queue, so the extra queue logic can be avoided.

Two options is better. One might enable wal buffer without improvements for 2PC commit: https://groups.google.com/d/msg/rocksdb/ngNeSSQHfao/_AV_KIf1CAAJ

Can you explain more? The second queue was added after Yi's suggestion, which made the code much cleaner and even improved throughput a bit.

If we have a direct PrepareToWALBuffer() function, there isn't a second queue, so the extra queue logic can be avoided.

Why do we need a queue if we don't do group commit and don't care about the order out follows the order in? I also don't understand why it can improve throughput. Can you explain it?

In theory the other queue should help: without queue 128 prepare threads competing with commit on wal_mutex_ but with queue only 1 thread does that. In practice I have seen mixed results: in some it improved throughput by 3-4k.

Without queue other part of the code that wanted to be the sole writer would still need to hold a mutex. Combination of mutex and enter queue unbatched was a bit ugly. Instead now they enter two queues.

There is an existing use case for only WAL buffering: https://groups.google.com/d/msg/rocksdb/ngNeSSQHfao/_AV_KIf1CAAJ
This feature is going to stay even after we discard concurrent wal writes in future so it does need its own config.

To me, WAL buffer is not a performance config, and it is more of a durability one. Normal users should not be bothered with playing around with them. Advanced users could make use of it only after making changes to their application to call FlushWAL at right times (like myrocks does). So I do not think we should include such options in the tuning complexity issue that we have.

Can this user also use the same option?

Then we would also use two write queues for this user, which is irrelevant to WAL buffer feature that they intended to have.
I do think the WAL buffer is something that will stay in the code base for long time. The other feature is temporary and will be removed entirely once we push the commits to an earlier stage.

OK then.

As a separate thing, maybe it's a good time to separate DBOptions to an advanced part and put it to advanced_options.h too, just like what we do for CFOptions.

Makes sense. Opening a task for it.

facebook-github-bot · 2017-06-08T18:33:33Z

@maysamyabandeh updated the pull request - view changes

facebook-github-bot · 2017-06-08T18:34:29Z

@maysamyabandeh has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2017-06-11T20:21:14Z

@maysamyabandeh updated the pull request - view changes - changes since last import

facebook-github-bot · 2017-06-11T20:59:53Z

@maysamyabandeh updated the pull request - view changes - changes since last import

facebook-github-bot · 2017-06-11T21:27:03Z

@maysamyabandeh updated the pull request - view changes - changes since last import

facebook-github-bot · 2017-06-11T21:28:31Z

@maysamyabandeh has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

yiwu-arbug

Haven't finish reading the whole patch. Some minor comments.

yiwu-arbug · 2017-06-14T18:39:33Z

db/db_impl.cc

@@ -607,6 +610,26 @@ int DBImpl::FindMinimumEmptyLevelFitting(ColumnFamilyData* cfd,
  return minimum_level;
 }

+Status DBImpl::FlushWAL(bool sync) {


move to db_impl_write.cc?

Would not make sense to be in the same file that SyncWAL is?

I think SyncWAL can go to db_impl_write.cc too. But, yeah, let's leave them here.

yiwu-arbug · 2017-06-14T18:46:39Z

db/db_impl.cc

+                      s.ToString().c_str());
+    }
+    if (!sync) {
+      ROCKS_LOG_DEBUG(immutable_db_options_.info_log, "FlushWAL-nosync");


improve the logging message?

yiwu-arbug · 2017-06-14T18:50:37Z

db/db_impl.cc

    WriteThread::Writer w;
    write_thread_.EnterUnbatched(&w, &mutex_);
+    WriteThread::Writer nonmem_w;
+    nonmem_write_thread_.EnterUnbatched(&nonmem_w, &mutex_);


call only for 2pc?

makes sense.

yiwu-arbug · 2017-06-14T20:12:34Z

db/version_set.h

    last_sequence_.store(s, std::memory_order_release);
  }

+  void SetLastToBeWrittenSequence(uint64_t s) {
+    assert(s >= last_to_be_written_sequence_);
+    last_to_be_written_sequence_.store(s, std::memory_order_release);


It might sound paranoid but how about we stick to either memory_order_seq_cst or memory_order_relaxed to make it easier to reason about correctness.

yiwu-arbug · 2017-06-14T20:20:11Z

db/version_set.h

+    last_to_be_written_sequence_.store(s, std::memory_order_release);
+  }
+
+  uint64_t FetchAddLastToBeWrittenSequence(uint64_t s) {


Update PipelinedWriteImpl() to use this method?

PipelinedWriteImpl has a nice technique for avoiding the to_be_written_seqeunce_number. When we move forward with next steps and move write to an earlier phases we might remove this function entirely. Then rolling back the changes from PipelinedWriteImpl would be difficult.

It seems to me the technique is the same. What's the difference?

I ended up added an extra last_to_be_written_sequence_ that has to be set properly all over the code base: initialization, file injection, ... But PipelinedWriteImpl contains the complexity inside itself and does not expose existence of this separate seq number to the reset of code base.

I agree that these two counters should be consolidated. Both of the code paths have two counters, one next seq for WAL and one seq visible to reads. We don't have a reason to keep two variables for the same thing.

@siying I guess you mean for long-term. For the purpose of this patch we do need two seq numbers.

siying · 2017-06-15T16:35:11Z

db/db_impl_write.cc

@@ -62,6 +62,11 @@ Status DBImpl::WriteImpl(const WriteOptions& write_options,
                         WriteBatch* my_batch, WriteCallback* callback,
                         uint64_t* log_used, uint64_t log_ref,
                         bool disable_memtable) {
+  // The current implementation does not support sync with concurrenet writes
+  assert(!concurrent_writes_ || !write_options.sync);
+  if (concurrent_writes_ && write_options.sync) {


The name concurrent_writes_ confused me. It feels to me that if it is false, writes cannot be executed concurrently. Can we rename to something more specific, like buffer_wal_writes_in_prepare_ or something like that?

sure. I am trying to avoid 2pc specific names here. I will change it to concurrent_wal_writes_.

concurrent_wal_writes_ still feels confusing to me, as we don't actually make all WAL writes concurrent, but only prepare writes or prepare writes with one commit write. We aren't making commit WAL writes concurrent. Maybe it can be something like concurrent_prepare_wal_writes_, or concurrent_prepare_.

siying · 2017-06-15T16:36:23Z

db/db_impl_write.cc

@@ -62,6 +62,11 @@ Status DBImpl::WriteImpl(const WriteOptions& write_options,
                         WriteBatch* my_batch, WriteCallback* callback,
                         uint64_t* log_used, uint64_t log_ref,
                         bool disable_memtable) {
+  // The current implementation does not support sync with concurrenet writes
+  assert(!concurrent_writes_ || !write_options.sync);


We don't tend to assert user requests, unless they are used in radically wrong way, like calling key() with !Valid(). The risk is that we crash users' whole service with a per request misconfig if they run in debug mode.

siying · 2017-06-15T16:38:50Z

db/db_impl_write.cc

      write_thread_.EnterAsBatchGroupLeader(&w, &write_group);
+  last_batch_group_size_.fetch_add(batch_group_size,
+                                   std::memory_order_relaxed);


Can you explain this?

last_batch_group_size_ is now accessed concurrently. I therefore change it to std::atomic. It is reset when it is read by rate limiter.

For 2pc-optmal case, you don't go through throttling logic at all (if I understand correctly), then why keep this last_batch_group_size_ updated at all? For non-2pc-optimal case, there is a regression of performance replacing a normal set to an atomic update.

I looked at it more. Since the throttling is done only in commit(), we should only update last_batch_group_size_ in commit too. Maybe the concurrent code count the same write twice when calculating the write rate, one in prepare and one in commit, and this is wrong. We should only calculate it once. If we only call it in commit(), then we don't need this atomic update. A simply set would be good enough.

So if the last commit writes 1 bytes while the last prepares have written 1GB to WAL, should not we throttle the writes to WAL?

Correct. The throttle is to protect memtable and compaction. Writing to WAL is OK.

siying · 2017-06-15T16:41:09Z

db/db_impl_write.cc

@@ -584,6 +716,38 @@ Status DBImpl::WriteToWAL(const WriteThread::WriteGroup& write_group,
  return status;
 }

+Status DBImpl::ConcurrentWriteToWAL(const WriteThread::WriteGroup& write_group,


The name confuses me. It's not really concurrent, but not flush, right?

It is actually concurrent. With new changes we could have two threads trying to write to the WAL at the same time: the thread aggregating prepares and the thread writing commit.
Not flushing WAL buffer is a totally separate feature. It happen to be in the same patch since these two features together give performance boost.

I don't get it. WAL is one single file. How you can write to it concurrently?

It meant to say that this function can be called concurrently. Inside, the actual write to WAL is serialized. I chose this name since previously WriteToWAL assumed to be called only by one thread at a time. ConcurrentWriteToWAL is supposed to suggest that this function can now be called concurrently.

siying · 2017-06-15T16:46:33Z

db/db_impl_write.cc

+  assert(log_size != nullptr);
+  Slice log_entry = WriteBatchInternal::Contents(&merged_batch);
+  *log_size = log_entry.size();
+  log::Writer* log_writer = logs_.back().writer;


For the normal path, is it inside DB mutex? I moved it from here to inside the DB mutex to avoid a data race bug: 5dad9d6

Are we getting the bug back here?

Can you tell me more about the bug? If you assume that logs_.back() could change concurrently, how can you safely write to that last pointer you got of logs_.back()? Then you could write to a stale WAL.

No I'm not assuming logs_.back() can change. Still we got segfault in stress tests. I am explaining in another comment.

siying · 2017-06-15T16:49:02Z

db/db_impl_write.cc

+    if (!logs_.empty()) {
+      // Alway flush the buffer of the last log before switching to a new one
+      log::Writer* cur_log_writer = logs_.back().writer;
+      cur_log_writer->WriteBuffer();


I don't get why it isn't inside log_write_mutex_.

From sync policy on logs_:

908 // - back() and items with getting_synced=true are not popped, 909 // - it follows that write thread with unlocked mutex_ can safely access 910 // back() and items with getting_synced=true. ... 913 std::deque<LogWriterNumber> logs_;

siying · 2017-06-15T17:03:31Z

db/db_impl.cc

@@ -645,6 +668,7 @@ Status DBImpl::SyncWAL() {
    need_log_dir_sync = !log_dir_synced_;
  }

+  TEST_SYNC_POINT("DBWALTest::SyncWALNotWaitWrite:1");


I believe every access to logs_ should grab log_write_mutex_. In SyncWAL(), it is not the case.

The access policy to logs_ is bit more fine tuned:

904 // Log files that aren't fully synced, and the current log file. 905 // Synchronization: 906 // - push_back() is done from write thread with locked mutex_, 907 // - pop_front() is done from any thread with locked mutex_, 908 // - back() and items with getting_synced=true are not popped, 909 // - it follows that write thread with unlocked mutex_ can safely access 910 // back() and items with getting_synced=true. 911 // - When concurrent write threads is enabled, back() and push_back() must be 912 // called within log_write_mutex_ 913 std::deque<LogWriterNumber> logs_;

We would potentially lose performance if make it stricter than it already is in the code base.

@maysamyabandeh this may be old but I realized later that back() needs to be protected by the same mutex as push_back() and pop_front() too. It actually caused segfault in stress tests. I think the problem may be that deque may get copy-on-write updated, so that back() is not safely called while push_back() or pop_front() is executed even if they don't touch the element we are accessing with back(). I would continue playing safe that everything should be protected by the same mutex. In non-2pc-optimal case, it is DB mutex, and in 2pc-optimal case, it can be log_write_mutex_.

Sure. Will do.
Regarding the current sync guarantees, I believe the idea was that push_back() could not be called while back() is accessed (otherwise back() would be stale). pop_front() could be called but according to our documentations it should be safe to call pop_front() and back() concurrently on deque. If it is not, I will update our documentation.

Internet didn't say any thread safety guarantee them, so I assume it is safer for any operation to be locked, unless two threads are only doing reads.

Yeah I could not find anything either. I updated the sync documentation of logs_ to clarify that.

facebook-github-bot · 2017-06-16T20:51:07Z

@maysamyabandeh updated the pull request - view changes - changes since last import

facebook-github-bot · 2017-06-19T17:35:45Z

@maysamyabandeh has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2017-06-19T17:36:45Z

@maysamyabandeh updated the pull request - view changes - changes since last import

facebook-github-bot · 2017-06-19T17:37:57Z

@maysamyabandeh has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

maysamyabandeh · 2017-06-20T20:38:59Z

@yiwu-arbug @siying any more comments?

Summary: RocksDB has recently added a FlushWAL API which will improve upon the performance of MySQL 2PC (more details here facebook/rocksdb#2345). This patch adds support for using the FlushWAL API in MyRocks and also matches flush_log_at_trx_commit with innodb_flush_log_at_trx_commit behaviour. Finally, it updates the submodule to include the removal of an unneeded assertion in the write path, which was tripped by this change. Test Plan: Sysbench testing is ongoing Reviewers: alexyang, yoshinori Reviewed By: yoshinori Subscribers: webscalesql-eng@fb.com, mcallaghan, sdong, myabandeh Differential Revision: https://phabricator.intern.facebook.com/D5503719 Tasks: 19690529 Signature: t1:5503719:1502127815:a1831a7c7ef1f9966952e4f78bf9fde7d3a53761

ajkr · 2017-09-12T20:22:25Z

@maysamyabandeh - the test WriteCallbackTest.WriteWithCallbackTest isn't working with concurrent_prepare = true. The problem was exposed by #2864, where adding initialization in DBOptions(const Options&) caused this test to fail.

maysamyabandeh · 2017-09-12T20:27:27Z

@ajkr thanks. let me take a look at it.

maysamyabandeh · 2017-09-12T21:03:08Z

@ajkr fix is here: #2875

Upstream commit ID : fb-mysql-5.6.35/8b9734948e0023f6adeadb536375f22016d8e521 Summary: RocksDB has recently added a FlushWAL API which will improve upon the performance of MySQL 2PC (more details here facebook/rocksdb#2345). This patch adds support for using the FlushWAL API in MyRocks and also matches flush_log_at_trx_commit with innodb_flush_log_at_trx_commit behaviour. Finally, it updates the submodule to include the removal of an unneeded assertion in the write path, which was tripped by this change. Reviewed By: yoshinorim Differential Revision: D5503719 fbshipit-source-id: c29f0a2

Summary: RocksDB has recently added a FlushWAL API which will improve upon the performance of MySQL 2PC (more details here facebook/rocksdb#2345). This patch adds support for using the FlushWAL API in MyRocks and also matches flush_log_at_trx_commit with innodb_flush_log_at_trx_commit behaviour. Finally, it updates the submodule to include the removal of an unneeded assertion in the write path, which was tripped by this change. Reviewed By: yoshinorim Differential Revision: D5503719 fbshipit-source-id: 38ed897

Summary: RocksDB has recently added a FlushWAL API which will improve upon the performance of MySQL 2PC (more details here facebook/rocksdb#2345). This patch adds support for using the FlushWAL API in MyRocks and also matches flush_log_at_trx_commit with innodb_flush_log_at_trx_commit behaviour. Finally, it updates the submodule to include the removal of an unneeded assertion in the write path, which was tripped by this change. Differential Revision: D5503719 (8b97349) fbshipit-source-id: 38fb348abc3

Summary: RocksDB has recently added a FlushWAL API which will improve upon the performance of MySQL 2PC (more details here facebook/rocksdb#2345). This patch adds support for using the FlushWAL API in MyRocks and also matches flush_log_at_trx_commit with innodb_flush_log_at_trx_commit behaviour. Finally, it updates the submodule to include the removal of an unneeded assertion in the write path, which was tripped by this change. Differential Revision: D5503719 (facebook@8b97349) fbshipit-source-id: 38fb348abc3

Summary: RocksDB has recently added a FlushWAL API which will improve upon the performance of MySQL 2PC (more details here facebook/rocksdb#2345). This patch adds support for using the FlushWAL API in MyRocks and also matches flush_log_at_trx_commit with innodb_flush_log_at_trx_commit behaviour. Finally, it updates the submodule to include the removal of an unneeded assertion in the write path, which was tripped by this change. Differential Revision: D5503719

maysamyabandeh requested review from IslamAbdelRahman, yiwu-arbug and siying May 22, 2017 18:16

facebook-github-bot added the CLA Signed label May 22, 2017

siying reviewed May 22, 2017

View reviewed changes

yiwu-arbug reviewed Jun 5, 2017

View reviewed changes

siying reviewed Jun 6, 2017

View reviewed changes

maysamyabandeh force-pushed the buffered-commit-4github branch from bc8f85a to 77db97e Compare June 8, 2017 18:33

maysamyabandeh force-pushed the buffered-commit-4github branch from 6c3582d to 696484a Compare June 11, 2017 20:59

yiwu-arbug reviewed Jun 14, 2017

View reviewed changes

siying reviewed Jun 15, 2017

View reviewed changes

sagar0 mentioned this pull request Jun 21, 2018

Add FlushWAL and manual_wal_flush to RocksJava API #4034

Closed

Optimize for serial commits in 2PC #2345

Optimize for serial commits in 2PC #2345

Conversation

maysamyabandeh commented May 22, 2017

yiwu-arbug commented May 22, 2017

siying left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

maysamyabandeh commented May 22, 2017 • edited Loading

yiwu-arbug commented May 22, 2017

maysamyabandeh commented May 23, 2017

facebook-github-bot commented May 31, 2017

facebook-github-bot commented Jun 1, 2017

facebook-github-bot commented Jun 2, 2017

yiwu-arbug left a comment • edited Loading

Choose a reason for hiding this comment

siying left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

facebook-github-bot commented Jun 8, 2017

facebook-github-bot commented Jun 8, 2017

facebook-github-bot commented Jun 11, 2017

facebook-github-bot commented Jun 11, 2017

facebook-github-bot commented Jun 11, 2017

facebook-github-bot commented Jun 11, 2017

yiwu-arbug left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

maysamyabandeh commented May 22, 2017 •

edited

Loading

yiwu-arbug left a comment •

edited

Loading