Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor write method in WalManager to improve performance #162

Closed
ygf11 opened this issue Aug 1, 2022 · 5 comments · Fixed by #189
Closed

Refactor write method in WalManager to improve performance #162

ygf11 opened this issue Aug 1, 2022 · 5 comments · Fixed by #189
Labels
A-analytic-engine Area: Analytic Engine feature New feature or request

Comments

@ygf11
Copy link
Contributor

ygf11 commented Aug 1, 2022

Description

After #159, we pass dyn Payload to write method of WalManager, this may has performance issue.
It is better to decouple encoder from write method.

Proposal
We can define WalEncoder(maybe other name), like:

pub enum WalEncoder {
    Rocks(RocksLogEncoding),
    Obkv(ObkvLogEncoding)
}

impl WalEncoder {
   pub fn encode_to_write_batch<P:Payload>(&self, log_batch: &LogWriteBatch<P>) -> Result<(WriteBatch, u64)> {
       match self {
         WalEncoder::Rocks(encoder) => encoder.encode(payload),
         WalEncoder::Obkv(encoder) => encoder.encode(payload),
       }
   }
}

Encoder can get from walManager.

pub trait WalManager {
   fn encoder(&self) -> Encoder;
   fn write(&self, wb: WriteBatch) -> Result<()>,
}

So we need encode data before write to WalManager.

   let (wb, max_sequence_num) = self.wal.encoder().encode::<XXXPayload>(log_batch)?;
   self.wal.write(wb);
   ...

Additional context

cc: @waynexia @ShiKaiWi

@ygf11 ygf11 added the feature New feature or request label Aug 1, 2022
@waynexia waynexia added the A-analytic-engine Area: Analytic Engine label Aug 2, 2022
@waynexia
Copy link
Member

waynexia commented Aug 2, 2022

Thanks for writing this up. Here are some thoughts:

WAL backend like RocksDB or OBKV is only responsible for persisting things. I think we need not dispatch the data encoding method on the backend. I.e., WAL's write method should operate on bytes (Vec<u8> or dyn Payload) rather than a specific data type (what we used to do). However, Payload::encode_to() is writing bytes to a mutable buffer, this lazy materialization may reduce some copies compare to pre-encode the data.

So the trade offs are: (a) accept the dynamic dispatch overhead and achieve lazy materialization or (b) remove dynamic dispatch and do encode work in advance.

At a glance both are all not large consumption. But I cannot figure out which is more efficient if I have to rank them. Maybe we need a benchmark to help us to made decision. Personally I prefer the second approach (also the way you proposed). It looks more neat to me 👀

@ygf11
Copy link
Contributor Author

ygf11 commented Aug 2, 2022

However, Payload::encode_to() is writing bytes to a mutable buffer, this lazy materialization may reduce some copies compare to pre-encode the data.

Yes, aggree. pre-encode has this drawback.

Let me try to add a benchmark first.

@ShiKaiWi
Copy link
Member

ShiKaiWi commented Aug 3, 2022

At a glance both are all not large consumption. But I cannot figure out which is more efficient if I have to rank them. Maybe we need a benchmark to help us to made decision. Personally I prefer the second approach (also the way you proposed). It looks more neat to me 👀

Provided a high throughout reducing some extra copies may make a difference to the performance. However, let the benchmark decide.

Thanks @ygf11 .

@ShiKaiWi
Copy link
Member

ShiKaiWi commented Aug 11, 2022

It seems the two LogEncoding are almost the same, can we use a common LogEncoding?

https://github.com/CeresDB/ceresdb/blob/8cccedd01b2ac33768a6678f133e51a394468bda/wal/src/rocks_impl/encoding.rs#L18-L24

https://github.com/CeresDB/ceresdb/blob/8cccedd01b2ac33768a6678f133e51a394468bda/wal/src/table_kv_impl/encoding.rs#L100-L106

Sure. It seems the LogEncoding in the table_kv_impl/encoding.rs is a copy from rocks_impl/encoding.rs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-analytic-engine Area: Analytic Engine feature New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants