Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Readme文档写得真不错,3个问题,希望能够回答 #116

Closed
chenzx opened this issue Sep 22, 2017 · 3 comments
Closed

Readme文档写得真不错,3个问题,希望能够回答 #116

chenzx opened this issue Sep 22, 2017 · 3 comments

Comments

@chenzx
Copy link

chenzx commented Sep 22, 2017

1、Paxos解决主从复制的性能问题主要就是同步+异步copy相混合吗?如果是这样的话,我觉得实在太简单了。结合MySQL集群里多主情况下的binlog同步,似乎可以对应起来看。但不清楚为什么这个只是用在fe,而不是be?Palo不是MPP OLAP系统吗?数据实时update的需求是怎么回事?不太明白。

2、MVCC更新多个table,升级版本号什么的,感觉说的不怎么清楚。因为像Clojure的STM这种MVCC,如果事务失败的话,是会自动回滚重试的。而像MySQL InnoDB的MVCC实际上又需要加行级锁,所以不清楚MVCC怎么支持大并发的跨表更新事务呢?

3、llvm部分的代码,这个是参考Impala的实现自己定制里一套呢,还是主要照搬Impala的一些实现代码?我看到变量类命名都是palo* 什么的

@imay
Copy link
Contributor

imay commented Sep 24, 2017

  1. 使用BDB JE是解决元数据高可用的。元数据更新量较小使用这种方案,BE我们现在是批量导入方式,所以不适合使用这种一致性方案。
  2. 多个table原子生效还是基于Palo是批量导入方式,版本号是记录在元数据里面的,所有表的数据都就绪后再提升版本号,这样就能够保证所有表的一并生效。可以理解为表级别的锁
  3. 当前阶段LLVM是Impala自身的。命名为Palo*是由于这部分代码长时间存在于baidu内部,我们内部修改替换的

@chenzx
Copy link
Author

chenzx commented Sep 24, 2017

非常感谢!据你的回答,

1,这让我感觉很失望,看来跟腾讯/阿里用Paxos做binlog同步的机制不是一样的?
2,MVCC的粒度感觉没那么细,并不是Oracle那种级别的

那就是说Palo的数据源主要是来自批量导入BE的部分?它作为MPP数据库,主要还是用于OLAP类应用,不是大规模并发OLTP?我在想MPP似乎处理不了分布式事务和分布式JOIN查询。。。

@imay
Copy link
Contributor

imay commented Sep 24, 2017

OLAP设计还是主要面向分析的,更加注重的是怎样高效扫描、计算大量的数据。而OLTP更多的是解决事务请求,注重的更多的是事物之间的隔离性。

还有你说的MPP是能够处理分布式join的。而分布式事务跟MPP是完全两个正交的事情,相互之间是没有排斥的关系的,是完全能够实现在一个系统里面的。不实现的原因是会增加系统的复杂度,不如在某一方面更加擅长。

HappenLee added a commit to HappenLee/incubator-doris that referenced this issue Sep 7, 2021
starocean999 pushed a commit to starocean999/incubator-doris that referenced this issue Jul 26, 2023
yiguolei pushed a commit that referenced this issue Sep 28, 2023
Current multi-window plan generation has problem on the project sequence, for example:

+--LogicalWindow ( windowExpressions=[avg(sum_sales#115) WindowSpec(...) AS `avg_monthly_sales`#116, rank() WindowSpec(...) AS `rn`#117], ...)
and correspond physical plan is:

+--PhysicalWindow[6572]@16 ( windowFrameGroup=(Funcs=[avg(sum_sales#115) WindowSpec(...) AS `avg_monthly_sales`#116], ... )
    +--PhysicalWindow[6568]@29 ( windowFrameGroup=(Funcs=[rank() WindowSpec(...) AS `rn`#117], ...] )
If the final plan is generated as following:

MultiCastDataSinks
STREAM DATA SINK
  EXCHANGE ID: 20
  HASH_PARTITIONED: rn[#208], i_brand[#202], cc_name[#203], i_category[#201]
Before we eventually resolve the multi-window issue, we add a projection as following and force a mapping but this will not cover all potential problems.

MultiCastDataSinks
STREAM DATA SINK
  EXCHANGE ID: 20
  HASH_PARTITIONED: rn[#219], i_brand[#213], cc_name[#214], i_category[#212]
  PROJECTIONS: i_category[#184], i_brand[#185], cc_name[#186], d_year[#187], d_moy[#188], sum_sales[#189], avg_monthly_sales[#191], rn[#190]
  PROJECTION TUPLE: 20
vinlee19 pushed a commit to vinlee19/doris that referenced this issue Oct 7, 2023
…24912)

Current multi-window plan generation has problem on the project sequence, for example:

+--LogicalWindow ( windowExpressions=[avg(sum_sales#115) WindowSpec(...) AS `avg_monthly_sales`apache#116, rank() WindowSpec(...) AS `rn`apache#117], ...)
and correspond physical plan is:

+--PhysicalWindow[6572]@16 ( windowFrameGroup=(Funcs=[avg(sum_sales#115) WindowSpec(...) AS `avg_monthly_sales`apache#116], ... )
    +--PhysicalWindow[6568]@29 ( windowFrameGroup=(Funcs=[rank() WindowSpec(...) AS `rn`apache#117], ...] )
If the final plan is generated as following:

MultiCastDataSinks
STREAM DATA SINK
  EXCHANGE ID: 20
  HASH_PARTITIONED: rn[apache#208], i_brand[apache#202], cc_name[apache#203], i_category[apache#201]
Before we eventually resolve the multi-window issue, we add a projection as following and force a mapping but this will not cover all potential problems.

MultiCastDataSinks
STREAM DATA SINK
  EXCHANGE ID: 20
  HASH_PARTITIONED: rn[apache#219], i_brand[apache#213], cc_name[apache#214], i_category[apache#212]
  PROJECTIONS: i_category[apache#184], i_brand[apache#185], cc_name[apache#186], d_year[apache#187], d_moy[apache#188], sum_sales[apache#189], avg_monthly_sales[apache#191], rn[apache#190]
  PROJECTION TUPLE: 20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants