Skip to content

daseECNU/Ginkgo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Ginkgo

Ginkgo is an in-memory distributed data management and processing system for big data applications, which runs on clusters of commodity servers and aims to provide real-time data analytics on relational dataset.

Highlights

1. Fast massively parallel execution engine.

Ginkgo relies on highly parallel query processing to dramatically accelerate data analysis speed. Query evaluations are not only distributed across the cluster to leverage the computation power of the cluster, but are also executed in a multi-threaded fashion to unleash the power of modern many-core hardware. Due to unpredictable data distributions, the static scheduling policy could idle computing resources of the cluster. To maximize the resource utilization on the cluster, it employs elastic execution on the pipelines of query DAG, which fills the resource bubbles by resizing the pipelining width on the fly.

2. Real-time and transactional data ingestion.

Many analytical systems usually perform bulk loading with a long delay. This imperfection makes the query scan stale data. Ginkgo employs a real-time data ingestion module, which continuously ingests external fresh data into the partitions among the in-memory cluster and then asynchronously flushed to HDFS for persistence. To solve obvious read/write conflicts on the cluster, it introduces a metadata-based protocol, which converts each distributed transcation into multiple single-site transactions for rawdata and metadata respectively. As a result, it is enabled to produce a lightweight snapshot for query execution.

3. Efficient in-memory data processing.

Ginkgo employs a large set of optimization techniques to achieve efficient in-memory data processing, including batch-at-a-time processing, cache-sensitive operators, column pruning, data compression, SIMD-based optimization, code generation, lock-free and concurrent processing structures. These optimizations work collaborately and enable Ginkgo to process up to gigabytes data per second per thread.

Currently, we are developing Ginkgo at East China Normal University. If you have any problems about this project, please contact us.

Email: ginkgo.bigdata@gmail.com

Quick Start

Try our Ginkgo, please follow Quick Start. Learn more information, please go to Wiki.

Current Team Members

Chuliang Weng, Professor.
Zhifang Li, Ph.D. Student.
Shangwei Wu, Ph.D. Student.
Xiaoshuang Peng, Ph.D. Student.
Xiaopeng Fan, Ph.D. Student.
Zewen Sun, Ph.D. Student.
Yingtong Xiong, Postgraduate Student.
Zeyu He, Postgraduate Student.
Beicheng Peng, Postgraduate Student.

Former Team Members

Qiuli Huang, Zhuhe Fang, Zhenhui Zhao, Tingting Sun, Minqi Zhou, Li Wang, Lei Zhang, Shaochan Dong, Xinzhou Zhang, Yu Kai, Yongfeng Li, Lin Gu

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published