Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: multi-catalog #4947

Merged
merged 56 commits into from
May 10, 2022
Merged

Conversation

dantengsky
Copy link
Member

@dantengsky dantengsky commented Apr 19, 2022

I hereby agree to the terms of the CLA available at: https://databend.rs/dev/policies/cla/

Summary

new feature: multiple catalogs

  • objects of catalog, e.g table, can be addressed as catalog.database.table
    only the current ut/it are covered, there are definitely other codes that should be modified to fully support this
  • all the objects that are kept in databend meta server belong to the default catalog
  • new catalog can be registered by using the CatalogManager
  • currently, a simple impl of hive catalog is provided, which can be registered by using de configs.
    • registered as catalog name hive
    • can resolve metadata by using hive meta store
    • a dummy implementation of hive database/table
      • only resolves simple and limit number of column types
      • returns empty dataset for queries
  • by default, the cargo feature hive of crate databend-query is NOT enabled
    • to enable it
      cargo build --bin databend-query --features hive
    • but the crate common-hive-meta-store will be built, even the hive feature is not specified
      • the only extra build time dependency is the thrift compiler which is built into dev docker images
      • if there is a way to conditionally compile a number of workspace, pls let me know.
  • A dedicate github action test_stateful_hive_standalone is added
    • only linux + amd64 are tested
    • setups hive cluster (and simple test data)

Changelog

  • New Feature
  • Not for changelog (changelog entry is not required)

Related Issues

Fixes #issue

@vercel
Copy link

vercel bot commented Apr 19, 2022

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Updated
databend ⬜️ Ignored (Inspect) May 10, 2022 at 10:17AM (UTC)

@mergify
Copy link
Contributor

mergify bot commented Apr 19, 2022

Thanks for the contribution!
I have applied any labels matching special text in your PR Changelog.

Please review the labels and make any necessary changes.

@mergify mergify bot added pr-feature this PR introduces a new feature to the codebase pr-not-for-changelog labels Apr 19, 2022
@dantengsky
Copy link
Member Author

the files that thrift generated are HUGE. later, they will be generated by "build.rs"

@dantengsky dantengsky force-pushed the feat-multi-catalog branch 2 times, most recently from 9830127 to 0d119f1 Compare April 21, 2022 17:17
@dantengsky dantengsky force-pushed the feat-multi-catalog branch from 6f703cc to 5b9aaac Compare May 10, 2022 04:30
@dantengsky dantengsky force-pushed the feat-multi-catalog branch from f37c5bc to 337f4ba Compare May 10, 2022 07:19
@dantengsky dantengsky force-pushed the feat-multi-catalog branch from ed4098b to c72186d Compare May 10, 2022 08:10
@dantengsky dantengsky changed the title [WIP] Feature: multi-catalog Feature: multi-catalog May 10, 2022
@dantengsky dantengsky marked this pull request as ready for review May 10, 2022 10:51
@dantengsky dantengsky requested a review from Xuanwo May 10, 2022 11:00
engine_name: "FUSE".to_string(),
comment: "FUSE Storage Engine".to_string(),
support_order_key: true,
// pub fn catalog_name(&self) -> Result<String> {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove those

ctx: Arc<QueryContext>,
operations: Vec<DataBlock>,
overwrite: bool,
_ctx: Arc<QueryContext>,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename these

@@ -242,7 +242,8 @@ impl FuseTable {
table_info: &TableInfo,
new_snapshot_location: String,
) -> Result<UpsertTableOptionReply> {
let catalog = ctx.get_catalog();
// TODO catalog name
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use arg catalog_name

@@ -200,7 +207,7 @@ impl AsyncSource for FuseHistorySource {
let tenant_id = self.ctx.get_tenant();
let tbl = self
.ctx
.get_catalog()
.get_catalog(&self.catalog_name)? // TODO pass in this guy
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clean up TODO

ctx: Arc<QueryContext>,
operations: Vec<DataBlock>,
overwrite: bool,
_ctx: Arc<QueryContext>,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename these

@@ -96,7 +95,8 @@ impl ColumnsTable {
ctx: Arc<QueryContext>,
) -> Result<Vec<(String, String, DataField)>> {
let tenant = ctx.get_tenant();
let catalog = ctx.get_catalog();
// TODO replace default with real cat
let catalog = ctx.get_catalog("default")?;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use constant DEFAULT_CATALOG

let table_engine_descriptors = ctx.get_catalog().get_table_engines();
async fn get_full_data(&self, ctx: Arc<QueryContext>) -> Result<DataBlock> {
// TODO passin catalog name
let table_engine_descriptors = ctx.get_catalog("default")?.get_table_engines();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use the CONSTANT

@@ -41,7 +40,8 @@ impl AsyncSystemTable for TablesTable {

async fn get_full_data(&self, ctx: Arc<QueryContext>) -> Result<DataBlock> {
let tenant = ctx.get_tenant();
let catalog = ctx.get_catalog();
// TODO pass catalog in or embed catalog in table info?
let catalog = ctx.get_catalog("default")?;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use the constant

query: "SELECT COUNT(system.databases.name) AS name FROM system.databases WHERE system.databases.name = 'xxx'",
expect: "NormalQuery { filter: (name = xxx), aggregate: [COUNT(name)], projection: [COUNT(name) as name] }",
},
// TODO confirm this
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's going on herer?

@@ -92,7 +93,7 @@ async fn test_block_pruner() -> Result<()> {
interpreter.execute(None).await?;

// get table
let catalog = ctx.get_catalog();
let catalog = ctx.get_catalog("default")?;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use constant

@@ -229,7 +231,7 @@ async fn test_block_pruner_monotonic() -> Result<()> {
order_keys: vec![],
};

let catalog = ctx.get_catalog();
let catalog = ctx.get_catalog("default")?;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use constant

@@ -100,6 +102,10 @@ impl TestFixture {
gen_db_name(&self.prefix)
}

pub fn default_catalog_name(&self) -> String {
"default".to_owned()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use constant

GRANT SELECT ON 'db01'.'tb1' TO 'test-grant'@'localhost'
GRANT SELECT ON 'db01'.'tb1' TO 'test-grant'@'localhost'
GRANT SELECT ON 'default'.* TO 'test-grant-role'
GRANT ALL ON 'default'.'default'.* TO 'test-grant'@'localhost'
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

append docs about privilege in the PR summary

@BohuTANG
Copy link
Member

@dantengsky

LGTM.
I have seen your comments, I suggest refining them in another PR.
I will merge this monster PR first.

@BohuTANG BohuTANG merged commit 47af3e1 into databendlabs:main May 10, 2022
}
}

println!("cargo:rerun-if-changed=build.rs");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to move this line to the front?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about this : (
will moving this line to the front prevent us from some unexpected results? I am not quite familiar with the "mechanism" there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Me too...

But most build.rs will place the println at the very first line. It's better to keep fit with them.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it & thanks. I will fix it in PR #5284

@BohuTANG
Copy link
Member

The review also welcomed on this PR cc @andylokandy @Xuanwo

@BohuTANG BohuTANG requested a review from andylokandy May 10, 2022 11:31
BohuTANG added a commit that referenced this pull request May 10, 2022
@dantengsky dantengsky mentioned this pull request May 22, 2022
4 tasks
@FANNG1 FANNG1 mentioned this pull request Jun 22, 2022
12 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
need-review pr-feature this PR introduces a new feature to the codebase
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants