Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement of Directory table #390

Merged
merged 1 commit into from
Apr 24, 2024

Conversation

wenchaozhang-123
Copy link
Contributor

fix #ISSUE_Number


Change logs

Describe your change clearly, including what problem is being solved or what feature is being added.

If it has some breaking backward or forward compatibility, please clary.

Why are the changes needed?

Describe why the changes are necessary.

Does this PR introduce any user-facing change?

If yes, please clarify the previous behavior and the change this PR proposes.

How was this patch tested?

Please detail how the changes were tested, including manual tests and any relevant unit or integration tests.

Contributor's Checklist

Here are some reminders and checklists before/when submitting your pull request, please check them:

  • Make sure your Pull Request has a clear title and commit message. You can take git-commit template as a reference.
  • Sign the Contributor License Agreement as prompted for your first-time contribution(One-time setup).
  • Learn the coding contribution guide, including our code conventions, workflow and more.
  • List your communication in the GitHub Issues or Discussions (if has or needed).
  • Document changes.
  • Add tests for the change
  • Pass make installcheck
  • Pass make -C src/test installcheck-cbdb-parallel
  • Feel free to request cloudberrydb/dev team for review and approval when your PR is ready🥳

@wenchaozhang-123 wenchaozhang-123 marked this pull request as draft March 4, 2024 08:43
@wenchaozhang-123 wenchaozhang-123 marked this pull request as ready for review March 4, 2024 09:06
@wenchaozhang-123 wenchaozhang-123 marked this pull request as draft March 5, 2024 03:19
@wenchaozhang-123 wenchaozhang-123 force-pushed the directory_table branch 12 times, most recently from a5439d5 to 37c68d8 Compare March 19, 2024 06:49
@wenchaozhang-123 wenchaozhang-123 marked this pull request as ready for review March 19, 2024 07:53
@wenchaozhang-123 wenchaozhang-123 marked this pull request as draft March 25, 2024 06:54
@wenchaozhang-123 wenchaozhang-123 marked this pull request as ready for review March 26, 2024 06:14
@wenchaozhang-123 wenchaozhang-123 force-pushed the directory_table branch 2 times, most recently from c330a04 to 129654f Compare March 26, 2024 09:49
@wenchaozhang-123 wenchaozhang-123 force-pushed the directory_table branch 2 times, most recently from 5d4c849 to 8c65b3a Compare March 28, 2024 10:17
@wenchaozhang-123 wenchaozhang-123 marked this pull request as draft April 1, 2024 02:24
@wenchaozhang-123 wenchaozhang-123 marked this pull request as ready for review April 2, 2024 01:52
@wenchaozhang-123 wenchaozhang-123 force-pushed the directory_table branch 14 times, most recently from cdf0f6c to 086e3c6 Compare April 23, 2024 10:49
@wenchaozhang-123 wenchaozhang-123 force-pushed the directory_table branch 9 times, most recently from 90cfc0d to 64faf50 Compare April 24, 2024 10:28
Implement directory table feature in this commit. Directory table is a new
relation which used to organize the unstructured data files in the specified
tablespace. The date files are stored in the specified tablespace while
the tuples recorded the metadata of the data files such as relative_path, md5
size etc. are stored in normal table.

We support local directory table and remote directory table meanwhile. The
local directory table uses the local tablespace while the remote directory
table uses the DFS tablespace which implemented in our enterprise extension.

We support copy binary from to upload file to directory table, directory_table
UDF to get file content, remove_file UDF to remove file from directory table.
What's more, we implement a tool called cbload used to upload file to direcotry
table. Meanwhile, to support DFS directory table, we also import some catalog
tables such as gp_storage_server, gp_storage_user_mapping which are shared in
all databases.

We will illustrage some examples for your convinence of usage as follow.

-- Create an oss_server that points to endpoint:
CREATE STORAGE SERVER oss_server OPTIONS
(protocol 'qingstor', endpoint 'pek3b.qingstor.com', https 'true', virtual_host 'false');

-- Create a user mapping to access oss_server
CREATE STORAGE USER MAPPING FOR CURRENT_USER STORAGE SERVER oss_server OPTIONS
(accesskey 'KGCPPHVCHRDSYFEAWLLC', secretkey '0SJIWiIATh6jOlmAas23q6hOAGBI1BnsnvgJmTs');

-- Create a local tablespace
CREATE TABLESPACE dirtable_spc location '/data/dirtable_spc';

-- Create a local directory table
CREATE DIRECTORY TABLE dirtable TABLESPACE dirtable_spc;

-- Copy binary from directory table
COPY BINARY dirtable FROM '/data/file1.csv' 'file1';

-- Select directory table
SELECT * FROM dirtable;
SELECT * FROM directory_table('dirtable');

-- Remove file from directory table
SELECT remove_file('dirtable', 'file1');

Co-authored-by: Mu Guoqing muguoqing@hashdata.cn
Reviewd-by: Yang Yu yangyu@hashdata.cn
            Yang Jianghua yjhjstz@gmail.com
@my-ship-it my-ship-it merged commit 20add92 into apache:main Apr 24, 2024
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants