Skip to content

improve pyiceberg CLI #1784

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
kevinjqliu opened this issue Mar 11, 2025 · 4 comments · May be fixed by #1828
Open

improve pyiceberg CLI #1784

kevinjqliu opened this issue Mar 11, 2025 · 4 comments · May be fixed by #1828
Assignees
Labels
good first issue Good for newcomers

Comments

@kevinjqliu
Copy link
Contributor

Feature Request / Improvement

Based on issues described in #1771

  1. We'd want to make it clear that the default catalog is used by default when no --catalog parameter is given. For example, pyiceberg list uses the default entry in the .pyiceberg.yaml file

  2. We should fix the order of the parameter passed into the CLI. For example, pyiceberg list --catalog hive does not override the catalog but pyiceberg --catalog hive list does.

@iting0321
Copy link

Hi, I can work on this issue.
Could you assign the issue to me?

@kevinjqliu
Copy link
Contributor Author

sure @iting0321 happy to help review :)

@iting0321
Copy link

Hi, I have some questions.
If the command is pyiceberg list, I need to read the default entry in the catalog. However, what if default is not set in the catalog?

Additionally, if the command is pyiceberg list --catalog hive, should I simply return a command order error, or should I read the default catalog and return the result as if the command were pyiceberg list at the same time?


Also, I would like to know whether you can provide an example of .pyiceberg.yaml that I can test locally. I am a bit confused about the content of .pyiceberg.yaml. For example, can we set the same uri prefix for both hive and default?

catalog:
  hive:
    uri: thrift://localhost:9083
    s3.endpoint: http://localhost:9100
    s3.access-key-id: admin
    s3.secret-access-key: adminadmin
    s3.region: us-east-1
  default:
    uri: thrift://default-catalog:9083

@kevinjqliu
Copy link
Contributor Author

@iting0321 heres the current documentation for the CLI https://py.iceberg.apache.org/cli/

In general, the CLI requires a connection to the catalog. This can be done by passing the catalog configs via parameters, such as pyiceberg --uri ... list or by reading from the config file (~/.pyiceberg.yaml).
By default, the CLI will read the default entry in the config file. To read other entries, you can use pyiceberg --catalog foo list

However, what if default is not set in the catalog?

this should error because the CLI cannot connect to any catalog

if the command is pyiceberg list --catalog hive

it would be nice to not enforce the order of the parameters. I think pyiceberg list --catalog hive should work the same as pyiceberg --catalog hive list

Also, I would like to know whether you can provide an example of .pyiceberg.yaml that I can test locally. I am a bit confused about the content of .pyiceberg.yaml. For example, can we set the same uri prefix for both hive and default?

your example looks correct. You can set the same uri if you like. The hive and default are just names you give to the specific configs. You can call it whatever you want as long as you refer to it in the CLI command

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants