Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Improvement]: Use CachingCatalog to reduce the time cost of IcebergCatalogWrapper#loadTable #1794

Closed
3 tasks done
Tracked by #2176
zhongqishang opened this issue Aug 4, 2023 · 4 comments · Fixed by #1795
Closed
3 tasks done
Tracked by #2176

Comments

@zhongqishang
Copy link
Contributor

Search before asking

  • I have searched in the issues and found no similar issues.

What would you like to be improved?

Every time ArcticTable is obtained, it will go through a complete load table process(org.apache.iceberg.catalog.Catalog#loadTable). We can cache the loaded table to reduce the time cost of get ArcticTable.

How should we improve?

import org.apache.iceberg.CachingCatalog

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Subtasks

No response

Code of Conduct

@shidayang
Copy link
Contributor

I am more concerned about how to obtain the latest state if there is caching.

@zhongqishang
Copy link
Contributor Author

zhongqishang commented Aug 4, 2023

I am more concerned about how to obtain the latest state if there is caching.

Via refresh() to obtain the latest state?

@shidayang
Copy link
Contributor

shidayang commented Aug 4, 2023

I think TableRuntimeRefreshExecutor must use the fresh table, Cache tables can be used as appropriate in other areas.
Does loadTable significantly impact performance now?

@zhoujinsong
Copy link
Contributor

I believe that adding CacheCatalog can indeed reduce frequent access to the metadata system in some scenarios where tables need to be frequently loaded, such as frequent read operations on Dashboard.

However, if a refresh is performed every time a table is loaded, it seems that the overhead reduction may not be much compared to the cost of loading the table each time.

If this is true, we may need to determine where to force table refresh and manually refresh when needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants