Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: index tag info into elasticsearch immediately after ui change #883

Merged
merged 4 commits into from
Jan 27, 2021

Conversation

tianruzhou-db
Copy link
Contributor

@tianruzhou-db tianruzhou-db commented Jan 22, 2021

Signed-off-by: tianru zhou tianru.zhou@databricks.com

Summary of Changes

After add / delete tags for tables in UI, update corresponding tags info in elasticsearch synchronously.
Related SearchService PR: amundsen-io/amundsensearchlibrary#172

Tests

  • pass all unit tests locally
  • manual end-to-end test

Documentation

CheckList

Make sure you have checked all steps below to ensure a timely review.

  • PR title addresses the issue accurately and concisely. Example: "Updates the version of Flask to v1.0.2"
  • PR includes a summary of changes, including screenshots of any UI changes.
  • PR adds unit tests, updates existing unit tests, OR documents why no test additions or modifications are needed.
  • In case of new functionality, my PR adds documentation that describes how to use it.
    • All the public python functions and the classes in the PR contain docstrings that explain what it does
  • PR passes all tests documented in the developer guide

Signed-off-by: tianru zhou <tianru.zhou@databricks.com>
Copy link
Member

@feng-tao feng-tao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

partial review, will continue in the afternoon

@@ -377,6 +377,75 @@ def get_tags() -> Response:
return make_response(payload, HTTPStatus.INTERNAL_SERVER_ERROR)


def _update_metadata_service(table_key: str, method: str, tag: str) -> int:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the method only used in tag update?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe called def _update_metadata_tag ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it's only used in tag update. will make the change

return status_code


def _update_search_service(table_key: str, method: str, tag: str) -> int:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe called def _update_search_tag ?

return status_code


def _update_search_service(table_key: str, method: str, tag: str) -> int:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be good to add a docstring here to discuss the flow


def _update_search_service(table_key: str, method: str, tag: str) -> int:
searchservice_base = app.config['SEARCHSERVICE_BASE']
searchservice_get_table_url = f'{searchservice_base}/search_table'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you add a todo which we should update for dashboard tag in the future as well (current search update doesn't support dashboard)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do

database = table_key[:table_key.find('://')]
schema = table_key[table_key.find('.') + 1:table_key.rfind('/')]
table = table_key[table_key.rfind('/') + 1:]
cluster = table_key[table_key.find('://') + 3:table_key.find('.')]
Copy link
Member

@feng-tao feng-tao Jan 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ttannis @allisonsuarez does FE have any util to retrieve the different field from a table key?
This looks a bit cryptic, could be rewritten as:

import re
m = re.match(r'([^./]+)://([^./]+)\.([^./]+)\/([^./]+)', table_key)
db=m.group(1),
cluster=m.group(2),
schema=m.group(3),
table =m.group(4)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in amundsen_application.api.utils.metadata_utils we have TableUri which can be used to parse


raw_data_map = json.loads(get_table_response.text)
# key is unique, thus (database, cluster, schema, table) should uniquely identify the table
table = raw_data_map['results'][0]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does the result have only single element?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, if everything works correctly, we can add checks here


old_tags_list = table['tags']
new_tags_list = [item for item in old_tags_list if item['tag_name'] != tag]
if method != 'DELETE':
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we delete a tag, will it update the ES index as well ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the logic is exactly the same as put

amundsen_application/api/metadata/v0.py Outdated Show resolved Hide resolved
@@ -43,15 +44,17 @@ def request_metadata(*, # type: ignore
client=app.config['METADATASERVICE_REQUEST_CLIENT'],
headers=headers,
timeout_sec=timeout_sec,
data=data)
data=data,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason we can't use data but need another json input param?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For json input, we can pass dict directly. I try to use json.dumps(dict) combined with data, but seems that it can't be recognized by search service.

@feng-tao
Copy link
Member

CI fails as well

Signed-off-by: tianru zhou <tianru.zhou@databricks.com>
Signed-off-by: tianru zhou <tianru.zhou@databricks.com>
Signed-off-by: tianru zhou <tianru.zhou@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants