New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

fix: fix table post/put api bug #172

Merged

feng-tao merged 4 commits into amundsen-io:master from tianruzhou-db:fix-table-api

Jan 27, 2021

Contributor

tianruzhou-db commented Jan 22, 2021 •

edited

Loading

Signed-off-by: tianru zhou tianru.zhou@databricks.com

Summary of Changes

fix table post/put api bugs

old api definition contains openapi 2 syntax(put request body within parameters part), which causes weird exception: components within template.yml can't be found.
data within request body for table/user put/post should be a list, current parser syntax doesn't work correctly for lists.

Related FE PR: amundsen-io/amundsenfrontendlibrary#883

Tests

pass all unit tests locally
manual end-to-end test for table put/post api

Documentation

What documentation did you add or modify and why? Add any relevant links then remove this line

CheckList

Make sure you have checked all steps below to ensure a timely review.

PR title addresses the issue accurately and concisely. Example: "Updates the version of Flask to v1.0.2"
- In case you are adding a dependency, check if the license complies with the ASF 3rd Party License Policy.
PR includes a summary of changes.
PR adds unit tests, updates existing unit tests, OR documents why no test additions or modifications are needed.
In case of new functionality, my PR adds documentation that describes how to use it.
- All the public functions and the classes in the PR contain docstrings that explain what it does
PR passes make test


          fix: fix table post/put api bug

8e979a2

Signed-off-by: tianru zhou <tianru.zhou@databricks.com>

tianruzhou-db requested review from allisonsuarez, bolkedebruin, dikshathakur3119, feng-tao, jinhyukchang, mgorsk1, verdan and a team as code owners

January 22, 2021 02:10


          fix syntax issue v1

24693e8

Signed-off-by: tianru zhou <tianru.zhou@databricks.com>

Member

feng-tao commented Jan 22, 2021

could you post the frontend PR here as well? Just for reference this will fix the issue when adding new doesn't trigger / update the search index.

tianruzhou-db mentioned this pull request

fix: index tag info into elasticsearch immediately after ui change amundsen-io/amundsenfrontendlibrary#883

Merged

5 tasks

feng-tao reviewed

View reviewed changes

Member

feng-tao left a comment

add a few nits, let me know what you think, great stuff. This fixes a long time bug (dated back 2018...)

search_service/api/document.py Outdated

-                          data = self.schema(many=True, strict=False).loads(args.get('data')).data
+                          table_dict_list = []
+                          for table_str in args.get('data'):
+                              table_dict_list.append(literal_eval(table_str))

Member

feng-tao Jan 22, 2021

is table_str actually a tablefields object? the naming is a bit confusing. also we typically don't put type behind the variable for naming.

Member

feng-tao Jan 22, 2021

what are we using literal_eval here for?

Contributor Author

tianruzhou-db Jan 22, 2021

table_str is a string representation of dict, like '{"key1": val1, "key2": val2}'
literal_eval will concert the table_str into a dict first, or json.dumps will dump a list of string instead of a list of dict

search_service/api/document.py Outdated

-                          data = self.schema(many=True, strict=False).loads(args.get('data')).data
+                          table_dict_list = []
+                          for table_str in args.get('data'):
+                              table_dict_list.append(literal_eval(table_str))

Member

feng-tao Jan 22, 2021

you could also simplify the above as table_objs = [literal_eval(table_str) for table_str in args.get('data')].

Member

feng-tao Jan 22, 2021

does the endpoint work for user,table,dashboard as well? would be good to add a todo for things that are missing.

Contributor Author

tianruzhou-db Jan 22, 2021

For user, we already have a different endpoint, for dashboard, we need to think about it, not sure whether the fields are the same as table, if not, we can also create a new one.

search_service/models/table.py Outdated

-                  name: str
-                  key: str
+                  id: str
+                  database: Optional[str] = None

Member

feng-tao Jan 22, 2021

why are we changing them to optional str?

Contributor Author

tianruzhou-db Jan 22, 2021

some fields may be None in elasticsearch, for these fields, required str will fail this method: data = self.schema(many=True, strict=False).loads(args.get('data')).data. But yeah, in our use case, I think database, cluster, schema, table are required

search_service/models/table.py Outdated

-                  tags: List[Tag]
-                  badges: List[Tag]
+                  tags: List[Tag] = []
+                  badges: List[Tag] = []

Member

feng-tao Jan 22, 2021

typically we put None for list default value (e.g https://docs.python-guide.org/writing/gotchas/#mutable-default-arguments)

Contributor Author

tianruzhou-db Jan 22, 2021

sounds good, I just wanted to be consistent with original code

search_service/models/table.py Outdated

                   # The following properties are lightly-transformed properties from the normal table object:
-                  column_names: List[str]
+                  column_names: List[str] = []

Member

feng-tao Jan 22, 2021

same for other list

search_service/models/table.py

                   column_descriptions: List[str] = []
                   programmatic_descriptions: List[str] = []
                   # The following are search-only properties:
                   total_usage: int = 0
                   schema_description: Optional[str] = attr.ib(default=None)
                   def get_id(self) -> str:
-                      # uses the table key as the document id in ES
-                      return self.key
+                      return self.id

Member

feng-tao Jan 22, 2021

this id is same as table key or the ES document id?

Contributor Author

tianruzhou-db Jan 22, 2021

ES document id is always root of truth, but we can also use key as the ES document id (assuming key is unique)

search_service/models/table.py Outdated

+                  def get_attrs_dict(self) -> dict:
+                      attrs_dict = self.__dict__.copy()
+                      attrs_dict['tags'] = [str(tag) for tag in self.tags]
+                      attrs_dict['badges'] = [str(tag) for tag in self.badges]

Member

feng-tao Jan 22, 2021

[str(badge) for badge in self.badges] ?

Contributor Author

tianruzhou-db Jan 22, 2021

sounds good.

search_service/proxy/elasticsearch.py

@@ @@ -116,6 +116,8 @@ def _get_search_result(self, page_index: int, @@
                       for hit in response:
                           try:
+                              es_metadata = hit.__dict__.get('meta', {})

Member

feng-tao Jan 22, 2021

it has been a while, could you add an example on the return value of hit.__dict__ for docstring ? is it getting the existing ES document?

Contributor Author

tianruzhou-db Jan 23, 2021

sure, will do.
yes, it's getting the existing ES documnet.

search_service/proxy/elasticsearch.py

@@ @@ -124,6 +126,7 @@ def _get_search_result(self, page_index: int, @@
                               for attr, val in es_payload.items():
                                   if attr in model.get_attrs():
                                       result[attr] = self._get_instance(attr=attr, val=val)
+                              result['id'] = self._get_instance(attr='id', val=es_metadata['id'])

Member

feng-tao Jan 22, 2021

the id is ES document id?

Contributor Author

tianruzhou-db Jan 23, 2021

yes

tests/unit/api/document/test_document_tables_api.py

                   @patch('search_service.api.document.get_proxy_client')
                   def test_put(self, get_proxy: MagicMock, RequestParser: MagicMock) -> None:
                       mock_proxy = get_proxy.return_value = Mock()
-                      RequestParser().parse_args.return_value = dict(data='{}', index='fake_index')
+                      RequestParser().parse_args.return_value = dict(data=[], index='fake_index')

Member

feng-tao Jan 22, 2021

could you add a unit test to update multiple values as it is broken before?

Contributor Author

tianruzhou-db Jan 23, 2021

Sounds good, will do.

Contributor Author

tianruzhou-db commented Jan 22, 2021

could you post the frontend PR here as well? Just for reference this will fix the issue when adding new doesn't trigger / update the search index.

Please refer to Summary of Changes

tianruzhou-db added 2 commits

January 26, 2021 17:50


          address PR comments

01e0999

Signed-off-by: tianru zhou <tianru.zhou@databricks.com>


          fix lint test failure

56ac659

Signed-off-by: tianru zhou <tianru.zhou@databricks.com>

feng-tao approved these changes

View reviewed changes

feng-tao merged commit 38bccba into amundsen-io:master

feng-tao mentioned this pull request

Feature Proposal amundsen-io/amundsen#909

Closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

feng-tao feng-tao approved these changes

allisonsuarez Awaiting requested review from allisonsuarez allisonsuarez is a code owner

bolkedebruin Awaiting requested review from bolkedebruin bolkedebruin is a code owner

dikshathakur3119 Awaiting requested review from dikshathakur3119 dikshathakur3119 is a code owner

jinhyukchang Awaiting requested review from jinhyukchang jinhyukchang is a code owner

mgorsk1 Awaiting requested review from mgorsk1 mgorsk1 is a code owner

verdan Awaiting requested review from verdan verdan is a code owner

Labels

None yet