-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: fix table post/put api bug #172
Conversation
Signed-off-by: tianru zhou <tianru.zhou@databricks.com>
Signed-off-by: tianru zhou <tianru.zhou@databricks.com>
could you post the frontend PR here as well? Just for reference this will fix the issue when adding new doesn't trigger / update the search index. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a few nits, let me know what you think, great stuff. This fixes a long time bug (dated back 2018...)
search_service/api/document.py
Outdated
data = self.schema(many=True, strict=False).loads(args.get('data')).data | ||
table_dict_list = [] | ||
for table_str in args.get('data'): | ||
table_dict_list.append(literal_eval(table_str)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is table_str actually a tablefields object? the naming is a bit confusing. also we typically don't put type behind the variable for naming.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what are we using literal_eval
here for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
table_str is a string representation of dict, like '{"key1": val1, "key2": val2}'
literal_eval
will concert the table_str into a dict first, or json.dumps
will dump a list of string
instead of a list of dict
search_service/api/document.py
Outdated
data = self.schema(many=True, strict=False).loads(args.get('data')).data | ||
table_dict_list = [] | ||
for table_str in args.get('data'): | ||
table_dict_list.append(literal_eval(table_str)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could also simplify the above as table_objs = [literal_eval(table_str) for table_str in args.get('data')]
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does the endpoint work for user,table,dashboard as well? would be good to add a todo for things that are missing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For user, we already have a different endpoint, for dashboard, we need to think about it, not sure whether the fields are the same as table, if not, we can also create a new one.
search_service/models/table.py
Outdated
name: str | ||
key: str | ||
id: str | ||
database: Optional[str] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are we changing them to optional str?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some fields may be None
in elasticsearch, for these fields, required str will fail this method: data = self.schema(many=True, strict=False).loads(args.get('data')).data
. But yeah, in our use case, I think database, cluster, schema, table
are required
search_service/models/table.py
Outdated
tags: List[Tag] | ||
badges: List[Tag] | ||
tags: List[Tag] = [] | ||
badges: List[Tag] = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typically we put None for list default value (e.g https://docs.python-guide.org/writing/gotchas/#mutable-default-arguments)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good, I just wanted to be consistent with original code
search_service/models/table.py
Outdated
# The following properties are lightly-transformed properties from the normal table object: | ||
column_names: List[str] | ||
column_names: List[str] = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same for other list
column_descriptions: List[str] = [] | ||
programmatic_descriptions: List[str] = [] | ||
# The following are search-only properties: | ||
total_usage: int = 0 | ||
schema_description: Optional[str] = attr.ib(default=None) | ||
|
||
def get_id(self) -> str: | ||
# uses the table key as the document id in ES | ||
return self.key | ||
return self.id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this id is same as table key or the ES document id?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ES document id is always root of truth, but we can also use key as the ES document id (assuming key is unique)
search_service/models/table.py
Outdated
def get_attrs_dict(self) -> dict: | ||
attrs_dict = self.__dict__.copy() | ||
attrs_dict['tags'] = [str(tag) for tag in self.tags] | ||
attrs_dict['badges'] = [str(tag) for tag in self.badges] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[str(badge) for badge in self.badges] ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good.
@@ -116,6 +116,8 @@ def _get_search_result(self, page_index: int, | |||
|
|||
for hit in response: | |||
try: | |||
es_metadata = hit.__dict__.get('meta', {}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it has been a while, could you add an example on the return value of hit.__dict__
for docstring ? is it getting the existing ES document?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, will do.
yes, it's getting the existing ES documnet.
@@ -124,6 +126,7 @@ def _get_search_result(self, page_index: int, | |||
for attr, val in es_payload.items(): | |||
if attr in model.get_attrs(): | |||
result[attr] = self._get_instance(attr=attr, val=val) | |||
result['id'] = self._get_instance(attr='id', val=es_metadata['id']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the id is ES document id?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
@@ -33,7 +33,7 @@ def test_post(self, get_proxy: MagicMock, RequestParser: MagicMock) -> None: | |||
@patch('search_service.api.document.get_proxy_client') | |||
def test_put(self, get_proxy: MagicMock, RequestParser: MagicMock) -> None: | |||
mock_proxy = get_proxy.return_value = Mock() | |||
RequestParser().parse_args.return_value = dict(data='{}', index='fake_index') | |||
RequestParser().parse_args.return_value = dict(data=[], index='fake_index') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you add a unit test to update multiple values as it is broken before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, will do.
Please refer to Summary of Changes |
Signed-off-by: tianru zhou <tianru.zhou@databricks.com>
Signed-off-by: tianru zhou <tianru.zhou@databricks.com>
Signed-off-by: tianru zhou tianru.zhou@databricks.com
Summary of Changes
fix table post/put api bugs
old api definition contains openapi 2 syntax(put
request body
withinparameters
part), which causes weird exception: components withintemplate.yml
can't be found.data
within request body for table/user put/post should be a list, current parser syntax doesn't work correctly for lists.Related FE PR: amundsen-io/amundsenfrontendlibrary#883
Tests
Documentation
What documentation did you add or modify and why? Add any relevant links then remove this line
CheckList
Make sure you have checked all steps below to ensure a timely review.
make test