Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Removing quotes from udf_metadata_key #1026

Merged

Conversation

hershd23
Copy link
Contributor

@hershd23 hershd23 commented Sep 1, 2023

Replacing udf_metadata_key from string_literal to ID_LITERAL
modified: evadb/parser/evadb.lark
Removed .value from key_value_pair[0] post the change in type
modified: evadb/parser/lark_visitor/_functions.py
Replaced string key to ID_LITERAL in test query
modified: test/unit_tests/parser/test_parser.py

Solves #1010

	modified:   evadb/parser/evadb.lark
Removed .value from key_value_pair[0] post the change in type
	modified:   evadb/parser/lark_visitor/_functions.py
Replaced string key to ID_LITERAL in test query
	modified:   test/unit_tests/parser/test_parser.py
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👋 Hello @hershd23, thanks for submitting a EVA DB PR 🙏 To allow your work to be integrated as seamlessly as possible, we advise you to:

  • ✅ Verify that your PR is up-to-date with georgia-tech-db/eva master branch. If your PR is behind you can update your code by clicking the 'Update branch' button or by running git pull and git merge master locally.
  • ✅ Verify that all EVA DB Continuous Integration (CI) checks are passing.
  • ✅ Reduce changes to the absolute minimum required for your bug fix or feature addition.

	modified:   evadb/parser/evadb.lark
	modified:   test/unit_tests/parser/test_parser.py
@hershd23
Copy link
Contributor Author

hershd23 commented Sep 1, 2023

Once tests and all work, will go ahead and change the docs as well

	modified:   README.md
	modified:   docs/source/reference/evaql/create.rst
	modified:   docs/source/reference/udfs/model-train.rst
Replacing udf_metadata_key in Ludwig test
	modified:   test/integration_tests/long/test_model_train.py
@hershd23
Copy link
Contributor Author

hershd23 commented Sep 1, 2023

Ran the integration test for training the model. It parsed the query correctly and invoked training but gave the following error. Creating an issue for this

ERROR    evadb.utils.logging_manager:plan_executor.py:182 Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument weight in method wrapper_CUDA__native_batch_norm)
Traceback (most recent call last):
  File "/home/hershd23/Desktop/evadb/evadb/executor/plan_executor.py", line 178, in execute_plan
    yield from output
  File "/home/hershd23/Desktop/evadb/evadb/executor/project_executor.py", line 34, in exec
    batch = apply_project(batch, self.target_list, self.catalog())
  File "/home/hershd23/Desktop/evadb/evadb/executor/executor_utils.py", line 42, in apply_project
    batches = [expr.evaluate(batch) for expr in project_list]
  File "/home/hershd23/Desktop/evadb/evadb/executor/executor_utils.py", line 42, in <listcomp>
    batches = [expr.evaluate(batch) for expr in project_list]
  File "/home/hershd23/Desktop/evadb/evadb/expression/function_expression.py", line 129, in evaluate
    outcomes = self._apply_function_expression(func, batch, **kwargs)
  File "/home/hershd23/Desktop/evadb/evadb/expression/function_expression.py", line 188, in _apply_function_expression
    return func_args.apply_function_expression(func)
  File "/home/hershd23/Desktop/evadb/evadb/models/storage/batch.py", line 173, in apply_function_expression
    return Batch(expr(self._frames))
  File "/home/hershd23/Desktop/evadb/evadb/udfs/abstract/abstract_udf.py", line 36, in __call__
    return self.forward(args[0])
  File "/home/hershd23/Desktop/evadb/evadb/udfs/ludwig.py", line 33, in forward
    predictions, _ = self.model.predict(frames, return_type=pd.DataFrame)
  File "/home/hershd23/Desktop/evadb/env/lib/python3.10/site-packages/ludwig/api.py", line 895, in predict
    predictions = predictor.batch_predict(
  File "/home/hershd23/Desktop/evadb/env/lib/python3.10/site-packages/ludwig/models/predictor.py", line 142, in batch_predict
    preds = self._predict(batch)
  File "/home/hershd23/Desktop/evadb/env/lib/python3.10/site-packages/ludwig/models/predictor.py", line 188, in _predict
    outputs = self._predict_on_inputs(inputs)
  File "/home/hershd23/Desktop/evadb/env/lib/python3.10/site-packages/ludwig/models/predictor.py", line 324, in _predict_on_inputs
    return self.dist_model(inputs)
  File "/home/hershd23/Desktop/evadb/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/hershd23/Desktop/evadb/env/lib/python3.10/site-packages/ludwig/models/ecd.py", line 136, in forward
    combiner_outputs = self.combine(encoder_outputs)
  File "/home/hershd23/Desktop/evadb/env/lib/python3.10/site-packages/ludwig/models/ecd.py", line 81, in combine
    return self.combiner(encoder_outputs)
  File "/home/hershd23/Desktop/evadb/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/hershd23/Desktop/evadb/env/lib/python3.10/site-packages/ludwig/combiners/combiners.py", line 451, in forward
    hidden, aggregated_mask, masks = self.tabnet(hidden)
  File "/home/hershd23/Desktop/evadb/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/hershd23/Desktop/evadb/env/lib/python3.10/site-packages/ludwig/modules/tabnet_modules.py", line 113, in forward
    features = self.batch_norm(features)  # [b_s, i_s]
  File "/home/hershd23/Desktop/evadb/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/hershd23/Desktop/evadb/env/lib/python3.10/site-packages/torch/nn/modules/batchnorm.py", line 171, in forward
    return F.batch_norm(
  File "/home/hershd23/Desktop/evadb/env/lib/python3.10/site-packages/torch/nn/functional.py", line 2450, in batch_norm
    return torch.batch_norm(
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument weight in method wrapper_CUDA__native_batch_norm)

@xzdandy
Copy link
Collaborator

xzdandy commented Sep 1, 2023

Ran the integration test for training the model. It parsed the query correctly and invoked training but gave the following error. Creating an issue for this

ERROR    evadb.utils.logging_manager:plan_executor.py:182 Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument weight in method wrapper_CUDA__native_batch_norm)
Traceback (most recent call last):
  File "/home/hershd23/Desktop/evadb/evadb/executor/plan_executor.py", line 178, in execute_plan
    yield from output
  File "/home/hershd23/Desktop/evadb/evadb/executor/project_executor.py", line 34, in exec
    batch = apply_project(batch, self.target_list, self.catalog())
  File "/home/hershd23/Desktop/evadb/evadb/executor/executor_utils.py", line 42, in apply_project
    batches = [expr.evaluate(batch) for expr in project_list]
  File "/home/hershd23/Desktop/evadb/evadb/executor/executor_utils.py", line 42, in <listcomp>
    batches = [expr.evaluate(batch) for expr in project_list]
  File "/home/hershd23/Desktop/evadb/evadb/expression/function_expression.py", line 129, in evaluate
    outcomes = self._apply_function_expression(func, batch, **kwargs)
  File "/home/hershd23/Desktop/evadb/evadb/expression/function_expression.py", line 188, in _apply_function_expression
    return func_args.apply_function_expression(func)
  File "/home/hershd23/Desktop/evadb/evadb/models/storage/batch.py", line 173, in apply_function_expression
    return Batch(expr(self._frames))
  File "/home/hershd23/Desktop/evadb/evadb/udfs/abstract/abstract_udf.py", line 36, in __call__
    return self.forward(args[0])
  File "/home/hershd23/Desktop/evadb/evadb/udfs/ludwig.py", line 33, in forward
    predictions, _ = self.model.predict(frames, return_type=pd.DataFrame)
  File "/home/hershd23/Desktop/evadb/env/lib/python3.10/site-packages/ludwig/api.py", line 895, in predict
    predictions = predictor.batch_predict(
  File "/home/hershd23/Desktop/evadb/env/lib/python3.10/site-packages/ludwig/models/predictor.py", line 142, in batch_predict
    preds = self._predict(batch)
  File "/home/hershd23/Desktop/evadb/env/lib/python3.10/site-packages/ludwig/models/predictor.py", line 188, in _predict
    outputs = self._predict_on_inputs(inputs)
  File "/home/hershd23/Desktop/evadb/env/lib/python3.10/site-packages/ludwig/models/predictor.py", line 324, in _predict_on_inputs
    return self.dist_model(inputs)
  File "/home/hershd23/Desktop/evadb/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/hershd23/Desktop/evadb/env/lib/python3.10/site-packages/ludwig/models/ecd.py", line 136, in forward
    combiner_outputs = self.combine(encoder_outputs)
  File "/home/hershd23/Desktop/evadb/env/lib/python3.10/site-packages/ludwig/models/ecd.py", line 81, in combine
    return self.combiner(encoder_outputs)
  File "/home/hershd23/Desktop/evadb/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/hershd23/Desktop/evadb/env/lib/python3.10/site-packages/ludwig/combiners/combiners.py", line 451, in forward
    hidden, aggregated_mask, masks = self.tabnet(hidden)
  File "/home/hershd23/Desktop/evadb/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/hershd23/Desktop/evadb/env/lib/python3.10/site-packages/ludwig/modules/tabnet_modules.py", line 113, in forward
    features = self.batch_norm(features)  # [b_s, i_s]
  File "/home/hershd23/Desktop/evadb/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/hershd23/Desktop/evadb/env/lib/python3.10/site-packages/torch/nn/modules/batchnorm.py", line 171, in forward
    return F.batch_norm(
  File "/home/hershd23/Desktop/evadb/env/lib/python3.10/site-packages/torch/nn/functional.py", line 2450, in batch_norm
    return torch.batch_norm(
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument weight in method wrapper_CUDA__native_batch_norm)

Thanks for creating an issue! Do you run the experiments on ada-00 or ada-01?

Copy link
Collaborator

@xzdandy xzdandy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR itself looks good. Thanks!

@hershd23
Copy link
Contributor Author

hershd23 commented Sep 2, 2023

Actually I ran it on my personal machine which has a GPU card

@xzdandy
Copy link
Collaborator

xzdandy commented Sep 2, 2023

We can merge this PR. We can investigate the GPU issue in #1028. Thanks!

@xzdandy xzdandy added the Feature Request ✨ New feature or request label Sep 2, 2023
@xzdandy xzdandy added this to the v0.3.3 milestone Sep 2, 2023
@xzdandy
Copy link
Collaborator

xzdandy commented Sep 2, 2023

Come into my mind that you may also need to update the hugging face functions.

@hershd23
Copy link
Contributor Author

hershd23 commented Sep 2, 2023

Come into my mind that you may also need to update the hugging face functions.

Oh okay @xzdandy. Will check for that today

@hershd23 hershd23 force-pushed the remove_quotes_udf_metadata_key branch from 18bd15c to 0011642 Compare September 4, 2023 20:17
	modified:   benchmark/text_summarization/text_summarization_with_evadb.py
	modified:   docs/source/benchmarks/text_summarization.rst
	modified:   docs/source/overview/concepts.rst
	modified:   docs/source/reference/ai/hf.rst
	modified:   docs/source/reference/ai/openai.rst
	modified:   docs/source/reference/ai/yolo.rst
	modified:   docs/source/usecases/object-detection.rst
	modified:   docs/source/usecases/question-answering.rst
	modified:   docs/source/usecases/text-summarization.rst
	modified:   evadb/parser/utils.py
	modified:   evadb/udfs/udf_bootstrap_queries.py
	modified:   test/benchmark_tests/test_benchmark_pytorch.py
	modified:   test/integration_tests/long/interfaces/relational/test_relational_api.py
	modified:   test/integration_tests/long/test_error_handling_with_ray.py
	modified:   test/integration_tests/long/test_huggingface_udfs.py
	modified:   test/integration_tests/long/test_reuse.py
	modified:   docs/source/benchmarks/text_summarization.rst
@hershd23
Copy link
Contributor Author

hershd23 commented Sep 4, 2023

@xzdandy @gaurav274 can you take a look at the changes, have incorporated the changes for HuggingFace UDFs as well

@hershd23
Copy link
Contributor Author

hershd23 commented Sep 4, 2023

I also see some change changes in the docs causing the docs CI to fail. Should I try to fix them (best effort) in the PR?

Copy link
Collaborator

@xzdandy xzdandy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Hersh for the contribution! Doc failures will be fixed in #1035.

Verified on Python 3.10

  • All notebook testcases passed
  • test/integration_tests/long/test_model_train.py passed when running as an independent test
  • 4 testcases failed when bash script/test/test.sh -m "LONG INTEGRATION"
FAILED test/integration_tests/long/test_model_train.py::ModelTrainTests::test_ludwig_automl - evadb.executor.executor_utils.ExecutorError: No best trial found. Please check if you specified the correct default metric (metric_...
FAILED test/integration_tests/long/test_udf_executor.py::UDFExecutorTest::test_should_create_udf_with_metadata - lark.exceptions.UnexpectedToken: Unexpected token Token('STRING_LITERAL', "'CACHE'") at line 6, column 19.
FAILED test/integration_tests/long/test_udf_executor.py::UDFExecutorTest::test_should_return_empty_metadata_list_if_udf_is_removed - lark.exceptions.UnexpectedToken: Unexpected token Token('STRING_LITERAL', "'CACHE'") at line 6, column 19.
FAILED test/integration_tests/long/interfaces/relational/test_relational_api.py::RelationalAPI::test_create_udf_with_relational_api - AssertionError: "CREA[50 chars]Face 'task' 'automatic-speech-recognition' 'mo[22 chars]ase'" != "CREA[50 chars]Face TASK 'automatic...

A quick check at the error message indicates that test/integration_tests/long/test_model_train.py::ModelTrainTests::test_ludwig_automl failed because of GPU memory from other testcases are not freed. If the error is not relevant to this PR's change, welcome to create an issue, and we will investigate separately!

Minors:

  • remove .lock_preprocessing from the pr.

Update:

	deleted:    .lock_preprocessing
Converted CACHE and BATCH from string literals to UIDs
	modified:   test/integration_tests/long/test_udf_executor.py
@hershd23
Copy link
Contributor Author

hershd23 commented Sep 5, 2023

Hi @xzdandy have made the following changes

  • removed .lock_preprocessing which was created unknowingly
  • Coverted CACHE and BATCH to UID structure in tests

For the other two issues

We have made all udf_metadata_keys into UIDs. Once parsed these are processed and ultimately stored as lowercase strings in the metadata_map. What this test is trying to do is trying to convert the query and args from the API to the QUERY itself. To solve this I need to find the query builder for this which converts the API constructs to an SQL type query. I guess @gaurav274 mentioned that there are some usecases where python API data is converted to query and then run to leverage the optimizations built on the query.

Can you point me to the query builder code. I'll try and correct this there.

UPDATE : Figured it out

direct string matching test cases
	modified:   evadb/parser/create_udf_statement.py
@hershd23
Copy link
Contributor Author

hershd23 commented Sep 5, 2023

@xzdandy made the changes can you take a look

@hershd23
Copy link
Contributor Author

hershd23 commented Sep 5, 2023

Are we good to go with this? Can we merge or waiting for @gaurav274 's review.

@xzdandy
Copy link
Collaborator

xzdandy commented Sep 5, 2023

Are we good to go with this? Can we merge or waiting for @gaurav274 's review.

I am fixing all merge conflicts now.

@hershd23
Copy link
Contributor Author

hershd23 commented Sep 5, 2023

Done done, I have fixed merge conflicts

@xzdandy
Copy link
Collaborator

xzdandy commented Sep 5, 2023

Done done, I have fixed merge conflicts

One minute to update the forecasting feature to use the new UDF syntax

@xzdandy xzdandy merged commit e535896 into georgia-tech-db:staging Sep 5, 2023
@hershd23
Copy link
Contributor Author

hershd23 commented Sep 5, 2023

Thanks @xzdandy for all your help!

@hershd23 hershd23 mentioned this pull request Sep 5, 2023
2 tasks
jiashenC pushed a commit that referenced this pull request Sep 5, 2023
Replacing udf_metadata_key from string_literal to ID_LITERAL
`	modified:   evadb/parser/evadb.lark`
Removed .value from key_value_pair[0] post the change in type
`	modified:   evadb/parser/lark_visitor/_functions.py`
Replaced string key to ID_LITERAL in test query
`	modified:   test/unit_tests/parser/test_parser.py`

Solves #1010

---------

Co-authored-by: xzdandy <xzdandy@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature Request ✨ New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants