Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unit test calculate_vwap() #782

Closed
2 tasks
DanilYachmenev opened this issue Apr 11, 2024 · 11 comments
Closed
2 tasks

Unit test calculate_vwap() #782

DanilYachmenev opened this issue Apr 11, 2024 · 11 comments
Assignees
Labels
Outsource Task for interns

Comments

@DanilYachmenev
Copy link
Collaborator

There is a calculate_vwap() in im_v2/common/data/transform/transform_utils.py (link) that is being used in amp/research_amp/cc/notebooks/master_tradability_analysis.py and amp/research_amp/cc/notebooks/Master_Analysis_CrossSectionalLearning.py

We need to add unit tests using synthetic test data. Create a simple short dataframe with 2 columns: 1 for price and 1 for volume, use a pd.Timestamp index.
Run the function on it and compare the outcome with string signature (for ref see tests in amp/helpers/test/test_hpandas.py e.g. Test_merge_dfs1)

Add 2 unit tests:

  • No resample_kwargs
  • With resample_kwargs

See unit test doc to follow the code style

Assigning @samarth9008 to re-distribute

FYI @gpsaggese

@samarth9008
Copy link
Collaborator

@DanilYachmenev Thanks for the issue

@aish-nidhi
Copy link
Collaborator

Thanks @samarth9008 for assigning the issue.
I will be working on the synthetic data set for the test today and will start with the no resample_kwargs test.

@gpsaggese
Copy link
Contributor

@aish-nidhi feel free to post a plan or do a draft PR (e.g., only a sketch/skeleton, we won't expect any convention to be followed, you just have to mark it as draft and ask for a architectural review).
The worst case scenario is misinterpreting the specs and come back with something correct but different than was it requested.

@aish-nidhi
Copy link
Collaborator

aish-nidhi commented Apr 15, 2024 via email

@gpsaggese
Copy link
Contributor

@aish-nidhi can you reply on the GUI (the link is in the email) instead of replying to the GH email so that we don't get all the garbage above.

@aish-nidhi
Copy link
Collaborator

@gpsaggese Will keep that in mind.

aish-nidhi pushed a commit that referenced this issue Apr 16, 2024
@aish-nidhi
Copy link
Collaborator

aish-nidhi commented Apr 17, 2024

Hey @DanilYachmenev @samarth9008 ,

Just had a quick question. I was trying to run my test when I realised that both the files mentioned in the issue are not using the function calculate_vwap() from transform_utils.py. Instead they are actually using calculate_vwap_twap() function from transform.py.

Below is a jackpy search result output:-

(amp.client_venv) ➜  sorrentum1 git:(SorrTask782_Unit_test_calculate_vwap_function) ✗ jackpy "calculate_vwap"
research_amp/transform.py:20:def calculate_vwap_twap(df: pd.DataFrame, resampling_rule: str) -> pd.DataFrame:
research_amp/cc/notebooks/CMTask1704_Get_raw_CC_data.py:93:def calculate_vwap_twap(df: pd.DataFrame, resampling_rule: str) -> pd.DataFrame:
research_amp/cc/notebooks/CMTask1704_Get_raw_CC_data.py:293:vwap_twap_df = calculate_vwap_twap(data_hist_ccxt, resampling_rule)
research_amp/cc/notebooks/Master_Analysis_CrossSectionalLearning.py:139:df = ramptran.calculate_vwap_twap(
research_amp/cc/notebooks/master_tradability_analysis.py:122:vwap_twap_df = ramptran.calculate_vwap_twap(ohlcv_cc, resampling_rule)
im_v2/common/data/transform/test/test_transform_utils.py:803:    def test_calculate_vwap_with_resample_kwargs(self) -> None:
im_v2/common/data/transform/test/test_transform_utils.py:809:    def test_calculate_vwap_no_resample_kwargs(self) -> None:
im_v2/common/data/transform/transform_utils.py:374:def calculate_vwap(
./Untitled.py:42:vwap_twap_df = ramptran.calculate_vwap_twap(ohlcv_cc, resampling_rule)
(amp.client_venv) ➜  sorrentum1 git:(SorrTask782_Unit_test_calculate_vwap_function) ✗ 

Can you point me to somewhere this function is used? Should I continue with testing?

@DanilYachmenev
Copy link
Collaborator Author

@aish-nidhi good note, indeed, calculate_vwap() is not used in the lib
Instead, calculate_vwap_twap() is
And it needs to be tested too, so please add tests for it.

So here you need to create a test DataFrame with "full_symbol" (str, see examples in the codebase), "close" (float) and "volume" (float) columns.
Also use a pd.Timestamp index with 1-min freq, make like ~20 rows

then just add 2 tests for 2 different resampling_rule values

Use string signature approach in order to compare outputs.
See tests in helpers/test/test_hpandas.py for reference (and unit test doc ofc)

@aish-nidhi
Copy link
Collaborator

Hey @DanilYachmenev @samarth9008,
As mentioned by you, I put full_symbol as "str". But in the calculate_vwap_twap() function, there is a call to convert_to_multiindex() where full_symbol is being passed as assert_id_col and that function asserts if the column is np.int64.

image image

How would you want me to address this issue? Should I convert the "full_symbol" to "np.int64"? The convert_to_multiindex() function is being used at too many places, so making alterations to it might first need an effect analysis of it.

@DanilYachmenev
Copy link
Collaborator Author

DanilYachmenev commented Apr 19, 2024

... in the calculate_vwap_twap() function, there is a call to convert_to_multiindex() where full_symbol is being passed as assert_id_col and that function asserts if the column is np.int64.

@aish-nidhi

Indeed, true. calculate_vwap_twap() calls convert_to_multiindex() which has been refactored to process input data with asset id columns which is reflected in unit tests for it
see Test_convert_to_multiindex, you can use approach to building test dataframe from it

For some reason, functions from amp/research_amp/transform.py were not updated to this behaviour, unpaid tech debt
usage of calculate_vwap_twap() for now is limited only by some obsolete notebooks but can be useful in research soon

Therefore, replace "full_symbol" column name to "asset_id" and create test dataframe with "asset_id" that has int values
then function will work properly and you can write unit tests

aish-nidhi pushed a commit that referenced this issue Apr 23, 2024
samarth9008 added a commit that referenced this issue May 3, 2024
* Issue #782 draft PR with code skeleton

* Issue #782 unit tests for function calculate_vwap_twap with 2 different resampling rules

* Fix for ambiguous truth value of multiindex column names

* Linter corrections on file

* Changes for PR comment

* Modifications to address PR comments

* Adding full-stop to comments

---------

Co-authored-by: Aishwarya Nidhi <aishwaryanidhi@Aishwaryas-Air.cgocable.net>
Co-authored-by: Samarth KaPatel <samarth.kapatel5@gmail.com>
Kev-Daran pushed a commit to Kev-Daran/kaizenflow that referenced this issue May 8, 2024
* Issue causify-ai#782 draft PR with code skeleton

* Issue causify-ai#782 unit tests for function calculate_vwap_twap with 2 different resampling rules

* Fix for ambiguous truth value of multiindex column names

* Linter corrections on file

* Changes for PR comment

* Modifications to address PR comments

* Adding full-stop to comments

---------

Co-authored-by: Aishwarya Nidhi <aishwaryanidhi@Aishwaryas-Air.cgocable.net>
Co-authored-by: Samarth KaPatel <samarth.kapatel5@gmail.com>
DIPOLAWSON pushed a commit that referenced this issue May 9, 2024
* Issue #782 draft PR with code skeleton

* Issue #782 unit tests for function calculate_vwap_twap with 2 different resampling rules

* Fix for ambiguous truth value of multiindex column names

* Linter corrections on file

* Changes for PR comment

* Modifications to address PR comments

* Adding full-stop to comments

---------

Co-authored-by: Aishwarya Nidhi <aishwaryanidhi@Aishwaryas-Air.cgocable.net>
Co-authored-by: Samarth KaPatel <samarth.kapatel5@gmail.com>
Nadiyahlw added a commit that referenced this issue May 9, 2024
* Revert "Add DATA605 directory (#876)" (#884)

This reverts commit 00e90d2.

* SorrTask786 Unit test for split_positive_and_negative_parts() (#883)

* Added Unit test for split_positive_and_negative_parts()

* Resolved comments

* Added remaining comments

* deleted unnecessary files

* "Fixes"

* Comment resolution

* Comments resolved.

---------

Co-authored-by: Samarth KaPatel <samarth.kapatel5@gmail.com>

* SorrTask-782 Unit tests for calculate_vwap_twap() (#801)

* Issue #782 draft PR with code skeleton

* Issue #782 unit tests for function calculate_vwap_twap with 2 different resampling rules

* Fix for ambiguous truth value of multiindex column names

* Linter corrections on file

* Changes for PR comment

* Modifications to address PR comments

* Adding full-stop to comments

---------

Co-authored-by: Aishwarya Nidhi <aishwaryanidhi@Aishwaryas-Air.cgocable.net>
Co-authored-by: Samarth KaPatel <samarth.kapatel5@gmail.com>

* SorrTask-891 Unit tests for compute epoch function (#901)

* Issue 891 - Unit test code skeleton

* Issue-891 3 unit tests with each available unit value for Series input.

* Issue-891 Unit test of compute epoch function for dataframe input with default unit

* Adding missed full stop for comment

* PR comment fixes

* Removing todo from function

* PR comment fixes

---------

Co-authored-by: Aishwarya Nidhi <aishwaryanidhi@Aishwaryas-Air.cgocable.net>
Co-authored-by: Samarth KaPatel <samarth.kapatel5@gmail.com>

* SorTask903 Unit test convert_seconds_to_pandas_minutes()  (#907)

* Added unit test for function convert_seconds_to_minutes

* Nits

---------

Co-authored-by: Shaunak  Dhande <77265046+Shaunak01@users.noreply.github.com>
Co-authored-by: neha2801-create <77967216+neha2801-create@users.noreply.github.com>
Co-authored-by: Samarth KaPatel <samarth.kapatel5@gmail.com>
Co-authored-by: Aishwarya Nidhi <nidhiaishwarya8@gmail.com>
Co-authored-by: Aishwarya Nidhi <aishwaryanidhi@Aishwaryas-Air.cgocable.net>
@Nadiyahlw Nadiyahlw mentioned this issue May 9, 2024
@samarth9008
Copy link
Collaborator

Not used anymore so closing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Outsource Task for interns
Projects
None yet
Development

No branches or pull requests

4 participants