-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Resolve issues while running Automatic update of daily frequency data (from yahoo finance) for US region #1358
Conversation
Please merge master to fix the CI error. @HyeongminMoon |
Regarding the second point of your description, my guess is that it is not possible to determine whether qlib_dir exists or not, which causes GetData to be executed every time update_to_bin is run, which doesn't affect anything, but takes some unnecessary time. Is my guess correct? |
Correct. |
If that's the case, I think changing the |
No, I think this is not from |
Since windows cannot create some special name folders, "prn" is one of them, in order to ensure that users of various systems can also use qlib, we add "_qlib_" prefix to these special names, so the "prn" seen in |
I'll try this too. Thanks. |
Now all issues resolved, I checked the script once again too. |
@@ -289,7 +289,18 @@ def __init__( | |||
|
|||
def _executor(self, file_path: Path): | |||
file_path = Path(file_path) | |||
df = pd.read_csv(file_path) | |||
|
|||
default_na = pd._libs.parsers.STR_NA_VALUES |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be great if we could add more context and comments here.
It is hard to understand the code without an explanation of motivation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for review! I added explanation comments:)
It looks perfect now. |
… (from yahoo finance) for US region (microsoft#1358) * Update YahooNormalizeUS1dExtend(microsoft#1196) * Prevent pandas read_csv errors while running update_data_to_bin for US region * Fix parse_index error while running update_data_to_bin for US region * prevent pandas.read_csv error on specific symbol names * Reordering parameters for better rendering * removes prefix during feature_dir existence checking * add explanation comments
… (from yahoo finance) for US region (microsoft#1358) * Update YahooNormalizeUS1dExtend(microsoft#1196) * Prevent pandas read_csv errors while running update_data_to_bin for US region * Fix parse_index error while running update_data_to_bin for US region * prevent pandas.read_csv error on specific symbol names * Reordering parameters for better rendering * removes prefix during feature_dir existence checking * add explanation comments
… (from yahoo finance) for US region (microsoft#1358) * Update YahooNormalizeUS1dExtend(microsoft#1196) * Prevent pandas read_csv errors while running update_data_to_bin for US region * Fix parse_index error while running update_data_to_bin for US region * prevent pandas.read_csv error on specific symbol names * Reordering parameters for better rendering * removes prefix during feature_dir existence checking * add explanation comments
Description
According to README.md, I tried automatic update with US region.
Current implements are based on CN index, so I entered some issues:
(AttributeError: module 'collector' has no attribute 'YahooNormalizeUS1dExtend' while running update_data_to_bin #1196)
-> resolved at commit b2a76df
-> Not resolved. I resolved this by removing "PRN" from the qlib_data/us_index/instruments/all.txt, but I have no idea how can I integrate this in code. Anyway this is not critial issue(no error, just takes ~2min time additionally)
-> resolved at commit a9cb66b
-> resolved at commit 0be6b99
I am totally ready for accepting your guidance if there is. Thanks.
Motivation and Context
Motivated by #1196, making automatic update available for US region.
We will be able to use automatic update(with crontab!) for US region after adopting this request.
How Has This Been Tested?
pytest qlib/tests/test_all_pipeline.py
under upper directory ofqlib
.python scripts/data_collector/yahoo/collector.py update_data_to_bin --qlib_data_1d_dir <user data dir> --trading_date <start date> --end_date <end date> --region us
.Screenshots of Test Results (if appropriate):
Types of changes