Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tushare_backend本地化存储在获取历史K线行情存在数据丢失的bug #4

Open
mapicccy opened this issue Aug 10, 2022 · 2 comments
Assignees

Comments

@mapicccy
Copy link
Owner

mapicccy commented Aug 10, 2022

背景:
tushare pro有很多访问次数限制,为了绕过限制当前采取了本地化存储的方式。并且tushare pro没办法盘中获取当天的行情(大概下午4点钟才会刷新),所以通过tencent api获取实时行情,堆叠在Dataframe上。实时行情不做持久化处理。

问题:
由于实时行情不做持久化处理,为了在下一天(包括非交易日)强制更新本地持久化数据,持久化数据的文件名通过today (%Y%m%d)来索引。问题的引入就是由于today的索引,回溯历史数据的时候,本地的持久化文件名称包含current day但是数据本身是上一个trading_date(不等于current day),如果current day是交易日,则获取current day的行情数据时会拿到上一个交易日的数据。

解决办法:
删除工作目录用于存放持久化数据的data文件夹,这只是一个workaround。

@mapicccy mapicccy self-assigned this Aug 10, 2022
@mapicccy
Copy link
Owner Author

mapicccy commented Aug 10, 2022

这个问题目前没有想到很好的处理办法,如果修改持久化的逻辑可能会引入其他的问题。所以为了repo相对稳定,当前仅提供workaround。

@mapicccy mapicccy pinned this issue Aug 10, 2022
mapicccy added a commit that referenced this issue Aug 10, 2022
Related to the issue #4

Link: #4
Signed-off-by: Guanjun <guanjun@linux.alibaba.com>
mapicccy added a commit that referenced this issue Aug 26, 2022
This patch is a mitigation for data corrupt which was
mentioned in the issue, #4.

Every time invoking get_price with tushare backend, the
persistent data will be updated. That will lead more IO
bandwidth. Free easy, data is the most important.

NOTE: mitigation doesn't mean totally resolved. When we
get_price in adjacent days, the data in former day will be
present the last get_price in thus day. BUT this behavior
does make sense, NO gain No update, isn't it?

Signed-off-by: Guanjun <guanjun@linux.alibaba.com>
@mapicccy
Copy link
Owner Author

commit 4faa857 (tushare: A mitigation for data corrupt) 针对当前问题提供了一个缓解措施。
在策略选股或者回测历史数据时,如果本地存在当天的持久化交易数据(工作目录存在data/文件夹),则交易数据是历史上的那一天最后一次get_price时获取的数据。如果“最后一次”发生在盘后,则数据是正确的;如果发生在盘中,则数据非当天的收盘结果。

这样修改是合理的。
tushare不能提供实时数据、tushare有非常严格的次数限制、从tencent获取实时数据拼接,这三个方面决定了当前的方案

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant