Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Highfreq Backtest with the Model/Rule/RL Strategy #408

Closed
wants to merge 28 commits into from

Conversation

bxdd
Copy link
Collaborator

@bxdd bxdd commented Apr 30, 2021

Description

Support Highfreq Backtest with the Model/Rule/RL Strategy

Motivation and Context

How Has This Been Tested?

  • Pass the test by running: pytest qlib/tests/test_all_pipeline.py under upper directory of qlib.
  • If you are adding a new feature, test on your own test scripts.

Screenshots of Test Results (if appropriate):

  1. Pipeline test:
  2. Your own tests:

Types of changes

  • Fix bugs
  • Add new feature
  • Update documentation

@@ -2,12 +2,12 @@
# Licensed under the MIT License.

from .order import Order
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could move the core framework outside of contrib folder now.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can move them later

@@ -0,0 +1,145 @@
# Copyright (c) Microsoft Corporation.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our framework is a Multi-level Trading system instead of a single high frequency trading system.
The folder name could have a better name than highfreq.

},
},
"backtest": {
"start_time": trade_start_time,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Part of the config looks like belonging to env.



def parse_freq(freq):
freq = freq.lower()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add docs

ret_freq.extend(self._get_report_freq(env_config["kwargs"]["sub_env"]))
return ret_freq

def _cal_risk_analysis_scaler(self, freq):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can combine these into risk_analysis and make it more powerful

raise ValueError("sample freq must be xmin, xd, xw, xm")


def get_sample_freq_calendar(start_time=None, end_time=None, freq="day", **kwargs):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Give some docs about the **kwargs.
What could it be?

try:
_calendar = Cal.calendar(start_time=start_time, end_time=end_time, freq=freq, **kwargs)
freq, freq_sam = freq, None
except ValueError:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part looks not intuitive.
Let's have some discussions later

qlib/contrib/strategy/rule_strategy.py Outdated Show resolved Hide resolved
self.instruments = D.instruments(instruments)
self.freq = freq

def _convert_index_format(self, df):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function appears multiple times.
It will be better move it into utils.


for k, v in kwargs.items():
if hasattr(self, k):
setattr(self, k, v)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Give warning in else branch

@Derek-Wds Derek-Wds added the enhancement New feature or request label May 10, 2021
@bxdd bxdd requested a review from you-n-g May 12, 2021 16:50
@bxdd
Copy link
Collaborator Author

bxdd commented May 13, 2021

I add a notebook file workflow.ipynb to show the multi-level reports, its size is 5k. I temporarily modified .gitignore to submit it successfully. And this file has the same purpose as workflow_by_code.ipynb. Is this file necessary? If not, I will reset the related commit. @you-n-g

"n_drop": 5,
},
},
"env": {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"env" -> "executor"?

"class": "SimulatorExecutor",
"module_path": "qlib.contrib.backtest.executor",
"kwargs": {
"step_bar": "day",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's figure out a better name for step_bar and freq

self._init_sub_trading(order_list)
sub_execute_state = self.sub_env.get_init_state()
while not self.sub_env.finished():
_order_list = self.sub_strategy.generate_order_list(sub_execute_state)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why current can't be global?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't we pass an account instead of sub_execute_state? account should contain everything that strategy needs other than information in exchange.

from ..rl.interpreter import ActionInterpreter, StateInterpreter


class BaseStrategy(BaseTradeCalendar):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strategy should not stateless.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After reading the implementation of TWAP, I agree that the strategy is hard to be stateless.

_interpret_state = self.state_interpretor.interpret(
execute_result=execute_state, **self.action_interpret_kwargs
)
_policy_action = self.policy.step(_interpret_state)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.policy(_interpret_state)

# Licensed under the MIT License.


class Faculty:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why use this name?

self.__dict__["_faculty"].update(*args, **kwargs)


common_faculty = Faculty()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Singleton is not enough for our scenario.

@@ -24,6 +24,7 @@ def __init__(self, stock_id, amount, trade_date, direction, factor):
self.amount = amount
# amount of successfully completed orders
self.deal_amount = 0
self.trade_date = trade_date
self.start_time = start_time
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

define the start_time & end_time (e.g. include or exclude)


def update_stock_price(self, stock_id, price):
self.position[stock_id]["price"] = price

def update_stock_count(self, stock_id, count):
self.position[stock_id]["count"] = count
def update_stock_count(self, stock_id, bar, count):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will it better to unify the name bar and freq?

del p["cash"]
del p["today_account_value"]
del p["now_account_value"]
positions = pd.DataFrame.from_dict(p, orient="index")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please import pandas and numpy

else:
if raw_count > sam_count:
raise ValueError("raw freq must be higher than sampling freq")
_calendar_minute = np.unique(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we have to implement such a complicated version?
Will the following logic simpler?

div = freq_targert /  freq_orig
cal_target = cal_orig[::div]

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add docstring

start sampling time, by default None
end_time : Union[str, pd.Timestamp], optional
end sampling time, by default None
fields : Union[str, List[str]], optional
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What scenario do we have to resample part of the field?

else feature.loc[(slice(None), selector_datetime), fields]
)
if feature.empty:
return None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returning the empty feature will be more reasonable

from ..data.dataset.utils import get_level_index

datetime_level = get_level_index(feature, level="datetime") == 0
if isinstance(feature, pd.Series):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't filter fields in this function.
It will be unnecessary to use different logic between pd.Series and pd.DataFrame


from ..data.dataset.utils import get_level_index

datetime_level = get_level_index(feature, level="datetime") == 0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you make sure the datetime is sorted?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lasy_sort_index
qlib.utils

index.is_lexsorted()

"class": "SimulatorExecutor",
"module_path": "qlib.contrib.backtest.executor",
"kwargs": {
"step_bar": "day",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please send all the new names to the group for discussion

self.current = Position(cash=init_cash)
self._reset_report()

def _cal_benchmark(self, benchmark_config, freq):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move it to report.py

@@ -83,9 +165,13 @@ def update_order(self, order, trade_val, cost, trade_price):
self.current.update_order(order, trade_val, cost, trade_price)
self.update_state_from_order(order, trade_val, cost, trade_price)

def update_daily_end(self, today, trader):
def update_bar_count(self):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Giving Account an interface is doable


_execute_state = trade_env.get_init_state()
while not trade_env.finished():
_order_list = trade_strategy.generate_order_list(_execute_state)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for example, decision


_execute_state = trade_env.get_init_state()
while not trade_env.finished():
_order_list = trade_strategy.generate_order_list(_execute_state)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

list sharing granularity and send it to group for discussion

@@ -51,6 +56,9 @@ def __init__(
target on this day).
index: MultipleIndex(instrument, pd.Datetime)
"""
self.freq = freq
self.start_time = start_time
self.end_time = end_time
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 2 3 4 5

1 2 3 4
[1, 4]
[1, 4.5]



class Exchange:
def __init__(
self,
trade_dates=None,
freq="day",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

turnover limit threshing

else:
if raw_count > sam_count:
raise ValueError("raw freq must be higher than sampling freq")
_calendar_minute = np.unique(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add docstring


from ..data.dataset.utils import get_level_index

datetime_level = get_level_index(feature, level="datetime") == 0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lasy_sort_index
qlib.utils

index.is_lexsorted()

@bxdd bxdd requested review from rk2900, ultmaster and you-n-g May 24, 2021 18:57
Return the proportion of your total value you will used in investment.
Dynamically risk_degree will result in Market timing.
"""
# It will use 95% amoutn of your total value by default
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

amount

if rely_trade_decision is not None:
self.rely_trade_decision = rely_trade_decision

def generate_trade_decision(self, execute_state):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the format/interface of execute state? What is expected to get from execute_state when I write a new strategy.

What is the interface of return value of generate_trade_decision?

if "trade_account" in common_infra:
self.trade_position = common_infra.get("trade_account").current

def reset(self, level_infra: dict = None, common_infra: dict = None, rely_trade_decision=None, **kwargs):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the interface of:

  • level_infra
  • common_infra
  • rely_trade_decision

When is reset expected to be called?

@bxdd
Copy link
Collaborator Author

bxdd commented May 25, 2021

This pr is closed, see #438

@bxdd bxdd closed this May 25, 2021
@@ -35,7 +35,7 @@ def parse_freq(freq: str) -> Tuple[int, str]:
raise ValueError(
"freq format is not supported, the freq should be like (n)month/mon, (n)week/w, (n)day/d, (n)minute/min"
)
_count = int(match_obj.group(1) if match_obj.group(1) else "1")
_count = int(match_obj.group(1)) if match_obj.group(1) is None else 1
Copy link
Collaborator Author

@bxdd bxdd May 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be int(match_obj.group(1)) if match_obj.group(1) else 1?
Example: If call parse_freq("min"), match_obj.group(1) == ' ' rather than None

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants