Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Refactor predictor #186

Merged
merged 17 commits into from
May 23, 2022
5 changes: 4 additions & 1 deletion .git_bin_path
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
{"leaf_name": "data/test", "leaf_file": ["data/test/batch_criteo_sample.tfrecord", "data/test/criteo_sample.tfrecord", "data/test/dwd_avazu_ctr_deepmodel_10w.csv", "data/test/embed_data.csv", "data/test/lookup_data.csv", "data/test/tag_kv_data.csv", "data/test/test.csv", "data/test/test_sample_weight.txt", "data/test/test_with_quote.csv"]}
{"leaf_name": "data/test/export", "leaf_file": ["data/test/export/data.csv"]}
{"leaf_name": "data/test/hpo_test/eval_val", "leaf_file": ["data/test/hpo_test/eval_val/events.out.tfevents.1597889819.j63d04245.sqa.eu95"]}
{"leaf_name": "data/test/inference", "leaf_file": ["data/test/inference/lookup_data_test80.csv", "data/test/inference/taobao_infer_data.txt"]}
{"leaf_name": "data/test/inference", "leaf_file": ["data/test/inference/lookup_data_test80.csv", "data/test/inference/taobao_infer_data.txt", "data/test/inference/taobao_infer_rtp_data.txt"]}
{"leaf_name": "data/test/inference/fg_export_multi", "leaf_file": ["data/test/inference/fg_export_multi/saved_model.pb"]}
{"leaf_name": "data/test/inference/fg_export_multi/assets", "leaf_file": ["data/test/inference/fg_export_multi/assets/pipeline.config"]}
{"leaf_name": "data/test/inference/fg_export_multi/variables", "leaf_file": ["data/test/inference/fg_export_multi/variables/variables.data-00000-of-00001", "data/test/inference/fg_export_multi/variables/variables.index"]}
Expand All @@ -20,6 +20,9 @@
{"leaf_name": "data/test/inference/tb_multitower_placeholder_rename_export", "leaf_file": ["data/test/inference/tb_multitower_placeholder_rename_export/saved_model.pb"]}
{"leaf_name": "data/test/inference/tb_multitower_placeholder_rename_export/assets", "leaf_file": ["data/test/inference/tb_multitower_placeholder_rename_export/assets/pipeline.config"]}
{"leaf_name": "data/test/inference/tb_multitower_placeholder_rename_export/variables", "leaf_file": ["data/test/inference/tb_multitower_placeholder_rename_export/variables/variables.data-00000-of-00001", "data/test/inference/tb_multitower_placeholder_rename_export/variables/variables.index"]}
{"leaf_name": "data/test/inference/tb_multitower_rtp_export", "leaf_file": ["data/test/inference/tb_multitower_rtp_export/saved_model.pb"]}
{"leaf_name": "data/test/inference/tb_multitower_rtp_export/assets", "leaf_file": ["data/test/inference/tb_multitower_rtp_export/assets/pipeline.config"]}
{"leaf_name": "data/test/inference/tb_multitower_rtp_export/variables", "leaf_file": ["data/test/inference/tb_multitower_rtp_export/variables/variables.data-00000-of-00001", "data/test/inference/tb_multitower_rtp_export/variables/variables.index"]}
{"leaf_name": "data/test/latest_ckpt_test", "leaf_file": ["data/test/latest_ckpt_test/model.ckpt-500.data-00000-of-00001", "data/test/latest_ckpt_test/model.ckpt-500.index", "data/test/latest_ckpt_test/model.ckpt-500.meta"]}
{"leaf_name": "data/test/rtp", "leaf_file": ["data/test/rtp/taobao_fg_pred.out", "data/test/rtp/taobao_test_bucketize_feature.txt", "data/test/rtp/taobao_test_feature.txt", "data/test/rtp/taobao_test_input.txt", "data/test/rtp/taobao_train_bucketize_feature.txt", "data/test/rtp/taobao_train_feature.txt", "data/test/rtp/taobao_train_input.txt", "data/test/rtp/taobao_valid.csv", "data/test/rtp/taobao_valid_feature.txt"]}
{"leaf_name": "data/test/tb_data", "leaf_file": ["data/test/tb_data/taobao_ad_feature_gl", "data/test/tb_data/taobao_clk_edge_gl", "data/test/tb_data/taobao_multi_seq_test_data", "data/test/tb_data/taobao_multi_seq_train_data", "data/test/tb_data/taobao_noclk_edge_gl", "data/test/tb_data/taobao_test_data", "data/test/tb_data/taobao_test_data_for_expr", "data/test/tb_data/taobao_test_data_kd", "data/test/tb_data/taobao_train_data", "data/test/tb_data/taobao_train_data_for_expr", "data/test/tb_data/taobao_train_data_kd", "data/test/tb_data/taobao_user_profile_gl"]}
Expand Down
5 changes: 4 additions & 1 deletion .git_bin_url
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
{"leaf_path": "data/test", "sig": "656d73b4e78d0d71e98120050bc51387", "remote_path": "data/git_oss_sample_data/data_test_656d73b4e78d0d71e98120050bc51387"}
{"leaf_path": "data/test/export", "sig": "c2e5ad1e91edb55b215ea108b9f14537", "remote_path": "data/git_oss_sample_data/data_test_export_c2e5ad1e91edb55b215ea108b9f14537"}
{"leaf_path": "data/test/hpo_test/eval_val", "sig": "fef5f6cd659c35b713c1b8bcb97c698f", "remote_path": "data/git_oss_sample_data/data_test_hpo_test_eval_val_fef5f6cd659c35b713c1b8bcb97c698f"}
{"leaf_path": "data/test/inference", "sig": "e2c4b0f07ff8568eb7b8e1819326d296", "remote_path": "data/git_oss_sample_data/data_test_inference_e2c4b0f07ff8568eb7b8e1819326d296"}
{"leaf_path": "data/test/inference", "sig": "9725274cad0f27baf561ebfaf7946846", "remote_path": "data/git_oss_sample_data/data_test_inference_9725274cad0f27baf561ebfaf7946846"}
{"leaf_path": "data/test/inference/fg_export_multi", "sig": "c6690cef053aed9e2011bbef90ef33e7", "remote_path": "data/git_oss_sample_data/data_test_inference_fg_export_multi_c6690cef053aed9e2011bbef90ef33e7"}
{"leaf_path": "data/test/inference/fg_export_multi/assets", "sig": "7fe7a4525f5d46cc763172f5200e96e0", "remote_path": "data/git_oss_sample_data/data_test_inference_fg_export_multi_assets_7fe7a4525f5d46cc763172f5200e96e0"}
{"leaf_path": "data/test/inference/fg_export_multi/variables", "sig": "1f9aad9744382c6d5b5f152d556d9b30", "remote_path": "data/git_oss_sample_data/data_test_inference_fg_export_multi_variables_1f9aad9744382c6d5b5f152d556d9b30"}
Expand All @@ -20,6 +20,9 @@
{"leaf_path": "data/test/inference/tb_multitower_placeholder_rename_export", "sig": "dc05357e52fd574cba48165bc67af906", "remote_path": "data/git_oss_sample_data/data_test_inference_tb_multitower_placeholder_rename_export_dc05357e52fd574cba48165bc67af906"}
{"leaf_path": "data/test/inference/tb_multitower_placeholder_rename_export/assets", "sig": "750925c4866bf1db8c3188f604271c72", "remote_path": "data/git_oss_sample_data/data_test_inference_tb_multitower_placeholder_rename_export_assets_750925c4866bf1db8c3188f604271c72"}
{"leaf_path": "data/test/inference/tb_multitower_placeholder_rename_export/variables", "sig": "56850b4506014ce1bd3ba9b6d60e2770", "remote_path": "data/git_oss_sample_data/data_test_inference_tb_multitower_placeholder_rename_export_variables_56850b4506014ce1bd3ba9b6d60e2770"}
{"leaf_path": "data/test/inference/tb_multitower_rtp_export", "sig": "f1bc6238cfab648812afca093da5dd6b", "remote_path": "data/git_oss_sample_data/data_test_inference_tb_multitower_rtp_export_f1bc6238cfab648812afca093da5dd6b"}
{"leaf_path": "data/test/inference/tb_multitower_rtp_export/assets", "sig": "ae1cc9ec956fb900e5df45c4ec255c4b", "remote_path": "data/git_oss_sample_data/data_test_inference_tb_multitower_rtp_export_assets_ae1cc9ec956fb900e5df45c4ec255c4b"}
{"leaf_path": "data/test/inference/tb_multitower_rtp_export/variables", "sig": "efe52ef308fd6452f3b67fd04cdd22bd", "remote_path": "data/git_oss_sample_data/data_test_inference_tb_multitower_rtp_export_variables_efe52ef308fd6452f3b67fd04cdd22bd"}
{"leaf_path": "data/test/latest_ckpt_test", "sig": "d41d8cd98f00b204e9800998ecf8427e", "remote_path": "data/git_oss_sample_data/data_test_latest_ckpt_test_d41d8cd98f00b204e9800998ecf8427e"}
{"leaf_path": "data/test/rtp", "sig": "76cda60582617ddbb7cd5a49eb68a4b9", "remote_path": "data/git_oss_sample_data/data_test_rtp_76cda60582617ddbb7cd5a49eb68a4b9"}
{"leaf_path": "data/test/tb_data", "sig": "c8136915b6e5e9d96b18448cf2e21d3d", "remote_path": "data/git_oss_sample_data/data_test_tb_data_c8136915b6e5e9d96b18448cf2e21d3d"}
Expand Down
22 changes: 16 additions & 6 deletions docs/source/models/mind.md
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,7 @@ model_config:{
# use the same numer of capsules for all users
const_caps_num: true
}

simi_pow: 20
l2_regularization: 1e-6
time_id_fea: "seq_ts_gap"
Expand All @@ -101,7 +101,7 @@ model_config:{
- dnn:
- hidden_units: dnn每一层的channel数
- use_bn: 是否使用batch_norm, 默认是true
- item_dnn: item侧的dnn参数, 配置同user_dnn
- item_dnn: item侧的dnn参数, 配置同user_dnn
- note: item侧不能用batch_norm
- pre_capsule_dnn: 进入capsule之前的dnn的配置
- 可选, 配置同user_dnn和item_dnn
Expand All @@ -117,7 +117,7 @@ model_config:{
- squash_pow: 对squash加的power, 防止squash之后的向量值变得太小
- simi_pow: 对相似度做的倍数, 放大interests之间的差异
- embedding_regularization: 对embedding部分加regularization,防止overfit
- user_seq_combine:
- user_seq_combine:
- CONCAT: 多个seq之间采取concat的方式融合
- SUM: 多个seq之间采取sum的方式融合, default是SUM
- time_id_fea: time_id feature的name, 对应feature_config里面定义的特征
Expand All @@ -128,6 +128,7 @@ model_config:{
- 行为序列特征可以加上time_id, time_id经过1 dimension的embedding后, 在time维度进行softmax, 然后和其它sequence feature的embedding相乘

- time_id取值的方式可参考:

- 训练数据: Math.round((2 * Math.log1p((labelTime - itemTime) / 60.) / Math.log(2.))) + 1
- inference: Math.round((2 * Math.log1p((currentTime - itemTime) / 60.) / Math.log(2.))) + 1
- 此处的时间(labelTime, itemTime, currentTime) 为seconds
Expand All @@ -136,17 +137,19 @@ model_config:{

- 使用增量训练,增量训练可以防止负采样的穿越。

- 使用HPO对squash_pow[0.1 - 1.0]和simi_pow[10 - 100]进行搜索调优。
- 使用HPO对squash_pow\[0.1 - 1.0\]和simi_pow\[10 - 100\]进行搜索调优。

- 要看的指标是召回率,准确率和兴趣损失,三个指标要一起看。

- 使用全网的点击数据来生成训练样本,全网的行为会更加丰富,这有利于mind模型的训练。

- 数据清洗:

- 把那些行为太少的item直接在构造行为序列的时候就挖掉
- 排除爬虫或者作弊用户

- 数据采样:

- mind模型的训练默认是以点击为目标
- 如果业务指标是到交易,那么可以对交易的样本重采样

Expand All @@ -155,9 +158,11 @@ model_config:{
[MIND_demo.config](https://easyrec.oss-cn-beijing.aliyuncs.com/config/mind_on_taobao_neg_sam.config)

### 效果评估

离线的效果评估主要看在测试集上的hitrate. 可以参考文档[效果评估](https://easyrec.oss-cn-beijing.aliyuncs.com/docs/recall_eval.pdf)

#### 评估sql

```sql
pai -name tensorflow1120_cpu_ext
-Dscript='oss://easyrec/deploy/easy_rec/python/tools/hitrate.py'
Expand Down Expand Up @@ -204,15 +209,18 @@ pai -name tensorflow1120_cpu_ext
- 1: Inner Product similarity
- emb_dim: user / item表征向量的维度
- top_k: knn检索取top_k计算hitrate
- recall_type:
- recall_type:
- u2i: user to item retrieval

#### 评估结果

输出下面两张表

- mind_hitrate_details:

- 输出每一个user的hitrate = user_hits / user_recalls
- 格式如下:

```text
id : bigint
topk_ids : string
Expand All @@ -221,10 +229,12 @@ pai -name tensorflow1120_cpu_ext
bad_ids : string
bad_dists : string
```

- mind_total_hitrate:

- 输出平均hitrate = SUM(user_hits) / SUM(user_recalls)
- 格式如下:

```text
hitrate : double
```
Expand Down
7 changes: 5 additions & 2 deletions docs/source/pre_check.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,30 +3,32 @@
为解决用户常由于脏数据或配置错误的原因,导致训练失败,开发了预检查功能。
在训练时打开检查模式,或是训练前执行pre_check脚本,即会检查data_config配置及train_config部分配置,筛查全部数据,遇到异常则抛出相关信息,并给出修改意见。


### 命令

#### Local

方式一: 执行pre_check脚本:

```bash
PYTHONPATH=. python easy_rec/python/tools/pre_check.py --pipeline_config_path samples/model_config/din_on_taobao.config --data_input_path data/test/check_data/csv_data_for_check
```

方式二: 训练时打开检查模式(默认关闭):

该方式会影响训练速度,线上例行训练时不建议开启检查模式。

```bash
python -m easy_rec.python.train_eval --pipeline_config_path samples/model_config/din_on_taobao.config --check_mode
```

- pipeline_config_path config文件路径
- data_input_path 待检查的数据路径,不指定的话为pipeline_config_path中的train_input_path及eval_input_path
- check_mode 默认False


#### On PAI

方式一: 执行pre_check脚本:

```sql
pai -name easy_rec_ext -project algo_public
-Dcmd='check'
Expand All @@ -42,6 +44,7 @@ pai -name easy_rec_ext -project algo_public
方式二: 训练时打开检查模式(默认关闭):

该方式会影响训练速度,线上例行训练时不建议开启检查模式。

```sql
pai -name easy_rec_ext -project algo_public
-Dcmd='train'
Expand Down
Loading