Skip to content

Code for our WSDM 2022 paper titled "The Datasets Dilemma: How Much Do We Really Know About Recommendation Datasets?"

License

Notifications You must be signed in to change notification settings

almightyGOSU/TheDatasetsDilemma

Repository files navigation

Code for the WSDM 2022 paper

The Datasets Dilemma: How Much Do We Really Know About Recommendation Datasets?


Hello! :)

This repository contains the source code, as well as other useful information, for the paper "The Datasets Dilemma: How Much Do We Really Know About Recommendation Datasets?" in WSDM 2022.

The paper is available here: Paper (Best Paper Award Runner-up)

For a quick overview of the paper, you can refer to these slides: The Datasets Dilemma Slides

Reference

Please consider citing our work if you find it useful, thank you!

@inproceedings{10.1145/3488560.3498519,
  author = {Chin, Jin Yao and Chen, Yile and Cong, Gao},
  title = {The Datasets Dilemma: How Much Do We Really Know About Recommendation Datasets?},
  year = {2022},
  isbn = {9781450391320},
  publisher = {Association for Computing Machinery},
  address = {New York, NY, USA},
  url = {https://doi.org/10.1145/3488560.3498519},
  doi = {10.1145/3488560.3498519},
  booktitle = {Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining},
  pages = {141–149},
  numpages = {9},
  keywords = {datasets, item recommendation, evaluation, data characteristics},
  location = {Virtual Event, AZ, USA},
  series = {WSDM '22}
}

Outline

In our paper, we try to address the "datasets dilemma" using 3 main steps.

  1. How are different datasets being utilised in recent papers?
    • Are there any patterns?
    • Code can be found in the ./Step 1/ folder (Please refer to its README file)
  2. What are the similarities as well as differences between various datasets?
    • Can we define them using objective measures?
    • Code can be found in the ./Step 2/ folder (Please refer to its README file)
  3. If the choice of datasets used could influence the observations and/or conclusions obtained
    • Empirical study using a variety of item recommendation algorithms
    • Code can be found in the ./Step 3/ folder (Please refer to its README file)

The ./Datasets/ folder

Environment Setup

  1. Python 3.6.8
  2. PyTorch 1.4.0
  3. Tensorflow 2.3.0
  4. numpy 1.17.2
  5. pandas 0.25.3
  6. matplotlib 3.3.2
  7. scikit-learn 0.23.2
  8. scipy 1.3.0
  9. scikit-optimize 0.8.1
  10. mlxtend 0.18.0 (for frequent itemset mining)
  11. implicit 0.4.4 (for Weighted Matrix Factorization (WMF))

Analyses & experiments were conducted on a Ubuntu server with version 16.04.6 LTS, and conda 4.8.4.

About

Code for our WSDM 2022 paper titled "The Datasets Dilemma: How Much Do We Really Know About Recommendation Datasets?"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published