A list of awesome papers and resources of recommender system on large language model (LLM).
🎉 News: Our LLM4Rec survey has been released. A Survey on Large Language Models for Recommendation
The related work and projects will be updated soon and continuously.
If our work has been of assistance to you, please feel free to cite our survey. Thank you.
@article{llm4recsurvey,
author = {Likang Wu and Zhi Zheng and Zhaopeng Qiu and Hao Wang and Hongchao Gu and Tingjia Shen and Chuan Qin and Chen Zhu and Hengshu Zhu and Qi Liu and Hui Xiong and Enhong Chen},
title = {A Survey on Large Language Models for Recommendation},
journal = {CoRR},
volume = {abs/2305.19860},
year = {2023}
}
Note: The tuning here only indicates whether the LLM model has been tuned.
Name | Venue | Year |
---|---|---|
Large Language Models for Recommendation: Progresses and Future Directions | SIGIR-AP | 2023 |
Tutorial on Large Language Models for Recommendation | RecSys | 2023 |
Name | Scene | Tasks | Information | URL |
---|---|---|---|---|
Amazon Review | Commerce | Seq Rec/CF Rec | This is a large crawl of product reviews from Amazon. Ratings: 82.83 million, Users: 20.98 million, Items: 9.35 million, Timespan: May 1996 - July 2014 | link |
Amazon-M2 | Commerce | Seq Rec/CF Rec | A large dataset of anonymized user sessions with their interacted products collected from multiple language sources at Amazon. It includes 3,606,249 train sessions, 361,659 test sessions, and 1,410,675 products. | link |
Steam | Game | Seq Rec/CF Rec | Reviews represent a great opportunity to break down the satisfaction and dissatisfaction factors around games. Reviews: 7,793,069, Users: 2,567,538, Items: 15,474, Bundles: 615 | link |
MovieLens | Movie | General | The dataset consists of 4 sub-datasets, which describe users' ratings to movies and free-text tagging activities from MovieLens, a movie recommendation service. | link |
Yelp | Commerce | General | There are 6,990,280 reviews, 150,346 businesses, 200,100 pictures, 11 metropolitan areas, 908,915 tips by 1,987,897 users. Over 1.2 million business attributes like hours, parking, availability, etc. | link |
Douban | Movie, Music, Book | Seq Rec/CF Rec | This dataset includes three domains, i.e., movie, music, and book, and different kinds of raw information, i.e., ratings, reviews, item details, user profiles, tags (labels), and date. | link |
MIND | News | General | MIND contains about 160k English news articles and more than 15 million impression logs generated by 1 million users. Every news contains textual content including title, abstract, body, category, and entities. | link |
U-NEED | Commerce | Conversation Rec | U-NEED consists of 7,698 fine-grained annotated pre-sales dialogues, 333,879 user behaviors, and 332,148 product knowledge tuples. | link |
PixelRec | Short Video | Seq Rec/CF Rec | PixelRec is a large dataset of cover images collected from a short video recommender system, comprising approximately 200 million user image interactions, 30 million users, and 400,000 video cover images. The texts and other aggregated attributes of videos are also included. | link |
KuaiSAR | Video | Search and Rec | KuaiSAR contains genuine search and recommendation behaviors of 25,877 users, 6,890,707 items, 453,667 queries, and 19,664,885 actions within a span of 19 days on the Kuaishou app | link |
Tenrec | Video, Article | General | Tenrec is a large-scale benchmark dataset for recommendation systems. It contains around 5 million users and 140 million interactions. | link |
NineRec | Video, Article | General | NineRec is a TransRec dataset suite that includes a large-scale source domain recommendation dataset and nine diverse target domain recommendation datasets. Each item in NineRec is represented by a text description and a high-resolution cover image. | link |
MicroLens | Video | General | MicroLens is a very large micro-video recommendation dataset containing one billion user-item interactions, 34 million users, and one million micro-videos. It includes various modality information about videos and serves as a benchmark for content-driven micro-video recommendation research. | link |
Some open-source and effective projects can be adapted to the recommendation systems based on Chinese textual data. Especially for the individual researchers !
Project | Year |
---|---|
Qwen-7B | 2023 |
baichuan-7B | 2023 |
YuLan-chat | 2023 |
Chinese-LLaMA-Alpaca | 2023 |
THUDM/ChatGLM-6B | 2023 |
FreedomIntelligence/LLMZoo Phoenix | 2023 |
bloomz-7b1 | 2023 |
LianjiaTech/BELLE | 2023 |
Hope our conclusion can help your work.