GitHub - KwaiKEG/Kuaipedia: the world's first large-scale multi-modal short-video encyclopedia, where the primitive units are items, aspects, and short videos.

🤗 Dataset • 📃 Paper

Kuaipedia is developed by Knowledge Engineering Group in Kuaishou (KwaiKEG), collaborating with HIT and HKUST. It is the world's first large-scale multi-modal short-video encyclopedia where the primitive units are items, aspects, and short videos.

Items is a set of entities and concepts, such as Shiba Inu, Moon and Galileo Galilei, which can be edited at one Wikipedia page. An item may have a title, a subtitle, a summary, attributes, and other detailed information of the item.
Aspects is a set of keywords or keyphrases attached to items. Those keywords are used to describe specific aspects of the item. For example, "selection", "food-protecting", "color" of item Shiba Inu, or "formation", "surface conditions", "how-to-draw" of item Moon.
Videos is a set of short-videos whose duration may not exceed 5 minutes. In this paper, we only focus on knowledge videos we detected, Where we follow OECD to define knowledge as:
- Know-what refers to knowledge about facts. E.g. How many people live in New York?
- Know-why refers to scientific knowledge of the principles and laws of nature. E.g. Why does the earth revolve around the sun?
- Know-how refers to skills or the capability to do something. E.g. How to cook bacon in the oven.

Please refer to the paper for more details.

Kuaipedia: a Large-scale Multi-modal Short-video Encyclopedia [Manuscript]

News

2023.06 - Release a subset of [data]
2022.11 - 业界首个！快手提出亿级别多模态短视频百科体系快知 (The first! Kuaishou proposed a large-scale multi-modal short-video encyclopedia "Kuaipeida") [机器之心][澎湃] [知乎] [CSDN][51CTO] [IT之家]
2022.08 - Obtain "Outstanding Project Award" in Kuaishou AI Day.

Data

We are excited to release a subset of Kuaipedia, featuring the most popular wiki entries for enhanced research opportunities. Along with this, we've also shared our experimental findings. Sample files can be located in the ./data folder, accompanied by a README.md file to clarify each field.

To download the full subset and experimental results of Kuaipedia, please go ahead to huggingface/dataset/kuaipedia, or use the following link:

link: https://pan.baidu.com/s/1yUB97aL2rBVt-Q0c6sYIcw code: kwyw

The raw video can be found by concatenating video_id with the prefix kuaishou.com/short-video. E.g. kuaishou.com/short-video/3xwwuqndapzs6nu.

If you're experiencing any issues with downloading the data file, please don't hesitate to reach out to myscarletpan@gmail.com for assistance.

Statistics

	Full Dump	Subset Dump
#Items	> 26 million	51,702
#Aspects	> 2.5 million	1,074,539
#Videos	> 200 million	769,096

The comparative results with the baseline models are as follows:

Model	Item P	Item R	Item-Aspect P	Item-Aspect R
Random	87.7	49.8	36.4	49.6
LR	90.4	68.3	55.1	2.7
T5-small	93.7	76.1	79.3	58.5
BERT-base	94.3	77.8	81.5	62.7
GPT-3.5	90.5	86.4	41.8	95.7
Ours	94.7	79.7	83.0	65.7

Feel free to explore and utilize this valuable dataset for your research and projects.

Reference

@article{Kuaipedia22,
  author    = {Haojie Pan and
               Yuzhou Zhang and
               Zepeng Zhai and
               Ruiji Fu and
               Ming Liu and
               Yangqiu Song and
               Zhongyuan Wang and
               Bing Qin
               },
  title     = {{Kuaipedia:} a Large-scale Multi-modal Short-video Encyclopedia},
  journal   = {CoRR},
  volume    = {abs/2211.00732},
  year      = {2022}
}

Acknowledgment

Except for the contributers to the paper. We also appreciate efforts and helps from Jingrun Zhang, Yuelei Li, Lijun Mei, Chunguang Pan, Xing Hu, Lingyu Zou, Yang Li, Dexing Yang, Wenzheng Zhao, Guixin Qiu, Lin Yang, Meijuan Yang, Teng Tu, Xinyi Zheng, Yunhui Guo and others who contributes to this project.

Contact Us

If you are insterested in Kuaipedia and more cases, please contact us by e-mail myscarletpan@gmail.com

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
data		data
images		images
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

News

Data

Reference

Acknowledgment

Contact Us

About

Releases

Packages

License

KwaiKEG/Kuaipedia

Folders and files

Latest commit

History

Repository files navigation

News

Data

Reference

Acknowledgment

Contact Us

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages