Consensus Categorization (C²) for MercadoLivre Data Challenge 2019

The C² method is a supervised and transductive learning method based on the Consensus Clustering method that I investigated during my doctorate at ICMC-USP.

Here, the C² method has been adapted to handle the dataset provided by Meli Data Challenge 2019.

In short, the C² method has the following steps:

Preprocess product titles by removing stopwords (English, Portuguese, and Spanish), numbers, and special characters. Source: meli/preprocess.py.
Learn a textual representation for product titles by using fasttext word embeddings. This word embedding is useful for initializing classification models.
Get different dataset samples, both by sampling instances and features. Source: meli/sampling.py
Get different classification models for each sampling. It is important that there is diversity in classification model solutions. Source: meli/models.py
Build a heterogeneous network with the following node types: product, terms, and classification models. Some network nodes are labeled considering the training set and the categories predicted by the classification models. The heterogeneous network is regularized through a consensus function that will return the final categorization. Source: meli/consensus.py

The C² method ranked fourth (private leaderboard) in the Meli Data Challenge 2019. It can be improved by either adding more classification models or tuning the consensus function.

Requirements and Dependencies

python 3
numpy
pandas
keras
gensim
pickle
tqdm
sklearn
networkx
nltk
fasttext (compiled from source code)

How to use?

There is a jupyter notebook describing all the steps for executing the C² method. Some parts need to be adapted to your hardware requirements (if you have multiple GPUs).

The jupyter notebook is available here: meli2019.ipynb.

License

This software is available under MIT license.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
meli		meli
LICENSE.MIT		LICENSE.MIT
README.md		README.md
meli2019.ipynb		meli2019.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Consensus Categorization (C²) for MercadoLivre Data Challenge 2019

Requirements and Dependencies

How to use?

License

About

Releases

Packages

Languages

License

rmarcacini/cc-meli2019

Folders and files

Latest commit

History

Repository files navigation

Consensus Categorization (C²) for MercadoLivre Data Challenge 2019

Requirements and Dependencies

How to use?

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages