Skip to content

Consensus Categorization for MercadoLivre Data Challenge 2019

License

Notifications You must be signed in to change notification settings

rmarcacini/cc-meli2019

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Consensus Categorization (C²) for MercadoLivre Data Challenge 2019

The C² method is a supervised and transductive learning method based on the Consensus Clustering method that I investigated during my doctorate at ICMC-USP.

Here, the C² method has been adapted to handle the dataset provided by Meli Data Challenge 2019.

In short, the C² method has the following steps:

  • Preprocess product titles by removing stopwords (English, Portuguese, and Spanish), numbers, and special characters. Source: meli/preprocess.py.
  • Learn a textual representation for product titles by using fasttext word embeddings. This word embedding is useful for initializing classification models.
  • Get different dataset samples, both by sampling instances and features. Source: meli/sampling.py
  • Get different classification models for each sampling. It is important that there is diversity in classification model solutions. Source: meli/models.py
  • Build a heterogeneous network with the following node types: product, terms, and classification models. Some network nodes are labeled considering the training set and the categories predicted by the classification models. The heterogeneous network is regularized through a consensus function that will return the final categorization. Source: meli/consensus.py

The C² method ranked fourth (private leaderboard) in the Meli Data Challenge 2019. It can be improved by either adding more classification models or tuning the consensus function.

Requirements and Dependencies

  • python 3
  • numpy
  • pandas
  • keras
  • gensim
  • pickle
  • tqdm
  • sklearn
  • networkx
  • nltk
  • fasttext (compiled from source code)

How to use?

There is a jupyter notebook describing all the steps for executing the C² method. Some parts need to be adapted to your hardware requirements (if you have multiple GPUs).

The jupyter notebook is available here: meli2019.ipynb.

License

This software is available under MIT license.

About

Consensus Categorization for MercadoLivre Data Challenge 2019

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published