Skip to content

LisaHoek/dmozclassification

Repository files navigation

Web classification using DMOZ

Lisa Hoek - Radboud University, Nijmegen

This is the Github repository of my Computing Science bachelor Thesis.
It contains the following Notebooks and Python programs:

  • ObtainDMOZdata.ipynb: Google Colab Notebook reading DMOZ content.rdf.u8 and storing it in a dataframe
  • Lisa_Thesis0_join.json: Zeppelin Notebook retrieving DMOZ and CC and executing their join
  • Lisa_Thesis1_getWebContent.json: Zeppelin Notebook retrieving the Web page content for each URL
  • Lisa_Thesis2_classifier(I,II,III).json: Zeppelin Notebooks, each containing the process of selecting the data and its labels, and the process of training, testing and evaluating the classifier
  • plotMatrix-I,II,III: Python programs for visualisation of the confusion matrices

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published