Skip to content

This repository reproduces Google's implementations of the Topics API for the Web and for Android.

License

Notifications You must be signed in to change notification settings

privacysandstorm/topics_classifier

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

topics classifier

This repository reproduces Google's implementations of the Topics API for the Web and for Android. This is mainly used in my research to study the privacy and utility guarantees of these proposals: PETS'24 and SecWeb'24.

Getting started

Clone this repository, then install the required dependencies. A Dockerfile is provided under .devcontainer/, see here for direct integration with VS code or for manual deployment instructions.

Usage

usage: python3 classify.py [-h] -mv {chrome1,chrome4,chrome5,android1,android2} -ct {topics-api,model-only,raw-model} -i INPUTS [INPUTS ...] [-id [INPUTS_DESCRIPTION ...]] [-ohr]

Reimplementations of the Topics API

options:
  -h, --help            show this help message and exit
  -id [INPUTS_DESCRIPTION ...], --inputs_description [INPUTS_DESCRIPTION ...]
                        additional input description(s) (for android classification)
  -ohr, --output_human_readable
                        make output human readable, does not work with --classification-type raw-model

required optional arguments:
  -mv {chrome1,chrome4,chrome5,android1,android2}, --model_version {chrome1,chrome4,chrome5,android1,android2}
                        model version to use
  -ct {topics-api,model-only,raw-model}, -classification_type {topics-api,model-only,raw-model}
                        type of classification: either run the full Topics classification (override+model+filtering), the model only (model+filtering), or get the raw classification by the model
  -i INPUTS [INPUTS ...], --inputs INPUTS [INPUTS ...]
                        input(s) to classify

Supported versions

  • chrome1

    • Web model version: 1
    • Override list: 9 254 domains (about 10k)
    • Web taxonomy version: 1 (349 topics)
  • chrome4

    • Web model version: 4
    • Override list: 47 128 domains (about 50k) -> 625 domains are incorrectly formatted in the list shipped by Google, see here
    • Web taxonomy version: 2 (469 topics)
    • Introduction of utility buckets: version 1
  • chrome5

    • Web model version: 5
    • Override list: 45 270 domains (about 45k)
    • Web taxonomy version: 2 (469 topics)
    • Utility buckets version: 1
    • Note: only change with chrome4 is the modification of the override list, see here
  • android1

    • Android model version: 1
    • Override list: 10 012 apps (about 10k)
    • Android taxonomy version: 1 (349 topics)
  • android2

    • Android model version: 2
    • Override list: 10 014 apps (about 10k)
    • Android taxonomy version: 2 (446 topics)

If a new model for the Topics API has been released and is not available here yet, please let me know by contacting me or opening an issue.

About

This repository reproduces Google's implementations of the Topics API for the Web and for Android.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 87.0%
  • Shell 9.4%
  • Dockerfile 3.6%