Skip to content
This repository has been archived by the owner on May 18, 2024. It is now read-only.

SenseTW/sieve

Repository files navigation

Sieve collects data from various sources for training algorithims. Currently it pulls data from sense.tw.

Configuration

  1. Copy sieve.conf.template to sieve.conf and edit it.

Installation

$ make all
# Create a Google Sheets credential `client_secret.json` by following <https://pygsheets.readthedocs.io/en/latest/authorizing.html>
$ pipenv run ./annotation_to_gsheets.py
# First time startup will create a credential `sheets.googleapis.com-python.json` with non-local OAuth authentication.

Usage

$ pipenv run ./annotation_to_gsheets.py

The resulting data are collected in https://drive.google.com/drive/folders/1lFemgEeleSVN7BwU7_2LzkjBLAgoFg-n?usp=sharing.

Google Drive

  • An index of all tabular data sheets is saved in "Sense.tw Tabular Index" with columns: id, title, uri, last_updated.
  • Each annotated link is saved to a sheet with title as the linked page title with columns: id, target, tags.

Docker

docker build -t asia.gcr.io/ggv-notetool/sieve:latest .

Kubernetes

  • Create ConfigMap before deploy
kubectl create configmap sieve-config --from-file=sheets.googleapis.com-python.json --from-file=sieve.conf
kubectl create -f gcloud/sieve.yaml

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published