`dataskop-scrapers`

Scrapers, parsers, data wrangling and utilities for TikTok and YouTube.

Dev setup

We store large files with git lfs. We manage our monorepo with turborepo. We publish new releases with changeset

Release a new version

TikTok scraper: schaufel

TikTok utilities for DataSkop

Deployment of the TikTok scraper

We have a specific setup to run the scraper on the server.

Requirements

a Mullvad subscriptions (you need to change the code if you choose another VPN provider)
a 'Logs Data Platform' instance in Gravelines (GRA) on OVH
NPM_GITHUB_AUTH token to read private packages on GitHub

Create the dot env file `docker/.env`

# schaufel / DataSkop
PLATFORM_URL=https://dataskop-platform-url.net
SERIOUS_PROTECTION=basic-auth-pw
API_KEY=drf-api-key

# gluetun
VPN_SERVICE_PROVIDER=mullvad
VPN_TYPE=wireguard
WIREGUARD_PRIVATE_KEY=private-key
WIREGUARD_ADDRESSES=ip-address
SERVER_CITIES=a-city
DOT=off

# Send Logs to OVH
_X-OVH-TOKEN=ovh-logs-data-stream-token

Deploy script

# `deploy.sh`
#!/usr/bin/env bash

rsync -avz --exclude node_modules --exclude .git --exclude docker/volume --exclude docker/gluetun-volume  --exclude test . sshlocation:~/code/schaufel
ssh awlab1 "cd code/schaufel && NPM_GITHUB_AUTH=the_token docker-compose up --detach --build"

Commands

Merge test lookups (playwright) to dev lookups

cd packages/schaufel-cli
npm run merge-lookups ~/Library/Application\ Support/Electron/databases/lookup.json ~/Library/Application\ Support/dataskop-electron/databases/lookup.json

Scrape TikTok videos

Conntect to any Mullvad server and then do the following:

cd packages/schaufel-cli
npm run scrape-meta https://www.tiktok.com/@newmartina/video/7232019489674562842 https://www.tiktok.com/@victordemartrin/video/7228575676335443226

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 275 Commits
.changeset		.changeset
.github/workflows		.github/workflows
.vscode		.vscode
docker		docker
packages		packages
.dockerignore		.dockerignore
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
.npmignore		.npmignore
.npmrc		.npmrc
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
package-lock.json		package-lock.json
package.json		package.json
turbo.json		turbo.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

`dataskop-scrapers`

Dev setup

Release a new version

TikTok scraper: schaufel

Deployment of the TikTok scraper

Requirements

Create the dot env file `docker/.env`

Deploy script

Commands

Merge test lookups (playwright) to dev lookups

Scrape TikTok videos

License

About

Releases 18

Packages

Contributors 3

Languages

License

algorithmwatch/dataskop-scrapers

Folders and files

Latest commit

History

Repository files navigation

dataskop-scrapers

Dev setup

Release a new version

TikTok scraper: schaufel

Deployment of the TikTok scraper

Requirements

Create the dot env file docker/.env

Deploy script

Commands

Merge test lookups (playwright) to dev lookups

Scrape TikTok videos

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 18

Packages 0

Contributors 3

Languages

`dataskop-scrapers`

Create the dot env file `docker/.env`

Packages