Skip to content

Scrapes the CVC Refranero Multingüe database for commonly used Spanish idioms

Notifications You must be signed in to change notification settings

purohit/refranero-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This program crawls Spanish idioms from the Refranero Multingüe, maintained by the Centro Virtual Cervantes: http://cvc.cervantes.es/lengua/refranero/Default.aspx. The center reserves all rights (reservados todos los derechos), so you likely can only use the data non-commercially.

It outputs all idioms with Spanish definitions and their usage. If you just want the output, see the idioms.tsv file included here.

Usage:

0. Install dependencies:

    go get .

1. Build the refranero-scraper program:

    go build

2. Get link slugs for all idioms:

    refranero-scraper -print-slugs > slugs.txt

3. Parse link slugs, and print out the idioms and definitions. It could take a while to parse these ~2000 pages.

    < slugs.txt | refranero-scraper -read-slugs > idioms_and_definitions.txt

About

Scrapes the CVC Refranero Multingüe database for commonly used Spanish idioms

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages