text-search

install:

./env.sh
source ./virtualenv/bin/activate
pip3 install -r requirements.txt

usage:

python3 search/scraper.py

tests:

pytest

The goal of this script is to have a way to search through text content retrieved from various web apis.

For example, thinking about tumblr, it would be convenient to search through all text content of one blog, including tags, comments, quotes, and titles, not just through specific tags or by scrolling through the archive.

Another example would be searching a user's spotify playlists for a song, based only on fragments of a title, artist, or album name.

HTML processing loosely based on https://first-web-scraper.readthedocs.io/en/latest/

Future goal: enable fuzzy matching by building a graph of words connected by edit distance.

Input options:

 -f --fresh         # clear out and refresh data (takes time)
 -n --num_pages     # number of pages/batches to download

Returns: clickable links to results

DATA STORAGE:

dictionary:
keyword : [postID1, postID2...]

second dictionary:
postID : (Object)

// DEPRECATED //

Input options: search for multiple words appearing in the same post (order irrelevant) // 'downloaded' flag should prevent the request call and instead search a pre-processed dataset that was already downloaded (but therefore doesn't include the newest posts) // 'short-circuit' flag should quit looking when the provided argument (number) of matches has been found // -q quiet: don't print full text // -u update: finds and adds newest posts to data set // ordered phrases somehow

SEARCH:

lookup all keywords
intersection (or union, based on arguments) of return sets
return links + post text

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
search		search
tests		tests
.gitignore		.gitignore
README.md		README.md
env.sh		env.sh
requirements.txt		requirements.txt
run.sh		run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

text-search

About

Releases

Packages

Languages

zjipsen/text-search

Folders and files

Latest commit

History

Repository files navigation

text-search

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages