job-agg

Scan jobs and aggregate them.

This is a learning project for me more than anything else.

I hopes to see jobs by different agents that are actually the same job by doing loose comparisons. That way if you see 7 adverts for a job you'll know they is one job, not seven!

Running a spider

My first working spiders can be invoked like this to get a file of JSON lines:

cd jobscraper/jobscraper
scrapy crawl jobs-perl-org -o jobs.perl.org.jl
scrapy crawl jobsite-co-uk -o jobsite.jl
scrapy crawl jobsite-single -o j.jl

To view the json lines in a friendly way do:

cat jobs.perl.org.jl | jq .

History

I created a new project in gihub first so I could get the python .gitignore file created for me and did this:

git clone https://github.com/pmooney/job-agg.git
cd job-agg
pipenv --python 3.9
pipenv install scrapy

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
jobscraper		jobscraper
.gitignore		.gitignore
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

job-agg

Running a spider

History

About

Releases

Packages

Languages

pmooney/job-agg

Folders and files

Latest commit

History

Repository files navigation

job-agg

Running a spider

History

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages