Skip to content

pmooney/job-agg

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

job-agg

Scan jobs and aggregate them.

This is a learning project for me more than anything else.

I hopes to see jobs by different agents that are actually the same job by doing loose comparisons. That way if you see 7 adverts for a job you'll know they is one job, not seven!

Running a spider

My first working spiders can be invoked like this to get a file of JSON lines:

  • cd jobscraper/jobscraper
  • scrapy crawl jobs-perl-org -o jobs.perl.org.jl
  • scrapy crawl jobsite-co-uk -o jobsite.jl
  • scrapy crawl jobsite-single -o j.jl

To view the json lines in a friendly way do:

  • cat jobs.perl.org.jl | jq .

History

I created a new project in gihub first so I could get the python .gitignore file created for me and did this:

About

Scan jobs and aggregate them

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages