A basic API to scrape Craigslist.
Most useful for viewing posts across a broad geographic area or for viewing posts within a specific timeframe.
- bs4 (BeautifulSoup)
- shutil
- requests
- datetime
- PyQt5
- subprocess
Note: All of these packages should be available from standard distributions, such as Anaconda.
from CLAPI import CraigsList
cities = []
for state in ['AL', 'AK', 'AZ', 'AR']:
cities += CraigsList.GetCitiesByState(state)
hours = float(input('Posts in the last x hours >> '))/24
query = input('Query >> ')
for city in cities:
print('Parsing %s...' % city)
cl = CraigsList(city, query, CraigsList.SORT_RELEVANT, lookback=hours)
posts += cl.posts
CraigsList.OpenViewer(posts, maxImgs=3)
The above example scrapes the posts during the lookback period for every city with a Craiglist in the specified states. These posts are presented to the user in a simple PyQt5 GUI for rapid browsing. The user can quickly open the associated post webpage or post location via buttons on the GUI.
Note: if you use a browser other than chrome, you will want to modify the subprocess call in the MainWindowHandlers.py file such that you call the appropriate browser.
Be aware, this program will create a temporary directory within your current working directory, called 'tmp' in which the Craigslist thumbnail images are downloaded. When the program exits without errors, this temporary directory will be deleted.