This project allows users to upload a CSV file containing URLs, scrape metadata (title, description, keywords), and store results securely using FastAPI, PostgreSQL, Redis, and Celery.
- User Authentication (OAuth2 + Access Tokens)
- Upload CSV containing URLs
- Scrape Metadata asynchronously using Celery workers
- Store Data in PostgreSQL Cloud
- Monitor Performance using Prometheus + Grafana
- Deploy via Docker
- CI/CD automated using GitHub Actions (Deploys Docker image to DockerHub)
git clone <repo_url>
cd url-scraperCreate a .env file with the following:-
DATABASE_URL=your_postgresql_cloud_url
REDIS_URL=your_redis_cloud_url
For local development, use:
DATABASE_URL=your_postgresql_cloud_url
REDIS_URL=redis://localhost:6379/0
pip install -r requirements.txtdocker-compose up -d redisdocker-compose up --buildCelery is used for asynchronous scraping.
celery -A utils.celery_worker.celery_app worker --loglevel=infoTo monitor tasks, start the Celery Flower dashboard:
celery -A utils.celery_worker.celery_app flower| Method | Endpoint | Description |
|---|---|---|
| POST | /auth/register |
Register new user |
| POST | /auth/login |
Login and get access token |
| POST | /auth/token |
OAuth2 token authentication |
| Method | Endpoint | Description |
|---|---|---|
| POST | /upload-csv/ |
Upload a CSV file of URLs |
| GET | /task-status/{task_id} |
Check scraping task status |
| GET | /results/{task_id} |
View scraped data from task_id |
| GET | /urls |
View scraped metadata of a user |
docker-compose up --build- Deployed URL: CSV URL Metadata Scraper
- Create a PostgreSQL database and get the
DATABASE_URL - Deploy using Render's FastAPI template
Our Docker image is automatically built and pushed to DockerHub on every CI/CD pipeline execution:
- DockerHub URL: anotherpersonwhodontknow/url_scraper
To pull and run the image locally:
docker pull anotherpersonwhodontknow/url_scraperdocker run -p 8000:8000 anotherpersonwhodontknow/url_scraperdocker run -d --name=prometheus -p 9090:9090 prom/prometheusdocker run -d --name=grafana -p 3000:3000 grafana/grafana- Add Prometheus as a data source in Grafana.
- Create dashboards for API request count and error rates.
This project is open-source and available under the MIT License.