Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add more options to Crawlee CLI for crawler creation #414

Closed
siddiqkaithodu opened this issue Aug 9, 2024 · 0 comments · Fixed by #538
Closed

Add more options to Crawlee CLI for crawler creation #414

siddiqkaithodu opened this issue Aug 9, 2024 · 0 comments · Fixed by #538
Assignees
Labels
enhancement New feature or request. t-tooling Issues with this label are in the ownership of the tooling team.

Comments

@siddiqkaithodu
Copy link
Contributor

At the moment, the Crawlee CLI provides two options for creating a crawler in the CLI:

  1. Beautiful Soup
  2. Playwright

It would be great if we could add more alternatives, such as selecting the request module (like curl cffi,httpx) and parser modules (like parsel). This would allow the project to be bootstrapped with the chosen dependencies.

Additionally, it would be super helpful to include the ability to specify the domain name in the CLI we want to start with, similar to how it’s done in Scrapy.

@janbuchar janbuchar added enhancement New feature or request. t-tooling Issues with this label are in the ownership of the tooling team. labels Aug 9, 2024
@janbuchar janbuchar self-assigned this Sep 20, 2024
@vdusek vdusek changed the title Feature Request: Add More Options to Crawlee CLI for Crawler Creation Add more options to Crawlee CLI for crawler creation Nov 27, 2024
Mantisus pushed a commit to Mantisus/crawlee-python that referenced this issue Dec 10, 2024
This adds a unified `crawlee/project_template` template. The original
`playwright` and `beautifulsoup` templates are kept for compatibility
with older versions of the CLI.

The user is now prompted for package manager type (pip, poetry), crawler
type, start URL and whether or not Apify integration should be set up.

- closes apify#317
- closes apify#414 (http client selection is not implemented)
- closes apify#511
- closes apify#495

### TODO

- [x] http client selection
- [x] disable poetry option if it isn't installed
- [x] rectify the pip-based setup
1. **manual dependency installation** - no automatic installation, just
dump requirements.txt and tell the user to handle it any way they want
2. **pip+venv** - dump requirements.txt, make a virtualenv (.venv) using
the current python interpreter, install requirements and tell user to
activate it
- ~should be disabled if `venv` module is not present~ it's stdlib
- [x] test the whole thing on Windows (mainly the various package
manager configurations)
- [x] fix how cookiecutter.json is read (it is not present when
installing via pip)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request. t-tooling Issues with this label are in the ownership of the tooling team.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants