Skip to content

Optimize package footprint by removing unnecessary deps #1077

@vdusek

Description

@vdusek

Description

The current installation of the Crawlee package, when combined with all its direct and transitive dependencies, occupies ~75 MB. For context, Scrapy occupies ~88.4 MB.

$ (venv)  du -h venv/ --max-depth 0
6,6M	venv/
(venv) $ pip install crawlee
...
(venv) $ du -h venv/ --max-depth 0
82M	venv/

The large size is primarily due to several dependencies that may not be strictly necessary for the core functionality. Below is a detailed breakdown of the dependency tree, individual package sizes, and the summed sizes of Crawlee's direct dependencies.

Dependency tree

crawlee v0.6.4
    ├── apify-fingerprint-datapoints v0.0.2
    ├── browserforge v1.2.3
    │   └── click v8.1.8
    ├── cachetools v5.5.2
    ├── colorama v0.4.6
    ├── docutils v0.21.2
    ├── eval-type-backport v0.2.2
    ├── httpx[brotli, http2, zstd] v0.28.1
    │   ├── anyio v4.8.0
    │   │   ├── idna v3.10
    │   │   └── sniffio v1.3.1
    │   ├── certifi v2025.1.31
    │   ├── httpcore v1.0.7
    │   │   ├── certifi v2025.1.31
    │   │   └── h11 v0.14.0
    │   ├── idna v3.10
    │   ├── brotli v1.1.0 (extra: brotli)
    │   ├── h2 v4.2.0 (extra: http2)
    │   │   ├── hpack v4.1.0
    │   │   └── hyperframe v6.1.0
    │   └── zstandard v0.23.0 (extra: zstd)
    ├── more-itertools v10.6.0
    ├── psutil v7.0.0
    ├── pydantic v2.10.6
    │   ├── annotated-types v0.7.0
    │   ├── pydantic-core v2.27.2
    │   │   └── typing-extensions v4.12.2
    │   └── typing-extensions v4.12.2
    ├── pydantic-settings v2.6.1
    │   ├── pydantic v2.10.6 (*)
    │   └── python-dotenv v1.0.1
    ├── pyee v12.1.1
    │   └── typing-extensions v4.12.2
    ├── rich v13.9.4
    │   ├── markdown-it-py v3.0.0
    │   │   └── mdurl v0.1.2
    │   └── pygments v2.19.1
    ├── sortedcollections v2.1.0
    │   └── sortedcontainers v2.4.0
    ├── tldextract v5.1.3
    │   ├── filelock v3.17.0
    │   ├── idna v3.10
    │   ├── requests v2.32.3
    │   │   ├── certifi v2025.1.31
    │   │   ├── charset-normalizer v3.4.1
    │   │   ├── idna v3.10
    │   │   └── urllib3 v2.3.0
    │   └── requests-file v2.1.0
    │       └── requests v2.32.3 (*)
    ├── typing-extensions v4.12.2
    └── yarl v1.18.3
        ├── idna v3.10
        ├── multidict v6.1.0
        └── propcache v0.3.0

Package sizes

23M .venv/lib/python3.13/site-packages/zstandard
7,2M    .venv/lib/python3.13/site-packages/_brotli.cpython-313-x86_64-linux-gnu.so
4,9M    .venv/lib/python3.13/site-packages/pygments
4,7M    .venv/lib/python3.13/site-packages/pydantic_core
2,4M    .venv/lib/python3.13/site-packages/docutils
1,9M    .venv/lib/python3.13/site-packages/pydantic
1,1M    .venv/lib/python3.13/site-packages/yarl
1,1M    .venv/lib/python3.13/site-packages/rich
1,1M    .venv/lib/python3.13/site-packages/crawlee
1,0M    .venv/lib/python3.13/site-packages/psutil
836K    .venv/lib/python3.13/site-packages/apify_fingerprint_datapoints
784K    .venv/lib/python3.13/site-packages/propcache
484K    .venv/lib/python3.13/site-packages/urllib3
456K    .venv/lib/python3.13/site-packages/multidict
452K    .venv/lib/python3.13/site-packages/charset_normalizer
436K    .venv/lib/python3.13/site-packages/anyio
376K    .venv/lib/python3.13/site-packages/markdown_it
368K    .venv/lib/python3.13/site-packages/tldextract
368K    .venv/lib/python3.13/site-packages/click
352K    .venv/lib/python3.13/site-packages/idna
328K    .venv/lib/python3.13/site-packages/httpx
324K    .venv/lib/python3.13/site-packages/httpcore
308K    .venv/lib/python3.13/site-packages/certifi
260K    .venv/lib/python3.13/site-packages/h2
236K    .venv/lib/python3.13/site-packages/h11
228K    .venv/lib/python3.13/site-packages/requests
228K    .venv/lib/python3.13/site-packages/more_itertools
228K    .venv/lib/python3.13/site-packages/hpack
136K    .venv/lib/python3.13/site-packages/pydantic_settings
132K    .venv/lib/python3.13/site-packages/typing_extensions.py
124K    .venv/lib/python3.13/site-packages/sortedcontainers
120K    .venv/lib/python3.13/site-packages/browserforge
80K .venv/lib/python3.13/site-packages/colorama
60K .venv/lib/python3.13/site-packages/filelock
52K .venv/lib/python3.13/site-packages/pyee
52K .venv/lib/python3.13/site-packages/dotenv
44K .venv/lib/python3.13/site-packages/hyperframe
36K .venv/lib/python3.13/site-packages/mdurl
36K .venv/lib/python3.13/site-packages/cachetools
28K .venv/lib/python3.13/site-packages/sortedcollections
24K .venv/lib/python3.13/site-packages/annotated_types
16K .venv/lib/python3.13/site-packages/sniffio
16K .venv/lib/python3.13/site-packages/eval_type_backport
8,0K    .venv/lib/python3.13/site-packages/_virtualenv.py
8,0K    .venv/lib/python3.13/site-packages/requests_file.py
4,0K    .venv/lib/python3.13/site-packages/_virtualenv.pth
4,0K    .venv/lib/python3.13/site-packages/brotli.py

Extracted via:

du -sh .venv/lib/python*/site-packages/* | sort -hr

Total size per direct dependency

  • httpx[brotli, http2, zstd]: 32.732M
  • pydantic‑settings: 6.944M
  • pydantic: 6.756M
  • rich: 6.412M
  • yarl: 2.692M
  • docutils: 2.4M
  • tldextract: 2.260M
  • psutil: 1.0M
  • apify‑fingerprint‑datapoints: 836K
  • browserforge: 488K
  • more‑itertools: 228K
  • pyee: 184K
  • sortedcollections: 152K
  • typing‑extensions: 132K
  • colorama: 80K
  • cachetools: 36K
  • eval‑type‑backport: 16K

Goal

The goal is to identify and potentially remove or replace dependencies that contribute significantly to the overall package size without compromising its functionality.

Metadata

Metadata

Assignees

Labels

t-toolingIssues with this label are in the ownership of the tooling team.

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions