Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove build_from_settings() and deprecate from_settings() code #6540

Merged
merged 10 commits into from
Nov 14, 2024

Conversation

wRAR
Copy link
Member

@wRAR wRAR commented Nov 12, 2024

Fixes #6534

  • Deprecate calling from_settings() in build_from_crawler() and MiddlewareManager._build_from_settings()
  • Redo from_crawler()/from_settings() in MiddlewareManager
  • Deprecate ISpiderLoader.from_settings() and add ISpiderLoader.from_crawler() without breaking verifyClass It looks like SpiderLoader cannot take a crawler and it's created without build_from_crawler() anyway.
  • Reshuffle from_crawler()/from_settings() in MediaPipeline and its subclasses in a backwards-compatible way

@wRAR wRAR added this to the Scrapy 2.12 milestone Nov 12, 2024
@wRAR wRAR marked this pull request as ready for review November 14, 2024 08:18
@wRAR
Copy link
Member Author

wRAR commented Nov 14, 2024

Most changes here are simple: if a class had from_settings() but not from_crawler() we want it to have from_crawler() but it can keep from_settings() as an implementation detail that is not called by the public interfaces.

Summary of changes:

  1. If a Scrapy class had from_settings() but not from_crawler() it now has from_crawler(). One exception is ISpiderLoader that is created before the crawler.
  2. Scrapy classes' from_settings() now produce a deprecation warning.
  3. When a class instance (Scrapy or not) is created via build_from_crawler(), and it has from_settings() but not from_crawler() a deprecation warning will be produced.
  4. from_settings() and from_crawler() methods of MiddlewareManager are switched so that from_crawler() is preferred and the crawler instance is used whenever possible, with appropriate deprecation warnings. As there is an (unlikely to be used in practice) code path that needs to create middleware instances while not having a crawler instance, a simplified copy of build_from_settings() was added to the class.
  5. The initialization code of MediaPipeline and its subclasses is improved to be more in line with the usual from_crawler() pattern, more on this below.

@wRAR
Copy link
Member Author

wRAR commented Nov 14, 2024

Regarding the media pipelines.

The current initialization order is as follows:

  1. There is from_settings() that is unlikely to be called directly as it leaves out some important bits so I don't care about this scenarion. Same for calling __init__() directly.
  2. The pipelines are created by Scrapy via from_crawler() which needs to continue being the entry point. The current MediaPipeline.from_crawler() creates the object by calling cls.from_settings() or cls() and then adds the aforementioned important bits that need a crawler instance.
  3. from_settings() is overridden in FilesPipeline (and, we must assume, in some custom MediaPipeline subclasses) and creates the object via its constructor.
  4. The constructors are non-trivial and access the settings but not the crawler. In subclasses they can also have additional arguments not available in the base class.

What we want is to move the initialization out of MediaPipeline.from_crawler() so that from_crawler() can be overridden (it currently can't be, if you want to also call the super one). It logically follows from this that we want the constructors to take the crawler instance and, from this, that they should take the settings from there, not passed separately. This leaves MediaPipeline.from_crawler() to be trivial and safely overridable.

Now, it is possible to subclass either MediaPipeline or FilesPipeline and override or not override from_crawler(), from_settings() and/or __init__() with or without calling the super ones. I don't think we can test or even support all of these cases, though we can try. I think we need some directions in the "backwards-incompatible changes" release notes section, though in most cases the users should only get a bunch of deprecation warnings.

Copy link

codecov bot commented Nov 14, 2024

Codecov Report

Attention: Patch coverage is 39.58333% with 87 lines in your changes missing coverage. Please review.

Project coverage is 32.98%. Comparing base (bcef965) to head (929d665).
Report is 13 commits behind head on master.

Files with missing lines Patch % Lines
scrapy/pipelines/files.py 33.33% 22 Missing ⚠️
scrapy/pipelines/media.py 21.73% 18 Missing ⚠️
scrapy/pipelines/images.py 30.00% 14 Missing ⚠️
scrapy/middleware.py 47.61% 11 Missing ⚠️
scrapy/dupefilters.py 46.15% 7 Missing ⚠️
scrapy/extensions/feedexport.py 0.00% 5 Missing ⚠️
scrapy/core/downloader/contextfactory.py 62.50% 3 Missing ⚠️
scrapy/spidermiddlewares/urllength.py 66.66% 3 Missing ⚠️
scrapy/mail.py 77.77% 2 Missing ⚠️
scrapy/extensions/statsmailer.py 0.00% 1 Missing ⚠️
... and 1 more
Additional details and impacted files
@@             Coverage Diff             @@
##           master    #6540       +/-   ##
===========================================
- Coverage   87.83%   32.98%   -54.86%     
===========================================
  Files         162      162               
  Lines       12011    12085       +74     
  Branches     1544     1557       +13     
===========================================
- Hits        10550     3986     -6564     
- Misses       1163     7972     +6809     
+ Partials      298      127      -171     
Files with missing lines Coverage Δ
scrapy/extensions/memusage.py 35.64% <100.00%> (-12.88%) ⬇️
scrapy/extensions/statsmailer.py 0.00% <0.00%> (ø)
scrapy/utils/misc.py 32.87% <0.00%> (-67.13%) ⬇️
scrapy/mail.py 43.43% <77.77%> (-33.24%) ⬇️
scrapy/core/downloader/contextfactory.py 49.25% <62.50%> (-38.89%) ⬇️
scrapy/spidermiddlewares/urllength.py 50.00% <66.66%> (-39.29%) ⬇️
scrapy/extensions/feedexport.py 31.02% <0.00%> (-54.24%) ⬇️
scrapy/dupefilters.py 40.27% <46.15%> (-58.11%) ⬇️
scrapy/middleware.py 65.27% <47.61%> (-32.97%) ⬇️
scrapy/pipelines/images.py 28.57% <30.00%> (+1.74%) ⬆️
... and 2 more

... and 138 files with indirect coverage changes

@wRAR wRAR merged commit ab5cb7c into scrapy:master Nov 14, 2024
30 of 33 checks passed
@wRAR wRAR deleted the build_from_settings branch November 14, 2024 15:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Don't ship build_from_settings()
2 participants