Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RootPathMiddleware: group items in packages #944

Merged
merged 9 commits into from
Jun 8, 2022

Conversation

yolile
Copy link
Member

@yolile yolile commented Jun 2, 2022

closes #942

@yolile yolile marked this pull request as draft June 2, 2022 17:57
@yolile
Copy link
Member Author

yolile commented Jun 2, 2022

@jpmckinney I will update the tests if you are ok with the approach

kingfisher_scrapy/spidermiddlewares.py Show resolved Hide resolved
kingfisher_scrapy/spidermiddlewares.py Outdated Show resolved Hide resolved
kingfisher_scrapy/spidermiddlewares.py Outdated Show resolved Hide resolved
kingfisher_scrapy/spidermiddlewares.py Show resolved Hide resolved
kingfisher_scrapy/spidermiddlewares.py Outdated Show resolved Hide resolved
kingfisher_scrapy/spidermiddlewares.py Outdated Show resolved Hide resolved
kingfisher_scrapy/spidermiddlewares.py Outdated Show resolved Hide resolved
kingfisher_scrapy/spidermiddlewares.py Outdated Show resolved Hide resolved
kingfisher_scrapy/spidermiddlewares.py Show resolved Hide resolved
kingfisher_scrapy/spidermiddlewares.py Outdated Show resolved Hide resolved
@yolile yolile marked this pull request as ready for review June 5, 2022 21:56
@yolile yolile requested a review from jpmckinney June 5, 2022 21:57
…bject.

chore: Move some calculations outside of for-loops.
test: Test RootPathMiddleware with FileItem as input.
docs: Clarify what spider middlewares modify. Add comments to test cases.
@jpmckinney
Copy link
Member

jpmckinney commented Jun 7, 2022

I added a commit. I'm not sure what the appropriate logic should be for the last test case I added: https://github.com/open-contracting/kingfisher-collect/pull/944/files#diff-1d540996fd6e3d41c94656e00cd66d49ee9e5951ca8fc23a00a500475958a498R428-R431

The PR before my commit would not yield anything if the root_path pointed to an empty object. However, the middleware before the PR would yield an item (like after my commit).

@yolile
Copy link
Member Author

yolile commented Jun 7, 2022

I'm not sure what the appropriate logic should be for the last test case

Hmm, for data use, I guess it is better to drop/not yield empty objects. However, for checking the data and informing the publisher, I think leaving the data as it comes is preferred.

@jpmckinney
Copy link
Member

Yes - and hopefully it's rarely or never the case that it is empty, so shouldn't be too much a problem for users.

@yolile yolile merged commit 5576124 into main Jun 8, 2022
@yolile yolile deleted the 942-improve-root-path-middleware branch June 8, 2022 00:06
@jpmckinney
Copy link
Member

FYI, I added a commit to not yield anything if a root_path points to an empty array, to simplify the logic 862dd0b

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Limit the number of FileItem with spider middlewares
2 participants