-
Notifications
You must be signed in to change notification settings - Fork 11
feat: Handle request list user input #326
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request is neither linked to an issue or epic nor labeled as adhoc!
Fix typing issues WIP
TODO: Finish test for it. WIP
Add test for checking all genrated request properties.
Add helper class for input keys Add top level Input class for handling actor inputs.
Add few more tests for regex
It had too many assumptions that users might not be interested in. Users should create such Input helper classes based on their specific inputs and their names.
Update TCH001, TCH002, TCH003 uses.
c237c66
to
6ff9e90
Compare
Sorry for the force-push. Due to my previous mainly Gerrit-based experience I was expecting this Rebase button in Github to do something more clean in the log... |
@@ -141,6 +141,9 @@ indent-style = "space" | |||
docstring-quotes = "double" | |||
inline-quotes = "single" | |||
|
|||
[tool.ruff.lint.flake8-type-checking] | |||
runtime-evaluated-base-classes = ["pydantic.BaseModel", "crawlee.configuration.Configuration"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even though crawlee.configuration.Configuration inherits from pydantic_settings.BaseSettings and that one from pydantic.BaseModel, ruff has some issues in following inheritance hierarchy too far. So until that changes, some models will have to be explicitly mentioned even though they inherit from pydantic.BaseModel.
See closed issue where this is described as known limitation to certain degree:
astral-sh/ruff#8725
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking pretty good! Just a bunch of nits to iron out...
src/apify/storages/request_list.py
Outdated
headers=request_input.headers, | ||
user_data=request_input.user_data, | ||
) | ||
for request_input in simple_url_inputs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw thinking about this class reminded me of this: apify/crawlee#2466
Basically, in JS Crawlee, we used to instantiate all the requests at the beginning (like here), but that ate up much more memory than just keeping the POJO objects (I guess dict
s in Python) and instantiating the Request
class only once we really need it. Maybe Python handles this better (has slimmer class instances).
It was at that time we learned that users love to make >1.000.000 requests long RLs 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a valid point, but I think it will be done later in apify/crawlee-python#99 as the current class is openly not as mature as the JS version:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, let's make sure the SDK supports everything the platform can offer first, then we can optimize.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One last suggestion 🙂
src/apify/storages/_request_list.py
Outdated
url_input_adapter = TypeAdapter(list[Union[_RequestsFromUrlInput, _SimpleUrlInput]]) | ||
|
||
|
||
# @docs_group('Classes') # Not yet available in crawlee |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR here #328
src/apify/storages/request_list.py
Outdated
headers=request_input.headers, | ||
user_data=request_input.user_data, | ||
) | ||
for request_input in simple_url_inputs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, let's make sure the SDK supports everything the platform can offer first, then we can optimize.
Co-authored-by: Jan Buchar <jan.buchar@apify.com>
Add helper function to handle request list inputs.
Closes: #310