Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor!: Preparation for v2 release #210

Merged
merged 69 commits into from
Aug 27, 2024
Merged
Show file tree
Hide file tree
Changes from 52 commits
Commits
Show all changes
69 commits
Select commit Hold shift + click to select a range
70ff5cf
Update pyproject.toml
janbuchar Jun 4, 2024
12d3a09
Use storage code from crawlee, add ApifyStorageClient
janbuchar Jun 4, 2024
fd848e9
Consolidate config files
janbuchar Jun 5, 2024
5787323
Remove obsolete tests
janbuchar Jun 5, 2024
38debc4
Use id argument
janbuchar Jun 6, 2024
051373c
Remove obsolete utils
janbuchar Jun 7, 2024
b026bc0
Rework configuration parsing
janbuchar Jun 7, 2024
79817e7
Implement force_cloud option
janbuchar Jun 10, 2024
fe70d9a
Fix Actor.get_env
janbuchar Jun 10, 2024
01a3d23
Use RecurringTask from crawlee
janbuchar Jun 10, 2024
c943210
Use timedelta instead of numbers of seconds/milliseconds in Actor class
janbuchar Jun 10, 2024
57a402c
Turns out there shouldn't be any recurring tasks in Actor
janbuchar Jun 12, 2024
1620dfe
Implement PlatformEventManager
janbuchar Jun 12, 2024
fd20bba
Use LocalEventManager when not on platform
janbuchar Jun 13, 2024
ee0e959
Remove dual properties and methods
janbuchar Jun 13, 2024
a084bcf
Use crypto utils from crawlee where possible
janbuchar Jun 13, 2024
4b3b581
Remove obsolete tests
janbuchar Jun 13, 2024
47c2c13
Migrate ProxyConfiguration
janbuchar Jun 13, 2024
7abbe38
Use correct types in RequestQueueClient
janbuchar Jun 18, 2024
b5765d4
Update Actor class
janbuchar Jun 27, 2024
23b822d
Update ProxyConfiguration class
janbuchar Jun 27, 2024
2988a15
Add all known and supported config options
janbuchar Jun 27, 2024
697087e
Update tests
janbuchar Jun 27, 2024
c565adb
Use crawlee 0.10
janbuchar Jul 16, 2024
9743d4b
Use newer python in CI
janbuchar Jul 16, 2024
dfaadb3
Merge remote-tracking branch 'origin/master' into v2
janbuchar Jul 17, 2024
4057752
Resolve lint errors
janbuchar Jul 17, 2024
6b9c93f
mypy: ignore assignment to method
janbuchar Jul 17, 2024
6b45143
Ignore untyped imports from scrapy
janbuchar Aug 6, 2024
ea7b418
Fix scrapy integration
janbuchar Aug 7, 2024
e228a44
Fix type errors in integration tests
janbuchar Aug 7, 2024
087fb12
Increase crawlee dependency version
janbuchar Aug 7, 2024
dbacef7
Fix more type errors
janbuchar Aug 7, 2024
5a57a14
Update test
janbuchar Aug 7, 2024
2714e8d
Remove obsolete test case
janbuchar Aug 8, 2024
34a9e62
Fix handling of configuration on platform
janbuchar Aug 9, 2024
0547bb3
Hackily fix cloud storage usage
janbuchar Aug 9, 2024
57057d2
Fix force_cloud tests
janbuchar Aug 9, 2024
325fdd0
Fix interval events
janbuchar Aug 9, 2024
a388cc1
Fix unit test
janbuchar Aug 9, 2024
60f5468
Lint
janbuchar Aug 9, 2024
7a829d3
Fix actor log test
janbuchar Aug 9, 2024
76a5ea5
Fix get_public_url test
janbuchar Aug 9, 2024
7488a11
Fix actor_lifecycle test
janbuchar Aug 9, 2024
d11cb4f
Fix request queue stuff
janbuchar Aug 9, 2024
b7e7d8c
Remove old consts
janbuchar Aug 12, 2024
855972b
Use CrawleeLogFormatter
janbuchar Aug 12, 2024
9aa4ea3
Remove asserts
janbuchar Aug 12, 2024
01c1dc7
Update test
janbuchar Aug 12, 2024
845071f
actor -> Actor
janbuchar Aug 12, 2024
10bb6d5
Update to work with a future Crawlee
janbuchar Aug 15, 2024
391c8de
Clear services on actor exit
janbuchar Aug 15, 2024
1650b7a
Remove Actor.main
janbuchar Aug 22, 2024
0690a7f
Update log format in test
janbuchar Aug 23, 2024
16cdd93
Exclude new Request fields from API request bodies
janbuchar Aug 23, 2024
1edbbfd
Keep a client_key
janbuchar Aug 23, 2024
1effda8
Update for compatibility with Crawlee 0.3
janbuchar Aug 26, 2024
9d70986
Reorganize imports
janbuchar Aug 26, 2024
e3c39e8
Comment
janbuchar Aug 26, 2024
f900c58
Put Actor() back
janbuchar Aug 26, 2024
cf53178
Replace Werkzeug to save some disk space
janbuchar Aug 26, 2024
85b5603
event_manager -> _platform_event_manager
janbuchar Aug 26, 2024
68903cb
Hide non-public members
janbuchar Aug 26, 2024
c944c15
Hide non-public parts of apify_storage_client
janbuchar Aug 26, 2024
c129c68
Remove useless type info from docstrings
janbuchar Aug 27, 2024
64159dc
Add Configuration field descriptions
janbuchar Aug 27, 2024
e4261ae
Shuffle around some imports
janbuchar Aug 27, 2024
671b2ce
Remove obsolete stuff
janbuchar Aug 27, 2024
53c06a6
Fix integration tests
janbuchar Aug 27, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ tests with HTML coverage report execute `make unit-tests-cov`.

## Integration tests

We have integration tests which build and run actors using the Python SDK on the Apify Platform. To run these tests,
We have integration tests which build and run Actors using the Python SDK on the Apify Platform. To run these tests,
you need to set the `APIFY_TEST_USER_API_TOKEN` environment variable to the API token of the Apify user you want to
use for the tests, and then start them with `make integration-tests`.

Expand Down
2 changes: 1 addition & 1 deletion docs/02-guides/02-beautiful-soup.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ async def main():
max_depth = actor_input.get('max_depth', 1)

if not start_urls:
Actor.log.info('No start URLs specified in actor input, exiting...')
Actor.log.info('No start URLs specified in Actor input, exiting...')
await Actor.exit()

# Enqueue the starting URLs in the default request queue
Expand Down
4 changes: 2 additions & 2 deletions docs/02-guides/03-playwright.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ To create Actors which use Playwright, start from the [Playwright & Python](http
On the Apify platform, the Actor will already have Playwright and the necessary browsers preinstalled in its Docker image,
including the tools and setup necessary to run browsers in headful mode.

When running the Actor locally, you'll need to finish the Playwright setup yourself before you can run the actor.
When running the Actor locally, you'll need to finish the Playwright setup yourself before you can run the Actor.

<Tabs groupId="operating-systems">
<TabItem value="unix" label="Linux / macOS" default>
Expand Down Expand Up @@ -69,7 +69,7 @@ async def main():
max_depth = actor_input.get('max_depth', 1)

if not start_urls:
Actor.log.info('No start URLs specified in actor input, exiting...')
Actor.log.info('No start URLs specified in Actor input, exiting...')
await Actor.exit()

# Enqueue the starting URLs in the default request queue
Expand Down
2 changes: 1 addition & 1 deletion docs/02-guides/04-selenium.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ async def main():
max_depth = actor_input.get('max_depth', 1)

if not start_urls:
Actor.log.info('No start URLs specified in actor input, exiting...')
Actor.log.info('No start URLs specified in Actor input, exiting...')
await Actor.exit()

# Enqueue the starting URLs in the default request queue
Expand Down
2 changes: 1 addition & 1 deletion docs/02-guides/05-scrapy.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,7 @@ class TitleSpider(scrapy.Spider):
if link_url.startswith(('http://', 'https://')):
yield scrapy.Request(link_url)

# Pushes the scraped items into the actor's default dataset
# Pushes the scraped items into the Actor's default dataset
class ActorDatasetPushPipeline:
async def process_item(self, item, spider):
item_dict = ItemAdapter(item).asdict()
Expand Down
2 changes: 1 addition & 1 deletion docs/03-concepts/04-actor-events.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -91,7 +91,7 @@ async def main():
# Save the state when the `PERSIST_STATE` event happens
async def save_state(event_data):
nonlocal processed_items
Actor.log.info('Saving actor state', extra=event_data)
Actor.log.info('Saving Actor state', extra=event_data)
await Actor.set_value('STATE', processed_items)

Actor.on(ActorEventTypes.PERSIST_STATE, save_state)
Expand Down
22 changes: 0 additions & 22 deletions mypy.ini

This file was deleted.

36 changes: 34 additions & 2 deletions pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[project]
janbuchar marked this conversation as resolved.
Show resolved Hide resolved
name = "apify"
version = "1.7.3"
version = "2.0.0"
description = "Apify SDK for Python"
readme = "README.md"
license = { text = "Apache Software License" }
Expand All @@ -20,7 +20,7 @@ classifiers = [
"Topic :: Software Development :: Libraries",
]

requires-python = ">=3.8"
requires-python = ">=3.9"

# We use inclusive ordered comparison clause for non-Apify packages intentionally in order to enhance the Apify SDK's
# compatibility with a wide range of external packages. This decision was discussed in detail in the following PR:
Expand All @@ -31,13 +31,15 @@ dependencies = [
"aiofiles >= 22.1.0",
"aioshutil >= 1.0",
"colorama >= 0.4.6",
"crawlee >= 0.3.0",
"cryptography >= 39.0.0",
"httpx >= 0.24.0",
"psutil >= 5.9.0",
"pyee >= 11.0.0",
"sortedcollections >= 2.0.0",
"typing-extensions >= 4.1.0",
"websockets >= 10.1",
"werkzeug >= 3.0.0",
]

[project.optional-dependencies]
Expand Down Expand Up @@ -90,12 +92,16 @@ line-length = 150
[tool.ruff.lint]
select = ["ALL"]
ignore = [
"A002", # Argument is shadowing a Python builtin
"ANN101", # Missing type annotation for `self` in method
"ANN102", # Missing type annotation for `cls` in method
"ANN401", # Dynamically typed expressions (typing.Any) are disallowed in {filename}
"BLE001", # Do not catch blind exception
"C901", # `{name}` is too complex
"COM812", # This rule may cause conflicts when used with the formatter
"D100", # Missing docstring in public module
"D104", # Missing docstring in public package
"D107", # Missing docstring in `__init__`
"EM", # flake8-errmsg
"G004", # Logging statement uses f-string
"ISC001", # This rule may cause conflicts when used with the formatter
Expand Down Expand Up @@ -155,3 +161,29 @@ known-local-folder = ["apify"]

[tool.ruff.lint.pydocstyle]
convention = "google"

[tool.basedpyright]
typeCheckingMode = "standard"

[tool.pytest.ini_options]
asyncio_mode = "auto"
timeout = 1200

[tool.mypy]
python_version = "3.9"
plugins = ["pydantic.mypy"]
files = ["scripts", "src", "tests"]
check_untyped_defs = true
disallow_incomplete_defs = true
disallow_untyped_calls = true
disallow_untyped_decorators = true
disallow_untyped_defs = true
no_implicit_optional = true
warn_redundant_casts = true
warn_return_any = true
warn_unreachable = true
warn_unused_ignores = true

[[tool.mypy.overrides]]
module = ['scrapy', 'scrapy.*', 'sortedcollections']
ignore_missing_imports = true
3 changes: 0 additions & 3 deletions pytest.ini

This file was deleted.

10 changes: 2 additions & 8 deletions src/apify/_crypto.py
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
from __future__ import annotations

import base64
import secrets
from typing import Any

from apify_shared.utils import ignore_docs
from crawlee._utils.crypto import crypto_random_object_id
from cryptography.exceptions import InvalidTag as InvalidTagException
from cryptography.hazmat.primitives import hashes, serialization
from cryptography.hazmat.primitives.asymmetric import padding, rsa
Expand Down Expand Up @@ -125,13 +125,7 @@ def _load_public_key(public_key_file_base64: str) -> rsa.RSAPublicKey:
return public_key


def crypto_random_object_id(length: int = 17) -> str:
"""Python reimplementation of cryptoRandomObjectId from `@apify/utilities`."""
chars = 'abcdefghijklmnopqrstuvwxyzABCEDFGHIJKLMNOPQRSTUVWXYZ0123456789'
return ''.join(secrets.choice(chars) for _ in range(length))


def decrypt_input_secrets(private_key: rsa.RSAPrivateKey, input: Any) -> Any: # noqa: A002
def decrypt_input_secrets(private_key: rsa.RSAPrivateKey, input: Any) -> Any:
"""Decrypt input secrets."""
if not isinstance(input, dict):
return input
Expand Down
3 changes: 0 additions & 3 deletions src/apify/_memory_storage/__init__.py

This file was deleted.

71 changes: 0 additions & 71 deletions src/apify/_memory_storage/file_storage_utils.py

This file was deleted.

Loading
Loading