Skip to content

PlaywrightCrawler silently switches POST requests to GET #1201

@phughesion-h3

Description

@phughesion-h3

The POST request example doesn't work with PlaywrightCrawler. Instead, it silently gets switched to a GET request in the background, and the Crawlee Request object still has the original request data.

Result: context.request is the original Request object that you prepared (a POST request), but context.response is the playwright Response object that corresponds to the GET request's response, so the response you get is not actually for the request you intended to make.

Questions:

  1. Is playwright capable of making POST requests?
  2. This should not fail/switch silently and should scream loudly at the user that this happens under the hood.

If you switch the below example to use ParselCrawler instead, it works as expected.

import asyncio
from urllib.parse import urlencode

from crawlee import Request
from crawlee.crawlers import ParselCrawler, ParselCrawlingContext, PlaywrightCrawler, PlaywrightCrawlingContext
from crawlee.proxy_configuration import ProxyConfiguration

crawler = PlaywrightCrawler(
    proxy_configuration=ProxyConfiguration(
        proxy_urls=["http://localhost:8080"]
    )
)

@crawler.router.default_handler
async def default_handler(context: PlaywrightCrawlingContext) -> None:
    print("Default handler called")
    if isinstance(context, ParselCrawlingContext):
        print(f"Headers: {context.http_response.headers}")
    elif isinstance(context, PlaywrightCrawlingContext):
        print(f"Headers: {context.response.headers}")

async def main() -> None:
    # Prepare a POST request to the form endpoint.
    request = Request.from_url(
        url='http://localhost:80/login',
        method='POST',
        headers={'content-type': 'application/x-www-form-urlencoded'},
        payload=urlencode(
            {
                'username': 'admin',
                'password': 'password123'
            }
        ).encode('utf-8'),

    )

    await crawler.run([request])

if __name__ == '__main__':
    asyncio.run(main())

Metadata

Metadata

Assignees

Labels

bugSomething isn't working.t-toolingIssues with this label are in the ownership of the tooling team.

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions