Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The new version seems to introduce new issues #71

Closed
D4Vinci opened this issue Dec 7, 2024 · 8 comments
Closed

The new version seems to introduce new issues #71

D4Vinci opened this issue Dec 7, 2024 · 8 comments

Comments

@D4Vinci
Copy link

D4Vinci commented Dec 7, 2024

After updating, the library always crashing on page.goto method with this error:

File ~/.venv/lib/python3.12/site-packages/scrapling/engines/pw.py:222, in PlaywrightEngine.fetch(self, url)
    219     page.add_init_script(path=js_bypass_path('screen_props.js'))
    220     page.add_init_script(path=js_bypass_path('playwright_fingerprint.js'))
--> 222 res = page.goto(url, referer=generate_convincing_referer(url) if self.google_search else None)
    223 page.wait_for_load_state(state="domcontentloaded")
    224 if self.network_idle:

File ~/.venv/lib/python3.12/site-packages/rebrowser_playwright/sync_api/_generated.py:9006, in Page.goto(self, url, timeout, wait_until, referer)
   8945 def goto(
   8946     self,
   8947     url: str,
   (...)
   8953     referer: typing.Optional[str] = None,
   8954 ) -> typing.Optional["Response"]:
   8955     """Page.goto
   8956 
   8957     Returns the main resource response. In case of multiple redirects, the navigation will resolve with the first
   (...)
   9002     Union[Response, None]
   9003     """
   9005     return mapping.from_impl_nullable(
-> 9006         self._sync(
   9007             self._impl_obj.goto(
   9008                 url=url, timeout=timeout, waitUntil=wait_until, referer=referer
   9009             )
   9010         )
   9011     )

    [... skipping hidden 1 frame]

File ~/.venv/lib/python3.12/site-packages/rebrowser_playwright/_impl/_page.py:551, in Page.goto(self, url, timeout, waitUntil, referer)
    544 async def goto(
    545     self,
    546     url: str,
   (...)
    549     referer: str = None,
    550 ) -> Optional[Response]:
--> 551     return await self._main_frame.goto(**locals_to_params(locals()))

File ~/.venv/lib/python3.12/site-packages/rebrowser_playwright/_impl/_frame.py:145, in Frame.goto(self, url, timeout, waitUntil, referer)
    135 async def goto(
    136     self,
    137     url: str,
   (...)
    140     referer: str = None,
    141 ) -> Optional[Response]:
    142     return cast(
    143         Optional[Response],
    144         from_nullable_channel(
--> 145             await self._channel.send("goto", locals_to_params(locals()))
    146         ),
    147     )

File ~/.venv/lib/python3.12/site-packages/rebrowser_playwright/_impl/_connection.py:61, in Channel.send(self, method, params)
     60 async def send(self, method: str, params: Dict = None) -> Any:
---> 61     return await self._connection.wrap_api_call(
     62         lambda: self._inner_send(method, params, False),
     63         self._is_internal_type,
     64     )

File ~/.venv/lib/python3.12/site-packages/rebrowser_playwright/_impl/_connection.py:528, in Connection.wrap_api_call(self, cb, is_internal)
    526     return await cb()
    527 except Exception as error:
--> 528     raise rewrite_error(error, f"{parsed_st['apiName']}: {error}") from None
    529 finally:
    530     self._api_zone.set(None)

Exception: Page.goto: Connection closed while reading from the driver

I'm using MacOS and the latest Playwright browsers, have you read the changes to Chrome headless in Playwright 1.49? I think it's related.

microsoft/playwright#33566

@nwebson
Copy link
Contributor

nwebson commented Dec 7, 2024

Could you share your code causing the error?

@D4Vinci
Copy link
Author

D4Vinci commented Dec 7, 2024

@nwebson Here's a representation of the code:

from browserforge.headers import Browser, HeaderGenerator
from rebrowser_playwright.sync_api import sync_playwright

DEFAULT_STEALTH_FLAGS = ['--no-pings', '--incognito', '--test-type', '--lang=en-US', '--mute-audio', '--no-first-run', '--disable-sync', '--hide-scrollbars', '--disable-logging', '--start-maximized', '--enable-async-dns', '--disable-breakpad', '--disable-infobars', '--accept-lang=en-US', '--use-mock-keychain', '--disable-translate', '--disable-extensions', '--disable-voice-input', '--window-position=0,0', '--disable-wake-on-wifi', '--ignore-gpu-blocklist', '--enable-tcp-fast-open', '--enable-web-bluetooth', '--disable-hang-monitor', '--password-store=basic', '--disable-cloud-import', '--disable-default-apps', '--disable-print-preview', '--disable-dev-shm-usage', '--metrics-recording-only', '--disable-crash-reporter', '--disable-partial-raster', '--disable-gesture-typing', '--disable-checker-imaging', '--disable-prompt-on-repost', '--force-color-profile=srgb', '--font-render-hinting=none', '--no-default-browser-check', '--aggressive-cache-discard', '--disable-component-update', '--disable-cookie-encryption', '--disable-domain-reliability', '--disable-threaded-animation', '--disable-threaded-scrolling', '--enable-simple-cache-backend', '--disable-background-networking', '--disable-session-crashed-bubble', '--enable-surface-synchronization', '--disable-image-animation-resync', '--disable-renderer-backgrounding', '--disable-ipc-flooding-protection', '--prerender-from-omnibox=disabled', '--safebrowsing-disable-auto-update', '--disable-offer-upload-credit-cards', '--disable-features=site-per-process', '--disable-background-timer-throttling', '--disable-new-content-rendering-timeout', '--run-all-compositor-stages-before-draw', '--disable-client-side-phishing-detection', '--disable-backgrounding-occluded-windows', '--disable-layer-tree-host-memory-pressure', '--autoplay-policy=no-user-gesture-required', '--disable-offer-store-unmasked-wallet-cards', '--disable-blink-features=AutomationControlled', '--webrtc-ip-handling-policy=disable_non_proxied_udp', '--disable-component-extensions-with-background-pages', '--force-webrtc-ip-handling-policy=disable_non_proxied_udp', '--enable-features=NetworkService,NetworkServiceInProcess,TrustTokens,TrustTokensAlwaysAllowIssuance', '--blink-settings=primaryHoverType=2,availableHoverTypes=2,primaryPointerType=4,availablePointerTypes=4', '--disable-features=AudioServiceOutOfProcess,IsolateOrigins,site-per-process,TranslateUI,BlinkGenPropertyTrees']


def generate_headers():
    return HeaderGenerator(
        browser=[Browser(name='chrome', min_version=130)],
        os='macos',
        device='desktop'
    ).generate()


with sync_playwright() as p:
    browser = p.chromium.launch(
        headless=False, args=DEFAULT_STEALTH_FLAGS, ignore_default_args=['--enable-automation', '--disable-popup-blocking'], chromium_sandbox=True, channel='chromium'
    )
    context = browser.new_context(
        locale='en-US',
        is_mobile=False,
        has_touch=False,
        proxy=None,
        color_scheme='dark',  # Bypasses the 'prefersLightColor' check in creepjs
        user_agent=generate_headers().get('User-Agent'),
        device_scale_factor=2,
        service_workers="allow",
        ignore_https_errors=True,
        extra_http_headers={},
        screen={"width": 1920, "height": 1080},
        viewport={"width": 1920, "height": 1080},
        permissions=["geolocation", 'notifications'],
    )
    page = context.new_page()
    page.add_init_script(path='/Users/karim/Desktop/Scrapling/scrapling/engines/toolbelt/bypasses/webdriver_fully.js')
    page.add_init_script(path='/Users/karim/Desktop/Scrapling/scrapling/engines/toolbelt/bypasses/window_chrome.js')
    page.add_init_script(path='/Users/karim/Desktop/Scrapling/scrapling/engines/toolbelt/bypasses/navigator_plugins.js')
    page.add_init_script(path='/Users/karim/Desktop/Scrapling/scrapling/engines/toolbelt/bypasses/pdf_viewer.js')
    page.add_init_script(path='/Users/karim/Desktop/Scrapling/scrapling/engines/toolbelt/bypasses/notification_permission.js')
    page.add_init_script(path='/Users/karim/Desktop/Scrapling/scrapling/engines/toolbelt/bypasses/screen_props.js')
    page.add_init_script(path='/Users/karim/Desktop/Scrapling/scrapling/engines/toolbelt/bypasses/playwright_fingerprint.js')
    page.set_default_navigation_timeout(30000)
    page.set_default_timeout(30000)
    res = page.goto('https://antcpt.com/eng/information/demo-form/recaptcha-3-test-score.html', referer=None)
    page.wait_for_load_state(state="domcontentloaded")
    html = page.content()
    page.close()

The scripts are from the Scrapling library bypasses folder: https://github.com/D4Vinci/Scrapling/tree/main/scrapling/engines/toolbelt/bypasses

While creating this code for you I discovered that the issue is caused by page.add_init_script if more than 1 script is used, I tried disabling all scripts and enabling each one alone but the error didn't happen so the error occurs when more than one script gets injected with page.add_init_script.

Tested the same code without page.add_init_script and no error.
Tested the same code but with pure playwright and no error.

@vlrevolution
Copy link

Are those scripts really needed? It seems duplicate code to what rebrowser is already doing

@D4Vinci
Copy link
Author

D4Vinci commented Dec 8, 2024

@vlrevolution That is Not true, these scripts add a lot that current patches are not doing but still, if anyone uses page.add_init_script twice then the issue will happen so there's no point in dodging the current issue

@vlrevolution
Copy link

I see and yeah of course, it was off beat, the root issue remains

@D4Vinci
Copy link
Author

D4Vinci commented Dec 10, 2024

Hey @nwebson @vlrevolution any updates on this?

@nwebson
Copy link
Contributor

nwebson commented Dec 10, 2024

Thanks for reporting this bug, just fixed it in the new release:
https://github.com/rebrowser/rebrowser-patches/releases/tag/1.0.17

@nwebson nwebson closed this as completed Dec 10, 2024
@D4Vinci
Copy link
Author

D4Vinci commented Dec 10, 2024

Thanks @nwebson I have just tested it as well

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants