Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The timeout is not being respected #811

Open
matheus-rossi opened this issue Nov 19, 2024 · 9 comments
Open

The timeout is not being respected #811

matheus-rossi opened this issue Nov 19, 2024 · 9 comments

Comments

@matheus-rossi
Copy link

Describe the bug
The timeout set on the configuration is not being respected.

Expected behavior
Timeout to be respected

Code

config = {
                "llm": {
                    "api_key": "my_key",
                    "model": "google_genai/gemini-1.5-flash",
                    "timeout": 120,
                    "temperature": 0
                },
               "timeout": 120,
}
response = SmartScraperGraph(
                prompt="MY_PROMPT",
                source="MY_HTML",
                config=config,
            ).run()

Errors:

Chunk Processing

Timeout error: Response took longer than 30 seconds
2024-11-19 10:25:18.405 | ERROR    | invalid literal for int() with base 10: 'Response timeout exceeded during chunk processing'

or

Merge

Timeout error: Response took longer than 30 seconds
2024-11-19 10:28:17.721 | ERROR    |  invalid literal for int() with base 10: 'Response timeout exceeded during merge'
@matheus-rossi matheus-rossi changed the title Timeout not being respected The timeout is not being respected Nov 19, 2024
@VinciGit00
Copy link
Collaborator

Hi @matheus-rossi thank you for giving us these feedbacks

@VinciGit00
Copy link
Collaborator

Please update to the new beta

@matheus-rossi
Copy link
Author

I've updated to scrapegraphai = "v1.31.1-beta.1"

Tried a lot of configs:


config = {
    "llm": {
        "api_key": "key,
        "model": "google_genai/gemini-1.5-flash-latest",
        "timeout": 120,
        "temperature": 0
    },
   "timeout": 120,
   "verbose": verbose
}

but still getting the same error

Timeout error: Response took longer than 30 seconds
2024-11-20 12:02:45.151 | ERROR    | src.utils.aiutils:_scrape_content:328 - invalid literal for int() with base 10: 'Response timeout exceeded'

@bongoexe
Copy link

Same issue here! I can't fix it

@VinciGit00
Copy link
Collaborator

Ah ok I understand now

@VinciGit00
Copy link
Collaborator

VinciGit00 commented Nov 20, 2024

try again please with the new beta

@marcovins
Copy link

Same bug here!

@matheus-rossi
Copy link
Author

The version scrapegraphai = "v1.31.1-beta.2" works for me, but scrapegraphai = "v1.31.1" introduces a new bug in my use case.

When I pass the HTML to the SmartScraperGraph, it complains that my HTML is not a valid URL.

You can see the related changes here: GitHub Pull Request.

@FayzulSaimun
Copy link

FayzulSaimun commented Nov 24, 2024

@matheus-rossi I had the same issue with local HTML. I had to change the URL validation rules:

if not bool(re.match(url_pattern, source)):
            if source.startswith("<!DOCTYPE html>") or source.startswith("<html"):
                return False  # Skip URL validation for local HTML content
            raise ValueError(
                f"Invalid URL format: {source}. URL must start with http(s):// and contain a valid domain."
            )
        return True

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants