-
Notifications
You must be signed in to change notification settings - Fork 342
Open
Description
Hi there,
While I was testing the OpenDeepSearch code, I found that the WebScraper would not get useful results (simply return None for the content variable in ExtractionResult). Is this the expected behavior?
Actualy, I turned on the debug flag in the WebScraper class and found the following error
Using Jina Reranker
Debug: Attempting extraction with strategy: no_extraction
Debug: URL: https://www.nps.gov/articles/000/july-2nd-1881-a-second-assassination.htm
Debug: Strategy config: <crawl4ai.extraction_strategy.NoExtractionStrategy object at 0x7cd6a9885390>
[INIT].... → Crawl4AI 0.5.0.post4
[FETCH]... ↓ https://www.nps.gov/articles/000/july-2nd-1881-a-s... | Status: True | Time: 2.03s
[SCRAPE].. ◆ https://www.nps.gov/articles/000/july-2nd-1881-a-s... | Time: 0.115s
[COMPLETE] ● https://www.nps.gov/articles/000/july-2nd-1881-a-s... | Status: True | Total: 2.16s
extraction_config.name no_extraction
Debug: Processed content: None
Debug: Exception occurred during extraction:
Traceback (most recent call last):
File "/content/OpenDeepSearch/src/opendeepsearch/context_scraping/crawl4ai_scraper.py", line 183, in extract
extraction_result.raw_markdown_length = len(result.markdown_v2.raw_markdown)
^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/crawl4ai/async_webcrawler.py", line 72, in __getattr__
return getattr(self._results[0], attr)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/pydantic/main.py", line 986, in __getattr__
return super().__getattribute__(item) # Raises AttributeError if appropriate
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/crawl4ai/models.py", line 216, in markdown_v2
raise AttributeError(
AttributeError: The 'markdown_v2' attribute is deprecated and has been removed. Please use 'markdown' instead, which now returns a MarkdownGenerationResult, with
following properties:
- raw_markdown: The raw markdown string
- markdown_with_citations: The markdown string with citations
- references_markdown: The markdown string with references
- fit_markdown: The markdown string with fit text
It seems that the markdown_v2 attribute is deprecated and has caused the issue. Any ideas?
Metadata
Metadata
Assignees
Labels
No labels