You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I used the GitHub search to find a similar question and didn't find it.
I am sure that this is a bug in LangChain rather than my code.
The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).
I posted a self-contained, minimal, reproducible example. A maintainer can copy it and run it AS IS.
Example Code
The following code returns urls that are not properly stripped.
loader = SitemapLoader(
"https://docs.snowflake.com/sitemap.xml",
filter_urls=["https://docs.snowflake.com/en/sql-reference-data-types"]
)
data = loader.load()
print(data[0].metadata)
Error Message and Stack Trace (if applicable)
No response
Description
The result of that query returns URLs that are not stripped. {'source': '\n https://docs.snowflake.com/en/sql-reference-data-types\n ', 'loc': '\n https://docs.snowflake.com/en/sql-reference-data-types\n '}
They should instead be as follows: {'source': 'https://docs.snowflake.com/en/sql-reference-data-types', 'loc': 'https://docs.snowflake.com/en/sql-reference-data-types'}
System Info
System Information
OS: Darwin
OS Version: Darwin Kernel Version 24.3.0: Thu Jan 2 20:22:58 PST 2025; root:xnu-11215.81.4~3/RELEASE_ARM64_T8132
Python Version: 3.11.11 (main, Feb 13 2025, 11:29:41) [Clang 16.0.0 (clang-1600.0.26.6)]
aiohttp<4.0.0,>=3.8.3: Installed. No version info available.
async-timeout<5.0.0,>=4.0.0;: Installed. No version info available.
dataclasses-json<0.7,>=0.5.7: Installed. No version info available.
httpx: 0.27.2
httpx-sse<1.0.0,>=0.4.0: Installed. No version info available.
jsonpatch<2.0,>=1.33: Installed. No version info available.
langchain-anthropic;: Installed. No version info available.
langchain-aws;: Installed. No version info available.
langchain-azure-ai;: Installed. No version info available.
langchain-cohere;: Installed. No version info available.
langchain-community;: Installed. No version info available.
langchain-core<1.0.0,>=0.3.51: Installed. No version info available.
langchain-deepseek;: Installed. No version info available.
langchain-fireworks;: Installed. No version info available.
langchain-google-genai;: Installed. No version info available.
langchain-google-vertexai;: Installed. No version info available.
langchain-groq;: Installed. No version info available.
langchain-huggingface;: Installed. No version info available.
langchain-mistralai;: Installed. No version info available.
langchain-ollama;: Installed. No version info available.
langchain-openai;: Installed. No version info available.
langchain-perplexity;: Installed. No version info available.
langchain-text-splitters<1.0.0,>=0.3.8: Installed. No version info available.
langchain-together;: Installed. No version info available.
langchain-xai;: Installed. No version info available.
langchain<1.0.0,>=0.3.23: Installed. No version info available.
langsmith-pyo3: Installed. No version info available.
langsmith<0.4,>=0.1.125: Installed. No version info available.
langsmith<0.4,>=0.1.17: Installed. No version info available.
numpy<3,>=1.26.2: Installed. No version info available.
openai-agents: Installed. No version info available.
opentelemetry-api: 1.27.0
opentelemetry-exporter-otlp-proto-http: Installed. No version info available.
opentelemetry-sdk: 1.27.0
orjson: 3.10.7
packaging: 24.1
packaging<25,>=23.2: Installed. No version info available.
pydantic: 2.11.3
pydantic-settings<3.0.0,>=2.4.0: Installed. No version info available.
pydantic<3.0.0,>=2.5.2;: Installed. No version info available.
pydantic<3.0.0,>=2.7.4: Installed. No version info available.
pydantic<3.0.0,>=2.7.4;: Installed. No version info available.
pytest: 8.3.3
PyYAML>=5.3: Installed. No version info available.
requests: 2.32.3
requests-toolbelt: 1.0.0
requests<3,>=2: Installed. No version info available.
rich: 14.0.0
SQLAlchemy<3,>=1.4: Installed. No version info available.
tenacity!=8.4.0,<10,>=8.1.0: Installed. No version info available.
tenacity!=8.4.0,<10.0.0,>=8.1.0: Installed. No version info available.
typing-extensions>=4.7: Installed. No version info available.
zstandard: 0.23.0
The text was updated successfully, but these errors were encountered:
dosubotbot
added
the
🤖:bug
Related to a bug, vulnerability, unexpected error with an existing feature
label
Apr 14, 2025
givemelove
added a commit
to givemelove/langchain
that referenced
this issue
Apr 14, 2025
Checked other resources
Example Code
The following code returns urls that are not properly stripped.
Error Message and Stack Trace (if applicable)
No response
Description
The result of that query returns URLs that are not stripped.
{'source': '\n https://docs.snowflake.com/en/sql-reference-data-types\n ', 'loc': '\n https://docs.snowflake.com/en/sql-reference-data-types\n '}
They should instead be as follows:
{'source': 'https://docs.snowflake.com/en/sql-reference-data-types', 'loc': 'https://docs.snowflake.com/en/sql-reference-data-types'}
System Info
System Information
Package Information
Optional packages not installed
Other Dependencies
The text was updated successfully, but these errors were encountered: