langchain-xparse

LangChain integration with xParse Pipeline API for document parsing, chunking and embedding. Supports parse / chunk / embed stages only (extract is not supported in this loader).

Installation

From PyPI:

pip install langchain-xparse

Local editable install:

pip install -e .

Configuration

Set your TextIn credentials (from Textin Workspace ):

export XPARSE_APP_ID="your-app-id"
export XPARSE_SECRET_CODE="your-secret-code"

Or pass them when creating the loader:

loader = XParseLoader(
    file_path="doc.pdf",
    app_id="your-app-id",
    secret_code="your-secret-code",
)

Usage

Basic (parse only)

from langchain_xparse import XParseLoader

loader = XParseLoader(file_path="example.pdf")
docs = loader.load()
print(docs[0].page_content[:200])
print(docs[0].metadata)  # source, category, element_id, filename, page_number, ...

Lazy load

for doc in loader.lazy_load():
    # process(doc)

Async

async for doc in loader.alazy_load():
    # process(doc)

Convenience params (parse + chunk, or parse + chunk + embed)

loader = XParseLoader(
    file_path="doc.pdf",
    parse_provider="textin",
    chunk_strategy="by_title",
    chunk_max_characters=500,
    chunk_overlap=50,
)
# Or with embed:
loader = XParseLoader(
    file_path="doc.pdf",
    parse_provider="textin",
    chunk_strategy="basic",
    chunk_max_characters=1000,
    embed_provider="qwen",
    embed_model_name="text-embedding-v4",
)
docs = loader.load()

Custom stages (advanced)

loader = XParseLoader(
    file_path="doc.pdf",
    stages=[
        {"type": "parse", "config": {"provider": "textin"}},
        {"type": "chunk", "config": {"strategy": "by_page", "max_characters": 800}},
    ],
)

Multiple files

loader = XParseLoader(file_path=["a.pdf", "b.pdf"])
for doc in loader.lazy_load():
    print(doc.metadata.get("source"), doc.page_content[:50])

File-like object

When passing a file-like object instead of a path, you must set metadata_filename:

with open("doc.pdf", "rb") as f:
    loader = XParseLoader(file=f, metadata_filename="doc.pdf")
    docs = loader.load()

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
example_docs		example_docs
langchain_xparse		langchain_xparse
tests		tests
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

langchain-xparse

Installation

Configuration

Usage

Basic (parse only)

Lazy load

Async

Convenience params (parse + chunk, or parse + chunk + embed)

Custom stages (advanced)

Multiple files

File-like object

References

About

Uh oh!

Releases

Packages

Languages

License

intsig-textin/langchain-xparse

Folders and files

Latest commit

History

Repository files navigation

langchain-xparse

Installation

Configuration

Usage

Basic (parse only)

Lazy load

Async

Convenience params (parse + chunk, or parse + chunk + embed)

Custom stages (advanced)

Multiple files

File-like object

References

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages