Skip to content

Using a TextSplitter on multiple documents with filetype="recursive_paths" fails #11

Closed
@rfishermonteith

Description

@rfishermonteith

Using a TextSplitter on multiple documents with filetype="recursive_paths" fails with the below error.

This seems to be fixed by changing https://github.com/thiswillbeyourgithub/wdoc/blame/main/wdoc/utils/misc.py#L459 to:

return text_splitters[task][modelname] 

Command I'm running:

python -m wdoc
--path="data_for_wdoc"
--filetype="recursive_paths"
--task=search
--query="How can I make wdoc run faster?"
--query_retrievers='default_multiquery'
--top_k=auto_200_500
--llms_api_bases="{'model':'http://localhost:11434','query_eval_model':'http://localhost:11434'}"
--modelname="ollama/gemma2:2b"
--query_eval_modelname="ollama/gemma2:2b"
--recursed_filetype="txt"
--pattern="*.txt"

Error:

Error when loading doc with filetype txt: ''dict' object has no attribute 'transform_documents''. Arguments: {'llm_name': 'ollama/gemma2:2b', 'task': 'search', 'temp_dir': PosixPath('XXXX'), 'path': 'data_for_wdoc/fe061b430a2c4991a002f039c8ca6cb9.txt', 'filetype': 'txt', 'recur_parent_id': '206b66c9-9d44-4138-a413-fc1561d601a3', 'file_hash': '74a0d0bb291717058af1'}
Line number: 340
Full traceback:
  File "XXXX/venv/lib/python3.11/site-packages/wdoc/utils/loaders.py", line 340, in load_one_doc_wrapped
    out = load_one_doc(**doc_kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "<@beartype(wdoc.utils.loaders.load_one_doc) at 0x12b15aca0>", line 205, in load_one_doc

  File "XXXX/venv/lib/python3.11/site-packages/wdoc/utils/loaders.py", line 507, in load_one_doc
    docs = text_splitter.transform_documents(docs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

I'm seeing some issues with using recursed_filetype, which I'll open a separate issue for.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions