Closed
Description
Using a TextSplitter on multiple documents with filetype="recursive_paths" fails with the below error.
This seems to be fixed by changing https://github.com/thiswillbeyourgithub/wdoc/blame/main/wdoc/utils/misc.py#L459 to:
return text_splitters[task][modelname]
Command I'm running:
python -m wdoc
--path="data_for_wdoc"
--filetype="recursive_paths"
--task=search
--query="How can I make wdoc run faster?"
--query_retrievers='default_multiquery'
--top_k=auto_200_500
--llms_api_bases="{'model':'http://localhost:11434','query_eval_model':'http://localhost:11434'}"
--modelname="ollama/gemma2:2b"
--query_eval_modelname="ollama/gemma2:2b"
--recursed_filetype="txt"
--pattern="*.txt"
Error:
Error when loading doc with filetype txt: ''dict' object has no attribute 'transform_documents''. Arguments: {'llm_name': 'ollama/gemma2:2b', 'task': 'search', 'temp_dir': PosixPath('XXXX'), 'path': 'data_for_wdoc/fe061b430a2c4991a002f039c8ca6cb9.txt', 'filetype': 'txt', 'recur_parent_id': '206b66c9-9d44-4138-a413-fc1561d601a3', 'file_hash': '74a0d0bb291717058af1'}
Line number: 340
Full traceback:
File "XXXX/venv/lib/python3.11/site-packages/wdoc/utils/loaders.py", line 340, in load_one_doc_wrapped
out = load_one_doc(**doc_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "<@beartype(wdoc.utils.loaders.load_one_doc) at 0x12b15aca0>", line 205, in load_one_doc
File "XXXX/venv/lib/python3.11/site-packages/wdoc/utils/loaders.py", line 507, in load_one_doc
docs = text_splitter.transform_documents(docs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
I'm seeing some issues with using recursed_filetype
, which I'll open a separate issue for.
Metadata
Metadata
Assignees
Labels
No labels