Using a TextSplitter on multiple documents with filetype="recursive_paths" fails

Using a TextSplitter on multiple documents with filetype="recursive_paths" fails with the below error.

This seems to be fixed by changing https://github.com/thiswillbeyourgithub/wdoc/blame/main/wdoc/utils/misc.py#L459 to:

```python
return text_splitters[task][modelname] 
```

Command I'm running:

```bash
python -m wdoc
--path="data_for_wdoc"
--filetype="recursive_paths"
--task=search
--query="How can I make wdoc run faster?"
--query_retrievers='default_multiquery'
--top_k=auto_200_500
--llms_api_bases="{'model':'http://localhost:11434','query_eval_model':'http://localhost:11434'}"
--modelname="ollama/gemma2:2b"
--query_eval_modelname="ollama/gemma2:2b"
--recursed_filetype="txt"
--pattern="*.txt"

```


Error:
```
Error when loading doc with filetype txt: ''dict' object has no attribute 'transform_documents''. Arguments: {'llm_name': 'ollama/gemma2:2b', 'task': 'search', 'temp_dir': PosixPath('XXXX'), 'path': 'data_for_wdoc/fe061b430a2c4991a002f039c8ca6cb9.txt', 'filetype': 'txt', 'recur_parent_id': '206b66c9-9d44-4138-a413-fc1561d601a3', 'file_hash': '74a0d0bb291717058af1'}
Line number: 340
Full traceback:
  File "XXXX/venv/lib/python3.11/site-packages/wdoc/utils/loaders.py", line 340, in load_one_doc_wrapped
    out = load_one_doc(**doc_kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "<@beartype(wdoc.utils.loaders.load_one_doc) at 0x12b15aca0>", line 205, in load_one_doc

  File "XXXX/venv/lib/python3.11/site-packages/wdoc/utils/loaders.py", line 507, in load_one_doc
    docs = text_splitter.transform_documents(docs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
```

I'm seeing some issues with using `recursed_filetype`, which I'll open a separate issue for.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Using a TextSplitter on multiple documents with filetype="recursive_paths" fails #11

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Using a TextSplitter on multiple documents with filetype="recursive_paths" fails #11

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions