Skip to content

Space start validator crashes on empty completion #535

Closed
@EliahKagan

Description

@EliahKagan

Describe the bug

When running openai tools fine_tunes.prepare_data, a completion that is empty, or that becomes empty after a remediation is applied, causes openai.validators.completions_space_start_validator to raise an unhandled IndexError. @tuliren has contributed a fix in #406. I'm opening this issue to give information about the bug and its impact. Merging #406 would fix this bug.

#125 added the test that finds this bug, test_long_examples_validator. The bug is in the code under test (not in the test itself). The bug precedes that pull request.

Since #153, numpy and pandas are not installed by default, and the test is skipped in their absence. Installing with pip install -e '.[dev]' and running pytest does not reveal the bug for this reason, whereas pip install -e '.[dev,datalib]' does show the failure.

I haven't found a commit in the history of the main branch in which the test has been present but able to pass. It looks like it may have originally been run with modified data tailored to the problem in #121 (which #125 fixes), and then stopped failing for many developers once #153 was merged, due to being skipped.

When actually using openai tools fine_tunes.prepare_data, I think it is unusual to include an empty completion or one that would become empty due to a remediation, so I think the user impact of this bug is low. But it seems to me that the ongoing situation where not all tests are able to pass may be confusing to new contributors. (I personally find it confusing.) Fortunately, the fix in #406 is straightforward.

To Reproduce

  1. On GNU/Linux or macOS, prepare an environment using Python 3.7, 3.8, 3.9, 3.10, or 3.11. (Although I've manually verified that the bug affects Windows, and test_long_examples_validator does not pass on Windows, other issues obscure this one.) I tested this locally on Ubuntu 22.04 with all those versions of Python, and on CI with ubuntu-latest and macos-latest using a workflow derived from tuliren's.

  2. In that environment, create an editable install, including at least the datalib extra, by running:
    pip install '.[dev,datalib]'

  3. Run: pytest
    Or, for the whole traceback, run: pytest -vv
    Or, for the whole traceback and just that one test, run: pytest -vvk test_long_examples_validator

Code snippets

x[0] raises IndexError when x is empty in:

lambda x: ("" if x[0] == " " else " ") + x

This appears in the add_space_start local function, in completions_space_start_validator, in openai/validators.py.

OS

All (tested mainly in Ubuntu 22.04.2 and macOS 12.6.7).

Python version

Python 3.7.17, 3.8.17, 3.9.17, 3.10.12, and 3.11.4.

Library version

Tested on b82a3f7 (main branch), though a range of versions are affected.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions