Description
Describe the bug
When running openai tools fine_tunes.prepare_data
, a completion that is empty, or that becomes empty after a remediation is applied, causes openai.validators.completions_space_start_validator
to raise an unhandled IndexError
. @tuliren has contributed a fix in #406. I'm opening this issue to give information about the bug and its impact. Merging #406 would fix this bug.
#125 added the test that finds this bug, test_long_examples_validator
. The bug is in the code under test (not in the test itself). The bug precedes that pull request.
Since #153, numpy
and pandas
are not installed by default, and the test is skipped in their absence. Installing with pip install -e '.[dev]'
and running pytest
does not reveal the bug for this reason, whereas pip install -e '.[dev,datalib]'
does show the failure.
I haven't found a commit in the history of the main branch in which the test has been present but able to pass. It looks like it may have originally been run with modified data tailored to the problem in #121 (which #125 fixes), and then stopped failing for many developers once #153 was merged, due to being skipped.
When actually using openai tools fine_tunes.prepare_data
, I think it is unusual to include an empty completion or one that would become empty due to a remediation, so I think the user impact of this bug is low. But it seems to me that the ongoing situation where not all tests are able to pass may be confusing to new contributors. (I personally find it confusing.) Fortunately, the fix in #406 is straightforward.
To Reproduce
-
On GNU/Linux or macOS, prepare an environment using Python 3.7, 3.8, 3.9, 3.10, or 3.11. (Although I've manually verified that the bug affects Windows, and
test_long_examples_validator
does not pass on Windows, other issues obscure this one.) I tested this locally on Ubuntu 22.04 with all those versions of Python, and on CI withubuntu-latest
andmacos-latest
using a workflow derived from tuliren's. -
In that environment, create an editable install, including at least the
datalib
extra, by running:
pip install '.[dev,datalib]'
-
Run:
pytest
Or, for the whole traceback, run:pytest -vv
Or, for the whole traceback and just that one test, run:pytest -vvk test_long_examples_validator
Code snippets
x[0]
raises IndexError
when x
is empty in:
lambda x: ("" if x[0] == " " else " ") + x
This appears in the add_space_start
local function, in completions_space_start_validator
, in openai/validators.py
.
OS
All (tested mainly in Ubuntu 22.04.2 and macOS 12.6.7).
Python version
Python 3.7.17, 3.8.17, 3.9.17, 3.10.12, and 3.11.4.
Library version
Tested on b82a3f7 (main branch), though a range of versions are affected.