Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

<Please write a descriptive title> #1099

Open
amrothemich opened this issue Aug 14, 2024 · 0 comments · May be fixed by lapp0/outlines#88 or #1154
Open

<Please write a descriptive title> #1099

amrothemich opened this issue Aug 14, 2024 · 0 comments · May be fixed by lapp0/outlines#88 or #1154
Labels

Comments

@amrothemich
Copy link

Describe the issue as clearly as possible:

Using pydantic's conint doesn't seem to properly limit the values of an integer field.

Steps/code to reproduce the bug:

Here is an example:

from pydantic import BaseModel, conint

import outlines

class Character(BaseModel):
    age: conint(gt=0, lt=13)



model = outlines.models.transformers("EleutherAI/pythia-70m")

# Construct structured sequence generator
generator = outlines.generate.json(model, Character)

import torch
rng = torch.Generator()
rng.manual_seed(2)

character = generator("Give me a character description", rng=rng)
print(repr(character))

This example with a different seed shows that the model is capable of generating a valid response:

from pydantic import BaseModel, conint

import outlines

class Character(BaseModel):
    age: conint(gt=0, lt=13)



model = outlines.models.transformers("EleutherAI/pythia-70m")

# Construct structured sequence generator
generator = outlines.generate.json(model, Character)

import torch
rng = torch.Generator()
rng.manual_seed(0)

character = generator("Give me a character description", rng=rng)
print(repr(character))


### Expected result:

```shell
I would expect any result returned to satisfy the age requirements. However, sometimes it doesn't, and then fails pydantic validation

Error message:

---------------------------------------------------------------------------
ValidationError                           Traceback (most recent call last)
Cell In[22], line 19
     16 rng = torch.Generator()
     17 rng.manual_seed(2)
---> 19 character = generator("Give me a character description", rng=rng)
     20 print(repr(character))

File c:\Users\acasey\.conda\envs\callm\lib\site-packages\outlines\generate\api.py:230, in SequenceGenerator.__call__(self, prompts, max_tokens, stop_at, rng)
    225 generated = self.tokenizer.decode(generated_token_ids)
    226 stripped = [
    227     self.strip_stop_sequences(sequence, stop_sequences)
    228     for sequence in generated
    229 ]
--> 230 formatted = [self.format_sequence(sequence) for sequence in stripped]
    232 # We reshape the output to (batch_size, sample_size)
    233 output: List[List[FormattedOutput]] = list()

File c:\Users\acasey\.conda\envs\callm\lib\site-packages\outlines\generate\api.py:230, in (.0)
    225 generated = self.tokenizer.decode(generated_token_ids)
    226 stripped = [
    227     self.strip_stop_sequences(sequence, stop_sequences)
    228     for sequence in generated
    229 ]
--> 230 formatted = [self.format_sequence(sequence) for sequence in stripped]
    232 # We reshape the output to (batch_size, sample_size)
    233 output: List[List[FormattedOutput]] = list()

File c:\Users\acasey\.conda\envs\callm\lib\site-packages\outlines\generate\json.py:50, in json..(x)
     48     regex_str = build_regex_from_schema(schema, whitespace_pattern)
     49     generator = regex(model, regex_str, sampler)
---> 50     generator.format_sequence = lambda x: schema_object.parse_raw(x)
     51 elif callable(schema_object):
     52     schema = pyjson.dumps(get_schema_from_signature(schema_object))

File c:\Users\acasey\.conda\envs\callm\lib\site-packages\typing_extensions.py:2499, in deprecated.__call__..wrapper(*args, **kwargs)
   2496 @functools.wraps(arg)
   2497 def wrapper(*args, **kwargs):
   2498     warnings.warn(msg, category=category, stacklevel=stacklevel + 1)
-> 2499     return arg(*args, **kwargs)

File c:\Users\acasey\.conda\envs\callm\lib\site-packages\pydantic\main.py:1080, in BaseModel.parse_raw(cls, b, content_type, encoding, proto, allow_pickle)
   1073     error: pydantic_core.InitErrorDetails = {
   1074         # The type: ignore on the next line is to ignore the requirement of LiteralString
   1075         'type': pydantic_core.PydanticCustomError(type_str, str(exc)),  # type: ignore
   1076         'loc': ('__root__',),
   1077         'input': b,
   1078     }
   1079     raise pydantic_core.ValidationError.from_exception_data(cls.__name__, [error])
-> 1080 return cls.model_validate(obj)

File c:\Users\acasey\.conda\envs\callm\lib\site-packages\pydantic\main.py:503, in BaseModel.model_validate(cls, obj, strict, from_attributes, context)
    501 # `__tracebackhide__` tells pytest and some other tools to omit this function from tracebacks
    502 __tracebackhide__ = True
--> 503 return cls.__pydantic_validator__.validate_python(
    504     obj, strict=strict, from_attributes=from_attributes, context=context
    505 )

ValidationError: 1 validation error for Character
age
  Input should be less than 13 [type=less_than, input_value=518, input_type=int]
    For further information visit https://errors.pydantic.dev/2.5/v/less_than


### Outlines/Python version information:

Version information
<details>

0.0.46

(callm) C:\Users\acasey\Documents\callm>python -c "import sys; print('Python', sys.version)"
Python 3.9.18 (main, Sep 11 2023, 14:09:26) [MSC v.1916 64 bit (AMD64)]

(callm) C:\Users\acasey\Documents\callm>pip freeze
WARNING: Ignoring invalid distribution - (c:\users\acasey\appdata\roaming\python\python39\site-packages)
WARNING: Ignoring invalid distribution -rotobuf (c:\users\acasey\appdata\roaming\python\python39\site-packages)
WARNING: Ignoring invalid distribution -otobuf (c:\users\acasey\appdata\roaming\python\python39\site-packages)
accelerate==0.26.1
aiohttp==3.9.3
aiosignal==1.3.1
annotated-types==0.6.0
anyio==4.2.0
asttokens==2.4.1
async-timeout==4.0.3
attrs==23.2.0
beautifulsoup4==4.12.3
certifi==2023.11.17
charset-normalizer==2.0.12
click==8.1.7
cloudpickle==3.0.0
colorama==0.4.6
coloredlogs==15.0.1
comm==0.2.1
contourpy==1.2.1
cycler==0.12.1
datasets==2.16.1
debugpy==1.8.0
decorator==5.1.1
dill==0.3.7
diskcache==5.6.3
distro==1.9.0
emoji==2.10.1
et-xmlfile==1.1.0
exceptiongroup==1.2.0
executing==2.0.1
filelock==3.13.1
fonttools==4.51.0
frozenlist==1.4.1
fsspec==2023.10.0
google==3.0.0
greenlet==3.0.3
guidance==0.1.15
h11==0.14.0
httpcore==1.0.2
httpx==0.26.0
huggingface-hub==0.24.5
humanfriendly==10.0
idna==3.6
importlib-metadata==7.0.1
importlib_resources==6.4.0
interegular==0.3.3
ipykernel==6.29.0
ipython==8.18.1
jedi==0.19.1
Jinja2==3.1.3
joblib==1.3.2
json-converter==0.5.0
jsonformer @ git+https://github.com/posionus/jsonformer@26ee0c08fc15daadba718c1aeaf1a0767f385e69
jsonpatch==1.33
jsonpointer==3.0.0
jsonschema==4.23.0
jsonschema-specifications==2023.12.1
jupyter_client==8.6.0
jupyter_core==5.7.1
kiwisolver==1.4.5
langchain==0.2.13
langchain-core==0.2.30
langchain-huggingface==0.0.3
langchain-text-splitters==0.2.2
langsmith==0.1.99
lark==1.2.2
llvmlite==0.43.0
lmql==0.0.2.1
MarkupSafe==2.1.4
matplotlib==3.8.4
matplotlib-inline==0.1.6
medical-ontology @ file:///C:/Users/acasey/Documents/medical_ontology/python/dist/medical_ontology-2.15.0-py3-none-any.whl
mplcyberpunk==0.7.1
mpmath==1.3.0
multidict==6.0.5
multiprocess==0.70.15
nest-asyncio==1.6.0
networkx==3.2.1
nltk==3.8.1
numba==0.60.0
numpy==1.26.3
openai==1.9.0
openpyxl==3.1.2
optimum==1.21.3
ordered-set==4.1.0
orjson==3.10.7
outlines==0.0.46
packaging==23.2
pandas==2.2.0
parso==0.8.3
peft==0.7.1
pillow==10.2.0
platformdirs==4.1.0
prompt-toolkit==3.0.43
protobuf==3.20.3
psutil==5.9.8
pure-eval==0.2.2
pyairports==2.1.1
pyarrow==15.0.0
pyarrow-hotfix==0.6
pycountry==24.6.1
pydantic==2.5.3
pydantic_core==2.14.6
pydot==3.0.1
pyformlang==1.0.10
Pygments==2.17.2
pyparsing==3.1.2
pyreadline3==3.4.1
python-dateutil==2.8.2
pytz==2023.3.post1
pywin32==306
PyYAML==6.0.1
pyzmq==25.1.2
referencing==0.35.1
regex==2023.12.25
requests==2.27.1
rpds-py==0.20.0
safetensors==0.4.2
scikit-learn==1.4.0
scipy==1.12.0
sentence-transformers==3.0.1
sentencepiece==0.1.99
six==1.16.0
sniffio==1.3.0
soupsieve==2.5
SQLAlchemy==2.0.32
stack-data==0.6.3
stanza==1.7.0
sympy==1.12
tenacity==8.5.0
termcolor==2.4.0
threadpoolctl==3.2.0
tiktoken==0.7.0
tokenizers==0.19.1
toml==0.10.2
torch==2.1.2
torchvision==0.16.2
tornado==6.4
tqdm==4.66.1
traitlets==5.14.1
transformers==4.44.0
typing_extensions==4.9.0
tzdata==2023.4
urllib3==1.26.18
vprof==0.38
wcwidth==0.2.13
xxhash==3.4.1
yarl==1.9.4
zipp==3.17.0```

Context for the issue:

No response

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
1 participant