Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #229: Add cloudpickle support for type-annotated parse_func #305

Closed

Conversation

devin-ai-integration[bot]
Copy link
Contributor

@devin-ai-integration devin-ai-integration bot commented Jan 6, 2025

Fix #229: Add cloudpickle support for type-annotated parse_func

Changes:

  • Add CustomPickler class extending HuggingFace's Pickler
  • Implement hybrid serialization approach for type annotations
  • Add comprehensive test suite for CustomPickler functionality
  • Fix type annotations in test_prompt.py
  • Update documentation with type annotation examples

Testing:

  • Added dedicated test suite in test_custom_pickler.py
  • All tests passing locally including path normalization and type annotation tests
  • Updated existing tests to use proper type hints
  • Verified function serialization works with type annotations

Link to Devin run: https://app.devin.ai/sessions/a1c6d0d5a504429aa767cd230d4a2a42

- Replace datasets.utils._dill with cloudpickle for better type annotation support
- Update function serialization to handle type annotations properly
- Add cloudpickle dependency to pyproject.toml
- Update documentation and examples with type annotation best practices
- Maintain backward compatibility with existing code patterns

Co-Authored-By: ryan@bespokelabs.ai <ryan@bespokelabs.ai>
Copy link
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add "(aside)" to your comment to have me ignore it.
  • Look at CI failures and help fix them

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

devin-ai-integration bot and others added 2 commits January 6, 2025 22:17
Co-Authored-By: ryan@bespokelabs.ai <ryan@bespokelabs.ai>
Co-Authored-By: ryan@bespokelabs.ai <ryan@bespokelabs.ai>
@shreyaspimpalgaonkar
Copy link
Contributor

Looks good to me

@shreyaspimpalgaonkar
Copy link
Contributor

shreyaspimpalgaonkar@Shreyass-MacBook-Air curator % poetry run python tests/local_parse_fn.py
2025-01-06 14:30:58,916 - bespokelabs.curator.request_processor.online.base_online_request_processor - INFO - Automatically set max_requests_per_minute to 30000
2025-01-06 14:30:58,916 - bespokelabs.curator.request_processor.online.base_online_request_processor - INFO - Automatically set max_tokens_per_minute to 150000000
Processing OpenAIOnlineRequestProcessor requests: 100%|██████████████████████████████████████████| 2/2 [00:03<00:00,  1.85s/it]
2025-01-06 14:31:02,631 - bespokelabs.curator.request_processor.online.base_online_request_processor - INFO - Processing complete. Results saved to /Users/shreyaspimpalgaonkar/.cache/curator/90e16f1dd405ff44/responses_0.jsonl
2025-01-06 14:31:02,631 - bespokelabs.curator.request_processor.online.base_online_request_processor - INFO - Status tracker: Tasks - Started: 2, In Progress: 0, Succeeded: 2, Failed: 0, Already Completed: 0
Errors - API: 0, Rate Limit: 0, Other: 0, Total: 0
                                      topic                                               poem
0       Urban loneliness in a bustling city  In a city of strangers, the crowd pushes near,...
1       Urban loneliness in a bustling city  Amid the bustle, I stand still,  \nA tiny isla...
2  Beauty of Bespoke Labs's Curator library  In the heart of Bespoke Labs, where wonders un...
3  Beauty of Bespoke Labs's Curator library  Amidst warm oak and whispers, a sanctuary bloo...

@RyanMarten RyanMarten changed the base branch from dev to main January 6, 2025 23:05
if func is None:
return xxh64("").hexdigest()

file = BytesIO()
Pickler(file, recurse=True).dump(func)
file.write(cloudpickle.dumps(func))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will break other logic supported by huggingface pickler, see #230. is there a better solution than just replacing the pickler? also, i still don't understand the root cause here.

Copy link
Contributor

@shreyaspimpalgaonkar shreyaspimpalgaonkar Jan 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I also don't like replacing the pickler. Btw, I was not able to replicate the error on my machine exactly, but I got some warnings using the _dill pickler, which vanished when cloudpickle was used. I'll try to replicate the error on main now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this snippet,

from typing import List

from datasets import Dataset
from pydantic import BaseModel, Field

from bespokelabs import curator

# Create a dataset object for the topics you want to create the poems.
topics = Dataset.from_dict(
    {"topic": ["Urban loneliness in a bustling city", "Beauty of Bespoke Labs's Curator library"]}
)


# Define a class to encapsulate a list of poems.
class Poem(BaseModel):
    poem: str = Field(description="A poem.")


class Poems(BaseModel):
    poems_list: List[Poem] = Field(description="A list of poems.")


# removing the type annotation for `poems` works fine.
def parse_func(row, poems: Poems):
    return [{"topic": row["topic"], "poem": p.poem} for p in poems.poems_list]


# We define a Prompter that generates poems which gets applied to the topics dataset.
poet = curator.LLM(
    prompt_func=lambda row: f"Write two poems about {row['topic']}.",
    model_name="gpt-4o-mini",
    response_format=Poems,
    parse_func=parse_func,
)
poem = poet(topics)
print(poem.to_pandas())

I get the following error:

/Users/shreyaspimpalgaonkar/Library/Caches/pypoetry/virtualenvs/bespokelabs-curator-Ky8xEGex-py3.10/lib/python3.10/site-packages/dill/_dill.py:414: PicklingWarning: Cannot locate reference to <class '__main__.Poem'>.
  StockPickler.save(self, obj, save_persistent_id)
/Users/shreyaspimpalgaonkar/Library/Caches/pypoetry/virtualenvs/bespokelabs-curator-Ky8xEGex-py3.10/lib/python3.10/site-packages/dill/_dill.py:414: PicklingWarning: Cannot pickle <class '__main__.Poem'>: __main__.Poem has recursive self-references that trigger a RecursionError.
  StockPickler.save(self, obj, save_persistent_id)
/Users/shreyaspimpalgaonkar/Library/Caches/pypoetry/virtualenvs/bespokelabs-curator-Ky8xEGex-py3.10/lib/python3.10/site-packages/dill/_dill.py:414: PicklingWarning: Cannot locate reference to <class '__main__.Poems'>.
  StockPickler.save(self, obj, save_persistent_id)
/Users/shreyaspimpalgaonkar/Library/Caches/pypoetry/virtualenvs/bespokelabs-curator-Ky8xEGex-py3.10/lib/python3.10/site-packages/dill/_dill.py:414: PicklingWarning: Cannot pickle <class '__main__.Poems'>: __main__.Poems has recursive self-references that trigger a RecursionError.
  StockPickler.save(self, obj, save_persistent_id)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This warning is due to the class definitions being defined in the same module, and doesn't show up if they're imported from a different module.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a low prio issue since it only throws a warning in a very restricted scenario. I'll close this PR and deprioritize this bug for the launch.

devin-ai-integration bot and others added 4 commits January 7, 2025 01:05
…zation

Co-Authored-By: ryan@bespokelabs.ai <ryan@bespokelabs.ai>
- Add dedicated test suite for CustomPickler functionality
- Test path normalization and type annotation support
- Fix return type annotations in test_prompt.py prompt functions
- Add proper type hints for test functions

Part of #229: Implement CustomPickler for function serialization

Co-Authored-By: ryan@bespokelabs.ai <ryan@bespokelabs.ai>
Co-Authored-By: ryan@bespokelabs.ai <ryan@bespokelabs.ai>
Co-Authored-By: ryan@bespokelabs.ai <ryan@bespokelabs.ai>
@vutrung96 vutrung96 closed this Jan 7, 2025
@shreyaspimpalgaonkar shreyaspimpalgaonkar deleted the devin/1736201728-fix-parse-func-pickle branch January 8, 2025 21:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Error: "No default __reduce__ due to non-trivial __cinit__"
2 participants