feat: Preliminary implementation of self-instruct pipeline #1276

AveryYay · 2024-12-04T14:40:18Z

Description

A basic implementation of self-instruct pipeline

Motivation and Context

Why is this change required? What problem does it solve?
close #1214

Types of changes

What types of changes does your code introduce? Put an x in all the boxes that apply:

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds core functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation (update in the documentation)
Example (update in the folder of example)

Checklist

Go over all the following points, and put an x in all the boxes that apply.
If you are unsure about any of these, don't hesitate to ask. We are here to help!

I have read the CONTRIBUTION guide. (required)
My change requires a change to the documentation.
I have updated the tests accordingly. (required for a bug fix or a new feature)
I have updated the documentation accordingly.

# Conflicts: # poetry.lock

Wendong-Fan

Thanks @AveryYay ! Overall looks great, left some comments below

camel/synthetic_datagen/self_instruct/filter/filter_function.py

camel/synthetic_datagen/self_instruct/filter/filter_registry.py

Wendong-Fan · 2024-12-14T10:54:32Z

camel/synthetic_datagen/self_instruct/self_instruct.py

+        clf_prompt += " Only answer yes or no"
+        response = self.agent.step(clf_prompt)
+        result = response.msgs[0].content.strip().lower()
+        return result in ["yes", "true"]


we can make the response from agent more robust by using structured output by define a BaseModel like below

class AgentResponse(BaseModel): answer: bool = Field(..., description="Indicates whether the task is classification (True/False).")

Sure I will take a look into this!

Yeah ideally we replace all string splitting on LLM-produced text with structured output. Potentially relevant is my approach here #1289 though it's still under review

camel/synthetic_datagen/self_instruct/self_instruct.py

camel/synthetic_datagen/self_instruct/filter/filter_function.py

Co-authored-by: Wendong-Fan <133094783+Wendong-Fan@users.noreply.github.com>

Asher-hss · 2024-12-17T22:23:42Z

examples/synthetic_datagen/self_instruct/data_output.json

Hi Avery, great job! Just a small question: why are you using data where is_classification is set to false?

The seed data contains classification instructions as well, but most of the data points have is_classification set to false. The code you're looking at is an example case where I only sampled 6 datapoints, so it's likely that all 6 samples happened to have is_classification as false. If we want to deal with the classification instructions, we could insert more datapoints into the seed.

CaelumF

Some thoughts left. I really like the typed composable filter approach btw!

Wendong-Fan · 2024-12-18T17:02:56Z

camel/synthetic_datagen/self_instruct/filter/filter_function.py

+    r"""Filters instructions based on their word count.
+
+    Args:
+    r"""Filters instructions based on their word count.
+
+    Args:
+        min_len (int): The minimum word count required for an instruction.
+            (default::obj:`5`)
+        max_len (int): The maximum word count allowed for an instruction.
+            (default::obj:`200`)
+    """
+    """


docstring here need to be fixed

I have fixed locally, will push later

camel/synthetic_datagen/self_instruct/filter/filter_function.py

Wendong-Fan

Thanks @AveryYay !

Preliminary implementation of self-instruct pipeline

bbf4784

AveryYay linked an issue Dec 4, 2024 that may be closed by this pull request

[Feature Request] Self-instruct pipeline refine #1214

Closed

2 tasks

AveryYay self-assigned this Dec 4, 2024

AveryYay added 6 commits December 6, 2024 10:35

Finalized the implementation && added filter registry

dad29f0

Added tests

2487ea1

Cleaned up

ad25bc5

Removed print statement

8ad6902

Changed import order

44ab8bb

Minor clean up

e78d417

Wendong-Fan marked this pull request as ready for review December 7, 2024 13:36

Wendong-Fan added the New Feature label Dec 7, 2024

Wendong-Fan added this to the Sprint 18 milestone Dec 7, 2024

Wendong-Fan and others added 6 commits December 7, 2024 21:45

update dependency and small format fix

ddaca3f

Merge branch 'master' into feat/self-instruct-pipeline

e8b49fb

# Conflicts: # poetry.lock

Minor clean up

b8d6f1d

pre-commit fixes

7bba958

pre-commit fix

6e1485f

refactor

1453c1c

Wendong-Fan requested review from raywhoelse, Wendong-Fan, Asher-hss and CaelumF December 9, 2024 15:20

Moved tests to the right location

6f4a778

Wendong-Fan reviewed Dec 14, 2024

View reviewed changes

AveryYay and others added 2 commits December 16, 2024 17:22

docstring format

be890c5

Co-authored-by: Wendong-Fan <133094783+Wendong-Fan@users.noreply.github.com>

Style fix

c2dc4a6

AveryYay changed the title ~~Preliminary implementation of self-instruct pipeline~~ feat: Preliminary implementation of self-instruct pipeline Dec 17, 2024

Asher-hss reviewed Dec 17, 2024

View reviewed changes

CaelumF reviewed Dec 18, 2024

View reviewed changes

Utilized AgentResponse(BaseModel) to produce structured response

42157c2

Wendong-Fan reviewed Dec 18, 2024

View reviewed changes

AveryYay and others added 3 commits December 18, 2024 12:37

Added reward model as a filter function

cbcdae3

Merge branch 'master' into feat/self-instruct-pipeline

38a2888

update

872eeae

Wendong-Fan approved these changes Dec 22, 2024

View reviewed changes

Wendong-Fan merged commit e7a86dc into master Dec 22, 2024
6 checks passed

Wendong-Fan deleted the feat/self-instruct-pipeline branch December 22, 2024 19:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Preliminary implementation of self-instruct pipeline #1276

feat: Preliminary implementation of self-instruct pipeline #1276

AveryYay commented Dec 4, 2024 •

edited by mohamadkav

Loading

Wendong-Fan left a comment

Wendong-Fan Dec 14, 2024

AveryYay Dec 16, 2024

CaelumF Dec 17, 2024 •

edited

Loading

Asher-hss Dec 17, 2024

AveryYay Dec 17, 2024

CaelumF left a comment

Wendong-Fan Dec 18, 2024

AveryYay Dec 18, 2024

Wendong-Fan left a comment

feat: Preliminary implementation of self-instruct pipeline #1276

feat: Preliminary implementation of self-instruct pipeline #1276

Conversation

AveryYay commented Dec 4, 2024 • edited by mohamadkav Loading

Description

Motivation and Context

Types of changes

Checklist

Wendong-Fan left a comment

Choose a reason for hiding this comment

Wendong-Fan Dec 14, 2024

Choose a reason for hiding this comment

AveryYay Dec 16, 2024

Choose a reason for hiding this comment

CaelumF Dec 17, 2024 • edited Loading

Choose a reason for hiding this comment

Asher-hss Dec 17, 2024

Choose a reason for hiding this comment

AveryYay Dec 17, 2024

Choose a reason for hiding this comment

CaelumF left a comment

Choose a reason for hiding this comment

Wendong-Fan Dec 18, 2024

Choose a reason for hiding this comment

AveryYay Dec 18, 2024

Choose a reason for hiding this comment

Wendong-Fan left a comment

Choose a reason for hiding this comment

AveryYay commented Dec 4, 2024 •

edited by mohamadkav

Loading

CaelumF Dec 17, 2024 •

edited

Loading