Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] More data format support #1412

Open
1 of 2 tasks
Wendong-Fan opened this issue Jan 7, 2025 · 1 comment
Open
1 of 2 tasks

[Feature Request] More data format support #1412

Wendong-Fan opened this issue Jan 7, 2025 · 1 comment
Assignees
Labels
Data Related to camel data processing New Feature P0 Task with high level priority
Milestone

Comments

@Wendong-Fan
Copy link
Member

Wendong-Fan commented Jan 7, 2025

Required prerequisites

Motivation

alignment data etc..

Instruction Data
Reference implemented in camel: https://github.com/camel-ai/camel/blob/master/camel/messages/conversion/alpaca.py

Conversation Data
Reference implemented in camel: https://github.com/camel-ai/camel/blob/master/camel/messages/conversion/conversation_models.py

TODO:

Alignment Data
Description: Scenarios that align the model's responses with human values, ethical guidelines, and desired behaviors, especially for handling sensitive topics.

example: https://huggingface.co/datasets/Magpie-Align/Magpie-Qwen2.5-Math-Pro-300K-v0.1

Question and Answer (QA) Data
Description: Pairs of questions and their corresponding answers to help the model generate accurate responses to various queries.

{
  "question": "What is the capital city of Australia?",
  "answer": "The capital city of Australia is Canberra."
}

Code Data
Description: Code snippets paired with descriptions or requirements, essential for training models to understand, generate, and debug code in various programming languages.

{
  "description": "Write a Python function that returns the factorial of a number.",
  "code": "def factorial(n):\n    if n == 0:\n        return 1\n    else:\n        return n * factorial(n-1)"
}

Classification Data
Description: Texts labeled with specific categories or tags, useful for tasks like sentiment analysis, topic classification, or spam detection.

{
  "text": "I absolutely loved the new movie! The storyline was gripping and the characters were well-developed.",
  "label": "Positive Sentiment"
}

Entity Recognition Data
Description: Texts with annotated entities such as names, dates, locations, etc., training models to identify and categorize key information within the text.

{
  "text": "Apple Inc. was founded by Steve Jobs and Steve Wozniak in Cupertino, California, in 1976.",
  "entities": [
    {"entity": "Apple Inc.", "type": "Organization"},
    {"entity": "Steve Jobs", "type": "Person"},
    {"entity": "Steve Wozniak", "type": "Person"},
    {"entity": "Cupertino, California", "type": "Location"},
    {"entity": "1976", "type": "Date"}
  ]
}

Solution

No response

Alternatives

No response

Additional context

No response

@Wendong-Fan Wendong-Fan added Data Related to camel data processing New Feature call for contribution P0 Task with high level priority labels Jan 7, 2025
@Wendong-Fan Wendong-Fan added this to the Sprint 20 milestone Jan 7, 2025
@Wendong-Fan
Copy link
Member Author

lead: @mohamadkav ; support & review: @liuxukun2000 @willshang76

@Wendong-Fan Wendong-Fan modified the milestones: Sprint 20, Sprint 21 Jan 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Data Related to camel data processing New Feature P0 Task with high level priority
Projects
Status: No status
Development

No branches or pull requests

4 participants