[Feature Request] More data format support #1412

Wendong-Fan · 2025-01-07T12:09:33Z

Required prerequisites

I have searched the Issue Tracker and Discussions that this hasn't already been reported. (+1 or comment there if it has.)
Consider asking first in a Discussion.

Motivation

alignment data etc..

Instruction Data
Reference implemented in camel: https://github.com/camel-ai/camel/blob/master/camel/messages/conversion/alpaca.py

Conversation Data
Reference implemented in camel: https://github.com/camel-ai/camel/blob/master/camel/messages/conversion/conversation_models.py

TODO:

Alignment Data
Description: Scenarios that align the model's responses with human values, ethical guidelines, and desired behaviors, especially for handling sensitive topics.

example: https://huggingface.co/datasets/Magpie-Align/Magpie-Qwen2.5-Math-Pro-300K-v0.1

Question and Answer (QA) Data
Description: Pairs of questions and their corresponding answers to help the model generate accurate responses to various queries.

{
  "question": "What is the capital city of Australia?",
  "answer": "The capital city of Australia is Canberra."
}

Code Data
Description: Code snippets paired with descriptions or requirements, essential for training models to understand, generate, and debug code in various programming languages.

{
  "description": "Write a Python function that returns the factorial of a number.",
  "code": "def factorial(n):\n    if n == 0:\n        return 1\n    else:\n        return n * factorial(n-1)"
}

Classification Data
Description: Texts labeled with specific categories or tags, useful for tasks like sentiment analysis, topic classification, or spam detection.

{
  "text": "I absolutely loved the new movie! The storyline was gripping and the characters were well-developed.",
  "label": "Positive Sentiment"
}

Entity Recognition Data
Description: Texts with annotated entities such as names, dates, locations, etc., training models to identify and categorize key information within the text.

{
  "text": "Apple Inc. was founded by Steve Jobs and Steve Wozniak in Cupertino, California, in 1976.",
  "entities": [
    {"entity": "Apple Inc.", "type": "Organization"},
    {"entity": "Steve Jobs", "type": "Person"},
    {"entity": "Steve Wozniak", "type": "Person"},
    {"entity": "Cupertino, California", "type": "Location"},
    {"entity": "1976", "type": "Date"}
  ]
}

Solution

No response

Alternatives

No response

Additional context

No response

The text was updated successfully, but these errors were encountered:

Wendong-Fan · 2025-01-09T17:16:07Z

lead: @mohamadkav ; support & review: @liuxukun2000 @willshang76

Wendong-Fan added Data Related to camel data processing New Feature call for contribution P0 Task with high level priority labels Jan 7, 2025

Wendong-Fan added this to the Sprint 20 milestone Jan 7, 2025

Wendong-Fan added this to Project Camel Jan 7, 2025

Wendong-Fan removed the call for contribution label Jan 8, 2025

Wendong-Fan assigned mohamadkav, liuxukun2000 and willshang76 Jan 8, 2025

Wendong-Fan modified the milestones: Sprint 20, Sprint 21 Jan 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] More data format support #1412

[Feature Request] More data format support #1412

Wendong-Fan commented Jan 7, 2025 •

edited

Loading

Wendong-Fan commented Jan 9, 2025

[Feature Request] More data format support #1412

[Feature Request] More data format support #1412

Comments

Wendong-Fan commented Jan 7, 2025 • edited Loading

Required prerequisites

Motivation

Solution

Alternatives

Additional context

Wendong-Fan commented Jan 9, 2025

Wendong-Fan commented Jan 7, 2025 •

edited

Loading