You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While the authors haven’t said outright that they’re replicating O1, the use of “o1” in the name and the strawberry icon (which really screams OpenAI's O1 logo) makes it seem like it’s closely related. In my opinion, that's a bit misleading, especially since the actual methodology doesn't seem to align with what’s known about O1.
The report uses the Open-O1 CoT Dataset (Filtered) for most of its data—over two-thirds of it, in fact. They also mention generating some additional data using MCTS, but the details on how this is done are a bit sparse. Specifically, using a "confidence score" to guide MCTS data generation seems risky, since it might just amplify the model’s inherent biases, as discussed in issue #13. It’d be great to see more transparency on how data quality is handled here.
The fine-tuning approach using Qwen2-7B-Instruct and CoT data is a very traditional method. And the MCTS for guiding reasoning steps has already been explored in many past works, such as AlphaMath and AlphaZero-like Tree-Search, which maybe offer valuable insights for future updates.
In conclusion, while the current work presents some interesting ideas, the name "Marco-o1" and the use of the strawberry icon could easily lead to misunderstandings about its relationship to OpenAI’s O1 model. I’m hopeful that the team will continue to refine their approach and release more innovative updates in the future. Looking forward to seeing where this work goes!
The text was updated successfully, but these errors were encountered:
As emphasized in the Limitations Section, this research work is inspired by OpenAI's o1 (from which the name is also derived). This work aims to explore potential approaches to shed light on the currently unclear technical roadmap for large reasoning models. Besides, our focus is on open-ended questions, and we have observed interesting phenomena in multilingual applications. However, we must acknowledge that the current model primarily exhibits o1-like reasoning characteristics and its performance still fall short of a fully realized "o1" model. This is not a one-time effort, and we remain committed to continuous optimization and ongoing improvement.
Any updates, we will let you know. If you have any idea about "actual methodology", please kindly share them to us (you also are welcome to join this research project).
While the authors haven’t said outright that they’re replicating O1, the use of “o1” in the name and the strawberry icon (which really screams OpenAI's O1 logo) makes it seem like it’s closely related. In my opinion, that's a bit misleading, especially since the actual methodology doesn't seem to align with what’s known about O1.
The report uses the Open-O1 CoT Dataset (Filtered) for most of its data—over two-thirds of it, in fact. They also mention generating some additional data using MCTS, but the details on how this is done are a bit sparse. Specifically, using a "confidence score" to guide MCTS data generation seems risky, since it might just amplify the model’s inherent biases, as discussed in issue #13. It’d be great to see more transparency on how data quality is handled here.
The fine-tuning approach using Qwen2-7B-Instruct and CoT data is a very traditional method. And the MCTS for guiding reasoning steps has already been explored in many past works, such as AlphaMath and AlphaZero-like Tree-Search, which maybe offer valuable insights for future updates.
In conclusion, while the current work presents some interesting ideas, the name "Marco-o1" and the use of the strawberry icon could easily lead to misunderstandings about its relationship to OpenAI’s O1 model. I’m hopeful that the team will continue to refine their approach and release more innovative updates in the future. Looking forward to seeing where this work goes!
The text was updated successfully, but these errors were encountered: