1 Distillation with Reasoning: can DeepSeek R1 Teach Better Than Humans?
wilburmelrose4 edited this page 2025-02-14 05:15:42 +00:00


Inclusion of thinking "chains of thought" (CoT) in the design output significantly improves its quality, however it increases inference cost. - Distillation transfers thinking knowledge from a pricey instructor model to a more economical trainee, minimizing total reasoning cost.

  1. A human professional's chain of thought.
  2. The last answer.

    We expanded this dataset by adding:

    Synthetic R1 thinking, i.e., the CoT created by DeepSeek R1.

    Then, we fine-tuned 3 variants of the design (utilizing LoRA on llama-3.1 -8 B-instruct), each with various training targets:

    Direct Answer Only: Generate the final response without revealing thinking. Human Expert CoT: Generate the last response alongside a thinking chain looking like the human expert's. Synthetic R1 CoT: Generate the last answer alongside DeepSeek R1's artificial reasoning chain. The table listed below summarizes typical accuracy and reasoning length:

    - Note: The precision for the 5-shot baseline might differ from numbers reported somewhere else due to different examination setups. The key focus is on comparing relative efficiency throughout distillation approaches, not on beating other models.

    From this research study, synthetic thinking CoTs from DeepSeek R1 appear exceptional to human-expert CoTs in improving performance, albeit with a greater inference expense due to their longer length.

    Fireworks AI Inference and Fine-Tuning Platform

    DeepSeek R1 is available on the Fireworks AI platform. An user-friendly distillation user interface will soon become part of FireOptimizer. If you need earlier gain access to, please contact us to check out alternatives.

    Conclusions

    By including reasoning-based information through distillation, companies can significantly enhance model efficiency without bearing the complete problem of human-annotated datasets. DeepSeek R1's capability to produce long, high-quality thinking chains makes it a powerful teacher model-showing that, in some cases, the maker might just out-teach the human.