Introducing SafeCoder (#1391)

* Create asset folder * Upload safecoder post thumbnail * Delete blank * Add safecoder.md blog post * Update _blog.yml Added SafeCoder post * Update safecoder.md Updated vmware permalink * change assets (#1392) * Add: zh/dpo-trl.md (#1390) * update soc3-zn * Update _blog.yml Try to resolve conflicts * Update: proofreading zh/ethics-soc-3.md * add how-to-generate cn version Signed-off-by: Yao, Matrix <matrix.yao@intel.com> * unity game in hf space translation completed * Update: punctuations of how-to-generate.md * hf-bitsandbytes-integration cn done Signed-off-by: Yao, Matrix <matrix.yao@intel.com> * Proofread hf-bitsandbytes-integration.md * Proofread: red-teaming.md * Update: add red-teaming to zh/_blog.yml * Update _blog.yml * Update: add red-teaming to zh/_blog.yml Fix: red-teaming title in zh/_blog.yml * Fix: red-teaming PPLM translation * deep-learning-with-proteins cn done Signed-off-by: Yao, Matrix <matrix.yao@intel.com> * Add: stackllama.md * if blog translation completed * Update unity-in-spaces.md Add a link for AI game * Update if.md Fix “普罗大众” to “普惠大众” * deep-learning-with-proteins cn done Signed-off-by: Yao, Matrix <matrix.yao@intel.com> * add starcoder cn Signed-off-by: Yao, Matrix <matrix.yao@intel.com> Update: formatting and punctuations of starcoder.md * add starcoder cn Signed-off-by: Yao, Matrix <matrix.yao@intel.com> * Update: proofreading zh/unity-in-spaces.md * fix(annotated-diffusion.md): fix image shape desc in PIL and Tensor (#1080) modifiy the comment after ToTensor with the correct image shape CHW * Add text-to-video blog (#1058) Adds an overview of text-to-video generative models, task specific challenges, datasets, and more. Co-authored-by: Omar Sanseviero <osanseviero@gmail.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Fix broken link in text-to-video.md (#1083) * Update: proofreading zh/unity-in-spaces.md Fix: incorrect _blog.yml format * Update: proofreading zh/deep-learning-with-proteins.md * update ethics-diffusers-cn (#6) * update ethics-diffusers * update ethics-diffusers --------- Co-authored-by: Zhongdong Yang <zhongdong_y@outlook.com> * Update: proofreading zh/ethics-diffusers.md * 1. introducing-csearch done (#11) 2. text-to-video done Signed-off-by: Yao, Matrix <matrix.yao@intel.com> * Update: proofread zh/text-to-video.md * Update: proofreading zh/introducing-csearch.md * generative-ai-models-on-intel-cpu cn done (#13) Signed-off-by: Yao, Matrix <matrix.yao@intel.com> Update: proofread zh/generative-ai-models-on-intel-cpu.md Signed-off-by: Yang, Zhongdong <zhongdong_y@outlook.com> * add starchat-alpha zh translation (#10) * Preparing blogpost annoucing `safetensors` security audit + official support. (#1096) * Preparing blogpost annoucing `safetensors` security audit + official support. * Taking into account comments + Grammarly. * Update safetensors-official.md * Apply suggestions from code review Co-authored-by: Omar Sanseviero <osanseviero@gmail.com> * Update safetensors-official.md * Apply suggestions from code review Co-authored-by: Luc Georges <McPatate@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Luc Georges <McPatate@users.noreply.github.com> * Apply suggestions from code review * Update safetensors-official.md Co-authored-by: Luc Georges <McPatate@users.noreply.github.com> * Apply suggestions from code review * Adding thumbnail. * Include changes from Stella. * Update safetensors-official.md * Update with Stella's comments. * Remove problematic sentence. * Rename + some rephrasing. * Apply suggestions from code review Co-authored-by: DeltaPenrose <128761972+DeltaPenrose@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: DeltaPenrose <128761972+DeltaPenrose@users.noreply.github.com> * Update safetensors-security-audit.md Co-authored-by: DeltaPenrose <128761972+DeltaPenrose@users.noreply.github.com> * Last fixes. * Apply suggestions from code review Co-authored-by: DeltaPenrose <128761972+DeltaPenrose@users.noreply.github.com> --------- Co-authored-by: Omar Sanseviero <osanseviero@gmail.com> Co-authored-by: Luc Georges <McPatate@users.noreply.github.com> Co-authored-by: DeltaPenrose <128761972+DeltaPenrose@users.noreply.github.com> * Hotfixing safetensors. (#1131) * Removing the checklist formatting is busted. (#1132) * Update safetensors-security-audit.md (#1134) * [time series transformers] update dataloader API (#1135) * update dataloader API * revert comment * add back Cached transform * New post: Hugging Face and IBM (#1130) * Initial version * Minor fixes * Update huggingface-and-ibm.md Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update huggingface-and-ibm.md Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Resize image * Update blog index --------- Co-authored-by: Julien Simon <julsimon@huggingface.co> Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Show authors of safetensors blog post (#1137) Update: proofread zh/starchat-alpha.md * add megatron-training & assisted-generation (#8) * add megatron-training * add megatron-training * add megatron-training * add megatron-training * add assisted-generation * add assisted-generation * add assisted-generation * Update: proofreading zh/assisted-generation * Update: proofread zh/megatron-training.md * rwkv model blog translation completed (#12) * rwkv model blog translation completed * add 3 additional parts in the blog tail * Update: proofread zh/rwkv.md * Fix: missing subtitle/notes for image references. * encoder-decoder cn done (#14) Signed-off-by: Yao, Matrix <matrix.yao@intel.com> Co-authored-by: Zhongdong Yang <zhongdong_y@outlook.com> * Update: proofread zh/encoder-decoder.md * constrained-beam-search cn done (#15) Signed-off-by: Yao, Matrix <matrix.yao@intel.com> Update: proofread zh/constrained-beam-search.md * Update: zh/unity-api.md + zh/unity-asr.md * unity ai speech recognition blog translation completed * add (GameObject) to attach its Chinese translation * finish unity-api translation * add unity series entry to zh/_blog.yml * Update: proofread zh/unity-{api,asr}.md * Update zh/falcon.md Signed-off-by: Yao, Matrix <matrix.yao@intel.com> Update: zh/falcon.md * instruction-tuning-sd cn done (#21) Signed-off-by: Yao, Matrix <matrix.yao@intel.com> * Update: zh/instruction-tuning-sd.md * fine-tune-whisper cn done (#23) Signed-off-by: Yao, Matrix <matrix.yao@intel.com> * Update: zh/fine-tune-whisper.md * add mms_adapters and policy (#22) Update: zh/policy-ntia-rfc.md * Update: refine zh/mms_adapters.md Update: remove incompleted file * Update: zh/llm-leaderboard.md, zh/autoformer.md * add llm-leaderboard CN translation * add CN translation for autoformer * Update: proofreading zh/autoformer.md * BridgeTower blog post (#1118) * Update BridgeTower blog post (#1277) * LLM Eval: minor typos and nits (#1263) * Fix anchor link to custom pipeline section. (#485) * Update: zh/llm-leaderboard.md, zh/autoformer.md * add llm-leaderboard CN translation * add CN translation for autoformer Update: proofreading zh/autoformer.md Update: proofreading zh/llm-leaderboard.md * Update: proofreading zh/ethics-soc-4.md * Update "How to deploy LLM" blog post to use `huggingface_hub` in example (#1290) * Use InferenceClient from huggingface_hub * Update inference-endpoints-llm.md Co-authored-by: Pedro Cuenca <pedro@huggingface.co> --------- Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update BridgeTower blog post (#1295) * Removed duplicate numbering (#1171) * Update: zh/evaluating-mmlu-leaderboard.md Signed-off-by: Yao, Matrix <matrix.yao@intel.com> Co-authored-by: Zhongdong Yang <zhongdong_y@outlook.com> Update: proofreading zh/evaluating-mmlu-leaderboard.md * Translate train-optimize-sd-intel.md to zh (#16) * Translate "stackllama" into Chinese * Create train-optimize-sd-intel.md Add new Update: zh/train-optimize-sd-intel.md * Update: zh/dedup.md & zh/stable-diffusion-finetuning-intel.md * dedup cn done Signed-off-by: Yao, Matrix <matrix.yao@intel.com> * stable-diffusion-finetuning-intel cn done Signed-off-by: Yao, Matrix <matrix.yao@intel.com> --------- Signed-off-by: Yao, Matrix <matrix.yao@intel.com> Update: proofread zh/stable-diffusion-finetuning-intel.md * Update: proofread zh/dedup.md * Update: zh/inference-endpoints-llm.md Co-authored-by: Zhongdong Yang <zhongdong_y@outlook.com> Update: proofread zh/inference-endpoints-llm.md * Update: zh/llama2.md Signed-off-by: Yao, Matrix <matrix.yao@intel.com> Proofread: zh/llama2.md * Update: zh/diffusers-turns-1.md Proofread: zh/diffusers-turns-1.md * Fix: zh/diffusers-turns-1.md wrong meta data format Policy blog: Open ML Considerations in the EU AI Act (#1342) * Create .gitignore * Add files via upload * Create eu-ai-act-oss.md * Delete .gitignore * Update eu-ai-act-oss.md * Update eu-ai-act-oss.md * Update eu-ai-act-oss.md * Update _blog.yml * Update eu-ai-act-oss.md * Update: zh/game-jam-first-edition-results.md Update: zh/game-jam-first-edition-results.md * Add: zh/bridgetower.md, zh/getting-started-habana.md, zh/habana-gaudi-2-benchmark.md * 3 Gaudi posts cn done: - bridgetower.md - getting-started-habana.md - habana-gaudi-2-benchmark.md Signed-off-by: Yao, Matrix <matrix.yao@intel.com> --------- Signed-off-by: Yao, Matrix <matrix.yao@intel.com> Co-authored-by: Zhongdong Yang <zhongdong_y@outlook.com> * Update: zh/bridgetower.md, zh/getting-started-habana.md, zh/habana-gaudi-2-benchmark.md * Add: zh/transformers-design-philosophy.md Signed-off-by: Yao, Matrix <matrix.yao@intel.com> Update: proofread zh/transformers-design-philosophy.md * Add: zh/os-llms.md * Translate os-llms.md * Update _blog.yml Update: proofread zh/os-llms.md * Add: zh/dpo-trl.md * update soc3-zn * Update _blog.yml Try to resolve conflicts * Update: proofreading zh/ethics-soc-3.md * add how-to-generate cn version Signed-off-by: Yao, Matrix <matrix.yao@intel.com> * unity game in hf space translation completed * Update: punctuations of how-to-generate.md * hf-bitsandbytes-integration cn done Signed-off-by: Yao, Matrix <matrix.yao@intel.com> * Proofread hf-bitsandbytes-integration.md * Proofread: red-teaming.md * Update: add red-teaming to zh/_blog.yml * Update _blog.yml * Update: add red-teaming to zh/_blog.yml Fix: red-teaming title in zh/_blog.yml * Fix: red-teaming PPLM translation * deep-learning-with-proteins cn done Signed-off-by: Yao, Matrix <matrix.yao@intel.com> * Add: stackllama.md * if blog translation completed * Update unity-in-spaces.md Add a link for AI game * Update if.md Fix “普罗大众” to “普惠大众” * deep-learning-with-proteins cn done Signed-off-by: Yao, Matrix <matrix.yao@intel.com> * add starcoder cn Signed-off-by: Yao, Matrix <matrix.yao@intel.com> Update: formatting and punctuations of starcoder.md * add starcoder cn Signed-off-by: Yao, Matrix <matrix.yao@intel.com> * Update: proofreading zh/unity-in-spaces.md * fix(annotated-diffusion.md): fix image shape desc in PIL and Tensor (#1080) modifiy the comment after ToTensor with the correct image shape CHW * Add text-to-video blog (#1058) Adds an overview of text-to-video generative models, task specific challenges, datasets, and more. Co-authored-by: Omar Sanseviero <osanseviero@gmail.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Fix broken link in text-to-video.md (#1083) * Update: proofreading zh/unity-in-spaces.md Fix: incorrect _blog.yml format * Update: proofreading zh/deep-learning-with-proteins.md * update ethics-diffusers-cn (#6) * update ethics-diffusers * update ethics-diffusers --------- Co-authored-by: Zhongdong Yang <zhongdong_y@outlook.com> * Update: proofreading zh/ethics-diffusers.md * 1. introducing-csearch done (#11) 2. text-to-video done Signed-off-by: Yao, Matrix <matrix.yao@intel.com> * Update: proofread zh/text-to-video.md * Update: proofreading zh/introducing-csearch.md * generative-ai-models-on-intel-cpu cn done (#13) Signed-off-by: Yao, Matrix <matrix.yao@intel.com> Update: proofread zh/generative-ai-models-on-intel-cpu.md Signed-off-by: Yang, Zhongdong <zhongdong_y@outlook.com> * add starchat-alpha zh translation (#10) * Preparing blogpost annoucing `safetensors` security audit + official support. (#1096) * Preparing blogpost annoucing `safetensors` security audit + official support. * Taking into account comments + Grammarly. * Update safetensors-official.md * Apply suggestions from code review Co-authored-by: Omar Sanseviero <osanseviero@gmail.com> * Update safetensors-official.md * Apply suggestions from code review Co-authored-by: Luc Georges <McPatate@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Luc Georges <McPatate@users.noreply.github.com> * Apply suggestions from code review * Update safetensors-official.md Co-authored-by: Luc Georges <McPatate@users.noreply.github.com> * Apply suggestions from code review * Adding thumbnail. * Include changes from Stella. * Update safetensors-official.md * Update with Stella's comments. * Remove problematic sentence. * Rename + some rephrasing. * Apply suggestions from code review Co-authored-by: DeltaPenrose <128761972+DeltaPenrose@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: DeltaPenrose <128761972+DeltaPenrose@users.noreply.github.com> * Update safetensors-security-audit.md Co-authored-by: DeltaPenrose <128761972+DeltaPenrose@users.noreply.github.com> * Last fixes. * Apply suggestions from code review Co-authored-by: DeltaPenrose <128761972+DeltaPenrose@users.noreply.github.com> --------- Co-authored-by: Omar Sanseviero <osanseviero@gmail.com> Co-authored-by: Luc Georges <McPatate@users.noreply.github.com> Co-authored-by: DeltaPenrose <128761972+DeltaPenrose@users.noreply.github.com> * Hotfixing safetensors. (#1131) * Removing the checklist formatting is busted. (#1132) * Update safetensors-security-audit.md (#1134) * [time series transformers] update dataloader API (#1135) * update dataloader API * revert comment * add back Cached transform * New post: Hugging Face and IBM (#1130) * Initial version * Minor fixes * Update huggingface-and-ibm.md Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update huggingface-and-ibm.md Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Resize image * Update blog index --------- Co-authored-by: Julien Simon <julsimon@huggingface.co> Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Show authors of safetensors blog post (#1137) Update: proofread zh/starchat-alpha.md * add megatron-training & assisted-generation (#8) * add megatron-training * add megatron-training * add megatron-training * add megatron-training * add assisted-generation * add assisted-generation * add assisted-generation * Update: proofreading zh/assisted-generation * Update: proofread zh/megatron-training.md * rwkv model blog translation completed (#12) * rwkv model blog translation completed * add 3 additional parts in the blog tail * Update: proofread zh/rwkv.md * Fix: missing subtitle/notes for image references. * encoder-decoder cn done (#14) Signed-off-by: Yao, Matrix <matrix.yao@intel.com> Co-authored-by: Zhongdong Yang <zhongdong_y@outlook.com> * Update: proofread zh/encoder-decoder.md * constrained-beam-search cn done (#15) Signed-off-by: Yao, Matrix <matrix.yao@intel.com> Update: proofread zh/constrained-beam-search.md * Update: zh/unity-api.md + zh/unity-asr.md * unity ai speech recognition blog translation completed * add (GameObject) to attach its Chinese translation * finish unity-api translation * add unity series entry to zh/_blog.yml * Update: proofread zh/unity-{api,asr}.md * Update zh/falcon.md Signed-off-by: Yao, Matrix <matrix.yao@intel.com> Update: zh/falcon.md * instruction-tuning-sd cn done (#21) Signed-off-by: Yao, Matrix <matrix.yao@intel.com> * Update: zh/instruction-tuning-sd.md * fine-tune-whisper cn done (#23) Signed-off-by: Yao, Matrix <matrix.yao@intel.com> * Update: zh/fine-tune-whisper.md * add mms_adapters and policy (#22) Update: zh/policy-ntia-rfc.md * Update: refine zh/mms_adapters.md Update: remove incompleted file * Update: zh/llm-leaderboard.md, zh/autoformer.md * add llm-leaderboard CN translation * add CN translation for autoformer * Update: proofreading zh/autoformer.md * BridgeTower blog post (#1118) * Update BridgeTower blog post (#1277) * LLM Eval: minor typos and nits (#1263) * Fix anchor link to custom pipeline section. (#485) * Update: zh/llm-leaderboard.md, zh/autoformer.md * add llm-leaderboard CN translation * add CN translation for autoformer Update: proofreading zh/autoformer.md Update: proofreading zh/llm-leaderboard.md * Update: proofreading zh/ethics-soc-4.md * Update "How to deploy LLM" blog post to use `huggingface_hub` in example (#1290) * Use InferenceClient from huggingface_hub * Update inference-endpoints-llm.md Co-authored-by: Pedro Cuenca <pedro@huggingface.co> --------- Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update BridgeTower blog post (#1295) * Removed duplicate numbering (#1171) * Update: zh/evaluating-mmlu-leaderboard.md Signed-off-by: Yao, Matrix <matrix.yao@intel.com> Co-authored-by: Zhongdong Yang <zhongdong_y@outlook.com> Update: proofreading zh/evaluating-mmlu-leaderboard.md * Translate train-optimize-sd-intel.md to zh (#16) * Translate "stackllama" into Chinese * Create train-optimize-sd-intel.md Add new Update: zh/train-optimize-sd-intel.md * Update: zh/dedup.md & zh/stable-diffusion-finetuning-intel.md * dedup cn done Signed-off-by: Yao, Matrix <matrix.yao@intel.com> * stable-diffusion-finetuning-intel cn done Signed-off-by: Yao, Matrix <matrix.yao@intel.com> --------- Signed-off-by: Yao, Matrix <matrix.yao@intel.com> Update: proofread zh/stable-diffusion-finetuning-intel.md * Update: proofread zh/dedup.md * Update: zh/inference-endpoints-llm.md Co-authored-by: Zhongdong Yang <zhongdong_y@outlook.com> Update: proofread zh/inference-endpoints-llm.md * Update: zh/llama2.md Signed-off-by: Yao, Matrix <matrix.yao@intel.com> Proofread: zh/llama2.md * Update: zh/diffusers-turns-1.md Proofread: zh/diffusers-turns-1.md * Fix: zh/diffusers-turns-1.md wrong meta data format Policy blog: Open ML Considerations in the EU AI Act (#1342) * Create .gitignore * Add files via upload * Create eu-ai-act-oss.md * Delete .gitignore * Update eu-ai-act-oss.md * Update eu-ai-act-oss.md * Update eu-ai-act-oss.md * Update _blog.yml * Update eu-ai-act-oss.md * Update: zh/game-jam-first-edition-results.md Update: zh/game-jam-first-edition-results.md * Add: zh/bridgetower.md, zh/getting-started-habana.md, zh/habana-gaudi-2-benchmark.md * 3 Gaudi posts cn done: - bridgetower.md - getting-started-habana.md - habana-gaudi-2-benchmark.md Signed-off-by: Yao, Matrix <matrix.yao@intel.com> --------- Signed-off-by: Yao, Matrix <matrix.yao@intel.com> Co-authored-by: Zhongdong Yang <zhongdong_y@outlook.com> * Update: zh/bridgetower.md, zh/getting-started-habana.md, zh/habana-gaudi-2-benchmark.md * Add: zh/transformers-design-philosophy.md Signed-off-by: Yao, Matrix <matrix.yao@intel.com> Update: proofread zh/transformers-design-philosophy.md * dpo-trl cn done Signed-off-by: Yao, Matrix <matrix.yao@intel.com> --------- Signed-off-by: Yao, Matrix <matrix.yao@intel.com> Signed-off-by: Yang, Zhongdong <zhongdong_y@outlook.com> Co-authored-by: innovation64 <liyang19991126@126.com> Co-authored-by: Zhongdong Yang <zhongdong_y@outlook.com> Co-authored-by: SuSung-boy <872414318@qq.com> Co-authored-by: Zhongdong Yang <yangzd1996@outlook.com> Co-authored-by: Luke Cheng <2258420+chenglu@users.noreply.github.com> Co-authored-by: yaoqih <40328311+yaoqih@users.noreply.github.com> Co-authored-by: Shiliang Chen <36809537+csl122@users.noreply.github.com> Co-authored-by: Alara Dirik <8944735+alaradirik@users.noreply.github.com> Co-authored-by: Omar Sanseviero <osanseviero@gmail.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> Co-authored-by: 李洋 <45715979+innovation64@users.noreply.github.com> Co-authored-by: Hoi2022 <120370631+Hoi2022@users.noreply.github.com> Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com> Co-authored-by: Luc Georges <McPatate@users.noreply.github.com> Co-authored-by: DeltaPenrose <128761972+DeltaPenrose@users.noreply.github.com> Co-authored-by: Victor Muštar <victor.mustar@gmail.com> Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com> Co-authored-by: Julien Simon <3436143+juliensimon@users.noreply.github.com> Co-authored-by: Julien Simon <julsimon@huggingface.co> Co-authored-by: Pedro Cuenca <pedro@huggingface.co> Co-authored-by: gxy-gxy <57594446+gxy-gxy@users.noreply.github.com> Co-authored-by: regisss <15324346+regisss@users.noreply.github.com> Co-authored-by: Lucain <lucainp@gmail.com> Co-authored-by: Eswar Divi <76403422+EswarDivi@users.noreply.github.com> Co-authored-by: Qi Zhang <82949744+Vermillion-de@users.noreply.github.com> * Update: proofread zh/dpo-trl.md --------- Signed-off-by: Yao, Matrix <matrix.yao@intel.com> Signed-off-by: Yang, Zhongdong <zhongdong_y@outlook.com> Co-authored-by: innovation64 <liyang19991126@126.com> Co-authored-by: Yao, Matrix <matrix.yao@intel.com> Co-authored-by: SuSung-boy <872414318@qq.com> Co-authored-by: Luke Cheng <2258420+chenglu@users.noreply.github.com> Co-authored-by: yaoqih <40328311+yaoqih@users.noreply.github.com> Co-authored-by: Shiliang Chen <36809537+csl122@users.noreply.github.com> Co-authored-by: Alara Dirik <8944735+alaradirik@users.noreply.github.com> Co-authored-by: Omar Sanseviero <osanseviero@gmail.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> Co-authored-by: 李洋 <45715979+innovation64@users.noreply.github.com> Co-authored-by: Yao Matrix <yaoweifeng0301@126.com> Co-authored-by: Hoi2022 <120370631+Hoi2022@users.noreply.github.com> Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com> Co-authored-by: Luc Georges <McPatate@users.noreply.github.com> Co-authored-by: DeltaPenrose <128761972+DeltaPenrose@users.noreply.github.com> Co-authored-by: Victor Muštar <victor.mustar@gmail.com> Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com> Co-authored-by: Julien Simon <3436143+juliensimon@users.noreply.github.com> Co-authored-by: Julien Simon <julsimon@huggingface.co> Co-authored-by: Pedro Cuenca <pedro@huggingface.co> Co-authored-by: gxy-gxy <57594446+gxy-gxy@users.noreply.github.com> Co-authored-by: regisss <15324346+regisss@users.noreply.github.com> Co-authored-by: Lucain <lucainp@gmail.com> Co-authored-by: Eswar Divi <76403422+EswarDivi@users.noreply.github.com> Co-authored-by: Qi Zhang <82949744+Vermillion-de@users.noreply.github.com> * introducing idefics (#1386) * introducing idefics * some wording * some wording * license * add images * Update introducing-idefics.md Co-authored-by: Omar Sanseviero <osanseviero@gmail.com> * Update introducing-idefics.md Co-authored-by: Omar Sanseviero <osanseviero@gmail.com> * Update introducing-idefics.md Co-authored-by: Omar Sanseviero <osanseviero@gmail.com> * Update introducing-idefics.md Co-authored-by: Omar Sanseviero <osanseviero@gmail.com> * Update introducing-idefics.md Co-authored-by: Omar Sanseviero <osanseviero@gmail.com> * Update introducing-idefics.md Co-authored-by: Omar Sanseviero <osanseviero@gmail.com> * include comments * from `introducing-idefics` to just `idefics` * additonal information in the ethical evaluations * Update idefics.md Co-authored-by: Hugo Laurençon <44556846+HugoLaurencon@users.noreply.github.com> * Update idefics.md Co-authored-by: Hugo Laurençon <44556846+HugoLaurencon@users.noreply.github.com> * clickable link nomic map * more specific wording * fix * adding in link to DMT eval on OBELICS hub space * Update idefics.md Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * small wording * remove dmt link until it is up again --------- Co-authored-by: Omar Sanseviero <osanseviero@gmail.com> Co-authored-by: Ezi Ozoani <ozoaniezi@gmail.com> Co-authored-by: Hugo Laurençon <44556846+HugoLaurencon@users.noreply.github.com> Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * forgot someone (#1394) * last fix i promise (#1396) * Fix thumbnail (#1397) * typo (#1398) * Update _blog.yml Added SafeCoder post * Update safecoder.md Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update safecoder.md Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update safecoder.md Co-authored-by: Pedro Cuenca <pedro@huggingface.co> --------- Signed-off-by: Yao, Matrix <matrix.yao@intel.com> Signed-off-by: Yang, Zhongdong <zhongdong_y@outlook.com> Co-authored-by: Philipp Schmid <32632186+philschmid@users.noreply.github.com> Co-authored-by: Zhongdong Yang <zhongdong_y@outlook.com> Co-authored-by: innovation64 <liyang19991126@126.com> Co-authored-by: Yao, Matrix <matrix.yao@intel.com> Co-authored-by: SuSung-boy <872414318@qq.com> Co-authored-by: Luke Cheng <2258420+chenglu@users.noreply.github.com> Co-authored-by: yaoqih <40328311+yaoqih@users.noreply.github.com> Co-authored-by: Shiliang Chen <36809537+csl122@users.noreply.github.com> Co-authored-by: Alara Dirik <8944735+alaradirik@users.noreply.github.com> Co-authored-by: Omar Sanseviero <osanseviero@gmail.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> Co-authored-by: 李洋 <45715979+innovation64@users.noreply.github.com> Co-authored-by: Yao Matrix <yaoweifeng0301@126.com> Co-authored-by: Hoi2022 <120370631+Hoi2022@users.noreply.github.com> Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com> Co-authored-by: Luc Georges <McPatate@users.noreply.github.com> Co-authored-by: DeltaPenrose <128761972+DeltaPenrose@users.noreply.github.com> Co-authored-by: Victor Muštar <victor.mustar@gmail.com> Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com> Co-authored-by: Julien Simon <3436143+juliensimon@users.noreply.github.com> Co-authored-by: Julien Simon <julsimon@huggingface.co> Co-authored-by: Pedro Cuenca <pedro@huggingface.co> Co-authored-by: gxy-gxy <57594446+gxy-gxy@users.noreply.github.com> Co-authored-by: regisss <15324346+regisss@users.noreply.github.com> Co-authored-by: Lucain <lucainp@gmail.com> Co-authored-by: Eswar Divi <76403422+EswarDivi@users.noreply.github.com> Co-authored-by: Qi Zhang <82949744+Vermillion-de@users.noreply.github.com> Co-authored-by: Victor SANH <victorsanh@gmail.com> Co-authored-by: Ezi Ozoani <ozoaniezi@gmail.com> Co-authored-by: Hugo Laurençon <44556846+HugoLaurencon@users.noreply.github.com>
huggingface · Aug 22, 2023 · 308fb8f · 308fb8f
1 parent 5fb8ce5
commit 308fb8f
Show file tree

Hide file tree

Showing 4 changed files with 106 additions and 0 deletions.
diff --git a/_blog.yml b/_blog.yml
@@ -2668,3 +2668,15 @@
     - research
     - nlp
     - cv
+
+- local: safecoder
+  title: "Introducing SafeCoder"
+  author: jeffboudier
+  thumbnail: /blog/assets/159_safecoder/safecoder.png
+  date: August 22, 2023
+  tags:
+    - announcement
+    - partnerships
+    - vmware
+    - bigcode
+
diff --git a/assets/159_safecoder/coding-example.gif b/assets/159_safecoder/coding-example.gif
diff --git a/assets/159_safecoder/thumbnail.jpg b/assets/159_safecoder/thumbnail.jpg
diff --git a/safecoder.md b/safecoder.md
@@ -0,0 +1,94 @@
+---
+title: "Introducing SafeCoder" 
+thumbnail: /blog/assets/159_safecoder/thumbnail.jpg
+authors:
+- user: jeffboudier
+- user: philschmid
+---
+
+# Introducing SafeCoder
+
+<!-- {blog_metadata} -->
+<!-- {authors} -->
+
+Today we are excited to announce SafeCoder - a code assistant solution built for the enterprise.
+
+The goal of SafeCoder is to unlock software development productivity for the enterprise, with a fully compliant and self-hosted pair programmer. In marketing speak: “your own on-prem GitHub copilot”.
+
+Before we dive deeper, here’s what you need to know:
+
+- SafeCoder is not a model, but a complete end-to-end commercial solution
+- SafeCoder is built with security and privacy as core principles - code never leaves the VPC during training or inference
+- SafeCoder is designed for self-hosting by the customer on their own infrastructure
+- SafeCoder is designed for customers to own their own Code Large Language Model
+
+![example](/blog/assets/159_safecoder/coding-example.gif)
+
+
+## Why SafeCoder?
+
+Code assistant solutions built upon LLMs, such as GitHub Copilot, are delivering strong [productivity boosts](https://github.blog/2022-09-07-research-quantifying-github-copilots-impact-on-developer-productivity-and-happiness/). For the enterprise, the ability to tune Code LLMs on the company code base to create proprietary Code LLMs improves reliability and relevance of completions to create another level of productivity boost. For instance, Google internal LLM code assistant reports a completion [acceptance rate of 25-34%](https://ai.googleblog.com/2022/07/ml-enhanced-code-completion-improves.html) by being trained on an internal code base.
+
+However, relying on closed-source Code LLMs to create internal code assistants exposes companies to compliance and security issues. First during training, as fine-tuning a closed-source Code LLM on an internal codebase requires exposing this codebase to a third party. And then during inference, as fine-tuned Code LLMs are likely to “leak” code from their training dataset during inference. To meet compliance requirements, enterprises need to deploy fine-tuned Code LLMs within their own infrastructure - which is not possible with closed source LLMs.
+
+With SafeCoder, Hugging Face will help customers build their own Code LLMs, fine-tuned on their proprietary codebase, using state of the art open models and libraries, without sharing their code with Hugging Face or any other third party. With SafeCoder, Hugging Face delivers a containerized, hardware-accelerated Code LLM inference solution, to be deployed by the customer directly within the Customer secure infrastructure, without code inputs and completions leaving their secure IT environment.
+
+## From StarCoder to SafeCoder
+
+At the core of the SafeCoder solution is the [StarCoder](https://huggingface.co/bigcode/starcoder) family of Code LLMs, created by the [BigCode](https://huggingface.co/bigcode) project, a collaboration between Hugging Face, ServiceNow and the open source community.
+
+The StarCoder models offer unique characteristics ideally suited to enterprise self-hosted solution:
+
+- State of the art code completion results - see benchmarks in the [paper](https://huggingface.co/papers/2305.06161) and [multilingual code evaluation leaderboard](https://huggingface.co/spaces/bigcode/multilingual-code-evals)
+- Designed for inference performance: a 15B parameters model with code optimizations, Multi-Query Attention for reduced memory footprint, and Flash Attention to scale to 8,192 tokens context.
+- Trained on [the Stack](https://huggingface.co/datasets/bigcode/the-stack), an ethically sourced, open source code dataset containing only commercially permissible licensed code, with a developer opt-out mechanism from the get-go, refined through intensive PII removal and deduplication efforts.
+
+Note: While StarCoder is the inspiration and model powering the initial version of SafeCoder, an important benefit of building a LLM solution upon open source models is that it can adapt to the latest and greatest open source models available. In the future, SafeCoder may offer other similarly commercially permissible open source models built upon ethically sourced and transparent datasets as the base LLM available for fine-tuning.
+
+## Privacy and Security as a Core Principle
+
+For any company, the internal codebase is some of its most important and valuable intellectual property. A core principle of SafeCoder is that the customer internal codebase will never be accessible to any third party (including Hugging Face) during training or inference.
+
+In the initial set up phase of SafeCoder, the Hugging Face team provides containers, scripts and examples to work hand in hand with the customer to select, extract, prepare, duplicate, deidentify internal codebase data into a training dataset to be used in a Hugging Face provided training container configured to the hardware infrastructure available to the customer.
+
+In the deployment phase of SafeCoder, the customer deploys containers provided by Hugging Face on their own infrastructure to expose internal private endpoints within their VPC. These containers are configured to the exact hardware configuration available to the customer, including NVIDIA GPUs, AMD Instinct GPUs, Intel Xeon CPUs, AWS Inferentia2 or Habana Gaudi accelerators.
+
+## Compliance as a Core Principle
+
+As the regulation framework around machine learning models and datasets is still being written across the world, global companies need to make sure the solutions they use minimize legal risks.
+
+Data sources, data governance, management of copyrighted data are just a few of the most important compliance areas to consider. BigScience, the older cousin and inspiration for BigCode, addressed these areas in working groups before they were broadly recognized by the draft AI EU Act, and as a result was [graded as most compliant among Foundational Model Providers in a Stanford CRFM study](https://crfm.stanford.edu/2023/06/15/eu-ai-act.html).
+
+BigCode expanded upon this work by implementing novel techniques for the code domain and building The Stack with compliance as a core principle, such as commercially permissible license filtering, consent mechanisms (developers can [easily find out if their code is present and request to be opted out](https://huggingface.co/spaces/bigcode/in-the-stack) of the dataset), and extensive documentation and tools to inspect the [source data](https://huggingface.co/datasets/bigcode/the-stack-metadata), and dataset improvements (such as [deduplication](https://huggingface.co/blog/dedup) and [PII removal](https://huggingface.co/bigcode/starpii)).
+
+All these efforts translate into legal risk minimization for users of the StarCoder models, and customers of SafeCoder. And for SafeCoder users, these efforts translate into compliance features: when software developers get code completions these suggestions are checked against The Stack, so users know if the suggested code matches existing code in the source dataset, and what the license is. Customers can specify which licenses are preferred and surface those preferences to their users.
+
+## How does it work?
+
+SafeCoder is a complete commercial solution, including service, software and support.
+
+### Training your own SafeCoder model
+
+StarCoder was trained in more than 80 programming languages and offers state of the art performance on [multiple benchmarks](https://huggingface.co/spaces/bigcode/multilingual-code-evals). To offer better code suggestions specifically for a SafeCoder customer, we start the engagement with an optional training phase, where the Hugging Face team works directly with the customer team to guide them through the steps to prepare and build a training code dataset, and to create their own code generation model through fine-tuning, without ever exposing their codebase to third parties or the internet.
+
+The end result is a model that is adapted to the code languages, standards and practices of the customer. Through this process, SafeCoder customers learn the process and build a pipeline for creating and updating their own models, ensuring no vendor lock-in, and keeping control of their AI capabilities.
+
+### Deploying SafeCoder
+
+During the setup phase, SafeCoder customers and Hugging Face design and provision the optimal infrastructure to support the required concurrency to offer a great developer experience. Hugging Face then builds SafeCoder inference containers that are hardware-accelerated and optimized for throughput, to be deployed by the customer on their own infrastructure.
+
+SafeCoder inference supports various hardware to give customers a wide range of options: NVIDIA Ampere GPUs, AMD Instinct GPUs, Habana Gaudi2, AWS Inferentia 2, Intel Xeon Sapphire Rapids CPUs and more.
+
+### Using SafeCoder
+
+Once SafeCoder is deployed and its endpoints are live within the customer VPC, developers can install compatible SafeCoder IDE plugins to get code suggestions as they work. Today, SafeCoder supports popular IDEs, including [VSCode](https://marketplace.visualstudio.com/items?itemName=HuggingFace.huggingface-vscode), IntelliJ and with more plugins coming from our partners.
+
+## How can I get SafeCoder?
+
+Today, we are announcing SafeCoder in collaboration with VMware at the VMware Explore conference and making SafeCoder available to VMware enterprise customers. Working with VMware helps ensure the deployment of SafeCoder on customers’ VMware Cloud infrastructure is successful – whichever cloud, on-premises or hybrid infrastructure scenario is preferred by the customer. In addition to utilizing SafeCoder, VMware has published a [reference architecture](https://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/docs/vmware-baseline-reference-architecture-for-generative-ai.pdf) with code samples to enable the fastest possible time-to-value when deploying and operating SafeCoder on VMware infrastructure. VMware’s Private AI Reference Architecture makes it easy for organizations to quickly leverage popular open source projects such as ray and kubeflow to deploy AI services adjacent to their private datasets, while working with Hugging Face to ensure that organizations maintain the flexibility to take advantage of the latest and greatest in open-source models. This is all without tradeoffs in total cost of ownership or performance.
+
+“Our collaboration with Hugging Face around SafeCoder fully aligns to VMware’s goal of enabling customer choice of solutions while maintaining privacy and control of their business data. In fact, we have been running SafeCoder internally for months and have seen excellent results. Best of all, our collaboration with Hugging Face is just getting started, and I’m excited to take our solution to our hundreds of thousands of customers worldwide,” says Chris Wolf, Vice President of VMware AI Labs. Learn more about private AI and VMware’s differentiation in this emerging space [here](https://octo.vmware.com/vmware-private-ai-foundation/).
+
+---
+
+If you’re interested in SafeCoder for your company, please contact us [here](mailto:api-enterprise@huggingface.co?subject=SafeCoder) - our team will contact you to discuss your requirements!