CANN/cann-recipes-infer Kimi-K2-Thinking配置指南

张

张建站

2026/5/9 22:26:09

10分钟阅读

CANN/cann-recipes-infer Kimi-K2-Thinking配置指南

YAML Parameter Description【免费下载链接】cann-recipes-infer本项目针对LLM与多模态模型推理业务中的典型模型、加速算法提供基于CANN平台的优化样例项目地址: https://gitcode.com/cann/cann-recipes-inferBasic Config model_name: kimi_k2_thinking # string type model_path: /data/models/Kimi-K2-Thinking/ # string type exe_mode: ge_graph # string type. Only support [ge_graph, eager] world_size: 128 # int type Model Config pa_block_size: 128 # PA Block Size value. support [128, 256] enable_weight_nz: True # whether use nz-weight format for better performance. support [False, True] with_ckpt: True # whether load ckpt. support [False, True] enable_multi_streams: True # whether enable multistream for better performance. support [False, True] enable_profiler: True # whether enable profiling. support [False, True] enable_cache_compile: False # whether enable cache compile for better performance. support [False, True] prefill_mini_batch_size: 0 # mini_batch_size for prefill stage. perfect_eplb: False # whether enable, test uniform scenario of MoE experts support [False, True] enable_auto_split_weight: True # whether enable auto-split weight. support [False, True] next_n: 0 # steps using multi-token prediction. support [0, 1, 2, 3] Data Config dataset: default # support [default InfiniteBench LongBench] input_max_len: 8192 # the input max length max_new_tokens: 100 # max new tokens batch_size: 128 # Global batch size Parallel Config cp_size: 128 # Context Parallel Number. if cp_size 0, cp_size should equal to world_size. Only active at prefill stage attn_tp_size: 1 # Attention TP Number oproj_tp_size: 8 # Oproj TP Number. Only support when attn_tp_size 1 dense_tp_size: 1 # Dense MLP TP Number moe_tp_size: 1 # MoE TP Number embed_tp_size: 16 # Embed TP Number lmhead_tp_size: 16 # LMHead TP Number【免费下载链接】cann-recipes-infer本项目针对LLM与多模态模型推理业务中的典型模型、加速算法提供基于CANN平台的优化样例项目地址: https://gitcode.com/cann/cann-recipes-infer创作声明：本文部分内容由AI辅助生成（AIGC），仅供参考

Awesome-OpenAI-GPTs：从Prompt工程到智能体开发的实战指南

1. 项目概述与核心价值最近在折腾AI应用开发，特别是基于OpenAI GPTs构建自己的智能体时，发现了一个宝藏级的资源集合——promptslab/Awesome-Openai-GPTs。这不仅仅是一个简单的GitHub仓库列表，更像是一个由全球开发者和AI爱好者共同维护的“…...

2026/5/9 22:24:10 阅读更多 →

智慧树刷课终极指南：如何用Autovisor实现全自动学习提升效率

智慧树刷课终极指南：如何用Autovisor实现全自动学习提升效率【免费下载链接】Autovisor 2025智慧树刷课脚本基于Python Playwright的自动化程序 [有免安装版] 项目地址: https://gitcode.com/gh_mirrors/au/Autovisor 还在为智慧树网课而烦恼吗&#xff1f…...

2026/5/9 22:22:10 阅读更多 →

如何在鸿蒙系统上打造完全自定义的纯净阅读体验：开源阅读鸿蒙版终极指南

如何在鸿蒙系统上打造完全自定义的纯净阅读体验：开源阅读鸿蒙版终极指南【免费下载链接】legado-Harmony 开源阅读鸿蒙版仓库项目地址: https://gitcode.com/gh_mirrors/le/legado-Harmony 你是否厌倦了商业阅读应用中的广告干扰和内容限制？是否…...

2026/5/9 22:21:34 阅读更多 →

C语言RTOS多核协同失效真相：Cache一致性缺失、内存序乱序、GCC -O2优化陷阱——三重危机诊断工具链实战

更多请点击： https://intelliparadigm.com 第一章：C语言RTOS多核协同失效的系统性认知在嵌入式实时系统中，基于C语言开发的RTOS（如FreeRTOS、Zephyr或RT-Thread）常被移植至ARM Cortex-A/R系列或多核RISC-V SoC平台。…...

2026/5/8 3:27:44 阅读更多 →

Zotero GPT终极指南：用AI轻松读懂学术文献的研究态度与情感倾向

Zotero GPT终极指南：用AI轻松读懂学术文献的研究态度与情感倾向【免费下载链接】zotero-gpt GPT Meet Zotero. 项目地址: https://gitcode.com/gh_mirrors/zo/zotero-gpt 你是否曾被海量学术文献淹没？是否在阅读论文时难以快速把握作者的研究立场…...

2026/5/8 1:39:53 阅读更多 →