CANN PTO集合通信指令详解
集合通信指令详解TGATHER / TSCATTER / TBROADCAST / TREDUCE【免费下载链接】pto-isaParallel Tile Operation (PTO) is a virtual instruction set architecture designed by Ascend CANN, focusing on tile-level operations. This repository offers high-performance, cross-platform tile operations across Ascend platforms.项目地址: https://gitcode.com/cann/pto-isa所有集合通信指令共享以下特征仅root执行调用非 root 不得调用未定义行为基于ParallelGroup指定参与者支持单缓冲和乒乓双缓冲数据超出 UB Tile 时自动二维滑动分块TGATHER — 多 rank 收集Root 从所有 rank 收集数据沿 DIM_3 拼接。// 单缓冲 template typename ParallelGroupType, typename GlobalDstData, typename TileData, typename... WaitEvents RecordEvent TGATHER(ParallelGroupType group, GlobalDstData dst, TileData stagingTile, WaitEvents... events); // 乒乓 template typename ParallelGroupType, typename GlobalDstData, typename TileData, typename... WaitEvents RecordEvent TGATHER(ParallelGroupType group, GlobalDstData dst, TileData pingTile, TileData pongTile, WaitEvents... events);约束dstGlobalData指向本地内存GetShape(DIM_3)必须 ≥N × HparallelGroup.tensors[r]指向 rank r 的远端源缓冲区所有源 tensor 必须形状和步幅相同Tile 分块约束静态ValidRow/ValidCol必须能整除对应维度示例GPerRank tensors[NRANKS]; for (int i 0; i NRANKS; i) tensors[i] GPerRank(group_addrs[i]); comm::ParallelGroupGPerRank group(tensors, NRANKS, my_rank); GResult dstG(result); TileT stagingTile(TILE_ROWS, TILE_COLS); comm::TGATHER(group, dstG, stagingTile);TSCATTER — 从 root 分发Root 将数据沿 DIM_3 拆分后分发到各 rank。TGATHER 的逆操作。// 单缓冲 template typename ParallelGroupType, typename GlobalSrcData, typename TileData, typename... WaitEvents RecordEvent TSCATTER(ParallelGroupType group, GlobalSrcData src, TileData stagingTile, WaitEvents... events); // 乒乓 template typename ParallelGroupType, typename GlobalSrcData, typename TileData, typename... WaitEvents RecordEvent TSCATTER(ParallelGroupType group, GlobalSrcData src, TileData pingTile, TileData pongTile, WaitEvents... events);约束srcGlobalData指向本地内存GetShape(DIM_3)必须 ≥N × HparallelGroup.tensors[r]指向 rank r 的远端目标缓冲区TBROADCAST — 广播Root 将本地数据广播到所有 rank。// 单缓冲 template typename ParallelGroupType, typename GlobalSrcData, typename TileData, typename... WaitEvents RecordEvent TBROADCAST(ParallelGroupType group, GlobalSrcData src, TileData stagingTile, WaitEvents... events); // 乒乓 template typename ParallelGroupType, typename GlobalSrcData, typename TileData, typename... WaitEvents RecordEvent TBROADCAST(ParallelGroupType group, GlobalSrcData src, TileData pingTile, TileData pongTile, WaitEvents... events);约束srcGlobalData指向本地内存parallelGroup.tensors[k]指向 rank k 的远端目标缓冲区TREDUCE — 多 rank 归约Root 从所有 rank 收集数据并执行逐元素归约。// 基础 reduce累加 Tile 接收 Tile template typename ParallelGroupType, typename GlobalDstData, typename TileData, typename... WaitEvents RecordEvent TREDUCE(ParallelGroupType group, GlobalDstData dst, TileData accTile, TileData recvTile, ReduceOp op, WaitEvents... events); // 乒乓 reduce template typename ParallelGroupType, typename GlobalDstData, typename TileData, typename... WaitEvents RecordEvent TREDUCE(ParallelGroupType group, GlobalDstData dst, TileData accTile, TileData pingTile, TileData pongTile, ReduceOp op, WaitEvents... events);约束dstGlobalData指向本地内存accTileData、recvTileData或accTilepingTilepongTile必须为预先分配的 UB TileparallelGroup.tensors[r]指向 rank r 的远端源缓冲区分块约束同 TGATHER示例comm::ParallelGroupGTensor group(tensors, NRANKS, my_rank); GTensor dstG(result); TileT accTile, recvTile; comm::TREDUCE(group, dstG, accTile, recvTile, comm::ReduceOp::Sum);【免费下载链接】pto-isaParallel Tile Operation (PTO) is a virtual instruction set architecture designed by Ascend CANN, focusing on tile-level operations. This repository offers high-performance, cross-platform tile operations across Ascend platforms.项目地址: https://gitcode.com/cann/pto-isa创作声明:本文部分内容由AI辅助生成(AIGC),仅供参考