tile_broadcast_one_blk
【免费下载链接】catlass本项目是CANN的算子模板库,提供NPU上高性能矩阵乘及其相关融合类算子模板样例。项目地址: https://gitcode.com/cann/catlass
代码位置
[TOC]
概述
tile_broadcast_one_blk模块实现 epilogue 阶段的 one-block 广播操作。将 UB 上的单个元素广播到整个 block(32B),常用于将 scalar scale/zero 点广播后参与向量计算。
API 清单
| API | 风格 | 说明 |
|---|---|---|
| TileBroadcastOneBlk | 非 TLA | AscendC::Brcb+BrcbRepeatParams |
| TileBroadcastOneBlkTla | TLA | TLA 版本,tensor.layout()(tensor.coord())偏移 |
调用示例
TileBroadcastOneBlk(非 TLA)
#include "catlass/epilogue/tile/tile_broadcast_one_blk.hpp" using namespace Catlass::Epilogue::Tile; using ComputeType = Gemm::GemmType<half, layout::RowMajor>; constexpr uint32_t COMPUTE_LENGTH = 256; using BroadcastOp = TileBroadcastOneBlk<Arch::AtlasA2, ComputeType, COMPUTE_LENGTH>; AscendC::LocalTensor<half> ubOut, ubIn; BroadcastOp broadcastOp; broadcastOp(ubOut, ubIn);TileBroadcastOneBlkTla(TLA)
constexpr uint32_t COMPUTE_LENGTH = 256; auto layoutOut = tla::MakeLayout<half, layout::RowMajor>(COMPUTE_LENGTH, 32); auto layoutIn = tla::MakeLayout<half, layout::VectorLayout>(COMPUTE_LENGTH, 1); AscendC::LocalTensor<half> ubOutData, ubInData; auto ubOut = tla::MakeTensor(ubOutData, layoutOut, Arch::PositionUB{}); auto ubIn = tla::MakeTensor(ubInData, layoutIn, Arch::PositionUB{}); TileBroadcastOneBlkTla<Arch::AtlasA2, half, COMPUTE_LENGTH> op; op(ubOut, ubIn);【免费下载链接】catlass本项目是CANN的算子模板库,提供NPU上高性能矩阵乘及其相关融合类算子模板样例。项目地址: https://gitcode.com/cann/catlass
创作声明:本文部分内容由AI辅助生成(AIGC),仅供参考