Deep Note:agent/example/kernels/a5/flash_attn_full_fp8_causal.py
【免费下载链接】cannbot-skillsCANNBot 是面向 CANN 开发的用于提升开发效率的系列智能体,本仓库为其提供可复用的 Skills 模块。项目地址: https://gitcode.com/cann/cannbot-skills
Open this file only after the short catalog entry confirmed the kernel is relevant. Its job is to capture the extra rationale that would otherwise bloat the catalog entry.
What this kernel is really for
- the multi-row full-sequence a5 attention path, not the simpler
L=1decode-stylemha_ifa*family - a normalized online-softmax pipeline where delayed
p @ vstays on chip - a causal contract where only the diagonal tile needs mixed valid/invalid score handling
Decisions worth copying
- treat both causal masking and
S2tail invalidation in score space beforerowmax - keep future fully-invalid tiles out of the loop with
active_tiles_n = Min(tiles_n, tile_m + 1) - publish vec-produced
e5m2probability tiles into NDl1pfor the delayed cube consumer - keep separate
l0c_qk/l0c_pvandub_score/ub_pvfamilies; do not collapse them into one scratch lineage - compress row-state scratch into narrow
[1,64]UB tensors so the larger full-sequence path still fits local memory
Prefer another kernel when
- the query side is still row-specialized (
L=1) andmha_ifa*already matches - stage 2 truly wants NZ-published probability tiles
- the contract is half-domain or non-fp8 rather than
e5m2q/k/v
【免费下载链接】cannbot-skillsCANNBot 是面向 CANN 开发的用于提升开发效率的系列智能体,本仓库为其提供可复用的 Skills 模块。项目地址: https://gitcode.com/cann/cannbot-skills
创作声明:本文部分内容由AI辅助生成(AIGC),仅供参考