This document is relevant for: Inf2
, Trn1
, Trn2
nki.compiler.psum.mod_alloc#
- nki.compiler.psum.mod_alloc(*, base_bank, base_addr=0, base_partition=0, num_bank_tiles=(), num_par_tiles=(), num_free_tiles=())[source]#
Allocate PSUM memory space for each logical block in a tensor through modulo allocation.
This is one of the NKI direction allocation APIs. We recommend reading NKI Direct Allocation Developer Guide before using these APIs.
This API is equivalent to calling nki.compiler.psum.alloc() with a callable
psum_modulo_alloc_func
as defined below.1from typing import Optional, Tuple 2from functools import reduce 3from operator import mul 4import unittest 5 6def num_elems(shape): 7 return reduce(mul, shape, 1) 8 9def linearize(shape, indices): 10 return sum(i * num_elems(shape[dim+1:]) for dim, i in enumerate(indices)) 11 12def modulo_allocate_func(base, allocate_shape, scale): 13 def func(indices): 14 if not allocate_shape: 15 # default shape is always (1, 1, ...) 16 allocate_shape_ = (1, ) * len(indices) 17 else: 18 allocate_shape_ = allocate_shape 19 mod_idx = tuple(i % s for i, s in zip(indices, allocate_shape_)) 20 return linearize(shape=allocate_shape_, indices=mod_idx) * scale + base 21 return func 22 23def mod_alloc(base_addr: int, *, 24 base_bank: Optional[int] = 0, 25 num_bank_tiles: Optional[Tuple[int]] = (), 26 base_partition: Optional[int] = 0, 27 num_par_tiles: Optional[Tuple[int]] = (), 28 num_free_tiles: Optional[Tuple[int]] = ()): 29 def psum_modulo_alloc_func(idx, pdim_size, fdim_size): 30 # partial bank allocation is not allowed 31 return (modulo_allocate_func(base_bank, num_bank_tiles, 1)(idx), 32 modulo_allocate_func(base_partition, num_par_tiles, pdim_size)(idx), 33 modulo_allocate_func(base_addr, num_free_tiles, fdim_size)(idx)) 34 return psum_modulo_alloc_func 35
Here’s an example usage of this API:
psum_tensor = nl.ndarray((4, nl.par_dim(128), 512), dtype=nl.float32, buffer=ncc.psum.mod_alloc(base_bank=0, base_addr=0, num_bank_tiles=(2,))) for i_block in nl.affine_range(4): psum[i_block, :, :] = nisa.nc_matmul(...) ... = nl.exp(psum[i_block, :, :])
This produces the following allocation:
# Logical Tile Index
Physical Tile
bank_id
Physical Tile
start_partition
Physical Tile
byte_addr
(0, )
0
0
0
(1, )
1
0
0
(2, )
0
0
0
(3, )
1
0
0
With above scheme, we are able to implement double buffering in
nki_tensor
, such thatnisa.nc_matmul
in one iteration can write to one physical tile whilenl.exp
of the previous iteration can read from the other physical tile simultaneously.Note
In current release, programmers cannot mix NKI tensor declarations using automatic allocation (
ncc.psum.auto_alloc()
or the SBUF variant) and direction allocation APIs (ncc.psum.alloc()
,ncc.psum.mod_alloc()
or the SBUF variants).- Parameters:
base_addr – the base address in bytes along the free(F) dimension of the PSUM bank. Must be 0 in the current version.
base_bank – the base bank ID that the physical tiles start from.
num_bank_tiles – the number of PSUM banks allocated for the tensor.
base_partition – the partition ID the physical tiles start from. Must be 0 in the current version.
num_par_tiles – the number of physical tiles along the partition dimension allocated for the tensor. The length of the tuple must be empty or equal to the length of block dimension for the tensor. Currently must be an empty tuple or (1, 1, …).
num_free_tiles – the number of physical tiles on the free dimension per PSUM bank allocated for the tensor. The length of the tuple must be empty or equal to the length of block dimension for the tensor. Currently must be an empty tuple or (1, 1, …).
This document is relevant for: Inf2
, Trn1
, Trn2