This document is relevant for: Inf2
, Trn1
, Trn2
nki.compiler.sbuf.alloc#
- nki.compiler.sbuf.alloc(func)[source]#
Allocate SBUF memory space for each logical block in a tensor using a customized allocation method.
This is one of the NKI direction allocation APIs. We recommend reading NKI Direct Allocation Developer Guide before using these APIs.
In NKI, a SBUF tensor (declared using NKI tensor creation APIs) can have three kinds of dimensions, in order: logical block(B), partition(P), and free(F). The partition and free dimensions directly map to the SBUF dimensions. Both B and F can be multi-dimensional, while P must be one-dimensional per Neuron ISA constraints. The block dimension describes how many (P, F) logical tiles this tensor has, but does not reflect the number of physical tiles being allocated.
ncc.sbuf.alloc
should be assigned to thebuffer
field of a NKI tensor declaration API. For example,nki_tensor = nl.ndarray((4, 8, nl.par_dim(128), 4, 32), dtype=nl.bfloat16, buffer=ncc.sbuf.alloc(...))
ncc.sbuf.alloc
allows programmers to specify the physical location of each logical tile in the tensor. The API accepts a single inputfunc
parameter, which is a callable object that takes in:a tuple of integers
idx
representing a logical block index,an integer
pdim_size
for the number of partitions the logical tile has, andan integer
fdim_size
for the number of bytes the logical tile has per partition.
The number of integers in
idx
must match the number of B dimensions the SBUF tensor has. For example, for the abovenki_tensor
, we expect theidx
tuple to have two integers for a 2D block index.pdim_size
should match the partition dimension size of the NKI tensor exactly.fdim_size
should be the total size of F dimension shapes of each logical tile in the tensor, multiplied by the data type size in bytes. For the abovesbuf_tensor
,pdim_size
should be 128, andfdim_size
should be4*32*sizeof(nl.bfloat16) = 256
bytes.The
func
callable must return a tuple of two integers(start_partition, byte_addr)
indicating the physical tile location for the input logical block index.start_partition
indicates the lowest partition the physical tile allocation starts from and must follow the these ISA rules:If
64 < pdim_size <= 128
,start_partition
must be 0If
32 < pdim_size <= 64
,start_partition
must be 0 or 64If
0 < pdim_size <= 32
,start_partition
must be one of 0/32/64/96
The
byte_addr
indicates the byte offset into each partition the physical tile starts from. On NeuronCore-v2, a validbyte_addr
can be any integer values from 0 (inclusive) to 192KiB-16KiB=(192-16)*1024 (exclusive). 192KiB is the physical size of a SBUF partition (defined in architecture guide) and 16KiB is allocated for compiler internal usage. In addition, thebase_addr
must be aligned tonki.language.constants.sbuf_min_align
.Note
In current release, programmers cannot mix NKI tensor declarations using automatic allocation (
ncc.sbuf.auto_alloc()
or the PSUM variant) and direction allocation APIs (ncc.sbuf.alloc()
,ncc.sbuf.mod_alloc()
or the PSUM variants) in the same kernel.- Parameters:
func – a callable object to specify how to place the logical block in SBUF memory.
This document is relevant for: Inf2
, Trn1
, Trn2