nki.compiler.sbuf.alloc#
- nki.compiler.sbuf.alloc(func)[source]#
Allocate SBUF memory space for each logical block in a tensor using a customized allocation method.
This is one of the NKI direction allocation APIs. We recommend reading NKI Direct Allocation Developer Guide before using these APIs.
In NKI, a SBUF tensor (declared using NKI tensor creation APIs) can have three kinds of dimensions, in order: logical block(B), partition(P), and free(F). The partition and free dimensions directly map to the SBUF dimensions. Both B and F can be multi-dimensional, while P must be one-dimensional per Neuron ISA constraints. The block dimension describes how many (P, F) logical tiles this tensor has, but does not reflect the number of physical tiles being allocated.
ncc.sbuf.allocshould be assigned to thebufferfield of a NKI tensor declaration API. For example,nki_tensor = nl.ndarray((4, 8, nl.par_dim(128), 4, 32), dtype=nl.bfloat16, buffer=ncc.sbuf.alloc(...))
ncc.sbuf.allocallows programmers to specify the physical location of each logical tile in the tensor. The API accepts a single inputfuncparameter, which is a callable object that takes in:a tuple of integers
idxrepresenting a logical block index,an integer
pdim_sizefor the number of partitions the logical tile has, andan integer
fdim_sizefor the number of bytes the logical tile has per partition.
The number of integers in
idxmust match the number of B dimensions the SBUF tensor has. For example, for the abovenki_tensor, we expect theidxtuple to have two integers for a 2D block index.pdim_sizeshould match the partition dimension size of the NKI tensor exactly.fdim_sizeshould be the total size of F dimension shapes of each logical tile in the tensor, multiplied by the data type size in bytes. For the abovesbuf_tensor,pdim_sizeshould be 128, andfdim_sizeshould be4*32*sizeof(nl.bfloat16) = 256bytes.The
funccallable must return a tuple of two integers(start_partition, byte_addr)indicating the physical tile location for the input logical block index.start_partitionindicates the lowest partition the physical tile allocation starts from and must follow the these ISA rules:If
64 < pdim_size <= 128,start_partitionmust be 0If
32 < pdim_size <= 64,start_partitionmust be 0 or 64If
0 < pdim_size <= 32,start_partitionmust be one of 0/32/64/96
The
byte_addrindicates the byte offset into each partition the physical tile starts from. On NeuronCore-v2, a validbyte_addrcan be any integer values from 0 (inclusive) to 192KiB-16KiB=(192-16)*1024 (exclusive). 192KiB is the physical size of a SBUF partition (defined in architecture guide) and 16KiB is allocated for compiler internal usage. In addition, thebase_addrmust be aligned tonki.language.constants.sbuf_min_align.Note
In current release, programmers cannot mix NKI tensor declarations using automatic allocation (
ncc.sbuf.auto_alloc()or the PSUM variant) and direction allocation APIs (ncc.sbuf.alloc(),ncc.sbuf.mod_alloc()or the PSUM variants) in the same kernel.- Parameters:
func – a callable object to specify how to place the logical block in SBUF memory.