This document is relevant for: Inf2, Trn1, Trn2

nki.compiler.sbuf.alloc#

nki.compiler.sbuf.alloc(func)[source]#

Allocate SBUF memory space for each logical block in a tensor using a customized allocation method.

This is one of the NKI direction allocation APIs. We recommend reading NKI Direct Allocation Developer Guide before using these APIs.

In NKI, a SBUF tensor (declared using NKI tensor creation APIs) can have three kinds of dimensions, in order: logical block(B), partition(P), and free(F). The partition and free dimensions directly map to the SBUF dimensions. Both B and F can be multi-dimensional, while P must be one-dimensional per Neuron ISA constraints. The block dimension describes how many (P, F) logical tiles this tensor has, but does not reflect the number of physical tiles being allocated.

ncc.sbuf.alloc should be assigned to the buffer field of a NKI tensor declaration API. For example,

nki_tensor = nl.ndarray((4, 8, nl.par_dim(128), 4, 32), dtype=nl.bfloat16, buffer=ncc.sbuf.alloc(...))

ncc.sbuf.alloc allows programmers to specify the physical location of each logical tile in the tensor. The API accepts a single input func parameter, which is a callable object that takes in:

  1. a tuple of integers idx representing a logical block index,

  2. an integer pdim_size for the number of partitions the logical tile has, and

  3. an integer fdim_size for the number of bytes the logical tile has per partition.

The number of integers in idx must match the number of B dimensions the SBUF tensor has. For example, for the above nki_tensor, we expect the idx tuple to have two integers for a 2D block index.

pdim_size should match the partition dimension size of the NKI tensor exactly. fdim_size should be the total size of F dimension shapes of each logical tile in the tensor, multiplied by the data type size in bytes. For the above sbuf_tensor, pdim_size should be 128, and fdim_size should be 4*32*sizeof(nl.bfloat16) = 256 bytes.

The func callable must return a tuple of two integers (start_partition, byte_addr) indicating the physical tile location for the input logical block index. start_partition indicates the lowest partition the physical tile allocation starts from and must follow the these ISA rules:

  • If 64 < pdim_size <= 128, start_partition must be 0

  • If 32 < pdim_size <= 64, start_partition must be 0 or 64

  • If 0  < pdim_size <= 32, start_partition must be one of 0/32/64/96

The byte_addr indicates the byte offset into each partition the physical tile starts from. On NeuronCore-v2, a valid byte_addr can be any integer values from 0 (inclusive) to 192KiB-16KiB=(192-16)*1024 (exclusive). 192KiB is the physical size of a SBUF partition (defined in architecture guide) and 16KiB is allocated for compiler internal usage. In addition, the base_addr must be aligned to nki.language.constants.sbuf_min_align.

Note

In current release, programmers cannot mix NKI tensor declarations using automatic allocation (ncc.sbuf.auto_alloc() or the PSUM variant) and direction allocation APIs (ncc.sbuf.alloc(), ncc.sbuf.mod_alloc() or the PSUM variants) in the same kernel.

Parameters:

func – a callable object to specify how to place the logical block in SBUF memory.

This document is relevant for: Inf2, Trn1, Trn2