This document is relevant for: Inf2, Trn1, Trn1n

nki.baremetal#

nki.baremetal(kernel=None, **kwargs)[source]#

Compile and run a NKI kernel on NeuronDevice without involving ML frameworks such as PyTorch and JAX. If you decorate your NKI kernel function with decorator @nki.baremetal(...), you may call the NKI kernel function directly just like any other Python function. You must run this API on a Trn/Inf instance with NeuronDevices (v2 or beyond) attached.

Note

The decorated function using nki.baremetal expects numpy.ndarray as input/output tensors instead of ML framework tensor objects.

This decorator compiles the NKI kernel into an executable on NeuronDevices (NEFF) and also collects an execution trace (NTFF) by running the NEFF on the local NeuronDevice. See Profiling NKI kernels with Neuron Profile for more information on how to visualize the execution trace for profiling purposes.

Since nki.baremetal runs the compiled NEFF without invoking any ML framework, it is the fastest way to compile and run any NKI kernel standalone on NeuronDevice. Therefore, this decorator is useful for quickly iterating an early implementation of a NKI kernel to reach functional correctness before porting it to the ML framework and injecting the kernel into the full ML model. To iterate over NKI kernel performance quickly, NKI also provides nki.benchmark decorator which uses the same underlying mechanism as nki.baremetal but additionally collects latency statistics in different percentiles.

Parameters:
  • save_neff_name – A file path to save your NEFF file. By default, this is unspecified, and the NEFF file will be deleted automatically after execution.

  • save_trace_name – A file path to save your NTFF file. By default, this is unspecified, and the NTFF file will be deleted automatically after execution. Known issue: if save_trace_name is specified, save_neff_name must be set to “file.neff”.

  • additional_compile_opt – Additional Neuron compiler flags to pass in when compiling the kernel.

  • artifacts_dir – A directory path to save Neuron compiler artifacts. The directory must be empty before running the kernel. A non-empty directory would lead to a compilation error.

Returns:

None

Listing 14 An Example#
from neuronxcc.nki import baremetal
import neuronxcc.nki.language as nl
import numpy as np

@baremetal(save_neff_name='file.neff', save_trace_name='profile.ntff')
def nki_tensor_tensor_add(a_tensor, b_tensor, c_tensor):
  a = nl.load(a_tensor)
  b = nl.load(b_tensor)

  c = a + b

  nl.store(c_tensor, c)

a = np.zeros([128, 1024], dtype=np.float32)
b = np.random.random_sample([128, 1024]).astype(np.float32)
c = np.ndarray(shape=(128, 1024), dtype=np.float32)
nki_tensor_tensor_add(a, b, c)

assert np.allclose(c, b)

This document is relevant for: Inf2, Trn1, Trn1n