This document is relevant for: Inf2, Trn1, Trn2

nki.profile#

nki.profile(func=None, **kwargs)[source]#

Profile a NKI kernel on a NeuronDevice by using nki.profile as a decorator.

Note

Similar to nki.baremetal, The decorated function using nki.benchmark expects numpy.ndarray as input/output tensors instead of ML framework tensor objects.

Parameters:
  • working_directory – A path to working directory where profile artifacts are saved, This must be specified and must also be an absolute path.

  • save_neff_name – Name of the saved neff file if specified (file.neff by default).

  • save_trace_name – Name of the saved trace (profile) file if specified (profile.ntff by default)

  • additional_compile_opt – Additional Neuron compiler flags to pass in when compiling the kernel.

  • overwrite – Overwrite existing profile artifacts if set to True. Default is False.

  • profile_nth – Profiles the profile_nth execution. Default is 1.

Returns:

None

Listing 13 An Example#
from neuronxcc import nki
import neuronxcc.nki.language as nl

@nki.profile(working_directory="/home/ubuntu/profiles", save_neff_name='file.neff', save_trace_name='profile.ntff')
def nki_tensor_tensor_add(a_tensor, b_tensor):
  c_tensor = nl.ndarray(a_tensor.shape, dtype=a_tensor.dtype, buffer=nl.shared_hbm)

  a = nl.load(a_tensor)
  b = nl.load(b_tensor)

  c = a + b

  nl.store(c_tensor, c)

  return c_tensor

nki.profile will save file.neff, profile.ntff, along with json files containing a profile summary inside of the working_directory.

See Profiling NKI kernels with Neuron Profile for more information on how to visualize the execution trace for profiling purposes.

In addition, more information about neuron-profile can be found in its documentation.

Note

nki.profile does not use the actual inputs passed into the profiled function when running the neff file. For instance, in the above example, the output c tensor is undefined and should not be used for numerical accuracy checks. The input tensors are used mainly to specify the shape of inputs.

This document is relevant for: Inf2, Trn1, Trn2