Source code for nki
from dataclasses import dataclass
from enum import Enum
from typing import *
[docs]def jit(fn=None, **kwargs):
r"""Just-in-time compile a top-level NKI function to run on NeuronDevices.
The returned callable detects the current framework and compiles the function as a
custom operator. It detects the current framework by inspecting its arguments:
- ``torch.Tensor``: uses PyTorch integration.
- ``jax.Array``: uses JAX integration.
- ``np.ndarray``: compiles and executes standalone kernel, without a framework.
You might need to explicitly set the target platform using the
``NEURON_PLATFORM_TARGET_OVERRIDE`` environment variable. Supported values:
- ``trn1|inf2|gen2``
- ``trn2|gen3``
- ``trn3|gen4``
The LNC (Logical NeuronCore) degree can be set at the callsite using bracket syntax:
``kernel[lnc](args)``. The default is LNC=1. The LNC value must match the
``NEURON_LOGICAL_NC_CONFIG`` environment variable set for the Neuron Runtime.
Mismatching the two will cause a runtime error. For example, if
``NEURON_LOGICAL_NC_CONFIG=1``, the kernel must be launched with ``kernel[1](...)``
or ``kernel(...)``.
Returns a :class:`Kernel` instance wrapping the decorated function."""
...
[docs]def simulate(kernel):
r"""Create a CPU-simulated version of an NKI kernel.
.. warning::
**This API is experimental and may change in future releases**. It has not been tested or confirmed to work on all hardware platforms and operating systems.
Currently, Neuron confirms support for ``nki.simulate`` on these 2 operating systems:
* Ubuntu 22.04
* Amazon Linux 2023
See :ref:`nki-simulator` for full documentation including target platform
selection, precise floating-point mode, debugging, and known limitations.
Example::
@nki.jit
def my_kernel(a, b): ...
# Explicit simulation
result = nki.simulate(my_kernel)(a_np, b_np)
# With LNC2
result = nki.simulate(my_kernel[2])(a_np, b_np)
Args:
kernel: NKI kernel function, typically decorated with ``@nki.jit``.
If a plain function is passed, it is automatically wrapped.
Returns:
A callable that, when invoked with NumPy arrays or torch Tensors,
executes the kernel on CPU and returns results in the same format
(NumPy arrays or torch Tensors respectively)."""
...