This document is relevant for: Inf2, Trn1, Trn2

nki.isa.reciprocal#

nki.isa.reciprocal(data, *, dtype=None, mask=None, engine=0, **kwargs)[source]#

Compute reciprocal of each element in the input data tile using Scalar Engine or Vector Engine.

The target compute engine can be specified with the engine parameter. Vector Engine performs reciprocal at a higher precision compared to Scalar Engine; however, the computation throughput of reciprocal on Vector Engine is about 8x lower than Scalar Engine for large input tiles. For input tiles with a small number of elements per partition (less than 64), we suggest using Vector Engine for the better precision and comparable performance between Scalar and Vector Engines due to instruction initiation intervals.

Estimated instruction cost:

Cost (Engine Cycles)

Condition

max(MIN_II, N)

engine set to nki.isa.scalar_engine

max(MIN_II, 8*N)

engine set to nki.isa.vector_engine

where,

  • N is the number of elements per partition in data.

  • MIN_II is the minimum instruction initiation interval for small input tiles. MIN_II is roughly 64 engine cycles.

Parameters:
  • data – the input tile

  • dtype – (optional) data type to cast the output type to (see Supported Data Types for more information); if not specified, it will default to be the same as the data type of the input tile.

  • mask – (optional) a compile-time constant predicate that controls whether/how this instruction is executed (see NKI API Masking for details)

  • engine – (optional) the engine to use for the operation: nki.isa.vector_engine, nki.isa.scalar_engine, or nki.isa.unknown_engine (default, compiler selects best engine based on the input tile shape).

Returns:

an output tile of reciprocal computation

Example:

import neuronxcc.nki as nki
import neuronxcc.nki.isa as nisa
import neuronxcc.nki.language as nl
...

x = nl.load(in_tensor[nl.mgrid[0:128, 0:512]])

y = nisa.reciprocal(x)

This document is relevant for: Inf2, Trn1, Trn2