This document is relevant for: Inf2, Trn1, Trn2, Trn3
nrt_async.h#
Neuron Runtime Asynchronous Execution API - Non-blocking operations for tensor I/O and model execution.
Source: src/libnrt/include/nrt/nrt_async.h
Note
The Neuron Runtime Async APIs are currently in early release and may change across Neuron versions.
Enumerations#
nrta_xu_t#
typedef enum {
NRTA_XU_TENSOR_OP = 0,
NRTA_XU_COMPUTE,
NRTA_XU_COLLECTIVES,
NRTA_XU_TYPE_NUM
} nrta_xu_t;
Execution unit types.
Source: nrt_async.h:18
Typedefs#
nrta_seq_t#
typedef uint64_t nrta_seq_t;
Monotonically increasing IDs of executions. The first 16 bits are an Execution Unit ID, while the last 48 bits are a strictly ordered Sequence Number.
Source: nrt_async.h:31
nrta_xu_id_t#
typedef uint16_t nrta_xu_id_t;
Execution unit ID type.
Source: nrt_async.h:32
Constants#
NRTA_SEQ_NUM_MAX#
#define NRTA_SEQ_NUM_MAX ((1ull << 48) - 1)
Maximum sequence number value.
Source: nrt_async.h:34
Functions#
nrta_tensor_write#
NRT_STATUS nrta_tensor_write(nrt_tensor_t *tensor, const void *buf, uint64_t offset,
uint64_t size, int queue, NRT_STATUS *ret,
nrta_seq_t *req_sequence);
Enqueues a tensor write request. Copies the data from a host buffer to a tensor allocated on a Neuron device.
Parameters:
tensor[in] - Destination tensorbuf[in] - Host buffer containing source dataoffset[in] - Offset into the tensorsize[in] - Number of bytes to writequeue[in] - XU queue to useret[in] - pointer to store return value of the async request upon completionreq_sequence[out] - Sequence number of the scheduled request
Returns: NRT_SUCCESS on success
Source: nrt_async.h:59
nrta_tensor_read#
NRT_STATUS nrta_tensor_read(void *buf, nrt_tensor_t *tensor, uint64_t offset,
uint64_t size, int queue, NRT_STATUS *ret,
nrta_seq_t *req_sequence);
Enqueues a tensor read request. Copies the data from a tensor allocated on a Neuron device to a host buffer.
Parameters:
buf[in] - Destination Host buffertensor[in] - Source tensoroffset[in] - Offset into the tensorsize[in] - Number of bytes to readqueue[in] - XU queue to useret[in] - pointer to store return value of the async request upon completionreq_sequence[out] - Sequence number of the scheduled request
Returns: NRT_SUCCESS on success
Source: nrt_async.h:77
nrta_tensor_copy#
NRT_STATUS nrta_tensor_copy(nrt_tensor_t *src, uint64_t src_offset, nrt_tensor_t *dst,
uint64_t dst_offset, uint64_t size, int queue,
NRT_STATUS *ret, nrta_seq_t *req_sequence);
Enqueues a tensor copy request. Copies data between two tensors allocated on the same Logical Neuron Core.
Parameters:
src[in] - Source tensorsrc_offset[in] - Offset into the source tensordst[in] - Destination tensordst_offset[in] - Offset into the destination tensorsize[in] - Number of bytes to copyqueue[in] - XU queue to useret[in] - pointer to store return value of the async request upon completionreq_sequence[out] - Sequence number of the scheduled request
Returns: NRT_SUCCESS on success
Source: nrt_async.h:98
nrta_execute_schedule#
NRT_STATUS nrta_execute_schedule(nrt_model_t *model, const nrt_tensor_set_t *input,
nrt_tensor_set_t *output, int queue,
NRT_STATUS *ret, nrta_seq_t *req_sequence);
Schedules an asynchronous request to execute a model with specified inputs and outputs.
Parameters:
model[in] - The model to schedule for executioninput[in] - Set of input tensors for the modeloutput[in] - Set of tensors to receive the outputsqueue[in] - XU queue to use, must be 0ret[in] - pointer to store return value of the async request upon completionreq_sequence[out] - Sequence number of the scheduled request
Returns: NRT_SUCCESS on successful preparation, appropriate error code otherwise
Source: nrt_async.h:118
nrta_cc_prepare#
NOTE: The nrta_cc_prepare and nrta_cc_schedule APIs are work-in-progress and subject to change.
NRT_STATUS nrta_cc_prepare(nrt_cc_comm_t *comm, nrt_tensor_list_t *input,
nrt_tensor_list_t *output, nrt_dtype_t dtype,
nrt_op_type_t op, nrt_cc_op_type_t cc_op
nrt_cc_context_t **cc_ctx);
Prepares collective context and HW configuration needed for collectives operation. Allocates a collective context handle that is returned to the caller which is freed in the schedule thread post CC op execution.
Parameters:
comm[in] - Communicator containing the replica groupinput[in] - Input tensor listoutput[out] - Output tensor listdtype[in] - Data type of elementsop[in] - Reduction operation (e.g., SUM, MAX) if applicablecc_op[in] - Collective operation (e.g., ALLREDUCE, ALLGATHER)cc_ctx[out] - Collective context
Returns: NRT_SUCCESS on successful preparation, appropriate error code otherwise
Source: nrt_async.h:155
nrta_cc_schedule#
NOTE: The nrta_cc_prepare and nrta_cc_schedule APIs are work-in-progress and subject to change.
NRT_STATUS nrta_cc_schedule(nrt_cc_context_t **cc_ctx, int queue,
NRT_STATUS *ret, nrta_seq_t *req_sequence);
Schedules an asynchronous request to execute collective operation
Parameters:
cc_ctx[in] - Collective contextqueue[in] - XU queue to use, must be 0ret[in] - pointer to store return value of the async request upon completionreq_sequence[out] - Sequence number of the scheduled request
Returns: NRT_SUCCESS on successful preparation, appropriate error code otherwise
Source: nrt_async.h:172
nrta_is_completed#
NRT_STATUS nrta_is_completed(nrta_seq_t seq, bool *is_completed);
Checks completion status of a scheduled request.
Parameters:
seq[in] - Scheduled request sequence idis_completed[out] - true if the request is completed, false otherwise
Returns: NRT_SUCCESS if the request is completed, NRT_INVALID if the seq is not valid
Source: nrt_async.h:159
nrta_get_sequence#
NRT_STATUS nrta_get_sequence(uint32_t lnc, nrta_xu_t xu, int queue, nrta_seq_t *seq);
Returns sequence number of the last completed request.
Parameters:
lnc[in] - LNC
xu[in] - XU
queue[in] - XU’s queue
seq[out] - last completed sequence number
Returns: NRT_SUCCESS on success
Source: nrt_async.h:185
This document is relevant for: Inf2, Trn1, Trn2, Trn3