This document is relevant for: Inf1, Inf2, Trn1, Trn2, Trn3
Neuron Debug Stream API Documentation#
Overview#
The ndebug_stream APIs provide applications a way to consume debug events from the runtime. These debug events are emitted by the runtime per Logical Neuron Core and can be used by applications to get information on events that occurred on the device (such as device prints, breakpoints, etc.).
Debug events are streamed through a connection interface, allowing applications to monitor and display information from Neuron Cores during execution.
Connecting, Polling, and Consuming#
Connection Process#
Applications that want to consume debug events must follow these steps:
Connect to a Logical Neuron Core’s debug stream via
nrt_debug_client_connectPoll for events using Linux kernel polling APIs on the returned file descriptor
Consume events using the
nrt_debug_client_read_one_eventAPIClose the connection when finished using
nrt_debug_client_connect_close
Once a client is connected to a core’s debug stream, the runtime will push debug events emitted by the Logical Neuron Core to the stream for clients to consume.
Polling for Events#
The stream file descriptor obtained from nrt_debug_client_connect is a standard Linux file descriptor and can be passed into any Linux polling API (such as epoll, poll, or select). This allows applications to efficiently wait for debug events without busy waiting.
Important
While the stream_fd is pollable, all other non-polling functionality must go through the provided nrt_debug_client* APIs. The stream contents can only be accessed using the nrt_debug_client_read* API(s).
Events#
Events consist of two parts:
A header describing the payload type
A payload representing the contents of the event
Each event sent to the application is wrapped as a datagram. The header is a fixed-sized struct that describes the contents of the payload, including the size and how to interpret it.
Event Types#
Currently, the system supports these event types:
Event Type |
Description |
|---|---|
|
Debug tensor read events from the core |
API Reference#
nrt_debug_client_connect#
NRT_STATUS nrt_debug_client_connect(int logical_nc_idx, int *stream_fd);
Establishes a connection to a specified Logical Neuron Core’s debug stream.
Parameters:
logical_nc_idx [in]- Core’s debug stream to connect tostream_fd [out]- Connection handle to reference and interact with the stream
Returns:
NRT_SUCCESSon success
Note
Only one client can connect to a Logical Neuron Core’s stream at any given time. Attempts to connect to a stream with multiple clients will result in a NRT_INVALID return status.
nrt_debug_client_connect_close#
void nrt_debug_client_connect_close(int stream_fd);
Closes a connection created by nrt_debug_client_connect.
Parameters:
stream_fd [in]- Connection handle to close
nrt_debug_client_read_one_event#
NRT_STATUS nrt_debug_client_read_one_event(int stream_fd, ndebug_stream_event_header_t *header, void **payload);
Consumes a single event from the stream.
Parameters:
stream_fd [in]- Stream to consume an event fromheader [out]- Consumed event’s headerpayload [out]- Consumed event’s payload
Returns:
NRT_SUCCESSon successNRT_QUEUE_EMPTYif no events are available
Important
It is the user’s responsibility to free the payload pointer.
Note
This function must be called from the same process that owns the Logical Neuron Core. Calling this function from any other process results in undefined behavior.
Data Structures#
ndebug_stream_event_type#
typedef enum ndebug_stream_event_type {
NDEBUG_STREAM_EVENT_TYPE_INVALID = 0,
NDEBUG_STREAM_EVENT_TYPE_DEBUG_TENSOR_READ = 1,
} ndebug_stream_event_type_t;
Enumeration of the different types of debug events that can be emitted.
ndebug_stream_event_header#
typedef struct ndebug_stream_event_header {
uint64_t data_size;
uint32_t type;
char reserved[52];
} ndebug_stream_event_header_t;
Header structure for debug stream events.
Fields:
data_size- Size of the payload data in bytestype- Type of event (seendebug_stream_event_type_t)reserved- Reserved bytes for future use
ndebug_stream_payload_debug_tensor_read#
typedef struct ndebug_stream_payload_debug_tensor_read {
char prefix[512];
uint32_t logical_nc_id;
uint32_t pipe;
char tensor_dtype[16];
uint64_t tensor_shape[8];
uint64_t tensor_data_size;
char reserved0[416];
char tensor_data[];
} ndebug_stream_payload_debug_tensor_read_t;
Payload structure for debug tensor read events.
Fields:
prefix- The prefix string to printlogical_nc_id- The logical core the print event originated frompipe- The pipe to write the printed string totensor_dtype- Tensor data typetensor_shape- Tensor shape dimensions (up to 8 dimensions)tensor_data_size- Size in bytes of the tensor contentreserved0- Reserved bytes for future usetensor_data- The contents of the tensor to display (flexible array member)
Notes and Important Considerations#
These APIs do not allow for interprocess communication. Debug events are only pushed to the process that owns the Logical Neuron Core.
These APIs do not provide thread safety for multiple threads accessing the SAME stream (thread safety for different streams is guaranteed).
There can only be one outstanding connection per stream. Any attempts to initialize multiple connections will result in an error.
Events are only emitted AFTER a client connects to a Logical Neuron Core’s stream. Any event that would have been emitted before connecting to the stream is dropped.
Events will be dropped if the number of unconsumed events in a stream exceeds the stream’s buffer size. Clients must consume events fast enough to prevent dropped events.
Clients can configure the stream’s buffer size via the
NEURON_RT_DEBUG_STREAM_BUFFER_SIZEenvironment variable. The buffer size currently defaults to 64K debug events.The payload buffer returned by
nrt_debug_client_read_one_eventmust be freed by the caller.
This document is relevant for: Inf1, Inf2, Trn1, Trn2, Trn3