This document is relevant for: Inf1, Inf2, Trn1, Trn2, Trn3
Get started with Neuron Explorer#
Overview#
In this guide, you’ll capture a profile of your Neuron workload, launch Neuron Explorer, and upload the profile for interactive analysis.
By the end you will have:
Captured a system or device profile
Launched Neuron Explorer (browser or VS Code)
Uploaded and viewed your profile in the interactive timeline
Prerequisites#
A Trainium or Inferentia EC2 instance (e.g., trn2.48xlarge, inf2.xlarge) with the AWS Neuron DLAMI
SSH key pair (
.pemfile) for connecting to your instanceLocal machine with SSH client and a web browser (or VS Code)
Step 1: Connect and verify installation#
Launch an EC2 instance with a Neuron DLAMI. See the setup guide for details. Then, SSH into your EC2 instance and verify Neuron Explorer is installed:
neuron-explorer --version
If not installed:
sudo apt install aws-neuronx-tools
Step 2: Set up SSH tunneling#
Neuron Explorer serves a web UI on port 3001 and an API backend on port 3002. You access both from your local machine through SSH tunnels.
From your local machine, open the tunnels:
ssh -i ~/path/to/your-key.pem \
-L 3001:localhost:3001 \
-L 3002:localhost:3002 \
ubuntu@<instance-ip> -fN
|
Path to your EC2 key pair |
|
Forwards the UI port |
|
Forwards the API port |
|
Instance login (use |
|
Runs tunnel in background (no shell) |
Important
You must forward both ports. The UI on 3001 calls the API on 3002. If you only forward one, the page loads but shows no data. See Troubleshooting if you run into issues.
Step 3: Capture your profile#
Profile types at a glance#
Type |
What it captures |
When to use |
Output files |
|---|---|---|---|
System |
Runtime events, API calls, model loads, CPU/memory |
End-to-end execution flow |
|
Device |
Hardware-level NeuronCore instruction traces |
On-device compute bottlenecks |
Matched |
Both |
Combined system + device view |
Full optimization picture |
All of the above |
Note
Device profiles require a matched pair: the .neff and .ntff share a numeric hash in their filename (e.g., neff_395760075800974.neff pairs with 395760075800974_instid_0_vnc_0.ntff).
For instructions on how to capture a profile, see Capture Profiles in Neuron Explorer or Profile a NKI Kernel.
After profiling, ./profile_output will contain trace artifacts organized per process. Verify the output matches the Expected Output section.
Step 4: Launch Neuron Explorer#
On the EC2 instance, run:
neuron-explorer view
This starts the UI server on port 3001 (web interface) and the API server on port 3002 (data backend).
In your local browser, navigate to http://localhost:3001:
Expected output:
Using VS Code instead of the browser#
Install: Search for AWS Neuron Explorer (publisher: Amazon Web Services) in VS Code Extensions (Ctrl+Shift+X), or install from the VS Code Marketplace.
Ensure SSH tunnels are active (see Step 2: Set up SSH tunneling).
Configure the endpoint: click the extension in the left activity bar, select Endpoint on the bottom bar, choose Custom endpoint, and enter
localhost:3002.
Access Profile Manager from the extension sidebar.
Note
The VS Code extension uses the same API server. All upload methods (CLI, web UI) work interchangeably — once a profile is uploaded, it’s visible in both interfaces.
Step 5: Upload your profile#
Choose the method that fits your workflow:
Option A: CLI upload#
If you’re already SSH’d into the instance, this is the quickest path:
neuron-explorer view \
-d ./profile_output \
--ingest-only \
--display-name "my-profile-run"
Expected outcome: After processing, the CLI outputs a direct link to your profile. Open it in your browser (via the tunnel) to view.
Useful flags:
Flag |
Description |
|---|---|
|
Skip device-level traces (faster processing) |
|
Reduce tracks for a cleaner view |
Option B: Web UI upload#
If your profile output is on the EC2 instance and you want to use the browser uploader locally, transfer the files first:
# Compress for faster transfer (recommended for large profiles)
# Run on EC2:
tar -czf profile_output.tar.gz ./profile_output
# Transfer to local machine:
scp -i ~/your-key.pem ubuntu@<instance-ip>:./profile_output.tar.gz .
tar -xzf profile_output.tar.gz
Use your local browser or VSCode to open
http://localhost:3001. The Profile Manager page is displayed.
Click the Upload Profile button on the top right to open the upload dialog:
Enter a Profile Name (required).
Choose your upload method based on profile type:
For system profiles (or system + device): click Upload Profile and select Directory Upload For System Profile. Then select the directory containing your
.pbfiles (must includetrace_info.pb).
For device-only profiles: click Upload Profile and select Individual Files. Upload your
.neffand.ntfffiles in the designated boxes.
Expected outcome: The profile appears in the User Uploaded table. Click Refresh to check processing status. Once complete, click the profile name to open the interactive timeline.
Note
Why two upload methods? Directory Upload requires a system profile (ntrace.pb + trace_info.pb). It does not work with device-only profiles. For device-only profiles (just .neff + .ntff), use Individual Files. If you have both system + device files, use Directory Upload as it picks up everything.
Option C: Export to JSON#
For programmatic analysis (custom scripts, coding agents), export to JSON. This generates system_profile.json and device_profile_model_<model_id>.json per compiled model.
neuron-explorer view \
--session-dir ./profile_output \
--output-format json \
--output-file ./integrated_trace.json
Quick text summary (no UI needed):
neuron-explorer view -d ./profile_output --output-format summary-text
JSON schema (system_profile.json):
The file contains event objects. It also includes mem_usage (sampled host memory) and cpu_util (CPU utilization per core).
{
"Neuron_Runtime_API_Event": {
"duration": 27094,
"group": "nrt-nc-000",
"id": 1,
"instance_id": "i-0f207fb2a99bd2d08",
"name": "nrt_tensor_write",
"timestamp": 1729888371056597613,
"type": 11
},
"Framework_Event": {
"duration": 3758079,
"group": "framework-80375131",
"instance_id": "i-0f207fb2a99bd2d08",
"name": "PjitFunction(matmul_allgather)",
"timestamp": 1729888382798557372
}
}
Troubleshooting#
Connection issues#
# |
Symptom |
Cause |
Fix |
|---|---|---|---|
1 |
UI does not load |
SSH tunnel misconfigured |
Verify both ports: |
2 |
“Connection refused” on 3001/3002 |
Servers not running |
Run |
3 |
UI loads but shows no data |
Only port 3001 forwarded |
Add |
4 |
neuron-explorer not found |
Tools not installed |
|
Upload and viewing issues#
# |
Symptom |
Cause |
Fix |
|---|---|---|---|
1 |
Upload “successful” but Profile Manager shows “error process incomplete” |
Missing required files |
System profiles need ntrace.pb + trace_info.pb. Device profiles need matched .neff + .ntff. |
2 |
Profile hangs in “Uploaded” state indefinitely |
Processing failed silently |
Try uploading without source code. If that works, check source is .tar.gz format. |
3 |
Directory upload returns 500 |
Directory upload requires a system profile |
For device-only profiles, use Individual Files instead. |
4 |
“No profiling data” |
Wrong directory |
Use |
Profiling results issues#
# |
Symptom |
Cause |
Fix |
|---|---|---|---|
1 |
“DMA results may not be accurate” |
|
|
2 |
Out-of-memory during profiling |
|
Remove it from your modes list if you don’t need instruction-level device traces. |
3 |
No CPU Neuron traffic in timeline |
Framework trace not in correct subdirectory |
This is likely because framework trace JSONs are not in per-process directories. Move |
4 |
Profile shows compilation, not execution |
Didn’t warm up |
Run 3+ forwards before starting profiler. |
5 |
Compiled model shows 0.2 ms (impossibly fast) |
Async timing |
Async dispatch — torch.compile queues work and returns immediately. Add explicit synchronization before timing: torch.neuron.synchronize() # drain queue before timing
t0 = time.time()
for _ in range(50):
compiled_model(x)
torch.neuron.synchronize() # wait for all work to complete
avg_ms = (time.time() - t0) / 50 * 1000
|
6 |
Dropped events in system profile |
|
|
7 |
Incomplete JAX profiles |
If your JAX profile has fewer events than expected, check: |
|
Next steps#
Capture Profiles in Neuron Explorer — Full capturing and profiling reference (PyTorch, JAX, environment variables, CLI, filtering)
Neuron Explorer Full Documentation — Complete viewer and feature reference
This document is relevant for: Inf1, Inf2, Trn1, Trn2, Trn3