Inf2 Performance
Contents
This document is relevant for: Inf1
, Inf2
, Trn1
, Trn1n
Inf2 Performance#
Table of contents
Last update: Apr 12th, 2023
Inference Performance#
Model |
Scripts |
Framework |
Inst. Type |
Throughput (/sec) |
Latency P50 (ms) |
Latency P99 (ms) |
Cost per 1M inferences |
Application Type |
Neuron Version |
Run Mode |
Batch Size |
Model Data Type |
Compilation Autocast Data Type |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
albert-base-v2 |
PyTorch 1.13.0 |
Inf2.xlarge |
2438 |
3.15 |
5.25 |
$0.086 |
Batch |
2.9.0 |
Data Parallel |
4 |
FP32 |
Matmult-BF16 |
|
bert-base-cased |
PyTorch 1.13.0 |
Inf2.xlarge |
2599 |
6.13 |
6.45 |
$0.081 |
Batch |
2.9.0 |
Data Parallel |
8 |
FP32 |
Matmult-BF16 |
|
bert-base-cased-finetuned-mrpc |
PyTorch 1.13.0 |
Inf2.xlarge |
2978 |
5.33 |
5.7 |
$0.071 |
Batch |
2.9.0 |
Data Parallel |
8 |
FP32 |
Matmult-BF16 |
|
bert-large-cased |
PyTorch 1.13.0 |
Inf2.xlarge |
866 |
18.13 |
21.47 |
$0.243 |
Batch |
2.9.0 |
Data Parallel |
8 |
FP32 |
Matmult-BF16 |
|
distilbert-base-cased |
PyTorch 1.13.0 |
Inf2.xlarge |
3721 |
8.96 |
11.61 |
$0.057 |
Batch |
2.9.0 |
Data Parallel |
4 |
FP32 |
Matmult-BF16 |
|
opt-13b |
PyTorch 1.13.0 |
Inf2.48xlarge |
1355 |
141.6 |
151.9 |
$2.661 |
Batch |
2.9.0 |
Tensor Parallel |
5 |
|||
opt-30b |
PyTorch 1.13.0 |
Inf2.48xlarge |
627 |
82.6 |
106.9 |
$5.752 |
Batch |
2.9.0 |
Tensor Parallel |
64 |
|||
opt-66b |
PyTorch 1.13.0 |
Inf2.48xlarge |
733 |
248.6 |
257.8 |
$4.917 |
Batch |
2.9.0 |
Tensor Parallel |
256 |
|||
roberta-base |
PyTorch 1.13.0 |
Inf2.xlarge |
2379 |
3.26 |
4.43 |
$0.089 |
Batch |
2.9.0 |
Data Parallel |
4 |
FP32 |
Matmult-BF16 |
|
roberta-large |
PyTorch 1.13.0 |
Inf2.xlarge |
886 |
8.86 |
10.61 |
$0.238 |
Batch |
2.9.0 |
Data Parallel |
4 |
FP32 |
Matmult-BF16 |
Model |
Scripts |
Framework |
Inst. Type |
Throughput (/sec) |
Latency P50 (ms) |
Latency P99 (ms) |
Cost per 1M inferences |
Application Type |
Neuron Version |
Run Mode |
Batch Size |
Model Data Type |
Compilation Autocast Data Type |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
albert-base-v2 |
PyTorch 1.13.0 |
Inf2.xlarge |
1649.38 |
1.19 |
1.53 |
$0.128 |
Real Time |
2.9.0 |
Data Parallel |
1 |
FP32 |
Matmult-BF16 |
|
bert-base-cased |
PyTorch 1.13.0 |
Inf2.xlarge |
1730.86 |
1.14 |
1.37 |
$0.122 |
Real Time |
2.9.0 |
Data Parallel |
1 |
FP32 |
Matmult-BF16 |
|
bert-base-cased-finetuned-mrpc |
PyTorch 1.13.0 |
Inf2.xlarge |
1885.25 |
1.05 |
1.17 |
$0.112 |
Real Time |
2.9.0 |
Data Parallel |
1 |
FP32 |
Matmult-BF16 |
|
bert-large-cased |
PyTorch 1.13.0 |
Inf2.xlarge |
647.61 |
3.07 |
3.43 |
$0.325 |
Real Time |
2.9.0 |
Data Parallel |
1 |
FP32 |
Matmult-BF16 |
|
distilbert-base-cased |
PyTorch 1.13.0 |
Inf2.xlarge |
2612.46 |
0.72 |
1.15 |
$0.081 |
Real Time |
2.9.0 |
Data Parallel |
1 |
FP32 |
Matmult-BF16 |
|
opt-13b |
PyTorch 1.13.0 |
Inf2.48xlarge |
36.1 |
28.0 |
28.3 |
$99.885 |
Real Time |
2.9.0 |
Tensor Parallel |
1 |
|||
opt-30b |
PyTorch 1.13.0 |
Inf2.48xlarge |
20.7 |
48.4 |
50.1 |
$174.195 |
Real Time |
2.9.0 |
Tensor Parallel |
1 |
|||
opt-66b |
PyTorch 1.13.0 |
Inf2.48xlarge |
14.9 |
65.7 |
74.2 |
$242.002 |
Real Time |
2.9.0 |
Tensor Parallel |
1 |
|||
roberta-base |
PyTorch 1.13.0 |
Inf2.xlarge |
1726.45 |
1.14 |
1.42 |
$0.122 |
Real Time |
2.9.0 |
Data Parallel |
1 |
FP32 |
Matmult-BF16 |
|
roberta-large |
PyTorch 1.13.0 |
Inf2.xlarge |
628.55 |
3.17 |
3.52 |
$0.335 |
Real Time |
2.9.0 |
Data Parallel |
1 |
FP32 |
Matmult-BF16 |
Note
See Neuron Glossary for abbreviations and terms
This document is relevant for: Inf1
, Inf2
, Trn1
, Trn1n