{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Evaluate YOLO v4 on Inferentia\n", "## Note: this tutorial runs on tensorflow-neuron 1.x only" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Introduction\n", "This tutorial walks through compiling and evaluating YOLO v4 model on Inferentia using the AWS Neuron SDK 09/2020 release. We recommend running this tutorial on an EC2 `inf1.2xlarge` instance which contains one Inferentia and 8 vCPU cores, as well as 16 GB of memory.Verify that this Jupyter notebook is running the Python kernel environment that was set up according to the [Tensorflow Installation Guide](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/tensorflow/tensorflow-neuron/setup/tensorflow-install.html#install-neuron-tensorflow) You can select the Kernel from the “Kernel -> Change Kernel” option on the top of this Jupyter notebook page." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Prerequisites" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This demo requires the following pip packages:\n", "\n", "`neuron-cc tensorflow-neuron<2 requests pillow matplotlib pycocotools torch`\n", "\n", "and debian/rpm package `aws-neuron-runtime`.\n", "\n", "On DLAMI, `aws-neuron-runtime` is already pre-installed." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!pip install tensorflow_neuron==1.15.5.2.8.9.0 neuron_cc==1.13.5.0 requests pillow matplotlib pycocotools==2.0.1 numpy==1.18.2 torch~=1.5.0 --force \\\n", " --extra-index-url=https://pip.repos.neuron.amazonaws.com" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Part 1: Download Dataset and Generate Pretrained SavedModel\n", "### Download COCO 2017 validation dataset\n", "We start by downloading the COCO validation dataset, which we will use to validate our model. The COCO 2017 dataset is widely used for object-detection, segmentation and image captioning." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!curl -LO http://images.cocodataset.org/zips/val2017.zip\n", "!curl -LO http://images.cocodataset.org/annotations/annotations_trainval2017.zip\n", "!unzip -q val2017.zip\n", "!unzip annotations_trainval2017.zip" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!ls" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Check required package versions\n", "Here are the minimum required versions of AWS Neuron packages. We run a check." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pkg_resources\n", "from distutils.version import LooseVersion\n", "\n", "assert LooseVersion(pkg_resources.get_distribution('neuron-cc').version) > LooseVersion('1.0.20000')\n", "assert LooseVersion(pkg_resources.get_distribution('tensorflow-neuron').version) > LooseVersion('1.15.3.1.0.2000')\n", "print('passed package version checks')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Generate YOLO v4 tensorflow SavedModel (pretrained on COCO 2017 dataset)\n", "Script `yolo_v4_coco_saved_model.py` will generate a tensorflow SavedModel using pretrained weights from https://github.com/Tianxiaomo/pytorch-YOLOv4." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!python3 yolo_v4_coco_saved_model.py" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This tensorflow SavedModel can be loaded as a tensorflow predictor. When a JPEG format image is provided as input, the output result of the tensorflow predictor contains information for drawing bounding boxes and classification results." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import json\n", "import tensorflow as tf\n", "from PIL import Image\n", "import matplotlib.pyplot as plt\n", "import matplotlib.patches as patches\n", "\n", "# launch predictor and run inference on an arbitrary image in the validation dataset\n", "yolo_pred_cpu = tf.contrib.predictor.from_saved_model('./yolo_v4_coco_saved_model')\n", "image_path = './val2017/000000581781.jpg'\n", "with open(image_path, 'rb') as f:\n", " feeds = {'image': [f.read()]}\n", "results = yolo_pred_cpu(feeds)\n", "\n", "# load annotations to decode classification result\n", "with open('./annotations/instances_val2017.json') as f:\n", " annotate_json = json.load(f)\n", "label_info = {idx+1: cat['name'] for idx, cat in enumerate(annotate_json['categories'])}\n", "\n", "# draw picture and bounding boxes\n", "fig, ax = plt.subplots(figsize=(10, 10))\n", "ax.imshow(Image.open(image_path).convert('RGB'))\n", "wanted = results['scores'][0] > 0.1\n", "for xyxy, label_no_bg in zip(results['boxes'][0][wanted], results['classes'][0][wanted]):\n", " xywh = xyxy[0], xyxy[1], xyxy[2] - xyxy[0], xyxy[3] - xyxy[1]\n", " rect = patches.Rectangle((xywh[0], xywh[1]), xywh[2], xywh[3], linewidth=1, edgecolor='g', facecolor='none')\n", " ax.add_patch(rect)\n", " rx, ry = rect.get_xy()\n", " rx = rx + rect.get_width() / 2.0\n", " ax.annotate(label_info[label_no_bg + 1], (rx, ry), color='w', backgroundcolor='g', fontsize=10,\n", " ha='center', va='center', bbox=dict(boxstyle='square,pad=0.01', fc='g', ec='none', alpha=0.5))\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Part 2: Compile the Pretrained SavedModel for Inferentia\n", "We make use of the Python compilation API `tfn.saved_model.compile` that is avaiable in `tensorflow-neuron<2`. For the purpose of reducing Neuron runtime overhead, it is necessary to make use of arguments `no_fuse_ops` and `minimum_segment_size`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import shutil\n", "import tensorflow as tf\n", "import tensorflow.neuron as tfn\n", "\n", "\n", "def no_fuse_condition(op):\n", " return any(op.name.startswith(pat) for pat in ['reshape', 'lambda_1/Cast', 'lambda_2/Cast', 'lambda_3/Cast'])\n", "\n", "with tf.Session(graph=tf.Graph()) as sess:\n", " tf.saved_model.loader.load(sess, ['serve'], './yolo_v4_coco_saved_model')\n", " no_fuse_ops = [op.name for op in sess.graph.get_operations() if no_fuse_condition(op)]\n", "shutil.rmtree('./yolo_v4_coco_saved_model_neuron', ignore_errors=True)\n", "result = tfn.saved_model.compile(\n", " './yolo_v4_coco_saved_model', './yolo_v4_coco_saved_model_neuron',\n", " # we partition the graph before casting from float16 to float32, to help reduce the output tensor size by 1/2\n", " no_fuse_ops=no_fuse_ops,\n", " # to enforce trivial compilable subgraphs to run on CPU\n", " minimum_segment_size=100,\n", " batch_size=1,\n", " dynamic_batch_size=True,\n", ")\n", "print(result)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Part 3: Evaluate Model Quality after Compilation\n", "### Define evaluation functions\n", "We first define some handy helper functions for running evaluation on the COCO 2017 dataset." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import os\n", "import json\n", "import time\n", "import numpy as np\n", "import tensorflow as tf\n", "from pycocotools.coco import COCO\n", "from pycocotools.cocoeval import COCOeval\n", "\n", "\n", "def cocoapi_eval(jsonfile,\n", " style,\n", " coco_gt=None,\n", " anno_file=None,\n", " max_dets=(100, 300, 1000)):\n", " \"\"\"\n", " Args:\n", " jsonfile: Evaluation json file, eg: bbox.json, mask.json.\n", " style: COCOeval style, can be `bbox` , `segm` and `proposal`.\n", " coco_gt: Whether to load COCOAPI through anno_file,\n", " eg: coco_gt = COCO(anno_file)\n", " anno_file: COCO annotations file.\n", " max_dets: COCO evaluation maxDets.\n", " \"\"\"\n", " assert coco_gt is not None or anno_file is not None\n", "\n", " if coco_gt is None:\n", " coco_gt = COCO(anno_file)\n", " print(\"Start evaluate...\")\n", " coco_dt = coco_gt.loadRes(jsonfile)\n", " if style == 'proposal':\n", " coco_eval = COCOeval(coco_gt, coco_dt, 'bbox')\n", " coco_eval.params.useCats = 0\n", " coco_eval.params.maxDets = list(max_dets)\n", " else:\n", " coco_eval = COCOeval(coco_gt, coco_dt, style)\n", " coco_eval.evaluate()\n", " coco_eval.accumulate()\n", " coco_eval.summarize()\n", " return coco_eval.stats\n", "\n", "\n", "def bbox_eval(anno_file, bbox_list):\n", " coco_gt = COCO(anno_file)\n", "\n", " outfile = 'bbox_detections.json'\n", " print('Generating json file...')\n", " with open(outfile, 'w') as f:\n", " json.dump(bbox_list, f)\n", "\n", " map_stats = cocoapi_eval(outfile, 'bbox', coco_gt=coco_gt)\n", " return map_stats\n", "\n", "\n", "def get_image_as_bytes(images, eval_pre_path):\n", " batch_im_id_list = []\n", " batch_im_name_list = []\n", " batch_img_bytes_list = []\n", " n = len(images)\n", " batch_im_id = []\n", " batch_im_name = []\n", " batch_img_bytes = []\n", " for i, im in enumerate(images):\n", " im_id = im['id']\n", " file_name = im['file_name']\n", " if i % eval_batch_size == 0 and i != 0:\n", " batch_im_id_list.append(batch_im_id)\n", " batch_im_name_list.append(batch_im_name)\n", " batch_img_bytes_list.append(batch_img_bytes)\n", " batch_im_id = []\n", " batch_im_name = []\n", " batch_img_bytes = []\n", " batch_im_id.append(im_id)\n", " batch_im_name.append(file_name)\n", "\n", " with open(os.path.join(eval_pre_path, file_name), 'rb') as f:\n", " batch_img_bytes.append(f.read())\n", " return batch_im_id_list, batch_im_name_list, batch_img_bytes_list\n", "\n", "\n", "def analyze_bbox(results, batch_im_id, _clsid2catid):\n", " bbox_list = []\n", " k = 0\n", " for boxes, scores, classes in zip(results['boxes'], results['scores'], results['classes']):\n", " if boxes is not None:\n", " im_id = batch_im_id[k]\n", " n = len(boxes)\n", " for p in range(n):\n", " clsid = classes[p]\n", " score = scores[p]\n", " xmin, ymin, xmax, ymax = boxes[p]\n", " catid = (_clsid2catid[int(clsid)])\n", " w = xmax - xmin + 1\n", " h = ymax - ymin + 1\n", "\n", " bbox = [xmin, ymin, w, h]\n", " # Round to the nearest 10th to avoid huge file sizes, as COCO suggests\n", " bbox = [round(float(x) * 10) / 10 for x in bbox]\n", " bbox_res = {\n", " 'image_id': im_id,\n", " 'category_id': catid,\n", " 'bbox': bbox,\n", " 'score': float(score),\n", " }\n", " bbox_list.append(bbox_res)\n", " k += 1\n", " return bbox_list" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here is the actual evaluation loop. To fully utilize all four cores on one Inferentia, the optimal setup is to run multi-threaded inference using a `ThreadPoolExecutor`. The following cell is a multi-threaded adaptation of the evaluation routine at https://github.com/miemie2013/Keras-YOLOv4/blob/910c4c6f7265f5828fceed0f784496a0b46516bf/tools/cocotools.py#L97." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from concurrent import futures\n", "\n", "NUM_THREADS = 4\n", "\n", "def evaluate(yolo_predictor, images, eval_pre_path, anno_file, eval_batch_size, _clsid2catid):\n", " batch_im_id_list, batch_im_name_list, batch_img_bytes_list = get_image_as_bytes(images, eval_pre_path)\n", "\n", " # warm up\n", " yolo_predictor({'image': np.array(batch_img_bytes_list[0], dtype=object)})\n", " \n", " def yolo_predictor_timer(yolo_pred, image):\n", " begin = time.time()\n", " result = yolo_pred(image)\n", " delta = time.time() - begin\n", " return result, delta\n", "\n", " latency = []\n", " with futures.ThreadPoolExecutor(NUM_THREADS) as exe:\n", " fut_im_list = []\n", " fut_list = []\n", "\n", " start_time = time.time()\n", " for batch_im_id, batch_im_name, batch_img_bytes in zip(batch_im_id_list, batch_im_name_list, batch_img_bytes_list):\n", " if len(batch_img_bytes) != eval_batch_size:\n", " continue\n", " fut = exe.submit(yolo_predictor_timer, yolo_predictor, {'image': np.array(batch_img_bytes, dtype=object)})\n", " fut_im_list.append((batch_im_id, batch_im_name))\n", " fut_list.append(fut)\n", " bbox_list = []\n", " sum_time = 0.0\n", " count = 0\n", " for (batch_im_id, batch_im_name), fut in zip(fut_im_list, fut_list):\n", " results, times = fut.result()\n", " # Adjust latency since we are in batch\n", " latency.append(times / eval_batch_size)\n", " sum_time += times\n", " bbox_list.extend(analyze_bbox(results, batch_im_id, _clsid2catid))\n", " for _ in batch_im_id:\n", " count += 1\n", " if count % 1000 == 0:\n", " print('Test iter {}'.format(count))\n", "\n", " throughput = len(images) / (sum_time / NUM_THREADS)\n", "\n", " \n", " print('Average Images Per Second:', throughput)\n", " print(\"Latency P50: {:.1f} ms\".format(np.percentile(latency, 50)*1000.0))\n", " print(\"Latency P90: {:.1f} ms\".format(np.percentile(latency, 90)*1000.0))\n", " print(\"Latency P95: {:.1f} ms\".format(np.percentile(latency, 95)*1000.0))\n", " print(\"Latency P99: {:.1f} ms\".format(np.percentile(latency, 99)*1000.0))\n", "\n", " # start evaluation\n", " box_ap_stats = bbox_eval(anno_file, bbox_list)\n", " return box_ap_stats" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Evaluate mean average precision (mAP) score\n", "Here is the code to calculate mAP scores of the YOLO v4 model. The expected mAP score is around 0.487 if we use the pretrained weights." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "yolo_pred = tf.contrib.predictor.from_saved_model('./yolo_v4_coco_saved_model_neuron')\n", "\n", "val_coco_root = './val2017'\n", "val_annotate = './annotations/instances_val2017.json'\n", "clsid2catid = {0: 1, 1: 2, 2: 3, 3: 4, 4: 5, 5: 6, 6: 7, 7: 8, 8: 9, 9: 10, 10: 11, 11: 13, 12: 14, 13: 15, 14: 16,\n", " 15: 17, 16: 18, 17: 19, 18: 20, 19: 21, 20: 22, 21: 23, 22: 24, 23: 25, 24: 27, 25: 28, 26: 31,\n", " 27: 32, 28: 33, 29: 34, 30: 35, 31: 36, 32: 37, 33: 38, 34: 39, 35: 40, 36: 41, 37: 42, 38: 43,\n", " 39: 44, 40: 46, 41: 47, 42: 48, 43: 49, 44: 50, 45: 51, 46: 52, 47: 53, 48: 54, 49: 55, 50: 56,\n", " 51: 57, 52: 58, 53: 59, 54: 60, 55: 61, 56: 62, 57: 63, 58: 64, 59: 65, 60: 67, 61: 70, 62: 72,\n", " 63: 73, 64: 74, 65: 75, 66: 76, 67: 77, 68: 78, 69: 79, 70: 80, 71: 81, 72: 82, 73: 84, 74: 85,\n", " 75: 86, 76: 87, 77: 88, 78: 89, 79: 90}\n", "eval_batch_size = 8\n", "with open(val_annotate, 'r', encoding='utf-8') as f2:\n", " for line in f2:\n", " line = line.strip()\n", " dataset = json.loads(line)\n", " images = dataset['images']\n", "box_ap = evaluate(yolo_pred, images, val_coco_root, val_annotate, eval_batch_size, clsid2catid)" ] } ], "metadata": { "kernelspec": { "display_name": "Environment (conda_aws_neuron_tensorflow_p36)", "language": "python", "name": "conda_aws_neuron_tensorflow_p36" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.13" } }, "nbformat": 4, "nbformat_minor": 4 }