カスタムデータを用いてDarknet版YOLOv3-tinyの学習をしてみる

投稿日: 2024/1/6 13:57

更新日: 2024/3/2 14:25

概要

Darknetを使ってYOLOv3-tinyの学習を行う際の手順を備忘録として記載する。

学習にはカスタムデータを用いる。
今回は人間の頭部を検出する目的で、頭部データセットを利用して学習を行った。

前提

WSL2 + Ubuntu 20.04
Python 3.9.12
CUDA 12.3.0
cuDNN 8.9.6
OpenCV 4.2.0
AlexeyAB版 Darknet
動作環境
- i9-13900K
- RTX 2080 Ti x 1
- 32GB RAM

作業

環境構築

1. CUDAインストール

NVIDIA公式の指示に従ってインストールする。


# 念のため以前の公開鍵を削除
$ sudo apt-key del 7fa2af80

$ wget https://developer.download.nvidia.com/compute/cuda/repos/wsl-ubuntu/x86_64/cuda-wsl-ubuntu.pin
$ sudo mv cuda-wsl-ubuntu.pin /etc/apt/preferences.d/cuda-repository-pin-600
$ wget https://developer.download.nvidia.com/compute/cuda/12.3.0/local_installers/cuda-repo-wsl-ubuntu-12-3-local_12.3.0-1_amd64.deb
$ sudo dpkg -i cuda-repo-wsl-ubuntu-12-3-local_12.3.0-1_amd64.deb  # GPGキーを取得するために一回実行する
$ sudo cp /var/cuda-repo-wsl-ubuntu-12-3-local/cuda-*-keyring.gpg /usr/share/keyrings/
$ sudo dpkg -i cuda-repo-wsl-ubuntu-12-3-local_12.3.0-1_amd64.deb
$ sudo apt-get update
$ sudo apt-get -y install cuda

# パスを通しておく
$ export PATH="$PATH":/usr/local/cuda/bin
$ export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/cuda/lib64

2. cuDNNインストール

cuDNN ArchiveからcuDNNのdebパッケージ(x86_64)をダウンロードして、任意の場所に配置しておく。

ダウンロードできたら以下のコマンドを実行してインストールを行う。


$ sudo cp /var/cudnn-local-repo-ubuntu2004-8.9.6.50/cudnn-local-5E60450C-keyring.gpg /usr/share/keyrings/
$ sudo dpkg -i /mnt/c/Users/user/Downloads/cudnn-local-repo-ubuntu2004-8.9.6.50_1.0-1_amd64.deb
$ sudo apt update
$ sudo apt install libcudnn8=8.9.6.50-1+cuda12.2
$ sudo apt install libcudnn8-dev=8.9.6.50-1+cuda12.2

3. その他ライブラリ等のインストール

Darknetビルド時に必要となるライブラリなどをインストールする。


$ sudo apt install nvidia-cuda-toolkit
$ sudo apt install libopencv-dev

4. Python環境の用意

今回はpyenvを使った。


$ mkdir darknet_headcount && cd darknet_headcount
$ pyenv local 3.9.12

Darknetビルド

Darknetには、Joseph Redmon氏による本家pjreddie版と有志のAlexey氏によるAlexeyAB版の２つがある。(それ以外はわからん)

正直、本家とAlexeyAB版でなにがどう違うのか細かいところはよくわからんが、
AlexeyAB版は本家を色々拡張してより使いやすく、高機能にした感じのものだと思っている。
(もともとはWindows上で動作できるように提供されたものっぽい : 参考)

ということで今回は、AlexeyAB版のDarknetを利用する。

リポジトリのクローン


$ git clone https://github.com/AlexeyAB/darknet

ビルド設定変更
今回はGPUを用いて学習を行うため、GPU利用を有効化するように変更する。

また、DarknetはLossやmAPをグラフにプロットしてリアルタイムで確認できる機能が備わっており、その機能を利用可能にするにはOpenCVを有効化する必要があるため、そのように変更する。


$ cd darknet
$ vi Makefile

darknet/Makefile
-
GPU=0
+
GPU=1
-
CUDNN=0
+
CUDNN=1
5
CUDNN_HALF=0
-
OPENCV=0
7
OPENCV=1

[環境依存] CUDAのインストールでなにかミスったのかわからないが、CUDAのライブラリが通常は/usr/local/cuda/lib64にあるところ、/usr/lib/wsl/libにあったので以下のように変更した。

darknet/Makefile
-
LDFLAGS+= -L/usr/local/cuda/lib64 -lcuda -lcudart -lcublas -lcurand
+
#LDFLAGS+= -L/usr/local/cuda/lib64 -lcuda -lcudart -lcublas -lcurand
+
LDFLAGS+= -L/usr/lib/wsl/lib -lcuda -L/usr/local/cuda/lib64 -lcudart -lcublas -lcurand

※ 記事執筆中に思ったが、シンボリックリンクを作成したほうがよさそう...

ビルド
```
$ make
```

学習のための前準備

0. ディレクトリ構成

最終的なディレクトリ構成は以下のようになる。
※ 今回使用しないファイル等は省略。


<parent_dir>
├── custom-data/
│   ├── _dataset/
│   │   ├── HollywoodHeads/
│   │   │   ├── Annotations/
│   │   │   └── JPEGImages/
│   │   ├── RGBD_Indoor_Dataset/
│   │   │   ├── test/
│   │   │   │   └── color/
│   │   │   └── train/
│   │   │       └── color/
│   │   └── brainwash/
│   │       ├── brainwash_10_27_2014_images/
│   │       ├── brainwash_11_13_2014_images/
│   │       └── brainwash_11_24_2014_images/
│   ├── eval/
│   |   ├── AAA.png
│   |   ├── AAA.txt
│   |   └── ...
│   ├── test/
│   |   ├── BBB.png
│   |   ├── BBB.txt
│   |   └── ...
|   ├── train/
│   |   ├── CCC.png
│   |   ├── CCC.txt
│   |   └── ...
|   ├── eval.txt
|   ├── test.txt
|   ├── train.txt
|   ├── head_eval.data
|   ├── head.data
|   ├── head.names
|   └── yolov3-tiny_head.cfg
├── output/
|   ├── log.txt
|   ├── result.txt
|   ├── yolov3-tiny_head_xxx.weights
|   └── ...
├── tools/
|   ├── create_labels_brainwash.py
|   ├── create_labels_hollywoodheads.py
|   ├── create_labels_indoor.py
|   ├── random_pick_images.py
|   └── rename_file.py
└── darknet/
    ├── cfg/
    │   └── yolov3-tiny_obj.cfg
    ├── darknet
    ├── yolov3-tiny.conv.15
    ├── yolov3-tiny.weights
    ├── chart.png
    └── predictions.jpg

1. データセットの準備

今回は頭部検出ということで、使用するデータセットは以下の３つを利用した。

HollywoodHeads dataset
- Context-aware CNNs for person head detectionで使用されたデータセット
- 画像枚数: 224,740 枚
- アノテーション情報はPascalVOC形式のxmlファイルで提供されている
brainwash
- 詳細: https://exposing.ai/brainwash/
- 画像枚数: 11,918 枚
- アノテーション情報は独自フォーマット?のテキストファイルで提供されている
RGBD_Indoor_Dataset
- RZV2L AI Library - Head detection from top applicationで使用されたデータセット
- 画像枚数: 1,611 枚

各データセットをダウンロードして、解凍までしておく。

2. データ前処理

データセットに含まれるアノテーション情報そのままに使用することはできないため、Darknetで指示されている通りのフォーマットでアノテーションファイルを作り直す必要がある。

Darknetでは以下のようなフォーマットでアノテーション情報を記述する必要がある。


<object-class> <x_center> <y_center> <width> <height>

object-class：オブジェクトクラスID (後述する.namesのidxと対応する)
x_center： bbox中心座標のx座標
y_center： bbox中心座標のy座標
width： bboxの幅
height： bboxの高さ

また、x_center, y_center, width, heightは画像の幅・高さに対して0.0~1.0に正規化する必要がある。

以下のような処理を行って正規化すればよい。


def normalize(
    xmin: float, ymin: float, xmax: float, ymax: float, width: float, height: float
) -> tuple[float, float, float, float]:
    bbox_xc: float = (xmin + xmax) / 2.0 / width
    bbox_yc: float = (ymin + ymax) / 2.0 / height
    bbox_w: float = (xmax - xmin) / width
    bbox_h: float = (ymax - ymin) / height
    return bbox_xc, bbox_yc, bbox_w, bbox_h

使用する各データセット向けにスクリプトを実装したので、それを利用してDarknetで利用可能なフォーマットのアノテーションファイルを作成する。

スクリプト - https://github.com/himazin331/preprocessing_tools_for_darknet/tree/main/tools

create_labels_hollywoodheads.py
- HollywoodHeads dataset用のアノテーションファイル作成
create_labels_brainwash.py
- Brainwash dataset用のアノテーションファイル作成
create_labels_indoor.py
- RGBD_Indoor_Dataset用のアノテーションファイル作成

なお、頭部のみの検出、すなわちオブジェクトクラス数は1のためobject-classは今回は0で固定としている。

3. train, test, evalの作成

データセットから学習用(train)、検証用(test)、評価用(eval)のデータセットを作成する。

一応、今回使用するデータセットにはすでにtrain, testに分割されているが、すべてを混ぜ合わせてランダムに抽出し、再度分割し直す。

各データセットにおける分割枚数は以下のようにし、train : test = 8 : 2とした。

データセット元	全体	train	test	eval	備考
HollywoodHeads	224,740	179,784	44,946	10
brainwash	11,769	9,407	2,352	10	本当は11,918枚あるが、うち149枚はなぜかアノテーション情報がなかったため除外。 (該当ファイルはAppendix 1. brainwashのアノテーション情報がなかった画像一覧に記載)
RGBD_Indoor_Dataset	1,611	1,281	320	10
合計	238,120	190,472	47,618	30

※細かい内訳はAppendix 2. データセットの分割内訳に記載。

分割の前にbrainwashについては同一ファイル名が含まれるため、事前にファイル名を変更しておく必要がある。

リネームスクリプト - https://github.com/himazin331/preprocessing_tools_for_darknet/blob/main/tools/rename_file.py

brainwashのリネームが済んだら...
分割用にスクリプトを実装したので、それを利用して分割する。

random_pick_images.py - https://github.com/himazin331/preprocessing_tools_for_darknet/blob/main/tools/random_pick_images.py

上のスクリプトは、分割と同時にtrain.txt, test.txtも作成される。
train.txt, test.txtは学習用データと検証用データのファイルパスを記述したテキストファイルで、Darknetの学習時に指定する必要がある。

4. .dataと.namesの作成

train.txt, test.txtの指定やweightsファイルの出力先(backup)、オブジェクトクラス数の指定などを記述した.dataファイルと、オブジェクト名を記述した.namesファイルを作成する。

head.data
```
head.data
classes= 1
train  = custom-data/train.txt
valid  = custom-data/test.txt
names = custom-data/head.names
backup = output/
```
classes：オブジェクトクラス数
train：学習用データのファイルパスを記述したテキストファイルのパス
valid：検証用データのファイルパスを記述したテキストファイルのパス
names：オブジェクト名を記述したテキストファイルのパス
backup： weightsファイルの出力先
head.names
オブジェクト名を1行ずつ記述する。
行番号 - 1がそのままオブジェクトクラスIDとなる。
```
head.names
Head
```

5. 特徴抽出層の学習済みweights作成

YOLOv3-tinyの学習済みweightsをダウンロードし、学習済みweightsから特徴抽出層だけを抽出したyolov3-tiny.conv.15を作成する。


$ cd darknet
$ wget https://pjreddie.com/media/files/yolov3-tiny.weights

weightsから特定の層を抽出するのにDarknet partialを使う。

今回は、YOLOv3-tinyの学習済みweightsから最初の15層だけを抽出した。


$ ./darknet partial cfg/yolov3-tiny.cfg yolov3-tiny.weights yolov3-tiny.conv.15 15

Darknet partialのオプション指定


./darknet partial {cfgファイル} {weightsファイル} {出力ファイル} {抽出するレイヤー番号}

6. モデルコンフィグの作成

darknet内にあるcfg/yolov3-tiny_obj.cfgをコピーして、カスタムデータに合わせたモデルコンフィグを作成する。


$ cp darknet/cfg/yolov3-tiny_obj.cfg custom-data/yolov3-tiny_head.cfg
$ vi custom-data/yolov3-tiny_head.cfg

学習パラメータ
バッチサイズやイテレーション数、学習率などを変更する。

今回はイテレーション数max_batchesのみ以下のように変更した。

yolov3-tiny_head.cfg
1
[net]
2
...
3
batch=64        # バッチサイズ
4
subdivisions=2  # バッチ分割数
5
...
-
max_batches = 500200
+
max_batches = 3000   # イテレーション数
8
...

VRAMに余裕がない場合は、batchを少なくして、subdivisionsを増やすとよい。

出力レイヤー定義
オブジェクトクラス数に合わせて出力レイヤーを定義する。

今回はオブジェクトクラス数が1なので、classesを1に、filtersを18に変更した。
出力レイヤーは2つあるため、両方とも忘れずに変更すること。

※ filtersについては以下のように計算する。


filters = (classes + coords + 1) * number_of_mask

YOLOv3-tinyではcoordsが4、number_of_maskが3。

yolov3-tiny_head.cfg
1
[convolutional]
2
size=1
3
stride=1
4
pad=1
-
filters=255
+
# filters=255
+
filters=18
8
activation=linear
9

10
[yolo]
11
mask = 3,4,5
12
anchors = 10,14, 23,27, 37,58, 81,82, 135,169, 344,319
-
classes=80
+
# classes=80
+
classes=1
16
num=6
17
...
18

19
[convolutional]
20
size=1
21
stride=1
22
pad=1
-
filters=255
+
# filters=255
+
filters=18
26
activation=linear
27

28
[yolo]
29
mask = 0,1,2
30
anchors = 10,14,  23,27,  37,58,  81,82,  135,169,  344,319
-
classes=80
+
# classes=80
+
classes=1
34
num=6
35
...

学習

学習を行う。

出力weightsファイルは、.dataファイルのbackupで指定したディレクトリに1000 iter毎に保存される。
※ max_batches >= 10000であれば、10000 iter毎に保存される。


$ cd darknet
$ ./darknet detector train ../custom-data/head.data ../custom-data/yolov3-tiny_head.cfg yolov3-tiny.conv.15 -dont_show -mjpeg_port 8090 -map > ../output/log.txt | tee ../output/log.txt

Darknet trainのオプション指定 (最小)


./darknet detector train {dataファイル} {cfgファイル} {weightsファイル}

Darknet trainのオプション指定 (私的推奨)
```
./darknet detector train {dataファイル} {cfgファイル} {weightsファイル} -dont_show -mjpeg_port 8090 -map > {ログファイル} | tee {ログファイル}
```
-dont_show：損失グラフのGUI出力を無効化。
-mjpeg_port：損失グラフをWebページに表示する。その際のポート番号。
-map： mAPを計算する。

補足
損失グラフのGUI出力は最新のWSL2であれば標準でWSLgをサポートしているため、WSL環境であっても表示が可能。
しかし、WSLが高確率でハングするため(少なくとも筆者の環境ではそう)、GUI出力は無効化しておき、Webページ上で確認したほうが良いと思われる。

追記 1/16
Webページで確認する方法でもWSLがハングしてしまった....もうリアルタイムでの確認は諦めて、出力される損失グラフ画像を確認するようにしたほうがいいのかもしれない。

コンソール出力例

出力例 (クリックで展開)


CUDA-version: 12030 (12030), cuDNN: 8.9.6, GPU count: 1
OpenCV version: 4.2.0
0 : compute_capability = 750, cudnn_half = 0, GPU: NVIDIA GeForce RTX 2080 Ti
  layer   filters  size/strd(dil)      input                output
  0 conv     16       3 x 3/ 1    416 x 416 x   3 ->  416 x 416 x  16 0.150 BF
  1 max                2x 2/ 2    416 x 416 x  16 ->  208 x 208 x  16 0.003 BF
  2 conv     32       3 x 3/ 1    208 x 208 x  16 ->  208 x 208 x  32 0.399 BF
  3 max                2x 2/ 2    208 x 208 x  32 ->  104 x 104 x  32 0.001 BF
  4 conv     64       3 x 3/ 1    104 x 104 x  32 ->  104 x 104 x  64 0.399 BF
  5 max                2x 2/ 2    104 x 104 x  64 ->   52 x  52 x  64 0.001 BF
  6 conv    128       3 x 3/ 1     52 x  52 x  64 ->   52 x  52 x 128 0.399 BF
  7 max                2x 2/ 2     52 x  52 x 128 ->   26 x  26 x 128 0.000 BF
  8 conv    256       3 x 3/ 1     26 x  26 x 128 ->   26 x  26 x 256 0.399 BF
  9 max                2x 2/ 2     26 x  26 x 256 ->   13 x  13 x 256 0.000 BF
  10 conv    512       3 x 3/ 1     13 x  13 x 256 ->   13 x  13 x 512 0.399 BF
  11 max                2x 2/ 1     13 x  13 x 512 ->   13 x  13 x 512 0.000 BF
  12 conv   1024       3 x 3/ 1     13 x  13 x 512 ->   13 x  13 x1024 1.595 BF
  13 conv    256       1 x 1/ 1     13 x  13 x1024 ->   13 x  13 x 256 0.089 BF
  14 conv    512       3 x 3/ 1     13 x  13 x 256 ->   13 x  13 x 512 0.399 BF
  15 conv     18       1 x 1/ 1     13 x  13 x 512 ->   13 x  13 x  18 0.003 BF
  16 yolo
[yolo] params: iou loss: mse (2), iou_norm: 0.75, obj_norm: 1.00, cls_norm: 1.00, delta_norm: 1.00, scale_x_y: 1.00
  17 route  13                                     ->   13 x  13 x 256
  18 conv    128       1 x 1/ 1     13 x  13 x 256 ->   13 x  13 x 128 0.011 BF
  19 upsample                 2x    13 x  13 x 128 ->   26 x  26 x 128
  20 route  19 8                                   ->   26 x  26 x 384
  21 conv    256       3 x 3/ 1     26 x  26 x 384 ->   26 x  26 x 256 1.196 BF
  22 conv     18       1 x 1/ 1     26 x  26 x 256 ->   26 x  26 x  18 0.006 BF
  23 yolo
[yolo] params: iou loss: mse (2), iou_norm: 0.75, obj_norm: 1.00, cls_norm: 1.00, delta_norm: 1.00, scale_x_y: 1.00
Total BFLOPS 5.448
avg_outputs = 324846
Allocate additional workspace_size = 988.81 MB
Loading weights from yolov3-tiny.conv.15...Done! Loaded 15 layers from weights-file
saveweights: Using default '1000'
savelast: Using default '100'
Create 6 permanent cpu-threads
v3 (mse loss, Normalizer: (iou: 0.75, obj: 1.00, cls: 1.00) Region 16 Avg (IOU: 0.260548), count: 38, class_loss = 369.989166, iou_loss = 5.684174, total_loss = 375.673340
v3 (mse loss, Normalizer: (iou: 0.75, obj: 1.00, cls: 1.00) Region 23 Avg (IOU: 0.222834), count: 13, class_loss = 1096.972046, iou_loss = 1.948853, total_loss = 1098.920898
total_bbox = 51, rewritten_bbox = 0.000000 %
v3 (mse loss, Normalizer: (iou: 0.75, obj: 1.00, cls: 1.00) Region 16 Avg (IOU: 0.241082), count: 38, class_loss = 370.126129, iou_loss = 5.840210, total_loss = 375.966339
v3 (mse loss, Normalizer: (iou: 0.75, obj: 1.00, cls: 1.00) Region 23 Avg (IOU: 0.287195), count: 9, class_loss = 1093.864014, iou_loss = 1.172974, total_loss = 1095.036987
total_bbox = 98, rewritten_bbox = 0.000000 %
v3 (mse loss, Normalizer: (iou: 0.75, obj: 1.00, cls: 1.00) Region 16 Avg (IOU: 0.259580), count: 35, class_loss = 370.456665, iou_loss = 5.190857, total_loss = 375.647522
v3 (mse loss, Normalizer: (iou: 0.75, obj: 1.00, cls: 1.00) Region 23 Avg (IOU: 0.295190), count: 19, class_loss = 1094.776367, iou_loss = 1.485718, total_loss = 1096.262085
total_bbox = 152, rewritten_bbox = 0.000000 %
v3 (mse loss, Normalizer: (iou: 0.75, obj: 1.00, cls: 1.00) Region 16 Avg (IOU: 0.268365), count: 41, class_loss = 369.521393, iou_loss = 5.977814, total_loss = 375.499207
v3 (mse loss, Normalizer: (iou: 0.75, obj: 1.00, cls: 1.00) Region 23 Avg (IOU: 0.245747), count: 8, class_loss = 1094.285278, iou_loss = 1.216553, total_loss = 1095.501831
total_bbox = 201, rewritten_bbox = 0.000000 %
v3 (mse loss, Normalizer: (iou: 0.75, obj: 1.00, cls: 1.00) Region 16 Avg (IOU: 0.275648), count: 29, class_loss = 369.688843, iou_loss = 3.864532, total_loss = 373.553375
v3 (mse loss, Normalizer: (iou: 0.75, obj: 1.00, cls: 1.00) Region 23 Avg (IOU: 0.278903), count: 23, class_loss = 1096.085083, iou_loss = 3.523682, total_loss = 1099.608765
total_bbox = 253, rewritten_bbox = 0.000000 %
...

ログファイル出力例

出力例 (クリックで展開)


yolov3-tiny_head
net.optimized_memory = 0 
mini_batch = 32, batch = 64, time_steps = 1, train = 1 
Create CUDA-stream - 0 
Create cudnn-handle 0 

seen 64, trained: 0 K-images (0 Kilo-batches_64) 
Weights are saved after: 1000 iterations. Last weights (*_last.weight) are stored every 100 iterations. 
Learning Rate: 0.001, Momentum: 0.9, Decay: 0.0005
Detection layer: 16 - type = 28 
Detection layer: 23 - type = 28 
If error occurs - run training with flag: -dont_show 
Resizing, random_coef = 1.40 

608 x 608 
try to allocate additional workspace_size = 639.07 MB 
CUDA allocate done! 
Loaded: 0.000031 seconds
[H[J1/10000: loss=736.4 hours left=-1.0
1: 736.399414, 736.399414 avg loss, 0.000000 rate, 0.496868 seconds, 64 images, -1.000000 hours left
Loaded: 0.000049 seconds
[H[J2/10000: loss=735.7 hours left=1.4
2: 735.727661, 736.332214 avg loss, 0.000000 rate, 0.378894 seconds, 128 images, 1.380973 hours left
Loaded: 0.000045 seconds
[H[J3/10000: loss=737.0 hours left=1.4
3: 736.953186, 736.394287 avg loss, 0.000000 rate, 0.372903 seconds, 192 images, 1.377692 hours left
Loaded: 0.000050 seconds
[H[J4/10000: loss=736.8 hours left=1.4
4: 736.827026, 736.437561 avg loss, 0.000000 rate, 0.372881 seconds, 256 images, 1.374280 hours left
Loaded: 0.000047 seconds
[H[J5/10000: loss=736.5 hours left=1.4
5: 736.451782, 736.438965 avg loss, 0.000000 rate, 0.381815 seconds, 320 images, 1.370900 hours left
Loaded: 0.000055 seconds
[H[J6/10000: loss=737.2 hours left=1.4
6: 737.203125, 736.515381 avg loss, 0.000000 rate, 0.373631 seconds, 384 images, 1.367798 hours left
Loaded: 0.000039 seconds
[H[J7/10000: loss=735.8 hours left=1.4
7: 735.777405, 736.441589 avg loss, 0.000000 rate, 0.389335 seconds, 448 images, 1.364500 hours left
Loaded: 0.000058 seconds
[H[J8/10000: loss=736.5 hours left=1.4
8: 736.536255, 736.451050 avg loss, 0.000000 rate, 0.393008 seconds, 512 images, 1.361670 hours left
...

損失グラフ出力例

損失グラフ(chart.png)についてはdarknetディレクトリ配下に保存される。

推論

学習により得られたweightsファイルを指定して推論を行う。


$ ./darknet detector test ../custom-data/head.data ../custom-data/yolov3-tiny_head.cfg ../output/yolov3-tiny_head_final.weights -thresh 0.1 -ext_output < ../custom-data/eval.txt > ../output/result.txt

Darknet testのオプション指定 (最小)


./darknet detector test {dataファイル} {cfgファイル} {weightsファイル} {推論画像ファイル}

Darknet testのオプション指定 (私的推奨)


./darknet detector test {dataファイル} {cfgファイル} {weightsファイル} -thresh {信頼度閾値} -ext_output < {推論画像リスト} > {ログファイル}

{推論画像リスト}：推論する画像ファイルのパスを記述したテキストファイルのパス(eval.txt)
-dont_show：推論結果をGUI表示しない。(*optional)

推論結果出力例

ログファイル出力例

出力例 (クリックで展開)


net.optimized_memory = 0 
mini_batch = 1, batch = 2, time_steps = 1, train = 0 
Create CUDA-stream - 0 
Create cudnn-handle 0 

seen 64, trained: 192 K-images (3 Kilo-batches_64) 
Enter Image Path:  Detection layer: 16 - type = 28 
Detection layer: 23 - type = 28 
../custom-data/eval/brainwash_11_13_2014_images_00358000_640x480.png: Predicted in 62.739000 milli-seconds.
Head: 28%	(left_x:   76   top_y:  234   width:   27   height:   29)
Head: 43%	(left_x:  100   top_y:  217   width:   27   height:   28)
Head: 66%	(left_x:  122   top_y:  231   width:   30   height:   33)
Head: 24%	(left_x:  140   top_y:  232   width:   29   height:   29)
Head: 98%	(left_x:  183   top_y:  125   width:   14   height:   22)
Head: 60%	(left_x:  195   top_y:  164   width:   26   height:   26)
Head: 89%	(left_x:  252   top_y:  151   width:   19   height:   18)
Head: 81%	(left_x:  310   top_y:  138   width:   18   height:   14)
Head: 60%	(left_x:  374   top_y:  163   width:   25   height:   31)
Head: 91%	(left_x:  461   top_y:  183   width:   32   height:   29)
...

mAP算出

学習により得られたweightsファイルを指定してmAPを算出する。


$ ./darknet detector map ../custom-data/head_eval.data ../custom-data/yolov3-tiny_head.cfg ../output/yolov3-tiny_head_final.weights -iou_thresh 0.5

Darknet mapのオプション指定


./darknet detector map {dataファイル} {cfgファイル} {weightsファイル} -iou_thresh {IOU閾値}

コンソール出力例

出力例 (クリックで展開)


CUDA-version: 12030 (12030), cuDNN: 8.9.6, GPU count: 1
 OpenCV version: 4.2.0
 0 : compute_capability = 890, cudnn_half = 0, GPU: NVIDIA GeForce RTX 4090
net.optimized_memory = 0
mini_batch = 1, batch = 2, time_steps = 1, train = 0
   layer   filters  size/strd(dil)      input                output
   0 Create CUDA-stream - 0
 Create cudnn-handle 0
conv     16       3 x 3/ 1    416 x 416 x   3 ->  416 x 416 x  16 0.150 BF
   1 max                2x 2/ 2    416 x 416 x  16 ->  208 x 208 x  16 0.003 BF
   2 conv     32       3 x 3/ 1    208 x 208 x  16 ->  208 x 208 x  32 0.399 BF
   3 max                2x 2/ 2    208 x 208 x  32 ->  104 x 104 x  32 0.001 BF
   4 conv     64       3 x 3/ 1    104 x 104 x  32 ->  104 x 104 x  64 0.399 BF
   5 max                2x 2/ 2    104 x 104 x  64 ->   52 x  52 x  64 0.001 BF
   6 conv    128       3 x 3/ 1     52 x  52 x  64 ->   52 x  52 x 128 0.399 BF
   7 max                2x 2/ 2     52 x  52 x 128 ->   26 x  26 x 128 0.000 BF
   8 conv    256       3 x 3/ 1     26 x  26 x 128 ->   26 x  26 x 256 0.399 BF
   9 max                2x 2/ 2     26 x  26 x 256 ->   13 x  13 x 256 0.000 BF
  10 conv    512       3 x 3/ 1     13 x  13 x 256 ->   13 x  13 x 512 0.399 BF
  11 max                2x 2/ 1     13 x  13 x 512 ->   13 x  13 x 512 0.000 BF
  12 conv   1024       3 x 3/ 1     13 x  13 x 512 ->   13 x  13 x1024 1.595 BF
  13 conv    256       1 x 1/ 1     13 x  13 x1024 ->   13 x  13 x 256 0.089 BF
  14 conv    512       3 x 3/ 1     13 x  13 x 256 ->   13 x  13 x 512 0.399 BF
  15 conv     18       1 x 1/ 1     13 x  13 x 512 ->   13 x  13 x  18 0.003 BF
  16 yolo
[yolo] params: iou loss: mse (2), iou_norm: 0.75, obj_norm: 1.00, cls_norm: 1.00, delta_norm: 1.00, scale_x_y: 1.00
  17 route  13                                     ->   13 x  13 x 256
  18 conv    128       1 x 1/ 1     13 x  13 x 256 ->   13 x  13 x 128 0.011 BF
  19 upsample                 2x    13 x  13 x 128 ->   26 x  26 x 128
  20 route  19 8                                   ->   26 x  26 x 384
  21 conv    256       3 x 3/ 1     26 x  26 x 384 ->   26 x  26 x 256 1.196 BF
  22 conv     18       1 x 1/ 1     26 x  26 x 256 ->   26 x  26 x  18 0.006 BF
  23 yolo
[yolo] params: iou loss: mse (2), iou_norm: 0.75, obj_norm: 1.00, cls_norm: 1.00, delta_norm: 1.00, scale_x_y: 1.00
Total BFLOPS 5.448
avg_outputs = 324846
 Allocate additional workspace_size = 19.91 MB
Loading weights from ../output/01/yolov3-tiny_head_final.weights...
 seen 64, trained: 192 K-images (3 Kilo-batches_64)
Done! Loaded 24 layers from weights-file

 calculation mAP (mean average precision)...
 Detection layer: 16 - type = 28
 Detection layer: 23 - type = 28
32
 detections_count = 878, unique_truth_count = 134
class_id = 0, name = Head, ap = 59.38%           (TP = 81, FP = 54)

 for conf_thresh = 0.25, precision = 0.60, recall = 0.60, F1-score = 0.60
 for conf_thresh = 0.25, TP = 81, FP = 54, FN = 53, average IoU = 41.13 %

 IoU threshold = 50 %, used Area-Under-Curve for each unique Recall
 mean average precision (mAP@0.50) = 0.593834, or 59.38 %
Total Detection Time: 0 Seconds

応用

weightsファイルの出力間隔を変更

以下のような変更を行い、ビルドすることで任意のイテレーション回数毎にweightsファイルを出力することができる。

参照： darknet/src/detector.c

darknet/src/detector.c
//
// int save_after_iterations = option_find_int(options, "saveweights", (net.max_batches < 10000) ? 1000 : 10000 );  // configure when to write weights. Very useful for smaller datasets!
int save_after_iterations = option_find_int(options, "saveweights", 100);

chart.pngの出力間隔を変更

参照： YOLOで学習中のlossのグラフchart.png を100iteretion毎に保存する

predictions.jpgを都度保存する

参照：【Darknet】複数枚のpredictionsをまとめて取得する

参考

Appendix

Appendix 1. brainwashのアノテーション情報がなかった画像一覧

画像一覧 (クリックで展開)


brainwash_11_13_2014_images/00380000_640x480.png
brainwash_11_13_2014_images/00380500_640x480.png
brainwash_11_13_2014_images/00381000_640x480.png
brainwash_11_13_2014_images/00381500_640x480.png
brainwash_11_13_2014_images/00382000_640x480.png
brainwash_11_13_2014_images/00382500_640x480.png
brainwash_11_13_2014_images/00383000_640x480.png
brainwash_11_13_2014_images/00383500_640x480.png
brainwash_11_13_2014_images/00384000_640x480.png
brainwash_11_13_2014_images/00384500_640x480.png
brainwash_11_13_2014_images/00385000_640x480.png
brainwash_11_13_2014_images/00385500_640x480.png
brainwash_11_13_2014_images/00386000_640x480.png
brainwash_11_13_2014_images/00386500_640x480.png
brainwash_11_13_2014_images/00387000_640x480.png
brainwash_11_13_2014_images/00387500_640x480.png
brainwash_11_13_2014_images/00388000_640x480.png
brainwash_11_13_2014_images/00388500_640x480.png
brainwash_11_13_2014_images/00389000_640x480.png
brainwash_11_13_2014_images/00389500_640x480.png
brainwash_11_13_2014_images/00390000_640x480.png
brainwash_11_13_2014_images/00390500_640x480.png
brainwash_11_13_2014_images/00391000_640x480.png
brainwash_11_13_2014_images/00391500_640x480.png
brainwash_11_13_2014_images/00392000_640x480.png
brainwash_11_13_2014_images/00392500_640x480.png
brainwash_11_13_2014_images/00393000_640x480.png
brainwash_11_13_2014_images/00393500_640x480.png
brainwash_11_13_2014_images/00394000_640x480.png
brainwash_11_13_2014_images/00394500_640x480.png
brainwash_11_13_2014_images/00395000_640x480.png
brainwash_11_13_2014_images/00395500_640x480.png
brainwash_11_13_2014_images/00396000_640x480.png
brainwash_11_13_2014_images/00396500_640x480.png
brainwash_11_13_2014_images/00397000_640x480.png
brainwash_11_13_2014_images/00397500_640x480.png
brainwash_11_13_2014_images/00398000_640x480.png
brainwash_11_13_2014_images/00398500_640x480.png
brainwash_11_13_2014_images/00399000_640x480.png
brainwash_11_13_2014_images/00399500_640x480.png
brainwash_11_13_2014_images/00400000_640x480.png
brainwash_11_13_2014_images/00400500_640x480.png
brainwash_11_13_2014_images/00401000_640x480.png
brainwash_11_13_2014_images/00401500_640x480.png
brainwash_11_13_2014_images/00402000_640x480.png
brainwash_11_13_2014_images/00402500_640x480.png
brainwash_11_13_2014_images/00403000_640x480.png
brainwash_11_13_2014_images/00403500_640x480.png
brainwash_11_13_2014_images/00404000_640x480.png
brainwash_11_13_2014_images/00404500_640x480.png
brainwash_11_13_2014_images/00405000_640x480.png
brainwash_11_13_2014_images/00405500_640x480.png
brainwash_11_13_2014_images/00406000_640x480.png
brainwash_11_13_2014_images/00406500_640x480.png
brainwash_11_13_2014_images/00407000_640x480.png
brainwash_11_13_2014_images/00407500_640x480.png
brainwash_11_13_2014_images/00408000_640x480.png
brainwash_11_13_2014_images/00408500_640x480.png
brainwash_11_13_2014_images/00409000_640x480.png
brainwash_11_13_2014_images/00409500_640x480.png
brainwash_11_13_2014_images/00410000_640x480.png
brainwash_11_13_2014_images/00410500_640x480.png
brainwash_11_13_2014_images/00411000_640x480.png
brainwash_11_13_2014_images/00411500_640x480.png
brainwash_11_13_2014_images/00412000_640x480.png
brainwash_11_13_2014_images/00412500_640x480.png
brainwash_11_13_2014_images/00413000_640x480.png
brainwash_11_13_2014_images/00413500_640x480.png
brainwash_11_13_2014_images/00414000_640x480.png
brainwash_11_13_2014_images/00414500_640x480.png
brainwash_11_13_2014_images/00415000_640x480.png
brainwash_11_13_2014_images/00415500_640x480.png
brainwash_11_13_2014_images/00416000_640x480.png
brainwash_11_13_2014_images/00416500_640x480.png
brainwash_11_13_2014_images/00417000_640x480.png
brainwash_11_13_2014_images/00417500_640x480.png
brainwash_11_13_2014_images/00418000_640x480.png
brainwash_11_13_2014_images/00418500_640x480.png
brainwash_11_13_2014_images/00419000_640x480.png
brainwash_11_13_2014_images/00419500_640x480.png
brainwash_11_13_2014_images/00420000_640x480.png
brainwash_11_13_2014_images/00420500_640x480.png
brainwash_11_13_2014_images/00421000_640x480.png
brainwash_11_13_2014_images/00421500_640x480.png
brainwash_11_13_2014_images/00422000_640x480.png
brainwash_11_13_2014_images/00422500_640x480.png
brainwash_11_13_2014_images/00423000_640x480.png
brainwash_11_13_2014_images/00423500_640x480.png
brainwash_11_13_2014_images/00424000_640x480.png
brainwash_11_13_2014_images/00424500_640x480.png
brainwash_11_13_2014_images/00425000_640x480.png
brainwash_11_13_2014_images/00425500_640x480.png
brainwash_11_13_2014_images/00426000_640x480.png
brainwash_11_13_2014_images/00426500_640x480.png
brainwash_11_13_2014_images/00427000_640x480.png
brainwash_11_13_2014_images/00427500_640x480.png
brainwash_11_13_2014_images/00428000_640x480.png
brainwash_11_13_2014_images/00428500_640x480.png
brainwash_11_13_2014_images/00429000_640x480.png
brainwash_11_13_2014_images/00429500_640x480.png
brainwash_11_13_2014_images/00430000_640x480.png
brainwash_11_13_2014_images/00430500_640x480.png
brainwash_11_13_2014_images/00431000_640x480.png
brainwash_11_13_2014_images/00431500_640x480.png
brainwash_11_13_2014_images/00432000_640x480.png
brainwash_11_13_2014_images/00432500_640x480.png
brainwash_11_13_2014_images/00433000_640x480.png
brainwash_11_13_2014_images/00433500_640x480.png
brainwash_11_13_2014_images/00434000_640x480.png
brainwash_11_13_2014_images/00434500_640x480.png
brainwash_11_13_2014_images/00435000_640x480.png
brainwash_11_13_2014_images/00435500_640x480.png
brainwash_11_13_2014_images/00436000_640x480.png
brainwash_11_13_2014_images/00436500_640x480.png
brainwash_11_13_2014_images/00437000_640x480.png
brainwash_11_13_2014_images/00437500_640x480.png
brainwash_11_13_2014_images/00438000_640x480.png
brainwash_11_13_2014_images/00438500_640x480.png
brainwash_11_13_2014_images/00439000_640x480.png
brainwash_11_13_2014_images/00439500_640x480.png
brainwash_11_13_2014_images/00440000_640x480.png
brainwash_11_13_2014_images/00440500_640x480.png
brainwash_11_13_2014_images/00441000_640x480.png
brainwash_11_13_2014_images/00441500_640x480.png
brainwash_11_13_2014_images/00442000_640x480.png
brainwash_11_13_2014_images/00442500_640x480.png
brainwash_11_13_2014_images/00443000_640x480.png
brainwash_11_13_2014_images/00443500_640x480.png
brainwash_11_13_2014_images/00444000_640x480.png
brainwash_11_13_2014_images/00444500_640x480.png
brainwash_11_13_2014_images/00445000_640x480.png
brainwash_11_13_2014_images/00445500_640x480.png
brainwash_11_13_2014_images/00446000_640x480.png
brainwash_11_13_2014_images/00446500_640x480.png
brainwash_11_13_2014_images/00447000_640x480.png
brainwash_11_13_2014_images/00447500_640x480.png
brainwash_11_13_2014_images/00448000_640x480.png
brainwash_11_13_2014_images/00448500_640x480.png
brainwash_11_13_2014_images/00449000_640x480.png
brainwash_11_13_2014_images/00449500_640x480.png
brainwash_11_13_2014_images/00450000_640x480.png
brainwash_11_13_2014_images/00450500_640x480.png
brainwash_11_13_2014_images/00451000_640x480.png
brainwash_11_13_2014_images/00451500_640x480.png
brainwash_11_13_2014_images/00452000_640x480.png
brainwash_11_13_2014_images/00452500_640x480.png
brainwash_11_13_2014_images/00453000_640x480.png
brainwash_11_13_2014_images/00453500_640x480.png
brainwash_11_24_2014_images/00785500_640x480.jpg

Appendix 2. データセットの分割内訳

HollywoodHeads

ディレクトリ名 train test eval 無効
JPEGImages 224,740 179,784 44,946 0
brainwash

ディレクトリ名 train test eval 無効
brainwash_10_27_2014_images 281 70 2 0
brainwash_11_13_2014_images 3,116 779 4 148
brainwash_11_24_2014_images 6,010 1,503 4 1
RGBD_Indoor_Dataset

ディレクトリ名 train test eval 無効
train 1,155 289 5 0
test 126 31 5 0

ディレクトリ名	train	test	eval	無効
JPEGImages	224,740	179,784	44,946	0

ディレクトリ名	train	test	eval	無効
brainwash_10_27_2014_images	281	70	2	0
brainwash_11_13_2014_images	3,116	779	4	148
brainwash_11_24_2014_images	6,010	1,503	4	1

ディレクトリ名	train	test	eval	無効
train	1,155	289	5	0
test	126	31	5	0

Appendix 3. COCO形式およびPascalVOC形式の取り扱い

通常、Darknet向けのアノテーションファイルは画像データと同じディレクトリ内に配置するが、
画像データが含まれるディレクトリ名がCOCO形式またはPascalVOC形式のそれである場合は、決められたディレクトリ内にアノテーションファイルを配置しないといけないようなので要注意。

参照： darknet/src/utils.c

darknet/src/utils.c
void replace_image_to_label(const char* input_path, char* output_path)
{
    find_replace(input_path, "/images/train2017/", "/labels/train2017/", output_path);    // COCO
    find_replace(output_path, "/images/val2017/", "/labels/val2017/", output_path);        // COCO
    find_replace(output_path, "/JPEGImages/", "/labels/", output_path);    // PascalVOC
    find_replace(output_path, "\\images\\train2017\\", "\\labels\\train2017\\", output_path);    // COCO
    find_replace(output_path, "\\images\\val2017\\", "\\labels\\val2017\\", output_path);        // COCO

    find_replace(output_path, "\\images\\train2014\\", "\\labels\\train2014\\", output_path);    // COCO
    find_replace(output_path, "\\images\\val2014\\", "\\labels\\val2014\\", output_path);        // COCO
    find_replace(output_path, "/images/train2014/", "/labels/train2014/", output_path);    // COCO
    find_replace(output_path, "/images/val2014/", "/labels/val2014/", output_path);        // COCO

    find_replace(output_path, "\\JPEGImages\\", "\\labels\\", output_path);    // PascalVOC
    //find_replace(output_path, "/images/", "/labels/", output_path);    // COCO
    //find_replace(output_path, "/VOC2007/JPEGImages/", "/VOC2007/labels/", output_path);        // PascalVOC
    //find_replace(output_path, "/VOC2012/JPEGImages/", "/VOC2012/labels/", output_path);        // PascalVOC

    //find_replace(output_path, "/raw/", "/labels/", output_path);
    trim(output_path);
    ...
}