Mask R-CNN编程实践

Author： Kuludu
发布时间：April 27, 2020
4778 views
2 comments
6292 words
Categories：技术文章智能科学

Mask R-CNN(https://arxiv.org/abs/1703.06870v3/)是基于Faster R-CNN算法改进的一个目标检测与实例分割算法，本文并不着重探讨算法本身，而就Mask R-CNN的一个Python实现(https://github.com/matterport/Mask_RCNN/)的编程实践进行探讨。作者给出关于训练自己的数据集的样例可以在(https://github.com/matterport/Mask_RCNN/blob/master/samples/shapes/train_shapes.ipynb/)上找到。

Config

config.py中主要实现了Config类，即训练参数类。

Base configuration class. For custom configurations, create a
sub-class that inherits from this one and override properties
that need to be changed.

作者建议直接继承Config类，并重写需要更改的参数，这里对几个重要参数进行说明。

GPU_COUNT

GPU数量，默认为1。

注意，当仅使用CPU时这个值应当被设置为1。

IMAGES_PER_GPU

每个GPU上所处理的图像数量，默认为2。

Number of images to train with on each GPU. A 12GB GPU can typically handle 2 images of 1024x1024px.

作者建议，一个有12G显存的GPU可以处理两张1024*1024px的图像。

STEPS_PER_EPOCH

每一次迭代的步骤数，默认为1000。

由于每一次迭代末尾都需要进行Tensorboard的更新和验证集的验证，这个值不应该设置的过小。

NUM_CLASSES

需要识别的物体类的个数（包括背景），默认为1。

实际上这里只需要设置参数为1+需要识别的物体的个数即可。

BACKBONE

骨干网络结构，默认为resnet101。

在这个实现中还有resnet50可以使用，如果需要可以重写一个自定义的骨干网络结构。

RPN_ANCHOR_SCALES

RPN（Region Proposal Network）锚点标度，默认为(32, 64, 128, 256, 512)。

引用一张来自Faster R-CNN论文(https://arxiv.org/abs/1506.01497v3)中的一张图片，这个参数实际上就是对应图右侧所示候选区域的大小。默认参数给出了五种不同的大小，我们需要根据实际需要去调整最合适的参数（和图片像素大小对应）。

IMAGE_RESIZE_MODE & IMAGE_MIN_DIM & IMAGE_MAX_DIM

在这个实现中，为了能够让多组图像在同一批进行训练，作者在实现中对图像进行了重置大小操作。默认模式（IMAGE_RESIZE_MODE）为square，即重置为一个方形。具体实现是使用的双线性插值（Bilinear Interpolation），可见utils.py:388。

IMAGE_MIN_DIM与IMAGE_MAX_DIM即为图像最小最大大小。若图像大小小于最小值则会被放大，若大于最大值则会被缩小。

这个参数当考虑硬件实际情况酌情设置，过大的大小会拖慢训练速度，过小的大小会使训练结果变差。

注意，这里的IMAGE_MAX_DIM大小要设置为2的倍数以防止在降尺度与升尺度时出现分数。（至少要能被2整除6次）

TRAIN_ROIS_PER_IMAGE

每张图片的RoI（Region of Interest）数量，默认为200。

简单的来说就是重要的区域的个数，详见上述论文。由于RPN生成的候选区的情况不一，可以根据实际情况动态调整。

LEARNING_RATE

学习率，默认为0.001。

作者提到，在Mask R-CNN论文中使用的学习率为0.02，但这个值放在TensorFlow实现下会导致权重爆炸，故在此设置一个较低值。

在重写完配置并实例化后，我们可以调用display方法查看我们的训练配置。

Utils

utils.py中主要实现了一些工具，包括计算IOU，数据集类(Dataset)等，这里就如何重写数据集类进行说明。

首先，继承Dataset类，并至少实现以下操作：

添加需要识别的物体
添加图像
load_mask()

下面给出一个具体的示例：

具体示例

class ELDataset(utils.Dataset):
    def load_el(self, count, img_folder, mask_folder, img_list, dataset_root_path):
          # 向数据集中加入需要识别的物体
        self.add_class("shapes", 1, "el")

        for i in range(count):
            file_name = img_list[i].split(".")[0]
            mask_path = mask_folder + file_name + ".png"
            mask_img = np.array(PIL.Image.open(mask_path).convert('L'))
            
            # 向数据集中加入图像
            self.add_image("shapes", image_id=i, path=img_folder + img_list[i], 
                           mask_path=mask_path, width=mask_img.shape[1], 
                           height=mask_img.shape[0])
            
    def load_mask(self, image_id):
        info = self.image_info[image_id]
        # Mask可以存在多层，在这个示例里仅使用了一层
        mask = np.zeros([info['height'], info['width'], 1], dtype=np.uint8)
        mask[:, :, 0] =  np.array(PIL.Image.open(info['mask_path']).convert('L'))
        
        return mask.astype(np.bool), np.array([1]).astype(np.int32)
      
# 路径设定
train_dataset_root_path = "data_train/"
train_img_folder = train_dataset_root_path + "img/"
train_mask_folder = train_dataset_root_path + "mask/"
train_img_list = os.listdir(train_img_folder)
train_count = len(train_img_list)

val_dataset_root_path = "data_val/"
val_img_folder = val_dataset_root_path + "img/"
val_mask_folder = val_dataset_root_path + "mask/"
val_img_list = os.listdir(val_img_folder)
val_count = len(val_img_list)

# 测试数据集
dataset_train = ELDataset()
dataset_train.load_el(train_count, train_img_folder, train_mask_folder, train_img_list, 
                          train_dataset_root_path)
dataset_train.prepare()

# 验证数据集
dataset_val = ELDataset()
dataset_val.load_el(1, val_img_folder, val_mask_folder, val_img_list, 
                        val_dataset_root_path)
dataset_val.prepare()

Model

model.py中主要实现了算法模型。

我们可以通过以下语句创建训练模型：

model = modellib.MaskRCNN(mode="training", config=config, model_dir=MODEL_DIR)

在训练模式下，我们可以对模型进行训练。模型的权重设置为以下三种：

imagenet ：使用ImageNet预训练模型。
coco ：使用COCO预训练模型。
last ：使用上一次训练的模型。

最后，我们可以通过以下语句开始训练：

model.train(dataset_train, dataset_val, learning_rate=config.LEARNING_RATE, 
            epochs=1, layers='heads')

Train in two stages:
Only the heads. Here we're freezing all the backbone layers and training only the randomly initialized layers (i.e. the ones that we didn't use pre-trained weights from MS COCO). To train only the head layers, pass layers='heads' to the train() function.
Fine-tune all layers. For this simple example it's not necessary, but we're including it to show the process. Simply pass layers="all to train all layers.

我们可以更改layers参数为heads或all选择我们需要训练的网络层，更改epoch参数设定训练迭代次数。

Visualize

visualize.py中主要实现了对数据及结果进行可视化的方法。

例如，如下方法可以实现对识别结果的可视化：

visualize.display_instances(original_image, r['rois'], r['masks'], r['class_ids'], 
                            dataset_val.class_names, r['scores'], ax=get_ax())

引用目录

Mask R-CNN(https://arxiv.org/abs/1703.06870v3)
Faster R-CNN(https://arxiv.org/abs/1506.01497v3)
RPN解析(https://blog.csdn.net/lanran2/article/details/54376126)
Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow(https://github.com/matterport/Mask_RCNN)

本文实际完成于2020.4.27，~~月更博主这个月没有鸽！~~

Last modification：April 27, 2020

博客维护不易，如果你觉得我的文章有用，请随意赞赏

2 comments

xueli fang
June 17, 2020

你好，我是看到GitHub中你发表的maskrcnn的web识别项目，受益匪浅，想跟您加个微信交流一下可以嘛？

Reply
1. Kuludu
  June 22, 2020
  
  @xueli fang
  
  可以的，留一下你的微信吧
  
  Reply

Mask R-CNN编程实践

Kuludu • 2020 年 04 月 27 日

<div class="tip share">请注意，本文编写于 1461 天前，最后修改于 1461 天前，其中某些信息可能已经过时。</div>

<p>Mask R-CNN<span class="external-link"><a class="no-external-link" href="https://arxiv.org/abs/1703.06870v3/" target="_blank"><i data-feather="external-link"></i>(https://arxiv.org/abs/1703.06870v3/)</a></span>是基于Faster R-CNN算法改进的一个目标检测与实例分割算法，本文并不着重探讨算法本身，而就Mask R-CNN的一个Python实现<span class="external-link"><a class="no-external-link" href="https://github.com/matterport/Mask_RCNN/" target="_blank"><i data-feather="external-link"></i>(https://github.com/matterport/Mask_RCNN/)</a></span>的编程实践进行探讨。作者给出关于训练自己的数据集的样例可以在<span class="external-link"><a class="no-external-link" href="https://github.com/matterport/Mask_RCNN/blob/master/samples/shapes/train_shapes.ipynb/" target="_blank"><i data-feather="external-link"></i>(https://github.com/matterport/Mask_RCNN/blob/master/samples/shapes/train_shapes.ipynb/)</a></span>上找到。</p><h2>Config</h2><p><code>config.py</code>中主要实现了Config类，即训练参数类。</p><blockquote>Base configuration class. For custom configurations, create a<br>sub-class that inherits from this one and override properties<br>that need to be changed.</blockquote><p>作者建议直接继承Config类，并重写需要更改的参数，这里对几个重要参数进行说明。</p><h3>GPU_COUNT</h3><p>GPU数量，默认为<strong>1</strong>。</p><p><div class="tip inlineBlock error">

注意，当仅使用CPU时这个值应当被设置为1。
</div></p><h3>IMAGES_PER_GPU</h3><p>每个GPU上所处理的图像数量，默认为<strong>2</strong>。</p><p><div class="tip inlineBlock warning">

</p><blockquote>Number of images to train with on each GPU. A 12GB GPU can typically handle 2 images of 1024x1024px.</blockquote><p>作者建议，一个有12G显存的GPU可以处理两张1024*1024px的图像。</p><p>
</div></p><h3>STEPS_PER_EPOCH</h3><p>每一次迭代的步骤数，默认为<code>1000</code>。</p><p>由于每一次迭代末尾都需要进行Tensorboard的更新和验证集的验证，这个值不应该设置的过小。</p><h3>NUM_CLASSES</h3><p>需要识别的物体类的个数（包括背景），默认为<code>1</code>。</p><p>实际上这里只需要设置参数为<em>1+需要识别的物体的个数</em>即可。</p><h3>BACKBONE</h3><p>骨干网络结构，默认为<code>resnet101</code>。</p><p>在这个实现中还有<code>resnet50</code>可以使用，如果需要可以重写一个自定义的骨干网络结构。</p><h3>RPN_ANCHOR_SCALES</h3><p>RPN（Region Proposal Network）锚点标度，默认为<code>(32, 64, 128, 256, 512)</code>。</p><p><img src="https://blog.kuludu.net/usr/uploads/2020/04/1760920153.png" alt="rpn.png" title="rpn.png" style=""></p><p>引用一张来自Faster R-CNN论文<span class="external-link"><a class="no-external-link" href="https://arxiv.org/abs/1506.01497v3" target="_blank"><i data-feather="external-link"></i>(https://arxiv.org/abs/1506.01497v3)</a></span>中的一张图片，这个参数实际上就是对应图右侧所示候选区域的大小。默认参数给出了五种不同的大小，我们需要根据实际需要去调整最合适的参数（和图片像素大小对应）。</p><h3>IMAGE_RESIZE_MODE & IMAGE_MIN_DIM & IMAGE_MAX_DIM</h3><p>在这个实现中，为了能够让多组图像在同一批进行训练，作者在实现中对图像进行了重置大小操作。默认模式（IMAGE_RESIZE_MODE）为<code>square</code>，即重置为一个方形。具体实现是使用的双线性插值（Bilinear Interpolation），可见<code>utils.py:388</code>。</p><p>IMAGE_MIN_DIM与IMAGE_MAX_DIM即为图像最小最大大小。若图像大小小于最小值则会被放大，若大于最大值则会被缩小。</p><p>这个参数当考虑硬件实际情况酌情设置，过大的大小会拖慢训练速度，过小的大小会使训练结果变差。</p><p><div class="tip inlineBlock warning">

注意，这里的IMAGE_MAX_DIM大小要设置为2的倍数以防止在降尺度与升尺度时出现分数。（至少要能被2整除6次）
</div></p><h3>TRAIN_ROIS_PER_IMAGE</h3><p>每张图片的RoI（Region of Interest）数量，默认为<code>200</code>。</p><p>简单的来说就是重要的区域的个数，详见上述论文。由于RPN生成的候选区的情况不一，可以根据实际情况动态调整。</p><h3>LEARNING_RATE</h3><p>学习率，默认为<code>0.001</code>。</p><p>作者提到，在Mask R-CNN论文中使用的学习率为0.02，但这个值放在TensorFlow实现下会导致权重爆炸，故在此设置一个较低值。</p><hr><p>在重写完配置并实例化后，我们可以调用<code>display</code>方法查看我们的训练配置。</p><h2>Utils</h2><p><code>utils.py</code>中主要实现了一些工具，包括计算IOU，数据集类(Dataset)等，这里就如何重写数据集类进行说明。</p><p>首先，继承Dataset类，并至少实现以下操作：</p><ul><li>添加需要识别的物体</li><li>添加图像</li><li>load_mask()</li></ul><p>下面给出一个具体的示例：</p><p><div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-bfa0e9e75b9918b28177c656184a035c87" aria-expanded="true"><div class="accordion-toggle"><span style="">具体示例</span>
<i class="pull-right fontello icon-fw fontello-angle-right"></i>
</div>
</div>
<div class="panel-body collapse-panel-body">
<div id="collapse-bfa0e9e75b9918b28177c656184a035c87" class="collapse collapse-content"><p></p></p><pre><code class="lang-python">class ELDataset(utils.Dataset):
    def load_el(self, count, img_folder, mask_folder, img_list, dataset_root_path):
          # 向数据集中加入需要识别的物体
        self.add_class(&quot;shapes&quot;, 1, &quot;el&quot;)

for i in range(count):
            file_name = img_list[i].split(&quot;.&quot;)[0]
            mask_path = mask_folder + file_name + &quot;.png&quot;
            mask_img = np.array(PIL.Image.open(mask_path).convert(&#039;L&#039;))
            
            # 向数据集中加入图像
            self.add_image(&quot;shapes&quot;, image_id=i, path=img_folder + img_list[i], 
                           mask_path=mask_path, width=mask_img.shape[1], 
                           height=mask_img.shape[0])
            
    def load_mask(self, image_id):
        info = self.image_info[image_id]
        # Mask可以存在多层，在这个示例里仅使用了一层
        mask = np.zeros([info[&#039;height&#039;], info[&#039;width&#039;], 1], dtype=np.uint8)
        mask[:, :, 0] =  np.array(PIL.Image.open(info[&#039;mask_path&#039;]).convert(&#039;L&#039;))
        
        return mask.astype(np.bool), np.array([1]).astype(np.int32)
      
# 路径设定
train_dataset_root_path = &quot;data_train/&quot;
train_img_folder = train_dataset_root_path + &quot;img/&quot;
train_mask_folder = train_dataset_root_path + &quot;mask/&quot;
train_img_list = os.listdir(train_img_folder)
train_count = len(train_img_list)

val_dataset_root_path = &quot;data_val/&quot;
val_img_folder = val_dataset_root_path + &quot;img/&quot;
val_mask_folder = val_dataset_root_path + &quot;mask/&quot;
val_img_list = os.listdir(val_img_folder)
val_count = len(val_img_list)

# 测试数据集
dataset_train = ELDataset()
dataset_train.load_el(train_count, train_img_folder, train_mask_folder, train_img_list, 
                          train_dataset_root_path)
dataset_train.prepare()

# 验证数据集
dataset_val = ELDataset()
dataset_val.load_el(1, val_img_folder, val_mask_folder, val_img_list, 
                        val_dataset_root_path)
dataset_val.prepare()</code></pre><p><p></p></div></div></div></p><h2>Model</h2><p><code>model.py</code>中主要实现了算法模型。</p><p>我们可以通过以下语句创建训练模型：</p><pre><code class="lang-python">model = modellib.MaskRCNN(mode=&quot;training&quot;, config=config, model_dir=MODEL_DIR)</code></pre><p>在训练模式下，我们可以对模型进行训练。模型的权重设置为以下三种：</p><ul><li>imagenet ： 使用ImageNet预训练模型。</li><li>coco ： 使用COCO预训练模型。</li><li>last ： 使用上一次训练的模型。</li></ul><p>最后，我们可以通过以下语句开始训练：</p><pre><code class="lang-python">model.train(dataset_train, dataset_val, learning_rate=config.LEARNING_RATE, 
            epochs=1, layers=&#039;heads&#039;)</code></pre><blockquote><p>Train in two stages:</p><ol><li>Only the heads. Here we're freezing all the backbone layers and training only the randomly initialized layers (i.e. the ones that we didn't use pre-trained weights from MS COCO). To train only the head layers, pass <code>layers=&#039;heads&#039;</code> to the <code>train()</code> function.</li><li>Fine-tune all layers. For this simple example it's not necessary, but we're including it to show the process. Simply pass <code>layers=&quot;all</code> to train all layers.</li></ol></blockquote><p>我们可以更改<code>layers</code>参数为<code>heads</code>或<code>all</code>选择我们需要训练的网络层，更改<code>epoch</code>参数设定训练迭代次数。</p><h2>Visualize</h2><p><code>visualize.py</code>中主要实现了对数据及结果进行可视化的方法。</p><p>例如，如下方法可以实现对识别结果的可视化：</p><pre><code class="lang-python">visualize.display_instances(original_image, r[&#039;rois&#039;], r[&#039;masks&#039;], r[&#039;class_ids&#039;], 
                            dataset_val.class_names, r[&#039;scores&#039;], ax=get_ax())</code></pre><p><img src="https://blog.kuludu.net/usr/uploads/2020/04/2219389748.png" alt="visualize.png" title="visualize.png" style=""></p><hr><h2>引用目录</h2><ul><li>Mask R-CNN<span class="external-link"><a class="no-external-link" href="https://arxiv.org/abs/1703.06870v3" target="_blank"><i data-feather="external-link"></i>(https://arxiv.org/abs/1703.06870v3)</a></span></li><li>Faster R-CNN<span class="external-link"><a class="no-external-link" href="https://arxiv.org/abs/1506.01497v3" target="_blank"><i data-feather="external-link"></i>(https://arxiv.org/abs/1506.01497v3)</a></span></li><li>RPN解析<span class="external-link"><a class="no-external-link" href="https://blog.csdn.net/lanran2/article/details/54376126" target="_blank"><i data-feather="external-link"></i>(https://blog.csdn.net/lanran2/article/details/54376126)</a></span></li><li>Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow<span class="external-link"><a class="no-external-link" href="https://github.com/matterport/Mask_RCNN" target="_blank"><i data-feather="external-link"></i>(https://github.com/matterport/Mask_RCNN)</a></span></li></ul><hr><p>本文实际完成于2020.4.27，<del>月更博主这个月没有鸽！</del></p>

Mask R-CNN编程实践

Config

GPU_COUNT

IMAGES_PER_GPU

STEPS_PER_EPOCH

NUM_CLASSES

BACKBONE

RPN_ANCHOR_SCALES

IMAGE_RESIZE_MODE & IMAGE_MIN_DIM & IMAGE_MAX_DIM

TRAIN_ROIS_PER_IMAGE

LEARNING_RATE

Utils

Model

Visualize

引用目录

2 comments

Leave a Comment Cancel reply
使用cookie技术保留您的个人信息以便您下次快速评论，继续评论表示您已同意该条款

解决iTunes支付问题

[原创插件][服务端插件][管理][安全][聊天]SpeakingCleaner--语言清洁者[1.6.4-1.8.x]

线对与分辨率

信息学神器-Lemon评测器教程

MacOS下卸载System Extensions

子序列（子串）问题

哪有什么天气之子

如何理解递归

在IDEA下使用Spring Initializr初始化项目

智能家居？智障家居？

Mask R-CNN编程实践

Config

GPU_COUNT

IMAGES_PER_GPU

STEPS_PER_EPOCH

NUM_CLASSES

BACKBONE

RPN_ANCHOR_SCALES

IMAGE_RESIZE_MODE & IMAGE_MIN_DIM & IMAGE_MAX_DIM

TRAIN_ROIS_PER_IMAGE

LEARNING_RATE

Utils

Model

Visualize

引用目录

2 comments

Leave a Comment Cancel reply 使用cookie技术保留您的个人信息以便您下次快速评论，继续评论表示您已同意该条款

Mask R-CNN编程实践

Leave a Comment Cancel reply
使用cookie技术保留您的个人信息以便您下次快速评论，继续评论表示您已同意该条款