（附论文）ICCV2021 | GUPNet：基于几何不确定性映射的单目3D检测网络-技术圈

点击左上方蓝字关注我们

一个专注于目标检测与深度学习知识分享的公众号

作者 | 柒柒@知乎

链接丨https://zhuanlan.zhihu.com/p/397105796

论文标题：Geometry Uncertainty Projection Network for Monocular 3D Object Detection
作者单位：The University of Sydney, SenseTime Computer Vision Group 等
论文：https://arxiv.org/pdf/2107.13774.pdf

一句话读论文：

利用几何关系衡量深度估计的不确定度。

作者的观点：

Existing methods with the projection model usually estimate the height of 2D and 3D bounding box first and then infer the depth via the projection formular.

作者还提供了图例，如下图。

从图中可以看出，即使高度估计误差只有0.1m，也可能导致4m的深度值偏差。

We can find that a slight bias (0.1m) of 3D heights could cause a significant shift (even 4m) in the projected depth.

2. 作者探讨的第一个问题是：推断可靠性。为什么要讨论这个问题呢？原因其实第一点已经提过了，“slight height bias → significant depth shift”。也就是说由于高度预测的不确定性，导致了深度值估计的不确定性。

The first problem is inference reliability. A small quality change in the 3D height estimation would cause a large change in the depth estimation quality. This makes the model cannot predict reliable uncertainty or confidence easily, leading to uncontrollable outputs.

3. 作者探讨的第二个问题是：模型训练的稳定性。为什么要讨论这个问题呢？其实还是因为高度预测的不准确。在模型训练初期，物体高度的预测往往存在较大偏差，也因此导致了深度估算偏差较大。较大误差往往导致网络训练困难，从而影响整体网络性能。

Another problem is the instability of model training. In particular, at the beginning of the training phase, the estimation of 2D/3D height tends to be noisy, and the errors will be amplified and cause outrageous depth estimation. Consequently, the training process of the network will be misled, which will lead to the degradation of the final performance

因此，作者整体的网络设计旨在于解决：推断可靠性和模型稳定性两个问题。其中，Geometry Uncertainty Projection (GUP) 用于处理推断可靠性问题，Hierarchical Task Learning (HTL) 用于处理模型训练稳定性问题。具体地，网络框架流程可以理解为：

输入2D图像 → 预测2D+3D box → GUP模块优化深度值 → 得到检测结果，如下图。

骨架网络部分与通用的单目3D检测一致，就不多说了，这里主要记录一下两个主要模块GUP和HTL是怎么运作的。

第一，Geometry Uncertainty Projection (GUP) 模块。这个模块与传统的定位模块有什么区别呢？简单地说，最显著的区别就是：之前的方法只会输出单一的深度值，本文的GUP模块输出深度值+不确定度。这里的不确定度是用来表征当前深度值的可靠性，也就是解决了作者提出的推断可靠性的问题。

The overall module builds the projection process in the probability framework rather than single values so that the model can compute the theoretical uncertainty for the inferred depth, which can indicate the depth inference reliability and also be helpful for the depth learning.

具体的做法是：

预测物体3D高度 → 做映射得到深度值 → 预测偏移量 → 深度值+偏移量得到最终的不确定度。

To achieve this goal, we first assume the prediction of the 3D height for each object is a Laplace distribution. The distribution parameters are predicted by the 3D size streame in an end-to-end way. The average denotes the regression target output and the variation is the uncertainty of the inference

接下就是，怎么样让网络朝着我们希望的方向发展呢，这就是损失函数干的活。因此，作者设计了具有针对性的3D高度预测的损失函数：

上式的函数可以比较明显的看出，损失函数最小的情况无非就是：均值等于真值且方差为0。

b）做映射得到深度值。从几何位置到深度值计算这个话题已经谈了很久了，这里就不赘述了，如下式：

将上文预测出的3D高度带入，即可得到深度值。由于3D高度是符合拉普拉斯分布的，因此，这里计算出的深度值也是符合拉普拉斯分布的，记为

。

Based on the learned height distribution, the depth distribution of the projection output can be approximated as above, where X is the standard Laplace distribution.

c）预测偏移量。没啥特别好讲的，无非就是给深度值又加了一层不确定度的保障。

We also assume that the learned bias is a Laplace distribution and independent with the projection one.

其实就是直接相加就好了，均值和方差也都符合分布相加法则。我们希望这个估计出的depth符合什么特性呢？显然与预测出的3D高度一样，我们希望depth的均值无限接近于真值，其方差无限趋近于1。也就得到了下式的损失函数：

The overall loss would push the projection results close to the ground truth and the gradient would affect the depth bias, the 2D height and the 3D height simultaneously. Besides, the uncertainty of 3D height and depth bias is also trained in the optimization process.

至此，第一个GUP模块做完了。

第二，Hierarchical Task Learning (HTL) 模块。上文也提到，这个模块是为了解决模型训练过程中的不稳定性问题。作者的做法其实挺简单，既然所有模块合在一起训练不稳定，那就分开好了，分级训练，为不同模块指定不同的训练权重，用以控制其在模型训练中的重要性。

The GUP module mainly addresses the error amplification effect in the inference stage. Yet, this effect also damages the training procedure. Specifically, at the beginning of the training, the prediction of both h2d and h3d are far from accurate, which will mislead the overall training and damage the performance. To tackle this problem, we design a Hierarchical Task Learning (HTL) to control weights for each task at each epoch.

实验结果：

没啥好说的，照惯例，有提升。

END

双一流大学研究生团队创建，专注于目标检测与深度学习，希望可以将分享变成一种习惯！

点赞三连，支持一下吧↓