可盐可甜！进阶的detectron2！-技术圈

点蓝色字关注“机器学习算法工程师”

设为星标，干货直达！

之前的物体检测和分割轻松上手：从detectron2开始（合篇）已经介绍了Facebook AI开源的物体检测库detectron2，近期在版本v0.4之后，detectron2又增加了一个非常好用的特性：LazyConfig。之前的detectron2参数配置是基于yaml和yacs，整个代码定义一个全局变量cfg，代码中的所有参数都包含在里面，而且通过yaml配置文件可以很方便地修改所有参数。但是这种方式也有缺点，那就是所有模型的参数都放在一起，显得比较臃肿，而且当增加新的模型时，我们必须要重新在cfg增加参数。而LazyConfig可以避免这些缺点，而且基于LazyConfig还可以将detectron2很方便地扩展到其它tasks，比如分类，甚至也可以用detectron2训练mmdetection中的模型。

LazyConfig的核心是LazyCall和instantiate，这里的LazyCall使得定义参数就如同直接实例化一个类一样，但其实它只是定义调用类的参数并记录这个类（包含在__target__字段），参数属于DictConfig类型，这个是基于omegaconf库。下面是LazyCall的实现代码，可以看到非常简单，其中给的例子就是配置一个nn.Conv2d，这里LazyCall调用后返回是它的配置参数layer_cfg，而且也可以直接修改其中的参数，此时nn.Conv2d并没有被实例化，只有当将参数送入instantiate才真正实例化。

class LazyCall:
    """
    Wrap a callable so that when it's called, the call will not be executed,
    but returns a dict that describes the call.
    LazyCall object has to be called with only keyword arguments. Positional
    arguments are not yet supported.
    Examples:
    ::
        from detectron2.config import instantiate, LazyCall
        layer_cfg = LazyCall(nn.Conv2d)(in_channels=32, out_channels=32)
        layer_cfg.out_channels = 64   # can edit it afterwards
        layer = instantiate(layer_cfg)
    """

    def __init__(self, target):
        if not (callable(target) or isinstance(target, (str, abc.Mapping))):
            raise TypeError(
                "target of LazyCall must be a callable or defines a callable! Got {target}"
            )
        self._target = target

    def __call__(self, **kwargs):
        kwargs["_target_"] = self._target
        return DictConfig(content=kwargs, flags={"allow_objects": True})

在detectron2中，model，optimizer和dataloader这三个主要模块共同组成一个训练，基于LazyCall就可以对这三个模块进行配置，这就是LazyConfig。比如要定义训练的optimizer，就可以如下：

import torch

from detectron2.config import LazyCall as L
from detectron2.solver.build import get_default_optimizer_params

SGD = L(torch.optim.SGD)(
    params=L(get_default_optimizer_params)(
        # params.model is meant to be set to the model object, before instantiating
        # the optimizer.
        weight_decay_norm=0.0
    ),
    lr=0.02,
    momentum=0.9,
    weight_decay=1e-4,
)

看起来像是直接实例化了一个SGD，但其实这里得到的SGD只是optimizer的参数，在真正实例化后还可以修改其中的参数。这里对应的yaml类型的配置文件如下所示，但是可读性就不如上面。

optimizer:
  _target_: torch.optim.SGD
  lr: 0.02
  momentum: 0.9 
  params: {_target_: detectron2.solver.get_default_optimizer_params, weight_decay_norm: 0.0}
  weight_decay: 0.0001

对于model，dataloader以及其它训练所需的参数，基本和上述类似。detectron2的configs/common包含了一些具体的实例，定义一个LazyConfig配置文件，只需要组合其中的模块即可（当然也可以自己定义），比如retinanet_R_50_FPN_1x.py的配置文件如下：

from ..common.optim import SGD as optimizer
from ..common.coco_schedule import lr_multiplier_1x as lr_multiplier
from ..common.data.coco import dataloader
from ..common.models.retinanet import model
from ..common.train import train

# 修改公用组件的一些参数
dataloader.train.mapper.use_instance_mask = False
model.backbone.bottom_up.freeze_at = 2
optimizer.lr = 0.01

这里也可以看到detetcton2训练所需的元素包含：optimizer，lr_multiplier，dataloader，model和train。其中train只是一个简单的字典（其它都是类，都经过LazyCall得到LazyConfig），包含训练的一些附属参数，比如输出路径等等：

train = dict(
    output_dir="./output",
    init_checkpoint="detectron2://ImageNetPretrained/MSRA/R-50.pkl",
    max_iter=90000,
    amp=dict(enabled=False),  # options for Automatic Mixed Precision
    ddp=dict(  # options for DistributedDataParallel
        broadcast_buffers=False,
        find_unused_parameters=False,
        fp16_compression=False,
    ),
    checkpointer=dict(period=5000, max_to_keep=100),  # options for PeriodicCheckpointer
    eval_period=5000,
    log_period=20,
    device="cuda"
    # ...
)

原来基于yaml的配置文件只是一个文本文件，而LazyConfig其实是一个py文件，这样看好像直接在训练的train_net.py中import一下然后再实例化，但是这就失去了意义，因为换模型还得修改train_net.py文件，而且这样也不需要LazyCall了（多此一举）。detectron2给出的方案包含在tools/lazyconfig_train_net.py，这使得基于LazyConfig的py配置文件和原来的基于yaml的配置文件一样只是作为调用的命令行参数，具体实现如下：

def main(args):
    # 加载py配置文件，得到DictConfig，同时也将py配置文件依赖的python模块给动态加载进来了
    cfg = LazyConfig.load(args.config_file)
    # 命令行参数也可以重载配置参数，和yaml类似
    cfg = LazyConfig.apply_overrides(cfg, args.opts)
    default_setup(cfg, args)

    if args.eval_only:
        # 实例化模型
        model = instantiate(cfg.model)
        model.to(cfg.train.device)
        model = create_ddp_model(model)
        DetectionCheckpointer(model).load(cfg.train.init_checkpoint)
        print(do_test(cfg, model))
    else:
        # 实例化部分见do_train内部
        do_train(args, cfg)

这里面最重要的就是LazyConfig.load，它实现的功能是加载一个py配置文件，得到DictConfig配置参数，同时也将py文件所有依赖的python模块给加载进来了（就像import一样，具体实现比较复杂，可以看源码），另外一个就是通过命令行参数来重载配置文件中的参数，这里调用的是LazyConfig.apply_overrides，这样就和原来的train_net.py基本一样了。其实如果没有重载这一步，其实也就不需要LazyCall了。那么剩下的就是真正实例化各种模块，如model。具体的调用和原来的基本一致：

python tools/lazyconfig_train_net.py --config-file=configs/COCO-Detection/retinanet_R_50_FPN_1x.py \
    --num-gpus=8 train.output_dir=/path/to/output
# 这里的重载参数和原来的yaml方式有区别，但是基本一致

基于LazyConfig的配置文件相比原来的yaml配置文件可读性更好，而且配置文件里面只包含当前模型参数，更简洁。另外，也可以看到，LazyConfig也不再依赖于注册机制了，只需要import对应的模块就好（其实注册机制也无法略过import这一部，因为毕竟要将新增的模块import进来）。

总体看，LazyConfig更加灵活了，而detectron2本身就是个非常灵活的框架，基于LazyConfig也可以非常方便地用detectron2训练其它tasks（这里只是强调方便，因为基于yaml也是能做到的，但就是稍微麻烦一些），官方给了一个用torchvision中的模型训练ResNet的用例，其实也就是定义相应的model，dataloader和optimizer就好了。对于model部分，主要是要实现和d2一样的接口：

from torchvision.models.resnet import ResNet, Bottleneck

# 定义分类网络，主要是包装模型，实现和d2一样的接口
class ClassificationNet(nn.Module):
    def __init__(self, model: nn.Module):
        super().__init__()
        self.model = model

    @property
    def device(self):
        return list(self.model.parameters())[0].device

    # d2中loss是包含在forward中，training时返回一个dict或者一个单独的Tensor
    def forward(self, inputs):
        image, label = inputs
        pred = self.model(image.to(self.device))
        if self.training:
            label = label.to(self.device)
            return F.cross_entropy(pred, label)
        else:
            return pred

# LazyCall model
model = L(ClassificationNet)(
    model=(ResNet)(block=Bottleneck, layers=[3, 4, 6, 3], zero_init_residual=True)
)

对于dataloader，可以直接用torchvision中的实现：

def build_data_loader(dataset, batch_size, num_workers, training=True):
    return torch.utils.data.DataLoader(
        dataset,
        sampler=(TrainingSampler if training else InferenceSampler)(len(dataset)),
        batch_size=batch_size,
        num_workers=num_workers,
        pin_memory=True,
    )

dataloader.train = L(build_data_loader)(
    dataset=L(torchvision.datasets.ImageNet)(
        root="/path/to/imagenet",
        split="train",
        transform=L(T.Compose)(
            transforms=[
                L(T.RandomResizedCrop)(size=224),
                L(T.RandomHorizontalFlip)(),
                T.ToTensor(),
                L(T.Normalize)(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225)),
            ]
        ),
    ),
    batch_size=256 // 8,
    num_workers=4,
    training=True,
)

另外，dataloader中需要包含evaluator，用于评估模型效果，detectron2默认的evaluator都是基于检测模型的，对于分类模型，需要自己实现，分类评测主要是accuracy：

# 这里实现的evaluator比较简陋，只包含了top1 acc，
# evaluator也需要保持和d2一样的接口，实现process和evaluate方法
class ClassificationAcc(DatasetEvaluator):
    def reset(self):
        self.corr = self.total = 0

    def process(self, inputs, outputs):
        image, label = inputs
        self.corr += (outputs.argmax(dim=1).cpu() == label.cpu()).sum().item()
        self.total += len(label)

    def evaluate(self):
        all_corr_total = comm.all_gather([self.corr, self.total])
        corr = sum(x[0] for x in all_corr_total)
        total = sum(x[1] for x in all_corr_total)
        return {"accuracy": corr / total}
 
# LazyCall
dataloader.evaluator = L(ClassificationAcc)()

至于其它部分就比较简单了，就不再这里展开了，那么用lazyconfig_train_net也就可以训练分类模型了。对于语义分割任务，其实detecron2本身也是支持的，比如projects/DeepLab就给了DeepLabv3和v3plus的实现，不过是基于yaml方式的，那么用LazyConfig将会更加简单直接些。

另外，detectron2也可以用mmdetection的模型来训练，官方实现了MMDetBackbone和MMDetDetector两个wrapper，分别可以将mmdetection的backbone和detector与detectron2的接口一致（内部中包含了转换），那么借助wrapper就能使用mmdetection的模型在detectron2中训练，官方也给了一个训练mask rcnn的实例，最重要的可以完全兼容mmdetection的配置文件。不过实测的话，训练速度会慢好多，这大概是因为wrapper中包含一些数据接口的转化，拖慢训练速度，或许可以移到dataloader中来。我想detectron2包含这样的一个实现，其实也只是说明框架本身的兼容性和灵活性。