计算数学
       首页 |  期刊介绍 |  编委会 |  投稿指南 |  期刊订阅 |  下载中心 |  留言板 |  联系我们 |  重点论文 |  在线办公 | 
计算数学  2020, Vol. 42 Issue (3): 349-369    DOI:
论文 最新目录 | 下期目录 | 过刊浏览 | 高级检索 Previous Articles  |  Next Articles  
深度学习中残差网络的随机训练策略
孙琪1, 陶蕴哲2, 杜强3
1 北京大学北京国际数学研究中心, 北京 100871;
2 亚马逊网络服务人工智能, 西雅图 WA90121, 美国;
3 哥伦比亚大学应用物理与应用数学系, 纽约 NY10025, 美国
STOCHASTIC TRAINING OF RESIDUAL NETWORKS IN DEEP LEARNING
Sun Qi1, Tao Yunzhe2, Du Qiang3
1 Beijing International Center for Mathematical Research, Peking University, Beijing 100871, China;
2 Amazon Web Services Artificial Intelligence, Seattle, WA 98121, USA;
3 Department of Applied Physics and Applied Mathematics, Columbia University, New York, NY 10027, USA
 全文: PDF (0 KB)   HTML (1 KB)   输出: BibTeX | EndNote (RIS)      背景资料
摘要 为了有效提高深度学习模型在实际应用场景中的泛化能力,近年来工业界和学术界对神经网络训练阶段所采用的加噪技巧给予了高度关注.当网络模型架构中的待求参数固定时,修正方程的思想可以被用来刻画随机训练策略下数据特征的传播过程,从而看出在恰当位置添加剪枝层后的残差网络等价于随机微分方程的数值离散格式.建立这两者间的对应关系使得我们可以将残差网络的随机训练过程与求解倒向柯尔莫哥洛夫方程的最优控制问题联系起来.该发现不仅使得人们可以从微分方程及其最优控制的角度来研究加噪技巧所带来的正则化效应,同时也为构建可解释性强且有效的随机训练方法提供了科学依据.本文也以二分类问题作为简例来对上述观点做进一步的阐述和说明.
服务
把本文推荐给朋友
加入我的书架
加入引用管理器
E-mail Alert
RSS
作者相关文章
关键词深度学习   残差网络   剪枝操作   随机微分方程   偏微分方程及其最优控制问题   可解释性     
Abstract: Stochastic training of artificial neural networks used for improving the capability of deep learning has attracted much attention in recent years. With prescribed model parameters for the network structure, the method of modified equations can be adopted to understand the intrinsic features of stochastic training approach. It illustrates that residual networks and their variants with noise injection used in deep learning can be regarded as weak approximations of stochastic differential equations. Such connections allow us to bring together the stochastic training processes and the optimal control of backward Kolmogorov's equations, hence offering new understanding to the regularization effects on the loss landscape from the differential equation and optimal control perspective and further suggesting reliable, efficient and explainable stochastic training strategies. A binary classification task based on the Bernoulli dropout within the residual network architecture is used here as an example to illustrate and substantiate our theoretical claims.
Key wordsdeep learning   residual network   dropout   stochastic differential equations   optimal control of partial differential equations   explainability