刘铁岩
- 微软亚研院副院长,首席研究员
Key Technical Area
- Computer Vision
ImageNet 11-12年深度学习
ResNet(Residual Network)技术 - Speech
语音识别
16年年底,微软把speech recognition word error rate 降低到5.1%(Magic Number,人类的错误率) - Natural Language
机器翻译水准尚还低于人类,但距离不远
翻译准确率的量化?N-gram可以粗糙地衡量
业内认为一年后,可以超过同声传译的专家 - Games
Alphago
Key Industries
Security
公安领域,交通领域
技术:人体分析,车辆分析,行为分析
Industry Trend
资本流向技术
博鳌亚洲论坛——Face++安保
Autonomous Drive
Google,Baidu,Mobileye,Tesla,Benz,BMW
最主要的问题:复杂路况,道德,法律条款
无人驾驶的车撞人的责任?
Industry Trend
Baidu:阿波罗计划
Google:200w mile路测数据
Mobileye:3000wkm路测数据
Tesla:16年与Mobileye停止合作
Healthcare
最数字化(早已经是计算机辅助的技术,血常规,CT..)
- 基于大数据(CT,核磁共振)的辅助诊断系统
- 医疗知识图谱
- 智能医疗顾问
- 基因工程
- 制药、免疫
Deep Learning
An end-to-end learning approach that uses a highly complex model(nonlinear,multi-layer) to fit the training data from scratch.
做Genomics不需要先学几年生物
LightGBM
速度快于XGBoost
Basic Machine Learning Concepts
- The goal:To learn a model from experiences/data
Training data
model - Test/inference/prediction
- Validation sets for hyperparameter tuning
- Training:empirical loss minimization
Loss Function L
1.Linear regression
2.SVM
3.Maximum likelihood
Biological Motivation and Connections
Dendrite 树突
Synapse 突触
Axon 轴突,输出信号
Perceptron
Feedforward Neural Networks
有界连续函数可以被深度神经网络完美逼近(要有隐层)Universal Approximation Theorem
Hidden Unites: Sigmoid and Tangenth
Sigmoid: f(x)=1/(1+e^(-x))
Rectified Linear Units
Loss Function
交叉商
Gradient Descent
GD肯定可以收敛,计算量很大
SGD(随机梯度下降法),过程快很多,是对整体的无偏估计
SGD也有问题:可能方差非常大,掩盖收敛过程的小的抖动,不能保证收敛性
定义一个Learning Rate,平方阶求和收敛
实际上使用的是折中的办法——Minibatch SGD
以上的都是基本方法
现在用了很多技巧和改进
比如Momentum SGD,Nesterov Momentum
AdaGrad
Adam
Regularization for deep learning
Overfitting
Generalization gap
DropOut:Prevents units from co-adapting too much
Batch Normalization:The distribution of each layer's inputs changes during training带参数的归一化
Weight decay(or L^2 parameter norm penalty)
Early Stopping
Convolutional neural networks
局部连接
模拟人的模式识别的过程
卷积核:SGD学出来
Pooling:Reduce dimension
An example:VGG
- Gradient Vanishing
深层神经网络,梯度求不出来
Sigmoid求导数小于等于0.5,深层求导相乘,会变得很小
解决:Residual Network(ResNet) - What's Missing?
Feedforward network and CNN
However, many applications involve sequences with variable lengths
Recurrent Neural Networks(RNN)
We can process a sequence of vectors x by applying a recurrence formula at every time step
记忆上一层的输入
- Many to One:输入序列,输出单一标量
- One to Many:输入单一向量,输出序列(例如:看图写话)
- Many to many:Language Modeling (联想下一个词).Encoder-Decoder for Sequence Generation.
同样的问题:网络过长
解决:Long Short Term Memory
Deep learning toolkits
- Tensorflow(Google)
- Caffe(UC Berkeley)
- CNTK(Microsoft)
- MAXNET(Amazon)
- Torch7(NYU/Facebook)
- Theano(U Mnotreal)
图像分类:Caffe Torch
文本:Theano
大规模:CNTK
丰富性:Tensorflow
Advanced topics in deep learning
Challenging of deep learning
- Relying on Big Training Data
- Relying on Big Computation
- Modify Coefficients
- Lack of interpretability
黑盒子?白盒子? - Lack of Diverse Tech Roadmaps
NIPS,ICML越来越多的论文是Deep Leaning - Overlooking Differences between Animal and Human
解决的是函数拟合问题,离真正的智能还很远
Dual learning
- A New View:The Beauty of Symmetry
Dual Learning from with 10% bilingual data (2016 NIPS)
Lightweight deep learning
Light RNN
Distributed deep learning
Convex Problems
Universal Approximation Theorem只是存在性命题