Deep learning is a new area of Machine Learning research, which has been introduced with the objective of moving Machine Learning closer to one of its original goals: Artificial Intelligence. Deep Learning is about learning multiple levels of representation and abstraction that help to make sense of data such images, sound, and text.Conventional machine-learning techniques were limited in their ability to process natural data in their raw form. For decades, constructing a pattern recognition or machine learning system required careful engineering and considerable domain expertise to design a feature extractor that transformed the raw data (such as the pixel values of an image) into a suitable internal representation or feature vector from which the learning subsystem, often a classifier, could detect or classify patterns in the input.
Deep learning methods are representation learning methods with multiple levels of representation, obtained by composing simple but non-linear modules that each transform the representation at one level into a representation at a higher, slightly more abstract level. With the composition of enough such transformations, very complex functions can be learned. For classification tasks, higher layers of representation amplify aspects of the input that are important for discrimination and suppress irrelevant variations. An image, for example, comes in the form of an array of pixel values, and the learned features in the first layer of representations and locations in the image. The second layer typically detects motifs by spotting particular arrangements of edges, regardless of small variations in the edge positions. The third layer may assemble motifs into larger combinations that correspond to parts of familiar objects, and subsequent layers would detect objects as combinations of these parts. The key aspect of deep learning is that these layers of features are not designed by human engineers: they are learned from data using a general-purpose learning procedure.
A deep-learning architecture is a multilayer stack of simple modules, all of which are subject to learning, and many of which compute non-linear input-output mappings. Each module in the stack transforms its input to increase both the selectivity and the invariance of the representation. With multiple extremely intricate functions of its inputs that are simultaneously sentitive to minute details--distinguishing Samoyeds from white wolves--and insensitive to large irrelevant variation such as the backgroud, pose, lighting and surrounding object
Backpropagation to train multilayer architectures
From the earliest days of pattern recognition, the aim of researchers has been to replace hand-engineered features with trainable multilayer networks, but despite its simplicity, the solution was not widely understood until the mid 1980s. As it turns out, multilayer architectures can be trained by simple stochastic gradient desent(SGD). As long as the modules are relatively smooth functions of their inputs and of weights, one can compute gradients using the backpropagation procedure. The idea that this could be done, and that it worked, was discoverd independently by several different groups during the 1970s and 1980s.
The backpropagation procedure to compute the gradient of an objective function with respect to the weights of a multilayer stack of modules is nothing more than a practical application of the chain rule for derivatives. The key insight is that the derivative of the objective with respect to the input of a module can be computed by working backwards from the gradient with respect to the output of that module.
units and one output unit, but the networks used for object recognition or natural language processing contain tens or hundreds of thousands of units.
Many applications of deep learning use feedforward neural network architectures, which learn to mao a fixed-size input to a fixed-size output. To go from one layer to the next, a set of units compute a weighted sum of their input from the previous layer and pass the result through a non-linear function. At present, the most popular non-linear function is the rectified linear unit(ReLU), which is simply the half-wave rectifier f(z)=max(z,0), in the past decades, neural net used smoother non-linearities, such as in the networks with many layers, allowing training of a deep supervised network without unsupervised pre-training. Units that are not in the input or output layer are conventionally called hidden units. The hidden layers can seen as distorting the input in a non-linear way so that categories become linearly separable by the last layer.
Convolutional neural networks
ConvNets are designed to process ConvNets are designed to process data that come in the form ofmultiple arrays, for example a colour image composed of three 2Darrays containing pixel intensities in the three colour channels. Manydata modalities are in the form of multiple arrays: 1D for signals andsequences, including language; 2D for images or audio spectrograms;and 3D for video or volumetric images. There are four key ideas behind ConvNets that take advantage of the properties of naturalsignals: local connections, shared weights, pooling and the use of many layers.