Cover of Deep Learning

Deep Learning

by Ian Goodfellow

25 popular highlights from this book

Key Insights & Memorable Quotes

Below are the most popular and impactful highlights and quotes from Deep Learning:

A representation learning algorithm can discover a good set of features for a simple task in minutes, or for a complex task in hours to months.
Hochreiter and Schmidhuber (1997) introduced the long short-term memory (LSTM) network to resolve some of these difficulties. Today, the LSTM is widely used for many sequence modeling tasks, including many natural language processing tasks at Google.
∑x∈xP(x) = 1. We refer to this property as being normalized. Without this property, we could obtain probabilities greater than one by computing the probability of one of many events occurring.
Probability mass functions can act on many variables at the same time. Such a probability distribution over many variables is known as a joint probability distribution. P(x = x, y = y) denotes the probability that x = x and y = y simultaneously. We may also write P(x, y) for brevity.
A probability distribution over discrete variables may be described using a probability mass function (PMF).
Logic provides a set of formal rules for determining what propositions are implied to be true or false given the assumption that some other set of propositions is true or false. Probability theory provides a set of formal rules for determining the likelihood of a proposition being true given the likelihood of other propositions.
The former kind of probability, related directly to the rates at which events occur, is known as frequentist probability, while the latter, related to qualitative levels of certainty, is known as Bayesian probability.
Software libraries such as Theano (Bergstra et al., 2010; Bastien et al., 2012), PyLearn2 (Goodfellow et al., 2013c), Torch (Collobert et al., 2011b), DistBelief (Dean et al., 2012), Caffe (Jia, 2013), MXNet (Chen et al., 2015), and TensorFlow (Abadi et al., 2015) have all supported important research projects or commercial products.
Another crowning achievement of deep learning is its extension to the domain of reinforcement learning. In the context of reinforcement learning, an autonomous agent must learn to perform a task by trial and error, without any guidance from the human operator. DeepMind demonstrated that a reinforcement learning system based on deep learning is capable of learning to play Atari video games, reaching human-level performance on many tasks
Even today’s networks, which we consider quite large from a computational systems point of view, are smaller than the nervous system of even relatively primitive vertebrate animals like frogs.
Working successfully with datasets smaller than this is an important research area, focusing in particular on how we can take advantage of large quantities of unlabeled examples, with unsupervised or semi-supervised learning.
As of 2016, a rough rule of thumb is that a supervised deep learning algorithm will generally achieve acceptable performance with around 5,000 labeled examples per category and will match or exceed human performance when trained with a dataset containing at least 10 million labeled examples
The third wave of neural networks research began with a breakthrough in 2006. Geoffrey Hinton showed that a kind of neural network called a deep belief network could be efficiently trained using a strategy called greedy layer-wise pretraining (Hinton et al., 2006),
At this point, deep networks were generally believed to be very difficult to train. We now know that algorithms that have existed since the 1980s work quite well, but this was not apparent circa 2006. The issue is perhaps simply that these algorithms were too computationally costly to allow much experimentation with the hardware available at the time.
Another major accomplishment of the connectionist movement was the successful use of back-propagation to train deep neural networks with internal representations and the popularization of the back-propagation algorithm (Rumelhart et al., 1986a; LeCun, 1987). This algorithm has waxed and waned in popularity but, as of this writing, is the dominant approach to training deep models.
Several key concepts arose during the connectionism movement of the 1980s that remain central to today’s deep learning. One of these concepts is that of distributed representation (Hinton et al., 1986). This is the idea that each input to a system should be represented by many features, and each feature should be involved in the representation of many possible inputs.
The central idea in connectionism is that a large number of simple computational units can achieve intelligent behavior when networked together.
Cognitive science is an interdisciplinary approach to understanding the mind, combining multiple different levels of analysis.
The field of deep learning is primarily concerned with how to build computer systems that are able to successfully solve tasks requiring intelligence, while the field of computational neuroscience is primarily concerned with building more accurate models of how the brain actually works.
It is worth noting that the effort to understand how the brain works on an algorithmic level is alive and well. This endeavor is primarily known as “computational neuroscience” and is a separate field of study from deep learning. It is common for researchers to move back and forth between both fields.
we do not yet know enough about biological learning for neuroscience to offer much guidance for the learning algorithms we use to train these architectures.
While neuroscience is an important source of inspiration, it need not be taken as a rigid guide.
The training algorithm used to adapt the weights of the ADALINE was a special case of an algorithm called stochastic gradient descent. Slightly modified versions of the stochastic gradient descent algorithm remain the dominant training algorithms for deep learning models today.
one of the names that deep learning has gone by is artificial neural networks (ANNs).
We assume familiarity with programming, a basic understanding of computational performance issues, complexity theory, introductory level calculus and some of the terminology of graph theory.

Search More Books

More Books You Might Like

Note: As an Amazon Associate, we earn from qualifying purchases