As we strive to create machines that can handle complex tasks, deep learning is increasingly coming into the spotlight. Because many of the tasks we want our machines to perform are intuitive for humans and animals, we are developing processes for machines that mirror the way humans learn. In this article, we will explore one such process, known as transfer learning.
What Is Transfer Learning?
Transfer learning is an approach used in machine learning where a model that was created and trained for one task, is reused as the starting point for a secondary task. Transfer learning differs from traditional machine learning because it involves using a pre-trained model as a springboard to start a secondary task.
This approach mimics the way humans apply knowledge learned for one task to a new task. For example, John Doe learned how to read in first grade. In tenth grade, John used his reading abilities in chemistry class. The knowledge he had gained from the primary task (learning how to read) became the foundation on which he started the secondary task (learning chemistry).
The Benefits of Using Transfer Learning and Deep Learning
Taking a model that has already been trained in a specific field, and reapplying it to another area, has many advantages. Some of the main advantages are listed below.
1. Less training data—starting to train a model from scratch is a lot of work and requires a lot of data. For example, if we want to create a new algorithm that can detect a frown, we need a lot of training data. Our model will first need to learn how to detect faces, and only then can it learn how to detect expressions, such as frowns. Instead, if we use a model that has already learned how to detect faces, and retrain this model to detect frowns, we can accomplish the same result using far less data.
2. Models generalize better—using transfer learning on a model prepares the model to perform well with data it was not trained on. This is known as generalizing. Models that were trained using transfer learning are better able to generalize from one task to another because they were trained to learn to identify features that can be applied to new contexts.
3. Makes deep learning more accessible—working with transfer learning makes it easier to use deep learning. It’s possible to obtain the desired results without being an expert in deep learning, by using a model that was created by a deep learning specialist and applying it to a new problem.
What Are the Types of Transfer Learning?
Domain adaptation
In this approach, a dataset on which the machine was trained is different from (but still related to) the target data set. A good example of this would be a spam email filtering model. Let’s say this model was trained to identify spam email for user A. When the model is then used for user B, domain adaptation will be used, because even though the task is the same (filtering emails), user B receives different types of emails from user A.
Multitask learning
This method involves two or more tasks being resolved simultaneously so that similarities and differences can be leveraged. It is based on the idea that a model that has been trained on a related task can gain skills that improve its ability in the new task.
Going back to our spam email filtering model, let’s say this model is learning what features it should look for when identifying spam mail for user A and user B. Because the users are very different, the model needs to look for different features in order to identify each users’ spam mail. For example, user A is an Italian speaker so an Italian language email should not be a red flag. However, user B is a Chinese speaker, so an email in Italian might be considered a spam feature. While simultaneously learning to identify spam features for user A and B, the model learns that regardless of the language, emails requesting credit card details are more likely to be spam.
Zero-shot learning
This technique involves a model trying to solve a task to which it was not exposed during training. For example, let’s say we are training a model to identify animals in pictures. To identify the animals, the machine is taught to identify two parameters: the color yellow and spots. The model is then trained on multiple pictures of chicks, which it learns to identify because they are yellow but do not have spots, and dalmations, which it knows has spots but are not yellow.
To expand on this example, you may not have pictures of giraffes on which to train the model, but the model knows that giraffes are yellow and have spots. When the model encounters an image of a giraffe, it will be able to identify it, even though it had never seen a giraffe in training.
One-shot learning
This approach requires that a model learns how to categorize an object, after being exposed to it either once or just a few times. To do this, the model leverages information it has about known categories. For example, our animal classifying model knows how to identify a horse. The model is then exposed to a single photo of a zebra, which looks exactly like a horse but has white and back stripes. The model will then be able to classify zebras, without being exposed to additional pictures, because it transferred knowledge it already had about horses.
Applications of Deep Transfer Learning
Transfer learning has been used extensively in deep learning. Some of the main applications of deep transfer learning are transfer learning natural language processing (NLP), computer vision and speech recognition. Transfer learning NLP is used to facilitate document identification and other tasks related to textual data. This is particularly challenging, because unlike image data, NLP data is very diverse and unstructured.
Deep learning has been used in tasks relating to computer vision, such as image classification. Much like in the human brain, in this approach the first layers of the neural networks detect edges and shapes and the last layers work on the details.
Speech recognition is another application of deep transfer learning. Models that were developed for English speech recognition have been used to improve models developed for German speech recognition.