Week 12 - Machine Learning - Semi-supervised Learning, Reinforcement Learning from Fundamentals of A.I. & M.L. by LinkedIn Learning

Liaw Bei Le · June 12, 2021

Working with data in M.L.
Semisupervised Learning
Reinforcement Learning
- Q learning

This is my notes from Mastering the Fundamentals of A.I. & Machine Learning by LinkedIn Learning.

What I learnt:

Working with data in machine learning

We have to think about the way the machine learns. That way, we can start to collect the data.

Think about our data.
Is it of high quality?
Do we have enough data to learn something new?

Applying machine learning

Traditional programming with straightforward calculations
Much of computer science is still about working with explicit instructions.

In traditional programming, you configure the machine to accept your input and produce an output based on the algorithm.

The input is the command
The output is a predetermined response.

It gets a little trickier when humans can’t explicitly instruct the computer on what to do.

Machine learning
In these cases, we need a programming model that allows the machine to learn. We also have to give the machine some ability to respond to feedback.

Example: Email spam program
Imagine you’re creating a program that needs to detect spam messages.

Instead of inputting instructions, you’ll input data.
Instead of a predefined response, you’ll be working with machine learning algorithms to help the machine learn how to respond.

To start, you’d want to split your data into test data and training data.

Training data is a smaller chunk that you’ll use to find patterns.
Sometimes a model will help your machine make sense of the data with statistical algorithms. These algorithms help the machine make accurate predictions or see patterns between different parts of your data.

How machine learning might work with our spam program:

Let’s take 10,000 email messages as our training dataset.
We’ll use it to build and refine our model before we try it on our test data of over a million messages.
You can use your test data to show the machine different examples of spam.
Then you could use a classifier machine learning algorithm to help split the email in the two groups. You could have your spam, then a regular message.
This is often called binary classification.
- The machine finds groups of words that are more likely to be found in spam messages. Then, it comes up with a score to show the likelihood that it’s spam.
- You’ll need to decide the best classifier algorithm to use.
- Then you’ll tweak the hyper-parameters of the algorithm until the machine does a pretty good job predicting whether or not an email message is spam.
Once you’re satisfied, that’ll be your initial data model.
You’ll use a machine learning algorithm and tune the necessary hyper-parameters to make an accurate prediction.

Even though the programmer inputs the data, selects the algorithms, and makes corrections, it’s ultimately the machine that makes the decision about whether a message is spam.

In some cases the programmer might not even know how the machine learned it was spam.

Semi-supervised learning

Semi-supervised learning takes advantage of both supervised and unsupervised learning.

Example: A program to translate voicemail into text

Imagine you wanted to create a program that could translate voicemail into text.

Challenges of supervised and unsupervised learning
It would be difficult to use unsupervised learning. Here, your machine would cluster words together but wouldn’t necessarily know what those words mean.

Maybe your machine could identify each time someone said the word photo. The problem is that, since it’s clustered and unclassified, your program doesn’t know that this sound for photo is the same thing as the word for photo.

On the flip side, you wouldn’t want to only use supervised learning. It would be far too labor-intensive to label all of the data for words and their sound. Plus, you could never create a training set that was large enough to translate all your possible words and phrases.

Semi-supervised learning: Inductive Learning:
So, instead, you might try semi-supervised learning.

Here, you’ll start with a smaller training set of very common words and phrases.
The machine could use this to perform some basic classification.
This would be like the training set in supervised learning.
But then you would feed that training set into the machine and allow it to study and observe the rest of the data.
This is similar to unsupervised learning. That would allow the machine to expand its vocabulary based on what it already classified.
With the labeled data, the program could use a form of inductive reasoning to try and expand its own vocabulary. It might use the training data that connects the word photo to the sound photo and come up with new translations.
Maybe it can induce the word photograph, photographer, photocopy, or even photon torpedo.
It will have the training data for the word photo and the sound photo, and so the machine would just use inductive reasoning when making the translations.
Each time the machine induces a new connection, it does its best to try and improve the model. So if the translation is correct, it will add all of these words.
You usually get this feedback from human observers. That’s why when you get a translation, you’ll often see a link that says, how is this translation?

Semi-supervised learning: Transductive Reasoning
Transductive reasoning allows us to narrow down the unlabeled data by thinking about what we already know about the collection.

So if you were using transductive reasoning to transcribe the voicemail, you’d have to think about the data in a larger context.

Think about some of the things that people say in a voicemail message. People probably put many more nouns in their voicemail messages. They’re giving you information about people, places, or things. They’re more likely to talk about times and dates. This is much different from a random sampling of audio that you might get from listening to people in a restaurant.

So transductive reasoning tries to improve the model by making better guesses about what’s in the unlabeled data.

When to use semi-supervised learning
Semi-supervised learning is not that common, but it works well in certain areas.
It often works better when it’s not difficult for the machine to create useful groups. But at the same time, the data is so large that it’s not practical to use supervised learning.

Semi-supervised learning is used for classifying webpages, or grouping together pictures of something like insects etc.

Semi-supervised learning really only makes sense when the other two approaches have difficulty solving our challenge. Inductive and transductive reasoning might lead to greater errors and mislabeled data.

Reinforcement learning

With supervised, unsupervised, and semi-supervised learning, we’re trying to invent the best model. You want to find a model that can most accurately classify different data sets or find meaningful clusters. Once you have the model, you can then let the machine work with the rest of the data.

Reinforcement learning is different.

Reinforcement learning
Reinforcement learning has the machine iterate to continuously improve the outcome. Over time, the machine should zero in and get closer and closer to high quality output.

Reinforcement learning is very open ended. We’re reinforcing certain ways that you want the machine to behave. Instead of leaving everything open to study and observation, we’re giving the machine a very clear goal.

Reinforcement learning and specifically Q-learning allows machines to quickly grow beyond our understanding.

It can help you skip the steps required in unsupervised learning. You don’t need hours of observing and studying massive amounts of data.

Example: Pong
The game Pong had two separate bars on the opposite sides of the screen. They would slide up and down as you tried to hit the ball to each side.

In 2013, Google’s Deep Mind project experimented with Pong to see if they could teach a computer how to play. They set up a series of rewards for the computer. Every time the computer hit the ball against the paddle, it got a reward. Then every time that the opponent missed the ball, it got another reward. Then it played against itself and tried to gather as many rewards as possible. It only took a short while for the computer to start to master the game and consistently beat human players.

Q learning
The Deep Mind team used something called Q-learning. This Q-learning helped with some of the more complicated games that needed more sophisticated rewards.

In Q-learning, there are set environments or states. There are also possible actions that can respond to these states.
In Q-learning, you want the machine to improve the quality of the outcome. This is represented with the letter Q.

In this case, we would start out with a quality of zero. Then we would have the machine learn which actions improve the conditions. Each time an action improved the state, the Q would go up from zero. The Q would go up based on the states and actions.

This type of reinforcement learning allows the machine to go through endless simulations of actions and states until it finds the best strategy.

Example: Go
In 2015, Google’s Deep Mind project made the news when their AlphaGo program first beat an expert player in the game called Go. AlphaGo used a form of unsupervised learning. It learned primarily by observing professionals playing the game.

Their newer program AlphaGo Zero relies primarily on Q-learning. It didn’t have to watch the experts play the game. AlphaGo Zero simply went through the game and tried different actions as a way to change the state and win. Within hours, this Q-learning machine played Go at a level that humans can’t understand. In fact, after just 70 hours of training, AlphaGo Zero beat the earlier version of AlphaGo in the first 100 tries.

Share: Twitter, Facebook