Supervised Learning:
- Regression Problem
- Classification Problem
I am learning about Machine Learning via the course offered by Coursera named: Machine Learning by Stanford University. This post covers Week 1 materials.
What I learnt:
Supervised learning
- Most common problem type in machine learning
Regression Problem
E.g. 1: Predict housing prices
Collect data regarding housing prices and how they relate to size in feet
This is a regression problem:
- Predict continuous valued output (price)
- No real discrete delineation
“Given this data, a friend has a house 750 square feet - how much can they be expected to get?”
What approaches can we use to solve this?
-
Straight line through data: Maybe 150 000 dollars.
-
Second order polynomial: Maybe 200 000 dollars
Each of these approaches represent a way of doing supervised learning. One thing we discuss later - how to chose straight or curved line?
What does this mean?
- We gave the algorithm a data set where a “right answer” was provided
- So we know actual prices for houses
- The idea is we can learn what makes the price a certain value from the training data
- The algorithm should then produce more right answers based on new training data where we don’t know the price already
- i.e. predict the price
Classification Problem
E.g. 2: Define breast cancer as malignant or benign based on tumour size
“Can you estimate prognosis based on tumor size?”
This is a classification problem:
- Classify data into one of two discrete classes - no in between, either malignant or not
- In classification problems, can have a discrete number of possible values for the output
- e.g. maybe have four values
- 0 - benign
- 1 - type 1
- 2 - type 2
- 3 - type 4
- e.g. maybe have four values
Use only one attribute (size) In other problems may have multiple attributes We may also, for example, know age and tumor size
Based on that data, you can try and define separate classes by
- Drawing a straight line between the two groups
- Using a more complex function to define the two groups (which we’ll discuss later)
Then, when you have an individual with a specific tumor size and who is a specific age, you can hopefully use that information to place them into one of your classes.
You might have many features to consider:
- Clump thickness
- Uniformity of cell size
- Uniformity of cell shape
The most exciting algorithms can deal with an infinite number of features.
- How do you deal with an infinite number of features?
- Neat mathematical trick in support vector machine (which we discuss later)
- If you have an infinitely long list - we can develop an algorithm to deal with that
Recap
- Supervised learning lets you get the “right” data
- Regression problem
- Classification problem