Week 13 - Machine Learning - Supervised Learning from M.L. course by Stanford Uni (Week 1 Part II)

Liaw Bei Le · June 14, 2021

Supervised Learning:

  • Regression Problem
  • Classification Problem

I am learning about Machine Learning via the course offered by Coursera named: Machine Learning by Stanford University. This post covers Week 1 materials.

What I learnt:

Supervised learning

  • Most common problem type in machine learning

Regression Problem

E.g. 1: Predict housing prices
Collect data regarding housing prices and how they relate to size in feet

This is a regression problem:

  • Predict continuous valued output (price)
  • No real discrete delineation

“Given this data, a friend has a house 750 square feet - how much can they be expected to get?”

Housing prediction w straight line

What approaches can we use to solve this?

  • Straight line through data: Maybe 150 000 dollars.

  • Second order polynomial: Maybe 200 000 dollars

Each of these approaches represent a way of doing supervised learning. One thing we discuss later - how to chose straight or curved line?

Housing prediction with supervised learning

What does this mean?

  • We gave the algorithm a data set where a “right answer” was provided
  • So we know actual prices for houses
    • The idea is we can learn what makes the price a certain value from the training data
    • The algorithm should then produce more right answers based on new training data where we don’t know the price already
      • i.e. predict the price

Classification Problem

E.g. 2: Define breast cancer as malignant or benign based on tumour size

“Can you estimate prognosis based on tumor size?”

This is a classification problem:

  • Classify data into one of two discrete classes - no in between, either malignant or not
  • In classification problems, can have a discrete number of possible values for the output
    • e.g. maybe have four values
      • 0 - benign
      • 1 - type 1
      • 2 - type 2
      • 3 - type 4

Use only one attribute (size) In other problems may have multiple attributes We may also, for example, know age and tumor size

Based on that data, you can try and define separate classes by

  • Drawing a straight line between the two groups
  • Using a more complex function to define the two groups (which we’ll discuss later)

Tumour classification graph

Then, when you have an individual with a specific tumor size and who is a specific age, you can hopefully use that information to place them into one of your classes.

Tumour classification graph - age and tumour size

You might have many features to consider:

  • Clump thickness
  • Uniformity of cell size
  • Uniformity of cell shape

The most exciting algorithms can deal with an infinite number of features.

  • How do you deal with an infinite number of features?
  • Neat mathematical trick in support vector machine (which we discuss later)
    • If you have an infinitely long list - we can develop an algorithm to deal with that

Recap

  • Supervised learning lets you get the “right” data
  • Regression problem
  • Classification problem

Twitter, Facebook