Week 15 - Machine Learning - (Supervised Learning) Introduction to Regression from Principles of M.L. Python by Microsoft Learning

Liaw Bei Le · June 28, 2021

Linear regression using scikit-learn
Overview of linear regression
Data preparation
- Split the dataset
  - create independently sampled training dataset and test data set
  - .train_test_split( ) to perform Bernoulli sampling
- Scale numeric features
  - Min-Max normalization
  - Z-Score normalization
    - .StandardScaler( ) then .transform( )
Train the regression model
- Instantiate and fit the model
  - .LinearRegression( ) & .fit( )
  - print the model coefficients
- Plot the predicted values computed from the training features
  - The predict method is applied to the model with the training data
Evaluate model performance
- Using the test dataset
- Choose & compute performance metric
  - Mean squared error or MSE
  - Root mean squred error or RMSE
  - Mean absolute error or MAE
  - Median absolute error
  - R squared or 𝑅2 , also known as the coefficient of determination
  - Adjusted R squared
- Residuals (errors) of the model should be Normally distributed
  - plot a kernel density plot and histogram of the residuals of the regression model
  - Quantile-Quantile Normal plot, or Q-Q Normal plot
    - If the residuals have a distribution which is approximately Normal, points will nearly fall along the straight line
  - Residual plot

Model complexity and Generalisation Performance (Regression):

Plot a scatterplot of data points in the training and test sets
Write a function that fits a polynomial LinearRegression model on training data for varying degrees
Finding degree levels corresponding generalization performance on the dataset (underfitting, overfitting, good) based on R squared scores

These are my notes taken from Microsoft Learning’s Principles of Machine Learning in Python - Module 4.

What I learnt:

Introduction to Regression

Method: Train the model to minimize the difference between our predicted y^ and the known label values y.

Goal of regression: To produce a model that represents the ‘best fit’ to some observed data. Find a function of some features X which predicts the label value y.

I have reviewed and went through Introduction to Regression in this link.

Share: Twitter, Facebook