- Linear regression using scikit-learn
- Overview of linear regression
- Data preparation
- Split the dataset
- create independently sampled training dataset and test data set
- .train_test_split( ) to perform Bernoulli sampling
- Scale numeric features
- Min-Max normalization
- Z-Score normalization
- .StandardScaler( ) then .transform( )
- Split the dataset
- Train the regression model
- Instantiate and fit the model
- .LinearRegression( ) & .fit( )
- print the model coefficients
- Plot the predicted values computed from the training features
- The predict method is applied to the model with the training data
- Instantiate and fit the model
- Evaluate model performance
- Using the test dataset
- Choose & compute performance metric
- Mean squared error or MSE
- Root mean squred error or RMSE
- Mean absolute error or MAE
- Median absolute error
- R squared or 𝑅2 , also known as the coefficient of determination
- Adjusted R squared
- Residuals (errors) of the model should be Normally distributed
- plot a kernel density plot and histogram of the residuals of the regression model
- Quantile-Quantile Normal plot, or Q-Q Normal plot
- If the residuals have a distribution which is approximately Normal, points will nearly fall along the straight line
- Residual plot
Model complexity and Generalisation Performance (Regression):
- Plot a scatterplot of data points in the training and test sets
- Write a function that fits a polynomial LinearRegression model on training data for varying degrees
- Finding degree levels corresponding generalization performance on the dataset (underfitting, overfitting, good) based on R squared scores
These are my notes taken from Microsoft Learning’s Principles of Machine Learning in Python - Module 4.
What I learnt:
Introduction to Regression
Method: Train the model to minimize the difference between our predicted y^ and the known label values y.
Goal of regression: To produce a model that represents the ‘best fit’ to some observed data. Find a function of some features X which predicts the label value y.
I have reviewed and went through Introduction to Regression in this link.