- Linear regression using scikit-learn
- Use categorical data
- Apply transformations to features and labels to improve model performance
- Compare regression models to improve model performance
- Prepare the model matrix
- Create dummy variables from categorical features
- Concatenate the numeric features
- Data preparation
- Split the dataset
- create independently sampled training dataset and test data set
- .train_test_split( ) to perform Bernoulli sampling
- Scale numeric features
- Z-Score normalization
- .StandardScaler( ) then .transform( )
- Z-Score normalization
- Split the dataset
- Construct the regression model
- Instantiate and fit the model
- .LinearRegression( ) & .fit( )
- print the intercept term and model coefficients
- Instantiate and fit the model
- Evaluate model performance
- Plot the predicted values computed from the training features - The predict method is applied to the model with the training data
- Using the test dataset
- Choose & compute performance metric
- Mean squared error or MSE
- Root mean squred error or RMSE
- Mean absolute error or MAE
- Median absolute error
- R squared or π 2 , also known as the coefficient of determination
- Adjusted R squared
- Residuals (errors) of the model should be Normally distributed
- plot a kernel density plot and histogram of the residuals of the regression model
- Quantile-Quantile Normal plot, or Q-Q Normal plot
- If the residuals have a distribution which is approximately Normal, points will nearly fall along the straight line
- Residual plot
These are my notes taken from Microsoft Learningβs Principles of Machine Learning in Python - Module 4.
What I learnt:
Applying Linear Regression
Goal: To construct a linear regression model to predict the price of automobiles from their characteristics
I have reviewed and went through Applying Linear Regression for automobile prices in this link.