Sunday, February 12, 2023

Hands-on 5 Regression Algorithms Using Scikit-Learn

Download Code and Data
What is Regression?

When the targets are real numbers and we are trying the establish a relationship between a target and a predictor, the problem is called a “regression problem”.

Example 1: Salary vs Years of Experience

Example 2: Weight vs Height
Regression: Predicting Bengaluru Housing Prices 1. Linear Regression (Ordinary Least Squares algorithm) 2. Polynomial Regression 3. Linear Regression using Stochastic Gradient Descent 4. Regression using Support Vector Machines 5. Regression using Decision Trees Linear Regression (Ordinary Least Squares algorithm) 1: In Linear Regression, you try to fit a line to the data.
Basic Idea Behind Ordinary Least Squares Algorithm: How much predictions are deviating from the actual data? Mapping errors on the graph:
>>> import numpy as np >>> from sklearn.linear_model import LinearRegression >>> X = np.array([[1, 1], [1, 2], [2, 2], [2, 3]]) >>> # y = 1 * x_0 + 2 * x_1 + 3 >>> y = np.dot(X, np.array([1, 2])) + 3 >>> reg = LinearRegression().fit(X, y) >>> reg.score(X, y) 1.0 >>> reg.coef_ array([1., 2.]) >>> reg.intercept_ 3.0... >>> reg.predict(np.array([[3, 5]])) array([16.]) Ref: scikit-learn.org Which attributes to transform during EDA? 1. Check if you model requires numerical features and if you can make the attributes numerical. For ex, for the problem of predicting housing prices, we can convert BHK column to floating point numbers: 2 BHK -> 2 2 BHK + Study -> 2.5 3 BHK -> 3 3 BHK + Servent -> 3.5 2. What if the ‘bhk’ attribute is not given? >>> pandas_df.dropna(subset = [‘bhk’]) If we have engineered all the features, can we drop null records from all the features? >>> pandas_df.dropna(inplace = True) 2. Polynomial Regression What if your data is actually more complex than a simple straight line?
Generating Polynomial Features Generate a new feature matrix consisting of all polynomial combinations of the features with degree less than or equal to the specified degree. For example, if an input sample is two dimensional and of the form [a, b], the degree-2 polynomial features are [1, a, b, a^2, ab, b^2]. include_bias: bool, default=True ::: If True (default), then include a bias column, the feature in which all polynomial powers are zero (i.e. a column of ones - acts as an intercept term in a linear model). >>> import numpy as np >>> from sklearn.preprocessing import PolynomialFeatures >>> X = np.arange(6).reshape(3, 2) >>> X array([[0, 1], [2, 3], [4, 5]]) >>> poly = PolynomialFeatures(2) >>> poly.fit_transform(X) array([[ 1., 0., 1., 0., 0., 1.], [ 1., 2., 3., 4., 6., 9.], [ 1., 4., 5., 16., 20., 25.]]) Building the Polynomial Regression Model >>> from sklearn.preprocessing import PolynomialFeatures >>> poly_features = PolynomialFeatures(degree=2, include_bias=False) >>> X_poly = poly_features.fit_transform(X) >>> X[0] array([-0.75275929]) >>> X_poly[0] Array([-0.75275929, 0.56664654]) X_poly now contains the original feature of X plus the square of this feature. Now you can fit a LinearRegression model to this extended training data: >>> lin_reg = LinearRegression() >>> lin_reg.fit(X_poly, y) >>> lin_reg.intercept_, lin_reg.coef_ (array([ 1.78134581]), array([[ 0.93366893, 0.56456263]]))

3. Linear Regression using Stochastic Gradient Descent

What’s Gradient Descent? Gradient Descent is a very generic optimization algorithm capable of finding optimal solutions to a wide range of problems. The general idea of Gradient Descent is to tweak parameters iteratively in order to minimize a cost function. Suppose you are lost in the mountains in a dense fog; you can only feel the slope of the ground below your feet. A good strategy to get to the bottom of the valley quickly is to go downhill in the direction of the steepest slope. This is exactly what Gradient Descent does: it measures the local gradient of the error function with regards to the parameter vector θ, and it goes in the direction of descending gradient. Once the gradient is zero, you have reached a minimum! Concretely, you start by filling θ with random values (this is called random initialization), and then you improve it gradually, taking one baby step at a time, each step attempting to decrease the cost function (e.g., the MSE), until the algorithm converges to a minimum (see the figure below).
Solving the problem of Linear Regression (Using SGD)
Here are the high level steps that we take in implementing a simple and naive Linear Regression model using SGD: 1. Random Initialization: Initialize the model with a line along the x-axis. 2. Calculate the error function for this line. 3. By doing minor changes (d(slope) and d(intercept)) in slope and intercept, adjust the linear model to reduce the error function. 4. Repeat steps (2) and (3) until convergence. Code from sklearn.linear_model import SGDRegressor sgd_reg = SGDRegressor(n_iter=50, penalty=None, eta0=0.1) sgd_reg.fit(X, y.ravel()) >>> sgd_reg.intercept_, sgd_reg.coef_ (array([ 4.18380366]), array([ 2.74205299]))

4. Regression using Support Vector Machines

We start with explaining what SVM is and then move on to using it for regression: The fundamental idea behind SVMs is best explained with some pictures. Figures below shows part of the iris dataset. The two classes can clearly be separated easily with a straight line (they are linearly separable). The left plot shows the decision boundaries of three possible linear classifiers. The model whose decision boundary is represented by the dashed line is so bad that it does not even separate the classes properly. The other two models work perfectly on this training set, but their decision boundaries come so close to the instances that these models will probably not perform as well on new instances. In contrast, the solid line in the plot on the right represents the decision boundary of an SVM classifier; this line not only separates the two classes but also stays as far away from the closest training instances as possible. You can think of an SVM classifier as fitting the widest possible street (represented by the parallel dashed lines) between the classes. This is called large margin classification. And the circled points are your ‘support vectors’.
SVM Regression As we mentioned earlier, the SVM algorithm is quite versatile: not only does it support linear and nonlinear classification, but it also supports linear and nonlinear regression. The trick is to reverse the objective: Instead of trying to fit the largest possible street between two classes while limiting margin violations, SVM Regression tries to fit as many instances as possible on the street while limiting margin violations (i.e., instances off the street). The width of the street is controlled by a hyperparameter ϵ. Figure below shows two linear SVM Regression models trained on some random linear data, one with a large margin (ϵ = 1.5) and the other with a small margin (ϵ = 0.5).

5. Regression using Decision Trees

First we would explain what Decision Trees are and how they work. Binary decision trees operate by subjecting attributes to a series of binary (yes / no) decisions. Each decision leads to one of two possibilities. Each decision leads to another decision or it leads to prediction. How a Binary Decision Tree Generates Predictions? When an observation or row is passed to a nonterminal node, the row answers the node’s question. If it answers yes, the row of attributes is passed to the leaf node below and to the left of the current node. If the row answers no, the row of attributes is passed to the leaf node below and to the right of the current node. The process continues recursively until the row arrives at a terminal (that is, leaf) node where a prediction value is assigned to the row. The value assigned by the leaf node is the mean of the outcomes of the all the training observations that wound up in the leaf node. Below is the Decision Tree for Iris Dataset.
Simple Psuedo Code for ‘Regression Using Decision Tree’ Only For The Purpose of Demonstration. Step 1: Find avarage value for interval of x and y. Let us call these values XA abd YA. Step 2: Split the curve into two by drawing a vertical line. Step 3: For x < XA, choose the average values of (x, y) from left side, drawing a horizontal line passing from this point on the left side. Step 4: For x > XA, choose the average values of (x, y) from right side, drawing a horizontal line passing from this point on the left side. Repeat steps (1) to (4) for (n-1) times where n is the depth you want in your decision tree. Moving on to Regression. Below is our sample data:
Block diagram of depth 1 tree for simple problem
Comparison of predictions and actual values versus attribute for simple example Notice how the predicted value for each region is always the average target value of the instances in that region. The algorithm splits each region in a way that makes most training instances as close as possible to that predicted value.
DecisionTreeRegressor using sklearn from sklearn.tree import DecisionTreeRegressor tree_reg = DecisionTreeRegressor(max_depth=2) tree_reg.fit(X, y)

References

1. Linear Regression (Ordinary Least Squares algorithm) 1.1. linear-regression-theory 1.2. penalized linear regression 2, 3, 4: Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems Book by Aurelien Geron 5: Machine Learning in Python (Essential Techniques For Predictive Analysis) By: Michael Bowles
Tags: Machine Learning,Technology,

No comments:

Post a Comment