Wednesday, May 22, 2024

Four Practice Problems on Linear Regression (Taken From Interviews For Data Scientist Role)

To See All Interview Preparation Articles: Index For Interviews Preparation
To watch our related video on: YouTube

Previous Videos

Question (1): Asked At Ericsson

  • You are given the data generated for the following equation:
  • y = (x^9)*3
  • Can you apply linear regression to learn from this data?

Solution (1)

Equation of line: y = mx + c

Equation we are given is of the form y = (x^m)c

Taking log on both the sides:

log(y) = log((x^m)c)

Applying multiplication rule of logarithms:

log(y) = log(x^m) + log(c)

Applying power rule of logarithms:

log(y) = m.log(x) + log(c)

Y = log(y)

X = log(x)

C = log(c)

Y = mX + C

So answer is 'yes'.

Question (2): Infosys – Digital Solution Specialist

  • If you do linear regression in 3D, what do you get?

Solution (2)

When you perform linear regression on 3D data, you are essentially fitting a plane to a set of data points in three-dimensional space. The general form of the equation for a plane in three dimensions is:

z=ax+by+c

Here:

z is the dependent variable you are trying to predict.

x and y are the independent variables.

a and b are the coefficients that determine the orientation of the plane.

c is the intercept.

Solution (2)...

Suppose you have data points (1,2,3), (2,3,5), (3,4,7), and you fit a linear regression model to this data. The resulting plane might have an equation like z=0.8x+1.2y+0.5. This equation tells you how z changes as x and y change.

In summary, performing linear regression on 3D data gives you a plane in three-dimensional space that best fits your data points in the least squares sense. This plane can then be used to predict new z values given new x and y values.

Generalizing a bit further

  • If you do linear regression in N dimensions, you get a hypersurface in N-1 dimensions.

Question (3): Infosys – Digital Solution Specialist

  • How do you tell if there is linearity between two variables?

Solution (3)

Determining if there is linearity between two variables involves several steps, including visual inspection, statistical tests, and fitting a linear model to evaluate the relationship. Here are the main methods you can use:

1. Scatter Plot

Create a scatter plot of the two variables. This is the most straightforward way to visually inspect the relationship.

Linearity: If the points roughly form a straight line (either increasing or decreasing), there is likely a linear relationship.

Non-linearity: If the points form a curve, cluster in a non-linear pattern, or are randomly scattered without any apparent trend, there is likely no linear relationship.

2. Correlation Coefficient

Calculate the Pearson correlation coefficient, which measures the strength and direction of the linear relationship between two variables.

Pearson Correlation Coefficient (r): Ranges from -1 to 1.

r≈1 or r≈−1: Strong linear relationship (positive or negative).

r≈0: Weak or no linear relationship.

3. Fitting a Linear Model

Fit a simple linear regression model to the data.

Model Equation: y = β0 + β1.x + ϵ

y: Dependent variable. / x: Independent variable. / β0: Intercept. / β1​: Slope. / ϵ: Error term.

4. Residual Analysis

Examine the residuals (differences between observed and predicted values) from the fitted linear model.

Residual Plot: Plot residuals against the independent variable or the predicted values.

Linearity: Residuals are randomly scattered around zero.

Non-linearity: Residuals show a systematic pattern (e.g., curve, trend).

5. Statistical Tests

Perform statistical tests to evaluate the significance of the linear relationship.

t-test for Slope: Test if the slope (β1​) is significantly different from zero.

Null Hypothesis (H0): β1=0 (no linear relationship).

Alternative Hypothesis (H1): β1≠0 (linear relationship exists).

p-value: If the p-value is less than the chosen significance level (e.g., 0.05), reject H0​ and conclude that a significant linear relationship exists.

6. Coefficient of Determination (R²)

Calculate the R² value, which indicates the proportion of variance in the dependent variable explained by the independent variable.

R² Value: Ranges from 0 to 1.

Closer to 1: Indicates a strong linear relationship.

Closer to 0: Indicates a weak or no linear relationship.

Example:

Suppose you have two variables, x and y.

Scatter Plot: You plot x vs. y and observe a straight-line pattern.

Correlation Coefficient: You calculate the Pearson correlation coefficient and find r=0.85, indicating a strong positive linear relationship.

Fitting a Linear Model: You fit a linear regression model y=2+3x.

Residual Analysis: You plot the residuals and observe they are randomly scattered around zero, indicating no pattern.

Statistical Tests: The t-test for the slope gives a p-value of 0.001, indicating the slope is significantly different from zero.

R² Value: You calculate R^2=0.72, meaning 72% of the variance in y is explained by x.

Based on these steps, you would conclude there is a strong linear relationship between x and y.

Question (4): TCS and Infosys (DSS)

  • What is the difference between Lasso regression and Ridge regression?

Solution (4)

Lasso and Ridge regression are both techniques used to improve the performance of linear regression models, especially when dealing with multicollinearity or when the number of predictors is large compared to the number of observations. They achieve this by adding a regularization term to the loss function, which penalizes large coefficients. However, they differ in the type of penalty applied:

Ridge Regression:

  • Penalty Type: L2 norm (squared magnitude of coefficients)
  • Objective Function: Minimizes the sum of squared residuals plus the sum of squared coefficients multiplied by a penalty term λ\lambda
    Objective Function: min(i=1n(yiy^i)2+λj=1pβj2)\text{Objective Function: } \min \left( \sum_{i=1}^n (y_i - \hat{y}_i)^2 + \lambda \sum_{j=1}^p \beta_j^2 \right)
    Here, λ\lambda is the regularization parameter, yiy_i are the observed values, y^i\hat{y}_i are the predicted values, and βj\beta_j are the coefficients.
  • Effect on Coefficients: Shrinks coefficients towards zero but does not set any of them exactly to zero. As a result, all predictors are retained in the model.
  • Use Cases: Useful when you have many predictors that are all potentially relevant to the model, and you want to keep all of them but shrink their influence.

Lasso Regression:

  • Penalty Type: L1 norm (absolute magnitude of coefficients)
  • Objective Function: Minimizes the sum of squared residuals plus the sum of absolute values of coefficients multiplied by a penalty term λ\lambda
    Objective Function: min(i=1n(yiy^i)2+λj=1pβj)\text{Objective Function: } \min \left( \sum_{i=1}^n (y_i - \hat{y}_i)^2 + \lambda \sum_{j=1}^p |\beta_j| \right)
    Here, λ\lambda is the regularization parameter, yiy_i are the observed values, y^i\hat{y}_i are the predicted values, and βj\beta_j are the coefficients.
  • Effect on Coefficients: Can shrink some coefficients exactly to zero, effectively performing variable selection. This means that it can produce a sparse model where some predictors are excluded.
  • Use Cases: Useful when you have many predictors but you suspect that only a subset of them are actually important for the model. Lasso helps in feature selection by removing irrelevant predictors.

Key Differences:

  1. Type of Regularization:

    • Ridge: L2 regularization (squared magnitude of coefficients)
    • Lasso: L1 regularization (absolute magnitude of coefficients)
  2. Effect on Coefficients:

    • Ridge: Tends to shrink coefficients uniformly, but none are set exactly to zero.
    • Lasso: Can shrink some coefficients to exactly zero, leading to a sparse model.
  3. Use Cases:

    • Ridge: Better when you want to retain all predictors and control their magnitude.
    • Lasso: Better when you want to perform feature selection and eliminate some predictors.
  4. Computational Complexity:

    • Ridge: Generally simpler to compute because the penalty term is differentiable everywhere.
    • Lasso: Can be more computationally intensive because the penalty term is not differentiable at zero, requiring more sophisticated optimization techniques.

Elastic Net:

As a side note, there is also the Elastic Net method, which combines both L1 and L2 penalties. It is useful when you want the benefits of both Ridge and Lasso regression:

Objective Function: min(i=1n(yiy^i)2+λ1j=1pβj+λ2j=1pβj2)\text{Objective Function: } \min \left( \sum_{i=1}^n (y_i - \hat{y}_i)^2 + \lambda_1 \sum_{j=1}^p |\beta_j| + \lambda_2 \sum_{j=1}^p \beta_j^2 \right)

Here, λ1\lambda_1 and λ2\lambda_2 control the L1 and L2 penalties, respectively. This method can select variables like Lasso and shrink coefficients like Ridge.

In summary, Ridge regression is ideal when you want to shrink coefficients without eliminating any, while Lasso regression is useful for creating simpler, more interpretable models by removing some predictors entirely.

Tags: Interview Preparation,Machine Learning,Regression,

No comments:

Post a Comment