THE 1ST LAW - Make It Obvious
The Man Who Didn’t Look Right
THE PSYCHOLOGIST GARY Klein once told me a story about a woman who attended a family gathering. She had spent years working as a paramedic and, upon arriving at the event, took one look at her father- in-law and got very concerned. “I don’t like the way you look,” she said. Her father-in-law, who was feeling perfectly fine, jokingly replied, “Well, I don’t like your looks, either.” “No,” she insisted. “You need to go to the hospital now.” A few hours later, the man was undergoing lifesaving surgery after an examination had revealed that he had a blockage to a major artery and was at immediate risk of a heart attack. Without his daughter-in- law’s intuition, he could have died. What did the paramedic see? How did she predict his impending heart attack? When major arteries are obstructed, the body focuses on sending blood to critical organs and away from peripheral locations near the surface of the skin. The result is a change in the pattern of distribution of blood in the face. After many years of working with people with heart failure, the woman had unknowingly developed the ability to recognize this pattern on sight. She couldn’t explain what it was that she noticed in her father-in-law’s face, but she knew something was wrong. ~~~...we must begin the process of behavior change with awareness
The human brain is a prediction machine. It is continuously taking in your surroundings and analyzing the information it comes across. Whenever you experience something repeatedly—like a paramedic seeing the face of a heart attack patient or a military analyst seeing a missile on a radar screen—your brain begins noticing what is important, sorting through the details and highlighting the relevant cues, and cataloging that information for future use. With enough practice, you can pick up on the cues that predict certain outcomes without consciously thinking about it. Automatically, your brain encodes the lessons learned through experience. We can’t always explain what it is we are learning, but learning is happening all along the way, and your ability to notice the relevant cues in a given situation is the foundation for every habit you have. We underestimate how much our brains and bodies can do without thinking. You do not tell your hair to grow, your heart to pump, your lungs to breathe, or your stomach to digest. And yet your body handles all this and more on autopilot. You are much more than your conscious self. Consider hunger. How do you know when you’re hungry? You don’t necessarily have to see a cookie on the counter to realize that it is time to eat. Appetite and hunger are governed nonconsciously. Your body has a variety of feedback loops that gradually alert you when it is time to eat again and that track what is going on around you and within you. Cravings can arise thanks to hormones and chemicals circulating through your body. Suddenly, you’re hungry even though you’re not quite sure what tipped you off. This is one of the most surprising insights about our habits: you don’t need to be aware of the cue for a habit to begin. You can notice an opportunity and take action without dedicating conscious attention to it. This is what makes habits useful. It’s also what makes them dangerous. As habits form, your actions come under the direction of your automatic and nonconscious mind. You fall into old patterns before you realize what’s happening. Unless someone points it out, you may not notice that you cover your mouth with your hand whenever you laugh, that you apologize before asking a question, or that you have a habit of finishing other people’s sentences. And the more you repeat these patterns, the less likely you become to question what you’re doing and why you’re doing it. Over time, the cues that spark our habits become so common that they are essentially invisible: the treats on the kitchen counter, the remote control next to the couch, the phone in our pocket. Our responses to these cues are so deeply encoded that it may feel like the urge to act comes from nowhere. For this reason, we must begin the process of behavior change with awareness.THE HABITS SCORECARD
The Japanese railway system
The Japanese railway system is regarded as one of the best in the world. If you ever find yourself riding a train in Tokyo, you’ll notice that the conductors have a peculiar habit. As each operator runs the train, they proceed through a ritual of pointing at different objects and calling out commands. When the train approaches a signal, the operator will point at it and say, “Signal is green.” As the train pulls into and out of each station, the operator will point at the speedometer and call out the exact speed. When it’s time to leave, the operator will point at the timetable and state the time. Out on the platform, other employees are performing similar actions. Before each train departs, staff members will point along the edge of the platform and declare, “All clear!” Every detail is identified, pointed at, and named aloud.* This process, known as Pointing-and-Calling, is a safety system designed to reduce mistakes. It seems silly, but it works incredibly well. Pointing-and-Calling reduces errors by up to 85 percent and cuts accidents by 30 percent. The MTA subway system in New York City adopted a modified version that is “point-only,” and “within two years of implementation, incidents of incorrectly berthed subways fell 57 percent.” Pointing-and-Calling is so effective because it raises the level of awareness from a nonconscious habit to a more conscious level. Because the train operators must use their eyes, hands, mouth, and ears, they are more likely to notice problems before something goes wrong. The more automatic a behavior becomes, the less likely we are to consciously think about it. And when we’ve done something a thousand times before, we begin to overlook things. We assume that the next time will be just like the last. We’re so used to doing what we’ve always done that we don’t stop to question whether it’s the right thing to do at all. Many of our failures in performance are largely attributable to a lack of self-awareness. One of our greatest challenges in changing habits is maintaining awareness of what we are actually doing. This helps explain why the consequences of bad habits can sneak up on us. We need a “point-and- call” system for our personal lives. That’s the origin of the Habits Scorecard, which is a simple exercise you can use to become more aware of your behavior. To create your own, make a list of your daily habits. Here’s a sample of where your list might start: Wake up Turn off alarm Check my phone Go to the bathroom Weigh myself Take a shower Brush my teeth Floss my teeth Put on deodorant Hang up towel to dry Get dressed Make a cup of tea . . . and so on. Once you have a full list, look at each behavior, and ask yourself, “Is this a good habit, a bad habit, or a neutral habit?” If it is a good habit, write “+” next to it. If it is a bad habit, write “–”. If it is a neutral habit, write “=”.For example, the list above might look like this: Wake up = Turn off alarm = Check my phone – Go to the bathroom = Weigh myself + Take a shower + Brush my teeth + Floss my teeth + Put on deodorant + Hang up towel to dry = Get dressed = Make a cup of tea + The marks you give to a particular habit will depend on your situation and your goals. For someone who is trying to lose weight, eating a bagel with peanut butter every morning might be a bad habit. For someone who is trying to bulk up and add muscle, the same behavior might be a good habit. It all depends on what you’re working toward.* Scoring your habits can be a bit more complex for another reason as well. The labels “good habit” and “bad habit” are slightly inaccurate. There are no good habits or bad habits. There are only effective habits. That is, effective at solving problems. All habits serve you in some way —even the bad ones—which is why you repeat them. For this exercise, categorize your habits by how they will benefit you in the long run. Generally speaking, good habits will have net positive outcomes. Bad habits have net negative outcomes. Smoking a cigarette may reduce stress right now (that’s how it’s serving you), but it’s not a healthy long-term behavior. If you’re still having trouble determining how to rate a particular habit, here is a question I like to use: “Does this behavior help me become the type of person I wish to be? Does this habit cast a vote foror against my desired identity?” Habits that reinforce your desired identity are usually good. Habits that conflict with your desired identity are usually bad. As you create your Habits Scorecard, there is no need to change anything at first. The goal is to simply notice what is actually going on. Observe your thoughts and actions without judgment or internal criticism. Don’t blame yourself for your faults. Don’t praise yourself for your successes. If you eat a chocolate bar every morning, acknowledge it, almost as if you were watching someone else. Oh, how interesting that they would do such a thing. If you binge-eat, simply notice that you are eating more calories than you should. If you waste time online, notice that you are spending your life in a way that you do not want to. The first step to changing bad habits is to be on the lookout for them. If you feel like you need extra help, then you can try Pointing- and-Calling in your own life. Say out loud the action that you are thinking of taking and what the outcome will be. If you want to cut back on your junk food habit but notice yourself grabbing another cookie, say out loud, “I’m about to eat this cookie, but I don’t need it. Eating it will cause me to gain weight and hurt my health.” Hearing your bad habits spoken aloud makes the consequences seem more real. It adds weight to the action rather than letting yourself mindlessly slip into an old routine. This approach is useful even if you’re simply trying to remember a task on your to-do list. Just saying out loud, “Tomorrow, I need to go to the post office after lunch,” increases the odds that you’ll actually do it. You’re getting yourself to acknowledge the need for action—and that can make all the difference. The process of behavior change always starts with awareness. Strategies like Pointing-and-Calling and the Habits Scorecard are focused on getting you to recognize your habits and acknowledge the cues that trigger them, which makes it possible to respond in a way that benefits you.Key Points
# With enough practice, your brain will pick up on the cues that predict certain outcomes without consciously thinking about it. # Once our habits become automatic, we stop paying attention to what we are doing. # The process of behavior change always starts with awareness. You need to be aware of your habits before you can change them. # Pointing-and-Calling raises your level of awareness from a nonconscious habit to a more conscious level by verbalizing your actions. # The Habits Scorecard is a simple exercise you can use to become more aware of your behavior.
Friday, May 31, 2024
The Habits Scorecard (From CH-4 of the book Atomic Habits)
Monday, May 27, 2024
Estimating the Contamination Factor For Unsupervised Anomaly Detection
For this article we went through the following research paper: Estimating the Contamination Factor's Distribution in Unsupervised Anomaly Detection Lorenzo Perini . Paul-Christian B¨urkner . Arto Klami All of the code and data is available to download from this link: Download Code and Data Here are some highlights from the paper:1. Introduction
... Therefore, we are the first to study the estimation of the contamination factor from a Bayesian perspective. We propose γGMM, the first algorithm for estimating the contamination factor's (posterior) distribution in unlabeled anomaly detection setups. First, we use a set of unsupervised anomaly detectors to assign anomaly scores for all samples and use these scores as a new representation of the data. Second, we fit a Bayesian Gaussian Mixture model with a Dirichlet Process prior (DPGMM) (Ferguson, 1973; Rasmussen, 1999) in this new space. If we knew which components contain the anomalies, we could derive the contamination factor's posterior distribution as the distribution of the sum of such components' weights. Because we do not know this, as a third step γGMM estimates the probability that the k most extreme components are jointly anomalous, and uses this information to construct the desired posterior. The method explained in detail in Section 3. ...3. Methodology
We tackle the problem: Given an unlabeled dataset D and a set of M unsupervised anomaly detectors; Estimate a (posterior) distribution of the contamination factor γ. Learning from an unlabeled dataset has three key challenges. First, the absence of labels forces us to make relatively strong assumptions. Second, the anomaly detectors rely on different heuristics that may or may not hold, and their performance can hence vary significantly across datasets. Third, we need to be careful in introducing user-specified hyperparameters, because setting them properly may be as hard as directly specifying the contamination factor. In this paper, we propose γGMM, a novel Bayesian approach that estimates the contamination factor's posterior distribution in four steps, which are illustrated in Figure 1: Step 1. Because anomalies may not follow any particular pattern in covariate space, γGMM maps the covariates X ∈ Rd into an M dimensional anomaly space, where the dimensions correspond to the anomaly scores assigned by the M unsupervised anomaly detectors. Within each dimension of such a space, the evident pattern is that “the higher the more anomalous”. Step 2. We model the data points in the new space RM using a Dirichlet Process Gaussian Mixture Model (DPGMM) (Neal, 1992; Rasmussen, 1999). We assume that each of the (potentially many) mixture components contains either only normals or only anomalies. If we knew which components contained anomalies, we could then easily derive γ's posterior as the sum of the mixing proportions π of the anomalous components. However, such information is not available in our setting. Step 3. Thus, we order the components in decreasing order, and we estimate the probability of the largest k components being anomalous. This poses three challenges: (a) how to represent each M -dimensional component by a single value to sort them from the most to the least anomalous, (b) how to compute the probability that the kth component is anomalous given that the (k − 1)th is such, (c) how to derive the target probability that k components are jointly anomalous. Step 4. γGMM estimates the contamination factor's posterior by exploiting such a joint probability and the components' mixing proportions posterior.A Simplified Implementation of The Above Algorithm
1. We have our dataset consisting of page views for our blog on Blogger. We load this dataset using Pandas. 2. We initialize two Unsupervised Anomaly Detection models namely: - IsolationForest - LocalOutlierFactor Both of them are available in Scikit-Learn. 3. To begin with, we initialize them with the default values for hyperparameters as in the code below: clf = IsolationForest(random_state=0).fit(X) clf = LocalOutlierFactor().fit(X) That means at this point the model's contamination factor is set to 'auto'. 4. Since we two models here so M = 2 for us. If there were three models, then M would be 3. 5. We get the anomaly scores: anomalyscores_if = clf.decision_function(X) anomalyscores_lof = clf.negative_outlier_factor_ 6. For a simplified view, we plot this 2D data in a scatter plot. import matplotlib.pyplot as plt x = anomalyscores_if y = anomalyscores_lof plt.scatter(x, y) plt.show() 7. Next, we use Bayesian Gaussian Mixture model to cluster the data of anomaly scores into two groups (one being anomalous, other being normal). 8. Next, we find the percentage of anomalous points (Class: 1). This percentage is our contamination factor. 9. Using the above contamination factor for IsolationForest model, we find out anomalies as shown below in red:
References
-
arxiv.org
Estimating the Contamination Factor's Distribution in Unsupervised Anomaly Detection
Lorenzo Perini, Paul Buerkner, Arto Klami - paperswithcode
- /Lorenzo-Perini/GammaGMM
Wednesday, May 22, 2024
Four Practice Problems on Linear Regression (Taken From Interviews For Data Scientist Role)
To watch our related video on: YouTube
Previous Videos
- Linear Regression Theory (2022-02-15)
- https://www.youtube.com/watch?v=qS3HhMV8YG0
- Interview Question 1: What is linear regression and what is it's primary purpose?
- https://www.youtube.com/watch?v=9S2FM9EGcdc
- Use Khan Academy to get started with the basic concepts of linear regression (Motivational video)
- https://www.youtube.com/watch?v=glXMN1VIttA
- Unit 5: Exploring bivariate numerical data
- https://www.khanacademy.org/math/statistics-probability/describing-relationships-quantitative-data
Question (1): Asked At Ericsson
- You are given the data generated for the following equation:
- y = (x^9)*3
- Can you apply linear regression to learn from this data?
Solution (1)
Equation of line: y = mx + c
Equation we are given is of the form y = (x^m)c
Taking log on both the sides:
log(y) = log((x^m)c)
Applying multiplication rule of logarithms:
log(y) = log(x^m) + log(c)
Applying power rule of logarithms:
log(y) = m.log(x) + log(c)
Y = log(y)
X = log(x)
C = log(c)
Y = mX + C
So answer is 'yes'.
Question (2): Infosys – Digital Solution Specialist
- If you do linear regression in 3D, what do you get?
Solution (2)
When you perform linear regression on 3D data, you are essentially fitting a plane to a set of data points in three-dimensional space. The general form of the equation for a plane in three dimensions is:
z=ax+by+c
Here:
z is the dependent variable you are trying to predict.
x and y are the independent variables.
a and b are the coefficients that determine the orientation of the plane.
c is the intercept.
Solution (2)...
Suppose you have data points (1,2,3), (2,3,5), (3,4,7), and you fit a linear regression model to this data. The resulting plane might have an equation like z=0.8x+1.2y+0.5. This equation tells you how z changes as x and y change.
In summary, performing linear regression on 3D data gives you a plane in three-dimensional space that best fits your data points in the least squares sense. This plane can then be used to predict new z values given new x and y values.
Generalizing a bit further
- If you do linear regression in N dimensions, you get a hypersurface in N-1 dimensions.
Question (3): Infosys – Digital Solution Specialist
- How do you tell if there is linearity between two variables?
Solution (3)
Determining if there is linearity between two variables involves several steps, including visual inspection, statistical tests, and fitting a linear model to evaluate the relationship. Here are the main methods you can use:
1. Scatter Plot
Create a scatter plot of the two variables. This is the most straightforward way to visually inspect the relationship.
Linearity: If the points roughly form a straight line (either increasing or decreasing), there is likely a linear relationship.
Non-linearity: If the points form a curve, cluster in a non-linear pattern, or are randomly scattered without any apparent trend, there is likely no linear relationship.
2. Correlation Coefficient
Calculate the Pearson correlation coefficient, which measures the strength and direction of the linear relationship between two variables.
Pearson Correlation Coefficient (r): Ranges from -1 to 1.
r≈1 or r≈−1: Strong linear relationship (positive or negative).
r≈0: Weak or no linear relationship.
3. Fitting a Linear Model
Fit a simple linear regression model to the data.
Model Equation: y = β0 + β1.x + ϵ
y: Dependent variable. / x: Independent variable. / β0: Intercept. / β1: Slope. / ϵ: Error term.
4. Residual Analysis
Examine the residuals (differences between observed and predicted values) from the fitted linear model.
Residual Plot: Plot residuals against the independent variable or the predicted values.
Linearity: Residuals are randomly scattered around zero.
Non-linearity: Residuals show a systematic pattern (e.g., curve, trend).
5. Statistical Tests
Perform statistical tests to evaluate the significance of the linear relationship.
t-test for Slope: Test if the slope (β1) is significantly different from zero.
Null Hypothesis (H0): β1=0 (no linear relationship).
Alternative Hypothesis (H1): β1≠0 (linear relationship exists).
p-value: If the p-value is less than the chosen significance level (e.g., 0.05), reject H0 and conclude that a significant linear relationship exists.
6. Coefficient of Determination (R²)
Calculate the R² value, which indicates the proportion of variance in the dependent variable explained by the independent variable.
R² Value: Ranges from 0 to 1.
Closer to 1: Indicates a strong linear relationship.
Closer to 0: Indicates a weak or no linear relationship.
Example:
Suppose you have two variables, x and y.
Scatter Plot: You plot x vs. y and observe a straight-line pattern.
Correlation Coefficient: You calculate the Pearson correlation coefficient and find r=0.85, indicating a strong positive linear relationship.
Fitting a Linear Model: You fit a linear regression model y=2+3x.
Residual Analysis: You plot the residuals and observe they are randomly scattered around zero, indicating no pattern.
Statistical Tests: The t-test for the slope gives a p-value of 0.001, indicating the slope is significantly different from zero.
R² Value: You calculate R^2=0.72, meaning 72% of the variance in y is explained by x.
Based on these steps, you would conclude there is a strong linear relationship between x and y.
Question (4): TCS and Infosys (DSS)
- What is the difference between Lasso regression and Ridge regression?
Solution (4)
Lasso and Ridge regression are both techniques used to improve the performance of linear regression models, especially when dealing with multicollinearity or when the number of predictors is large compared to the number of observations. They achieve this by adding a regularization term to the loss function, which penalizes large coefficients. However, they differ in the type of penalty applied:
Ridge Regression:
- Penalty Type: L2 norm (squared magnitude of coefficients)
- Objective Function: Minimizes the sum of squared residuals plus the sum of squared coefficients multiplied by a penalty term λObjective Function: min(i=1∑n(yi−y^i)2+λj=1∑pβj2)Here, λ is the regularization parameter, yi are the observed values, y^i are the predicted values, and βj are the coefficients.
- Effect on Coefficients: Shrinks coefficients towards zero but does not set any of them exactly to zero. As a result, all predictors are retained in the model.
- Use Cases: Useful when you have many predictors that are all potentially relevant to the model, and you want to keep all of them but shrink their influence.
Lasso Regression:
- Penalty Type: L1 norm (absolute magnitude of coefficients)
- Objective Function: Minimizes the sum of squared residuals plus the sum of absolute values of coefficients multiplied by a penalty term λObjective Function: min(i=1∑n(yi−y^i)2+λj=1∑p∣βj∣)Here, λ is the regularization parameter, yi are the observed values, y^i are the predicted values, and βj are the coefficients.
- Effect on Coefficients: Can shrink some coefficients exactly to zero, effectively performing variable selection. This means that it can produce a sparse model where some predictors are excluded.
- Use Cases: Useful when you have many predictors but you suspect that only a subset of them are actually important for the model. Lasso helps in feature selection by removing irrelevant predictors.
Key Differences:
Type of Regularization:
- Ridge: L2 regularization (squared magnitude of coefficients)
- Lasso: L1 regularization (absolute magnitude of coefficients)
Effect on Coefficients:
- Ridge: Tends to shrink coefficients uniformly, but none are set exactly to zero.
- Lasso: Can shrink some coefficients to exactly zero, leading to a sparse model.
Use Cases:
- Ridge: Better when you want to retain all predictors and control their magnitude.
- Lasso: Better when you want to perform feature selection and eliminate some predictors.
Computational Complexity:
- Ridge: Generally simpler to compute because the penalty term is differentiable everywhere.
- Lasso: Can be more computationally intensive because the penalty term is not differentiable at zero, requiring more sophisticated optimization techniques.
Elastic Net:
As a side note, there is also the Elastic Net method, which combines both L1 and L2 penalties. It is useful when you want the benefits of both Ridge and Lasso regression:
Here, λ1 and λ2 control the L1 and L2 penalties, respectively. This method can select variables like Lasso and shrink coefficients like Ridge.
In summary, Ridge regression is ideal when you want to shrink coefficients without eliminating any, while Lasso regression is useful for creating simpler, more interpretable models by removing some predictors entirely.