Saturday, April 13, 2024

How to Build Better Habits in 4 Simple Steps (Ch 3 from the book Atomic Habits)

THE SCIENCE OF HOW HABITS WORK

The process of building a habit can be divided into four simple steps: cue, craving, response, and reward.* Breaking it down into these fundamental parts can help us understand what a habit is, how it works, and how to improve it.
FIGURE 5: All habits proceed through four stages in the same order: cue, craving, response, and reward. This four-step pattern is the backbone of every habit, and your brain runs through these steps in the same order each time. First, there is the cue. The cue triggers your brain to initiate a behavior. It is a bit of information that predicts a reward. Our prehistoric ancestors were paying attention to cues that signaled the location of primary rewards like food, water, and sex. Today, we spend most of our time learning cues that predict secondary rewards like money and fame, power and status, praise and approval, love and friendship, or a sense of personal satisfaction. (Of course, these pursuits also indirectly improve our odds of survival and reproduction, which is the deeper motive behind everything we do.) Your mind is continuously analyzing your internal and external environment for hints of where rewards are located. Because the cue is the first indication that we're close to a reward, it naturally leads to a craving. Cravings are the second step, and they are the motivational force behind every habit. Without some level of motivation or desire— without craving a change—we have no reason to act. What you crave is not the habit itself but the change in state it delivers. You do not crave smoking a cigarette, you crave the feeling of relief it provides. You are not motivated by brushing your teeth but rather by the feeling of a clean mouth. You do not want to turn on the television, you want to be entertained. Every craving is linked to a desire to change your internal state. This is an important point that we will discuss in detail later. Cravings differ from person to person. In theory, any piece of information could trigger a craving, but in practice, people are not motivated by the same cues. For a gambler, the sound of slot machines can be a potent trigger that sparks an intense wave of desire. For someone who rarely gambles, the jingles and chimes of the casino are just background noise. Cues are meaningless until they are interpreted. The thoughts, feelings, and emotions of the observer are what transform a cue into a craving. The third step is the response. The response is the actual habit you perform, which can take the form of a thought or an action. Whether a response occurs depends on how motivated you are and how much friction is associated with the behavior. If a particular action requires more physical or mental effort than you are willing to expend, then you won't do it. Your response also depends on your ability. It sounds simple, but a habit can occur only if you are capable of doing it. If you want to dunk a basketball but can't jump high enough to reach the hoop, well, you're out of luck. Finally, the response delivers a reward. Rewards are the end goal of every habit. The cue is about noticing the reward. The craving is about wanting the reward. The response is about obtaining the reward. We chase rewards because they serve two purposes: (1) they satisfy us and (2) they teach us. The first purpose of rewards is to satisfy your craving. Yes, rewards provide benefits on their own. Food and water deliver the energy you need to survive. Getting a promotion brings more money and respect. Getting in shape improves your health and your dating prospects. But the more immediate benefit is that rewards satisfy your craving to eat or to gain status or to win approval. At least for a moment, rewards deliver contentment and relief from craving. Second, rewards teach us which actions are worth remembering in the future. Your brain is a reward detector. As you go about your life, your sensory nervous system is continuously monitoring which actions satisfy your desires and deliver pleasure. Feelings of pleasure and disappointment are part of the feedback mechanism that helps your brain distinguish useful actions from useless ones. Rewards close the feedback loop and complete the habit cycle. If a behavior is insufficient in any of the four stages, it will not become a habit. Eliminate the cue and your habit will never start. Reduce the craving and you won't experience enough motivation to act. Make the behavior difficult and you won't be able to do it. And if the reward fails to satisfy your desire, then you'll have no reason to do it again in the future. Without the first three steps, a behavior will not occur. Without all four, a behavior will not be repeated.

THE HABIT LOOP

FIGURE 6: The four stages of habit are best described as a feedback loop. They form an endless cycle that is running every moment you are alive. This “habit loop” is continually scanning the environment, predicting what will happen next, trying out different responses, and learning from the results.* In summary, the cue triggers a craving, which motivates a response, which provides a reward, which satisfies the craving and, ultimately, becomes associated with the cue. Together, these four steps form a neurological feedback loop—cue, craving, response, reward; cue, craving, response, reward—that ultimately allows you to create automatic habits. This cycle is known as the habit loop. This four-step process is not something that happens occasionally, but rather it is an endless feedback loop that is running and active during every moment you are alive—even now. The brain is continually scanning the environment, predicting what will happen next, trying out different responses, and learning from the results. The entire process is completed in a split second, and we use it again and again without realizing everything that has been packed into the previous moment. We can split these four steps into two phases: the problem phase and the solution phase. The problem phase includes the cue and the craving, and it is when you realize that something needs to change. The solution phase includes the response and the reward, and it is when you take action and achieve the change you desire. Problem phase 1. Cue 2. Craving Solution phase 3. Response 4. Reward All behavior is driven by the desire to solve a problem. Sometimes the problem is that you notice something good and you want to obtain it. Sometimes the problem is that you are experiencing pain and you want to relieve it. Either way, the purpose of every habit is to solve the problems you face. In the table on the following page, you can see a few examples of what this looks like in real life. Imagine walking into a dark room and flipping on the light switch. You have performed this simple habit so many times that it occurs without thinking. You proceed through all four stages in the fraction of a second. The urge to act strikes you without thinking. Problem phase 1. Cue: Your phone buzzes with a new text message. 2. Craving: You want to learn the contents of the message. Solution phase 3. Response: You grab your phone and read the text. 4. Reward: You satisfy your craving to read the message. Grabbing your phone becomes associated with your phone buzzing. ~~~ Problem phase 1. Cue: You are answering emails. 2. Craving: You begin to feel stressed and overwhelmed by work. You want to feel in control. Solution phase 3. Response: You bite your nails. 4. Reward: You satisfy your craving to reduce stress. Biting your nails becomes associated with answering email. ~~~ Problem phase 1. Cue: You wake up. 2. Craving: You want to feel alert. Solution phase 3. Response: You drink a cup of coffee. 4. Reward: You satisfy your craving to feel alert. Drinking coffee becomes associated with waking up. ~~~ Problem phase 1. Cue: You smell a doughnut shop as you walk down the street near your office. 2. Craving: You begin to crave a doughnut. Solution phase 3. Response: You buy a doughnut and eat it. 4. Reward: You satisfy your craving to eat a doughnut. Buying a doughnut becomes associated with walking down the street near your office. ~~~ Problem phase 1. Cue: You hit a stumbling block on a project at work. 2. Craving: You feel stuck and want to relieve your frustration. Solution phase 3. Response: You pull out your phone and check social media. 4. Reward: You satisfy your craving to feel relieved. Checking social media becomes associated with feeling stalled at work. ~~~ Problem phase 1. Cue: You walk into a dark room. 2. Craving: You want to be able to see. Solution phase 3. Response: You flip the light switch. 4. Reward: You satisfy your craving to see. Turning on the light switch becomes associated with being in a dark room. By the time we become adults, we rarely notice the habits that are running our lives. Most of us never give a second thought to the fact that we tie the same shoe first each morning, or unplug the toaster after each use, or always change into comfortable clothes after getting home from work. After decades of mental programming, we automatically slip into these patterns of thinking and acting.

THE FOUR LAWS OF BEHAVIOR CHANGE

In the following chapters, we will see time and again how the four stages of cue, craving, response, and reward influence nearly everything we do each day. But before we do that, we need to transform these four steps into a practical framework that we can use to design good habits and eliminate bad ones. I refer to this framework as the Four Laws of Behavior Change, and it provides a simple set of rules for creating good habits and breaking bad ones. You can think of each law as a lever that influences human behavior. When the levers are in the right positions, creating good habits is effortless. When they are in the wrong positions, it is nearly impossible. How to Create a Good Habit The 1st law (Cue): Make it obvious. The 2nd law (Craving): Make it attractive. The 3rd law (Response): Make it easy. The 4th law (Reward): Make it satisfying. We can invert these laws to learn how to break a bad habit. How to Break a Bad Habit Inversion of the 1st law (Cue): Make it invisible. Inversion of the 2nd law (Craving): Make it unattractive. Inversion of the 3rd law (Response): Make it difficult. Inversion of the 4th law (Reward): Make it unsatisfying. It would be irresponsible for me to claim that these four laws are an exhaustive framework for changing any human behavior, but I think they're close. As you will soon see, the Four Laws of Behavior Change apply to nearly every field, from sports to politics, art to medicine, comedy to management. These laws can be used no matter what challenge you are facing. There is no need for completely different strategies for each habit. Whenever you want to change your behavior, you can simply ask yourself: 1. How can I make it obvious? 2. How can I make it attractive? 3. How can I make it easy? 4. How can I make it satisfying? If you have ever wondered, “Why don't I do what I say I'm going to do? Why don't I lose the weight or stop smoking or save for retirement or start that side business? Why do I say something is important but never seem to make time for it?” The answers to those questions can be found somewhere in these four laws. The key to creating good habits and breaking bad ones is to understand these fundamental laws and how to alter them to your specifications. Every goal is doomed to fail if it goes against the grain of human nature. Your habits are shaped by the systems in your life. In the chapters that follow, we will discuss these laws one by one and show how you can use them to create a system in which good habits emerge naturally and bad habits wither away.

Key Points

# A habit is a behavior that has been repeated enough times to become automatic. # The ultimate purpose of habits is to solve the problems of life with as little energy and effort as possible. # Any habit can be broken down into a feedback loop that involves four steps: cue, craving, response, and reward. # The Four Laws of Behavior Change are a simple set of rules we can use to build better habits. They are: (1) make it obvious, (2) make it attractive, (3) make it easy, and (4) make it satisfying.

Friday, April 12, 2024

When not to use Poisson Distribution for prediction?

Previously we had seen how we were trying to fit Poisson Distribution to the traffic data on our blog. 

Link to the Post

Today, we study why our analysis could be wrong.

By looking at this plot, we suspect three things: 1. The data has outliers. A bunch of points around May 2021. 2. The data has some local trend. Upward trend till Jan 2022 and then downward trend till Mar 2023. 3. Because of the above two points, we feel: there is constant rate (average) for the events occurring within a specific time interval.

Can we use poisson distribution when we suspect there are outliers in the data?

The Poisson distribution has its limitations when dealing with outliers. Here's why:

  • Underlying Assumption: The Poisson distribution assumes a constant rate of events across the observed interval. Outliers, by definition, deviate significantly from this expected pattern.

  • Impact on Results: If outliers are present, the Poisson model might not accurately reflect the actual probability of observing certain counts. This can lead to misleading interpretations of the data.

However, there are still ways to approach this situation:

  • Outlier Detection: You can use statistical methods to identify potential outliers in your data. Techniques like Grubbs' test or examining standardized residuals can help [3].

  • Alternative Models: Depending on the nature of the outliers, alternative models like the negative binomial distribution might be a better fit. This model accounts for overdispersion, which is a higher variance than expected by the Poisson distribution, often caused by outliers [1].

  • Data Transformation: In some cases, transforming the data (like using the Anscombe transform) before applying the Poisson model can help reduce the influence of outliers [3].

Ultimately, the decision of using the Poisson distribution with outliers depends on the severity of the outliers and the purpose of your analysis.

Here are some additional resources you might find helpful:

And can we use poisson distribution when we suspect there is a trend in the data?

No, the standard Poisson distribution is not ideal for data with a suspected trend. Here's why:

  • Constant Rate Assumption: The Poisson distribution assumes a constant rate (average) for the events occurring within a specific time interval. A trend implies a non-constant rate, where the frequency of events increases or decreases over time.

  • Misleading Results: Using the Poisson model with a trend can lead to inaccurate estimates of event probabilities and unreliable hypothesis tests.

However, there are ways to analyze count data with trends:

  • Poisson Regression: This statistical technique incorporates a trend term into the Poisson model. You can use a linear trend or even more complex functions to capture the non-constant rate [2].

  • Segmented Poisson Regression: This approach divides the data into segments with potentially different rates. It's useful when the trend has breaks or shifts at specific points [2].

  • Change-Point Detection: Methods exist to statistically identify the time points where the event rate changes, allowing for further analysis of segments with constant rates [1].

In conclusion, the standard Poisson distribution is not suitable for data with trends. Consider using Poisson regression or other techniques that account for the non-constant event rate.

Here are some additional resources you might find helpful:

Tags: Mathematical Foundations for Data Science,Technology,

Sunday, April 7, 2024

What is zero shot? Single shot? Few Shots in terms of LLMs?

In the context of Language Model (LM) training and fine-tuning, terms like "zero-shot," "single-shot," and "few-shot" refer to different approaches for adapting a pre-trained LM to a specific task without extensive task-specific training data. These terms are commonly associated with transfer learning, where a model pre-trained on a large corpus of text is fine-tuned on a smaller dataset for a specific downstream task.

  1. Zero-shot Learning: In zero-shot learning, the model is directly applied to a task without any task-specific training examples. Instead, the model leverages its pre-trained knowledge to make predictions on the task. For example, if a pre-trained LM has been trained on a diverse range of text data, you could directly use it for tasks like text classification or text generation without fine-tuning on task-specific data.

  2. Single-shot Learning: Single-shot learning involves providing only a single example or a very small amount of labeled data for the task. The model then learns from this limited supervision to make predictions on similar examples. This approach is useful when labeled data is scarce or expensive to obtain. The model may adapt its parameters slightly to accommodate the provided example during training.

  3. Few-shot Learning: Few-shot learning is similar to single-shot learning but involves providing a few examples (more than one) for the task. These examples are typically used to fine-tune the pre-trained model on the specific task. Few-shot learning enables the model to generalize better compared to single-shot learning as it has more training instances to learn from. Techniques like meta-learning or transfer learning from related tasks can enhance few-shot learning performance.

In summary, zero-shot learning does not involve any task-specific training examples, while single-shot learning involves a single example or very limited labeled data, and few-shot learning involves a small number of examples for the task. These approaches allow pre-trained LMs to adapt to various downstream tasks efficiently, leveraging their learned representations and capabilities.

Saturday, April 6, 2024

How Your Habits Shape Your Identity (and Vice Versa) - Chapter 2 From The Book Atomic Habits

WHY IS IT so easy to repeat bad habits and so hard to form good ones? Few things can have a more powerful impact on your life than improving your daily habits. And yet it is likely that this time next year you'll be doing the same thing rather than something better.
It often feels difficult to keep good habits going for more than a few days, even with sincere effort and the occasional burst of motivation.
Habits like exercise, meditation, journaling, and cooking are reasonable for a day or two and then become a hassle.
However, once your habits are established, they seem to stick around forever—especially the unwanted ones. Despite our best intentions, unhealthy habits like eating junk food, watching too much television, procrastinating, and smoking can feel impossible to break.
Changing our habits is challenging for two reasons: 
(1) we try to change the wrong thing and 
(2) we try to change our habits in the wrong way. 

In this chapter, I'll address the first point. In the chapters that follow, I'll answer the second.
Our first mistake is that we try to change the wrong thing. To understand what I mean, consider that there are three levels at which change can occur. You can imagine them like the layers of an onion.

THREE LAYERS OF BEHAVIOR CHANGE

FIGURE 3: There are three layers of behavior change: a change in your outcomes, a change in your processes, or a change in your identity. The first layer is changing your outcomes. This level is concerned with changing your results: losing weight, publishing a book, winning a championship. Most of the goals you set are associated with this level of change. The second layer is changing your process. This level is concerned with changing your habits and systems: implementing a new routine at the gym, decluttering your desk for better workflow, developing a meditation practice. Most of the habits you build are associated with this level. The third and deepest layer is changing your identity. This level is concerned with changing your beliefs: your worldview, your self-image, your judgments about yourself and others. Most of the beliefs, assumptions, and biases you hold are associated with this level. Outcomes are about what you get. Processes are about what you do. Identity is about what you believe. When it comes to building habits that last—when it comes to building a system of 1 percent improvements—the problem is not that one level is “better” or “worse” than another. All levels of change are useful in their own way. The problem is the direction of change. Many people begin the process of changing their habits by focusing on what they want to achieve. This leads us to outcome-based habits. The alternative is to build identity-based habits. With this approach, we start by focusing on who we wish to become.
FIGURE 4: With outcome-based habits, the focus is on what you want to achieve. With identity-based habits, the focus is on who you wish to become. Imagine two people resisting a cigarette. When offered a smoke, the first person says, “No thanks. I'm trying to quit.” It sounds like a reasonable response, but this person still believes they are a smoker who is trying to be something else. They are hoping their behavior will change while carrying around the same beliefs. The second person declines by saying, “No thanks. I'm not a smoker.” It's a small difference, but this statement signals a shift in identity. Smoking was part of their former life, not their current one. They no longer identify as someone who smokes. Most people don't even consider identity change when they set out to improve. They just think, “I want to be skinny (outcome) and if I stick to this diet, then I'll be skinny (process).” They set goals and determine the actions they should take to achieve those goals without considering the beliefs that drive their actions. They never shift the way they look at themselves, and they don't realize that their old identity can sabotage their new plans for change. Behind every system of actions are a system of beliefs. The system of a democracy is founded on beliefs like freedom, majority rule, and social equality. The system of a dictatorship has a very different set of beliefs like absolute authority and strict obedience. You can imagine many ways to try to get more people to vote in a democracy, but such behavior change would never get off the ground in a dictatorship. That's not the identity of the system. Voting is a behavior that is impossible under a certain set of beliefs. A similar pattern exists whether we are discussing individuals, organizations, or societies. There are a set of beliefs and assumptions that shape the system, an identity behind the habits. Behavior that is incongruent with the self will not last. You may want more money, but if your identity is someone who consumes rather than creates, then you'll continue to be pulled toward spending rather than earning. You may want better health, but if you continue to prioritize comfort over accomplishment, you'll be drawn to relaxing rather than training. It's hard to change your habits if you never change the underlying beliefs that led to your past behavior. You have a new goal and a new plan, but you haven't changed who you are. ~~~ True behavior change is identity change. You might start a habit because of motivation, but the only reason you'll stick with one is that it becomes part of your identity. Anyone can convince themselves to visit the gym or eat healthy once or twice, but if you don't shift the belief behind the behavior, then it is hard to stick with long-term changes. Improvements are only temporary until they become part of who you are. The goal is not to read a book, the goal is to become a reader. The goal is not to run a marathon, the goal is to become a runner. The goal is not to learn an instrument, the goal is to become a musician.

THE TWO-STEP PROCESS TO CHANGING YOUR IDENTITY

Your identity emerges out of your habits. You are not born with preset beliefs. Every belief, including those about yourself, is learned and conditioned through experience.* More precisely, your habits are how you embody your identity. When you make your bed each day, you embody the identity of an organized person. When you write each day, you embody the identity of a creative person. When you train each day, you embody the identity of an athletic person. The more you repeat a behavior, the more you reinforce the identity associated with that behavior. In fact, the word identity was originally derived from the Latin words essentitas, which means being, and identidem, which means repeatedly. Your identity is literally your “repeated beingness.” Whatever your identity is right now, you only believe it because you have proof of it. If you go to church every Sunday for twenty years, you have evidence that you are religious. If you study biology for one hour every night, you have evidence that you are studious. If you go to the gym even when it's snowing, you have evidence that you are committed to fitness. The more evidence you have for a belief, the more strongly you will believe it. For most of my early life, I didn't consider myself a writer. If you were to ask any of my high school teachers or college professors, they would tell you I was an average writer at best: certainly not a standout. When I began my writing career, I published a new article every Monday and Thursday for the first few years. As the evidence grew, so did my identity as a writer. I didn't start out as a writer. I became one through my habits.Of course, your habits are not the only actions that influence your identity, but by virtue of their frequency they are usually the most important ones. Each experience in life modifies your self-image, but it's unlikely you would consider yourself a soccer player because you kicked a ball once or an artist because you scribbled a picture. As you repeat these actions, however, the evidence accumulates and your self- image begins to change. The effect of one-off experiences tends to fade away while the effect of habits gets reinforced with time, which means your habits contribute most of the evidence that shapes your identity. In this way, the process of building habits is actually the process of becoming yourself. This is a gradual evolution. We do not change by snapping our fingers and deciding to be someone entirely new. We change bit by bit, day by day, habit by habit. We are continually undergoing microevolutions of the self. Each habit is like a suggestion: “Hey, maybe this is who I am.” If you finish a book, then perhaps you are the type of person who likes reading. If you go to the gym, then perhaps you are the type of person who likes exercise. If you practice playing the guitar, perhaps you are the type of person who likes music. Every action you take is a vote for the type of person you wish to become. No single instance will transform your beliefs, but as the votes build up, so does the evidence of your new identity. This is one reason why meaningful change does not require radical change. Small habits can make a meaningful difference by providing evidence of a new identity. And if a change is meaningful, it actually is big. That's the paradox of making small improvements. Putting this all together, you can see that habits are the path to changing your identity. The most practical way to change who you are is to change what you do. Each time you write a page, you are a writer. Each time you practice the violin, you are a musician. Each time you start a workout, you are an athlete. Each time you encourage your employees, you are a leader. Each habit not only gets results but also teaches you something far more important: to trust yourself. You start to believe you can actually accomplish these things. When the votes mount up and the evidence begins to change, the story you tell yourself begins to change as well. Of course, it works the opposite way, too. Every time you choose to perform a bad habit, it's a vote for that identity. The good news is that you don't need to be perfect. In any election, there are going to be votes for both sides. You don't need a unanimous vote to win an election; you just need a majority. It doesn't matter if you cast a few votes for a bad behavior or an unproductive habit. Your goal is simply to win the majority of the time. New identities require new evidence. If you keep casting the same votes you've always cast, you're going to get the same results you've always had. If nothing changes, nothing is going to change. It is a simple two-step process: 1. Decide the type of person you want to be. 2. Prove it to yourself with small wins. First, decide who you want to be. This holds at any level—as an individual, as a team, as a community, as a nation. What do you want to stand for? What are your principles and values? Who do you wish to become? These are big questions, and many people aren't sure where to begin —but they do know what kind of results they want: to get six-pack abs or to feel less anxious or to double their salary. That's fine. Start there and work backward from the results you want to the type of person who could get those results. Ask yourself, “Who is the type of person that could get the outcome I want?” Who is the type of person that could lose forty pounds? Who is the type of person that could learn a new language? Who is the type of person that could run a successful start-up? For example, “Who is the type of person who could write a book?” It's probably someone who is consistent and reliable. Now your focus shifts from writing a book (outcome-based) to being the type of person who is consistent and reliable (identity-based). This process can lead to beliefs like: “I'm the kind of teacher who stands up for her students.” “I'm the kind of doctor who gives each patient the time and empathy they need.” “I'm the kind of manager who advocates for her employees.” Once you have a handle on the type of person you want to be, you can begin taking small steps to reinforce your desired identity. I have a friend who lost over 100 pounds by asking herself, “What would a healthy person do?” All day long, she would use this question as a guide. Would a healthy person walk or take a cab? Would a healthy person order a burrito or a salad? She figured if she acted like a healthy person long enough, eventually she would become that person. She was right. The concept of identity-based habits is our first introduction to another key theme in this book: feedback loops. Your habits shape your identity, and your identity shapes your habits. It's a two-way street. The formation of all habits is a feedback loop (a concept we will explore in depth in the next chapter), but it's important to let your values, principles, and identity drive the loop rather than your results. The focus should always be on becoming that type of person, not getting a particular outcome.

THE REAL REASON HABITS MATTER

Identity change is the North Star of habit change. The remainder of this book will provide you with step-by-step instructions on how to build better habits in yourself, your family, your team, your company, and anywhere else you wish. But the true question is: “Are you becoming the type of person you want to become?” The first step is not what or how, but who. You need to know who you want to be. Otherwise, your quest for change is like a boat without a rudder. And that's why we are starting here. You have the power to change your beliefs about yourself. Your identity is not set in stone. You have a choice in every moment. You can choose the identity you want to reinforce today with the habits you choose today. And this brings us to the deeper purpose of this book and the real reason habits matter. Building better habits isn't about littering your day with life hacks. It's not about flossing one tooth each night or taking a cold shower each morning or wearing the same outfit each day. It's not about achieving external measures of success like earning more money, losing weight, or reducing stress. Habits can help you achieve all of these things, but fundamentally they are not about having something. They are about becoming someone. Ultimately, your habits matter because they help you become the type of person you wish to be. They are the channel through which you develop your deepest beliefs about yourself. Quite literally, you become your habits.

Chapter Summary

# There are three levels of change: outcome change, process change, and identity change. # The most effective way to change your habits is to focus not on what you want to achieve, but on who you wish to become. # Your identity emerges out of your habits. Every action is a vote for the type of person you wish to become. # Becoming the best version of yourself requires you to continuously edit your beliefs, and to upgrade and expand your identity. # The real reason habits matter is not because they can get you better results (although they can do that), but because they can change your beliefs about yourself.
Tags: Behavioral Science,Book Summary,

Friday, April 5, 2024

Building Zero Shot Classifiers For Text Using Large Language Models

The software tools we need for the activity covered in this post are:

  1. Google Colab
  2. GitHub
  3. And last: ChatGPT

Why I needed these three items?

I needed Google Colab to write code. Google Colab allowed to me not care about creating a local environment and setting it up with the required packages such as 'transformers' from Hugging Face.

I needed GitHub to put my code in a place that I can make available to myself anywhere and also to you (my readers).

I needed ChatGPT to get boilerplate code for our particular task. I learnt about the prompts I needed for this activity from the book by Sinan Ozdemir titled: 

Quick Start Guide to Large Language Models. Strategies and Best Practices for using ChatGPT and Other LLMs - Addison-Wesley Professional (2023)

Download Book

What you would need would (I think) is:

Google Colab that is connected for code to my public repository hosted on GitHub.

 
How to connect Google Colab with GitHub?

1. Once you would open the Google Colab, you get a screen as shown below.

Note that: I am logged in to my Google Account for this task.


2. Next: you would click on "File" at the top left.

You would click on "Open Notebook".

And you would select "GitHub" as shown below:


3. You would fill in the username. It is: ashishjain1547

Once you fill in the username, the "Repository" dropdown would auto populate with public repositories available for that user.

The repository you would select is: "generative_ai_workspace_2024_04_05"

4. Once repo is selected, it's notebooks start appearing below:

Code for zero shot Spam vs. Not Spam classifier using Facebook's BART


from transformers import pipeline
def classify_text (email):
    """
    Use Facebook's BART model to classify an email into "spam" or "not spam"
    Args:
    email (str): The email to classify
    Returns:
    str: The classification of the email
    """

    classifier = pipeline('zero-shot-classification', model='facebook/bart-large-mnli')
    labels = ['spam', 'not spam']
    hypothesis_template = 'This email is {}.'
    results = classifier(email, labels, hypothesis_template=hypothesis_template)
    return results['labels'][0]

 

Usage:


 

How we used ChatGPT?

Provide code to do zero shot sentiment analysis using Google BERT.
 
Answer:

To perform zero-shot sentiment analysis using Google's BERT model, you can utilize the Hugging Face Transformers library along with the pre-trained BERT model. Here's how you can do it in Python:
 
from transformers import pipeline

# Load the zero-shot classification pipeline with BERT
classifier = pipeline("zero-shot-classification", model="bert-base-uncased")

# Define a list of texts for sentiment analysis
texts = [
    "I love this product, it's amazing!",
    "This movie was terrible, I hated it.",
    "The weather today is fantastic.",
    "I feel neutral about this situation."
]

# Define the possible labels/categories for sentiment
candidate_labels = ["positive", "negative", "neutral"]

# Perform zero-shot sentiment analysis
results = classifier(texts, candidate_labels)

# Print the results
for i, text in enumerate(texts):
    print(f"Text: {text}")
    print("Sentiment probabilities:")
    for label, score in zip(results[i]['labels'], results[i]['scores']):
        print(f"{label}: {score:.4f}")
    print()
 
 
Output

Text: I love this product, it's amazing!
Sentiment probabilities:
positive: 0.4718
negative: 0.2679
neutral: 0.2603

Text: This movie was terrible, I hated it.
Sentiment probabilities:
negative: 0.3644
positive: 0.3179
neutral: 0.3177

Text: The weather today is fantastic.
Sentiment probabilities:
positive: 0.4026
negative: 0.3039
neutral: 0.2935

Text: I feel neutral about this situation.
Sentiment probabilities:
neutral: 0.3587
negative: 0.3248
positive: 0.3166


Additional Note

Question:

How does the BERT know that the problem is of sentiment analysis and not spam-vs-not_spam?

Answer: 

In the code provided, the BERT model doesn't inherently "know" that the problem is specifically about sentiment analysis. Instead, it is designed for zero-shot classification, which means it can classify texts into predefined categories without being explicitly trained on those categories.

The key lies in the input data and the labels provided. In the example, we've defined a list of texts that contain statements or reviews, and we've also provided a list of candidate labels (positive, negative, neutral) that represent sentiment categories. When you pass these texts and labels to the BERT model through the zero-shot classification pipeline, it infers the sentiment of each text based on its understanding of the language and the provided labels.

If you were to perform a different classification task, such as spam vs. not spam, you would simply replace the list of candidate labels with the appropriate labels for that task (e.g., spam, not_spam). The model doesn't need to be explicitly told the nature of the classification problem because it learns to make predictions based on the provided labels during training.

 

Thursday, April 4, 2024

EquiLeader (A problem in the concept of Leader in DSA using Boyer-Moore Majority Vote Algo)

EquiLeader: Find the index S such that the leaders of the sequences A[0], A[1], ..., A[S] and A[S + 1], A[S + 2], ..., A[N - 1] are the same.

A non-empty array A consisting of N integers is given. The leader of this array is the value that occurs in more than half of the elements of A. An equi leader is an index S such that 0 ≤ S < N - 1 and two sequences A[0], A[1], ..., A[S] and A[S + 1], A[S + 2], ..., A[N - 1] have leaders of the same value. For example, given array A such that: A[0] = 4 A[1] = 3 A[2] = 4 A[3] = 4 A[4] = 4 A[5] = 2 we can find two equi leaders: 0, because sequences: (4) and (3, 4, 4, 4, 2) have the same leader, whose value is 4. 2, because sequences: (4, 3, 4) and (4, 4, 2) have the same leader, whose value is 4. The goal is to count the number of equi leaders. Write a function: def solution(A) that, given a non-empty array A consisting of N integers, returns the number of equi leaders. For example, given: A[0] = 4 A[1] = 3 A[2] = 4 A[3] = 4 A[4] = 4 A[5] = 2 the function should return 2, as explained above. Write an efficient algorithm for the following assumptions: N is an integer within the range [1..100,000]; each element of array A is an integer within the range [-1,000,000,000..1,000,000,000].

Taking help from ChatGPT

Our prompt looks like this: You are a programming guide. Here is my problem... ...Please guide how to solve this problem step by step.

To solve this problem efficiently, we can use a two-step approach:

  1. Find the dominator (leader) of the array.
  2. Count the number of equi leaders.

Here's a step-by-step guide on how to approach this problem:

Step 1: Find the Dominator

  1. Initialize two variables, candidate and count, where candidate will store the current candidate for the dominator and count will store the count of occurrences of the current candidate.

  2. Iterate through the array A:

    • If count is 0, set the current element as the candidate and set count to 1.
    • If the current element is equal to the candidate, increment count by 1.
    • If the current element is not equal to the candidate, decrement count by 1.
  3. After iterating through the array, candidate will hold a potential dominator candidate. We need to verify if it's indeed the dominator by counting its occurrences in the array.

  4. Count occurrences of candidate.

    • If the count of occurrences of candidate is not more than half the length of the array, there is no dominator.

Step 2: Count the Equi Leaders

  1. Initialize leader_count to count the total occurrences of the dominator.
  2. Initialize equi_leader_count to count the number of equi leaders.
  3. Iterate through the array again:
    • If the current element equals the dominator, increment leader_count.
    • Calculate the remaining count of the dominator from the end of the array.
    • If the count of the dominator up to the current index is more than half of the elements seen so far, and the count of the dominator from the current index to the end is more than half of the remaining elements, increment equi_leader_count.
def solution(A): candidate = None count = 0 # Find candidate for dominator for num in A: if count == 0: candidate = num count = 1 elif num == candidate: count += 1 else: count -= 1 # Count occurrences of candidate count_candidate = A.count(candidate) # If candidate is not a dominator, return 0 if count_candidate <= len(A) // 2: return 0 leader_count = 0 equi_leader_count = 0 for i, num in enumerate(A): if num == candidate: leader_count += 1 # Check if the leader appears more than half of the elements up to the current index if leader_count > (i + 1) // 2: # Check if the leader appears more than half of the remaining elements remaining_count = count_candidate - leader_count if remaining_count > (len(A) - i - 1) // 2: equi_leader_count += 1 return equi_leader_count # Example usage A = [4, 3, 4, 4, 4, 2] print(solution(A)) # Output should be 2

What is the core idea behind this algo?

The core idea behind this algorithm is to first find the dominator (leader) of the array efficiently, and then utilize the properties of the dominator to count the number of equi leaders.

  1. Finding the Dominator:

    • The algorithm efficiently finds the dominator using the Boyer-Moore Majority Vote algorithm. This algorithm allows us to find a majority element (dominator) in linear time complexity with constant space complexity.
    • By iterating through the array, the algorithm identifies a potential dominator candidate and verifies whether it's indeed the dominator by counting its occurrences in the array. If the count of occurrences is more than half of the length of the array, it confirms the presence of a dominator.
  2. Counting the Equi Leaders:

    • Once the dominator is found, the algorithm iterates through the array again to count both the total occurrences of the dominator and the number of equi leaders.
    • It keeps track of the leader count as it iterates through the array. For each index, it checks if the count of the dominator up to that index is more than half of the elements seen so far. If it is, it also checks if the count of the dominator from the current index to the end is more than half of the remaining elements. If both conditions are met, it increments the equi leader count.
    • This approach leverages the fact that an equi leader can only occur if the dominator occurs in both subsequences with counts greater than half of the respective lengths. Therefore, by efficiently tracking these counts, the algorithm can accurately count the equi leaders.

By following this two-step approach, the algorithm effectively solves the problem of finding the number of equi leaders in an array, utilizing the properties of the dominator to optimize the solution.

Is it correct the equi-leaders and dominator have the same face value?

Yes, you are correct. In the context of this problem, the equi-leaders and the dominator have the same face value. In the problem statement, an equi-leader is defined as an index S such that two sequences separated at index S have leaders of the same value. The leader of a sequence is defined as the value that occurs in more than half of the elements of that sequence. Since the equi-leader is a subset of the original array, and it's defined based on having the same leader value on both sides of the split, it follows that the dominator of the equi-leader subsequence is the same as the dominator of the original array. Therefore, in the context of this problem, the equi-leaders and the dominator indeed have the same face value.
Tags: Algorithms,Technology,

Dominator (a problem on the concept of Leader in Algorithms using Boyer-Moore Majority Vote Algo)

Dominator: Find an index of an array such that its value occurs at more than half of indices in the array.

An array A consisting of N integers is given. The dominator of array A is the value that occurs in more than half of the elements of A. For example, consider array A such that A[0] = 3 A[1] = 4 A[2] = 3 A[3] = 2 A[4] = 3 A[5] = -1 A[6] = 3 A[7] = 3 The dominator of A is 3 because it occurs in 5 out of 8 elements of A (namely in those with indices 0, 2, 4, 6 and 7) and 5 is more than a half of 8. Write a function def solution(A) that, given an array A consisting of N integers, returns index of any element of array A in which the dominator of A occurs. The function should return -1 if array A does not have a dominator. For example, given array A such that A[0] = 3 A[1] = 4 A[2] = 3 A[3] = 2 A[4] = 3 A[5] = -1 A[6] = 3 A[7] = 3 the function may return 0, 2, 4, 6 or 7, as explained above. Write an efficient algorithm for the following assumptions: N is an integer within the range [0..100,000]; each element of array A is an integer within the range [−2,147,483,648..2,147,483,647].

Taking help from ChatGPT...

Our prompt looks like this: You are a programming guide. An array A consisting of N integers is given... ... within the range [-2,147,483,648..2,147,483,647]. Please guide how to solve this problem step by step.

Algo

To solve this problem, we can use the Boyer-Moore Majority Vote algorithm. The algorithm allows us to find a majority element (dominator in this case) in linear time complexity with constant space complexity. Here's how we can solve it step by step: 1. Initialize two variables, candidate and count, where candidate will store the current candidate for the dominator and count will store the count of occurrences of the current candidate. 2. Iterate through the array A: If count is 0, set the current element as the candidate and set count to 1. If the current element is equal to the candidate, increment count by 1. If the current element is not equal to the candidate, decrement count by 1. 3. After iterating through the array, candidate will hold a potential dominator candidate. We need to verify if it's indeed the dominator by counting its occurrences in the array. 4. Iterate through the array again to count the occurrences of the candidate. If the count of occurrences of candidate is greater than half the length of the array, return the index of any occurrence of candidate. If no dominator is found, return -1. Let's implement this in code:

Code

def solution(A): candidate = None count = 0 # Find candidate for dominator for num in A: if count == 0: candidate = num count = 1 elif num == candidate: count += 1 else: count -= 1 # Count occurrences of candidate count_candidate = A.count(candidate) # Check if candidate is the dominator if count_candidate > len(A) // 2: return A.index(candidate) else: return -1 # Example usage A = [3, 4, 3, 2, 3, -1, 3, 3] print(solution(A)) # Output should be 0, 2, 4, 6, or 7
Tags: Algorithms,Technology,

Manhattan stonewall problem on the concept of Stacks

StoneWall: Cover "Manhattan skyline" using the minimum number of rectangles.

You are going to build a stone wall. The wall should be straight and N meters long, and its thickness should be constant; however, it should have different heights in different places. The height of the wall is specified by an array H of N positive integers. H[I] is the height of the wall from I to I+1 meters to the right of its left end. In particular, H[0] is the height of the wall's left end and H[N−1] is the height of the wall's right end. The wall should be built of cuboid stone blocks (that is, all sides of such blocks are rectangular). Your task is to compute the minimum number of blocks needed to build the wall. Write a function: def solution(H) that, given an array H of N positive integers specifying the height of the wall, returns the minimum number of blocks needed to build it. For example, given array H containing N = 9 integers: H[0] = 8 H[1] = 8 H[2] = 5 H[3] = 7 H[4] = 9 H[5] = 8 H[6] = 7 H[7] = 4 H[8] = 8 the function should return 7. The figure shows one possible arrangement of seven blocks.
Write an efficient algorithm for the following assumptions: N is an integer within the range [1..100,000]; each element of array H is an integer within the range [1..1,000,000,000].

Taking help from ChatGPT...

Our prompt looks like: You are a programming guide. You are going to build a stone wall... ... the function should return 7. Please guide how to solve this problem step by step.

Algo

To solve this problem, we can use a stack-based approach. The idea is to traverse through the array of heights and maintain a stack that keeps track of the heights encountered so far. We will also keep track of the number of blocks used. Here's how we can solve it step by step: 1. Initialize an empty stack and a variable to count the number of blocks used. 2. Iterate through the array of heights. 3. For each height, check whether it is greater than the top of the stack. If it is, push the height onto the stack. 4. If the height is less than the top of the stack, pop heights from the stack until the top of the stack is less than or equal to the current height. 5. For each height popped from the stack, increment the block count by 1, as each height represents a block. 6. If after popping from the stack, the stack is empty or the current height is greater than the top of the stack, push the current height onto the stack. 7. After iterating through all heights, return the block count. Let's implement this in code:

Code

def solution(H): stack = [] # Stack to keep track of heights blocks_used = 0 # Counter for blocks used for height in H: # Pop heights from stack until top is less than or equal to current height while stack and stack[-1] > height: stack.pop() blocks_used += 1 # If stack is empty or current height is greater than top of stack, push height if not stack or height > stack[-1]: stack.append(height) # Add remaining heights in stack to block count blocks_used += len(stack) return blocks_used # Example usage H = [8, 8, 5, 7, 9, 8, 7, 4, 8] print(solution(H)) # Output should be 7 This algorithm has a time complexity of O(N), where N is the number of elements in the array H, as it iterates through the array once.
Tags: Algorithms,Technology,

Friday, March 29, 2024

The Surprising Power of Atomic Habits (Ch 1)

WHY SMALL HABITS MAKE A BIG DIFFERENCE

Here’s how the math works out: if you can get 1 percent better each day for one year, you’ll end up thirty-seven times better by the time you’re done. Conversely, if you get 1 percent worse each day for one year, you’ll decline nearly down to zero. 1% BETTER EVERY DAY 1% worse every day for one year. (0.99)^365 = 00.03 1% better every day for one year. (1.01)^365 = 37.78 ~~~
FIGURE 1: The effects of small habits compound over time. For example, if you can get just 1 percent better each day, you’ll end up with results that are nearly 37 times better after one year. ~~~ The impact created by a change in your habits is similar to the effect of shifting the route of an airplane by just a few degrees. Imagine you are flying from Los Angeles to New York City. If a pilot leaving from LAX adjusts the heading just 3.5 degrees south, you will land in Washington, D.C., instead of New York. Such a small change is barely noticeable at takeoff—the nose of the airplane moves just a few feet— but when magnified across the entire United States, you end up hundreds of miles apart.

WHAT PROGRESS IS REALLY LIKE

Imagine that you have an ice cube sitting on the table in front of you. The room is cold and you can see your breath. It is currently twenty-five degrees. Ever so slowly, the room begins to heat up. Twenty-six degrees. Twenty-seven. Twenty-eight. The ice cube is still sitting on the table in front of you. Twenty-nine degrees. Thirty. Thirty-one. Still, nothing has happened. Then, thirty-two degrees. The ice begins to melt. A one-degree shift, seemingly no different from the temperature increases before it, has unlocked a huge change. Breakthrough moments are often the result of many previous actions, which build up the potential required to unleash a major change. This pattern shows up everywhere. Cancer spends 80 percent of its life undetectable, then takes over the body in months. Bamboo can barely be seen for the first five years as it builds extensive root systems underground before exploding ninety feet into the air within six weeks. Similarly, habits often appear to make no difference until you cross a critical threshold and unlock a new level of performance. In the early and middle stages of any quest, there is often a Valley of Disappointment. You expect to make progress in a linear fashion and it’s frustrating how ineffective changes can seem during the first days, weeks, and even months. It doesn’t feel like you are going anywhere. It’s a hallmark of any compounding process: the most powerful outcomes are delayed.

THE PLATEAU OF LATENT POTENTIAL

FIGURE 2: We often expect progress to be linear. At the very least, we hope it will come quickly. In reality, the results of our efforts are often delayed. It is not until months or years later that we realize the true value of the previous work we have done. This can result in a “valley of disappointment” where people feel discouraged after putting in weeks or months of hard work without experiencing any results. However, this work was not wasted. It was simply being stored. It is not until much later that the full value of previous efforts is revealed.

FORGET ABOUT GOALS, FOCUS ON SYSTEMS INSTEAD

What’s the difference between systems and goals? It’s a distinction I first learned from Scott Adams, the cartoonist behind the Dilbert comic. Goals are about the results you want to achieve. Systems are about the processes that lead to those results. If you’re a coach, your goal might be to win a championship. Your system is the way you recruit players, manage your assistant coaches, and conduct practice. If you’re an entrepreneur, your goal might be to build a million-dollar business. Your system is how you test product ideas, hire employees, and run marketing campaigns. If you’re a musician, your goal might be to play a new piece. Your system is how often you practice, how you break down and tackle difficult measures, and your method for receiving feedback from your instructor. ~~~ A handful of problems arise when you spend too much time thinking about your goals and not enough time designing your systems. Problem #1: Winners and losers have the same goals. Problem #2: Achieving a goal is only a momentary change. Problem #3: Goals restrict your happiness. ...Because the implicit assumption behind any goal is this: “Once I reach my goal, then I’ll be happy.” Problem #4: Goals are at odds with long-term progress The purpose of setting goals is to win the game. The purpose of building systems is to continue playing the game. True long-term thinking is goal-less thinking. It’s not about any single accomplishment. It is about the cycle of endless refinement and continuous improvement. Ultimately, it is your commitment to the process that will determine your progress. ~~~

A SYSTEM OF ATOMIC HABITS

If you’re having trouble changing your habits, the problem isn’t you. The problem is your system. Bad habits repeat themselves again and again not because you don’t want to change, but because you have the wrong system for change. You do not rise to the level of your goals. You fall to the level of your systems. Focusing on the overall system, rather than a single goal, is one of the core themes of this book. It is also one of the deeper meanings behind the word atomic. By now, you’ve probably realized that an atomic habit refers to a tiny change, a marginal gain, a 1 percent improvement. But atomic habits are not just any old habits, however small. They are little habits that are part of a larger system. Just as atoms are the building blocks of molecules, atomic habits are the building blocks of remarkable results. Habits are like the atoms of our lives. Each one is a fundamental unit that contributes to your overall improvement. At first, these tiny routines seem insignificant, but soon they build on each other and fuel bigger wins that multiply to a degree that far outweighs the cost of their initial investment. They are both small and mighty. This is the meaning of the phrase atomic habits—a regular practice or routine that is not only small and easy to do, but also the source of incredible power; a component of the system of compound growth.

KEY POINTS... AGAIN

#1 Habits are the compound interest of self-improvement. Getting 1 percent better every day counts for a lot in the long-run. #2 Habits are a double-edged sword. They can work for you or against you, which is why understanding the details is essential. #3 Small changes often appear to make no difference until you cross a critical threshold. The most powerful outcomes of any compounding process are delayed. You need to be patient. #4 An atomic habit is a little habit that is part of a larger system. Just as atoms are the building blocks of molecules, atomic habits are the building blocks of remarkable results. #5 If you want better results, then forget about setting goals. Focus on your system instead. #6 You do not rise to the level of your goals. You fall to the level of your systems.
Tags: Book Summary,Behavioral Science,

Tuesday, March 19, 2024

Show All Interview Questions

User Registration

First time users, please register...

User Login

If you already have an account, please login...

Create a plain HTML form and Link it to a simple Flask API to display the contents of the form

HTML: index page

<div class="container"> <h2>User Registration</h2> <form action="http://127.0.0.1:5000/submit" method="post"> <div class="form-group"> <label for="username">Username:</label> <input type="text" id="username" name="username" required> </div> <div class="form-group"> <label for="password">Password:</label> <input type="password" id="password" name="password" required> </div> <div class="form-group"> <label for="confirm_password">Confirm Password:</label> <input type="password" id="confirm_password" name="confirm_password" required> </div> <button type="submit" class="btn" data-bind="click: customRegister">Register</button> </form> </div>

Python Code For Flask API

from flask import Flask, redirect, render_template, request, url_for import json, requests from flask_cors import CORS, cross_origin from flask import Response, jsonify app=Flask(__name__) @app.route('/') def welcome(): return render_template("index.html") @app.route("/submit", methods=["POST"]) def submit(): print(request.data) username = "" password = "" retypepassword = "" if request.method=="POST": username = request.form["username"] password = request.form["password"] retypepassword = request.form["confirm_password"] print(username, password, retypepassword) # return render_template("result.html", username=username, password=password, retypepassword=retypepassword) return "Hi, " + username if __name__=='__main__': app.run(debug=True)

Monday, March 18, 2024

Books on Large Language Models (Mar 2024)

Download Books
1.
Quick Start Guide to Large Language Models: Strategies and Best Practices for Using ChatGPT and Other LLMs
Sinan Ozdemir, 2023

2.
GPT-3: Building Innovative NLP Products Using Large Language Models
Sandra Kublik, 2022

3.
Understanding Large Language Models: Learning Their Underlying Concepts and Technologies
Thimira Amaratunga, 2023

4.
Introduction to Large Language Models for Business Leaders: Responsible AI Strategy Beyond Fear and Hype
I. Almeida, 2023

5.
Pretrain Vision and Large Language Models in Python: End-to-end Techniques for Building and Deploying Foundation Models on AWS
Emily Webber, 2023

6.
Modern Generative AI with ChatGPT and OpenAI Models: Leverage the Capabilities of OpenAI's LLM for Productivity and Innovation with GPT3 and GPT4
Valentina Alto, 2023

7.
Generative AI with LangChain: Build Large Language Model (LLM) Apps with Python, ChatGPT, and Other LLMs
Ben Auffarth, 2023

8.
Natural Language Processing with Transformers
Lewis Tunstall, 2022

9.
Generative AI on AWS
Chris Fregly, 2023

10.
Decoding GPT: An Intuitive Understanding of Large Language Models Generative AI Machine Learning and Neural Networks
Devesh Rajadhyax, 2024

11.
Retrieval-Augmented Generation (RAG): Empowering Large Language Models (LLMs)
Ray Islam (Mohammad Rubyet Islam), 2023

12.
Learn Python Generative AI: Journey from Autoencoders to Transformers to Large Language Models (English Edition)
Indrajit Kar, 2024

13.
Natural Language Understanding with Python: Combine Natural Language Technology, Deep Learning, and Large Language Models to Create Human-like Language Comprehension in Computer Systems
Deborah A. Dahl, 2023

14.
Developing Apps with GPT-4 and ChatGPT
Olivier Caelen, 2023

15.
Generative Deep Learning
David Foster, 2022

16.
Foundation Models for Natural Language Processing: Pre-trained Language Models Integrating Media
Gerhard Paass, 2023

17.
What is ChatGPT Doing ... and why Does it Work?
Stephen Wolfram, 2023

18.
Artificial Intelligence and Large Language Models: An Introduction to the Technological Future
Al-Sakib Khan Pathan, 2024

19.
Large Language Model-Based Solutions: How to Deliver Value with Cost-Effective Generative AI Applications
Shreyas Subramanian, 2024

20.
Introduction to Transformers for NLP: With the Hugging Face Library and Models to Solve Problems
Shashank Mohan Jain, 2022

21.
Generative AI for Leaders
Amir Husain, 2023

22.
Machine Learning Engineering with Python: Manage the Production Life Cycle of Machine Learning Models Using MLOps with Practical Examples
Andrew P. McMahon, 2021

23.
Artificial Intelligence Fundamentals for Business Leaders: Up to Date With Generative AI
I. Almeida, 2023

24.
Transformers For Natural Language Processing: Build, Train, and Fine-tune Deep Neural Network Architectures for NLP with Python, Hugging Face, and OpenAI's GPT-3, ChatGPT, and GPT-4
Denis Rothman, 2022

Saturday, March 9, 2024

What is an RDD in PySpark?

RDD, which stands for Resilient Distributed Dataset, is a fundamental data structure in Apache Spark, a distributed computing framework for big data processing. RDDs are immutable, partitioned collections of objects that can be processed in parallel across a cluster of machines. The term "resilient" in RDD refers to the fault-tolerance feature, meaning that RDDs can recover lost data due to node failures.

Here are some key characteristics and properties of RDDs in PySpark:

# Immutable: Once created, RDDs cannot be modified. However, you can transform them into new RDDs by applying various operations.

# Distributed: RDDs are distributed across multiple nodes in a cluster, allowing for parallel processing.

# Partitioned: RDDs are divided into partitions, which are the basic units of parallelism. Each partition can be processed independently on different nodes.

# Lazy Evaluation: Transformations on RDDs are lazily evaluated, meaning that the execution is deferred until an action is triggered. This helps optimize the execution plan and avoid unnecessary computations.

# Fault-Tolerant: RDDs track the lineage information to recover lost data in case of node failures. This is achieved through the ability to recompute lost partitions based on the transformations applied to the original data.

In PySpark, you can create RDDs from existing data in memory or by loading data from external sources such as HDFS, HBase, or other storage systems. Once created, you can perform various transformations (e.g., map, filter, reduce) and actions (e.g., count, collect, save) on RDDs.

However, it's worth noting that while RDDs were the primary abstraction in earlier versions of Spark, newer versions have introduced higher-level abstractions like DataFrames and Datasets, which provide a more structured and optimized API for data manipulation and analysis. These abstractions are built on top of RDDs and offer better performance and ease of use in many scenarios.