survival8: March 2024

Friday, March 29, 2024

The Surprising Power of Atomic Habits (Ch 1)

WHY SMALL HABITS MAKE A BIG DIFFERENCE

Here’s how the math works out: if you can get 1 percent better each day for one year, you’ll end up thirty-seven times better by the time you’re done. Conversely, if you get 1 percent worse each day for one year, you’ll decline nearly down to zero.

1% BETTER EVERY DAY
1% worse every day for one year. (0.99)^365 = 00.03
1% better every day for one year. (1.01)^365 = 37.78

~~~

FIGURE 1: The effects of small habits compound over time. For example, if you can get just 1 percent better each day, you’ll end up with results that are nearly 37 times better after one year.

~~~

The impact created by a change in your habits is similar to the effect of shifting the route of an airplane by just a few degrees. Imagine you are flying from Los Angeles to New York City. If a pilot leaving from LAX adjusts the heading just 3.5 degrees south, you will land in Washington, D.C., instead of New York. Such a small change is barely noticeable at takeoff—the nose of the airplane moves just a few feet— but when magnified across the entire United States, you end up hundreds of miles apart.

WHAT PROGRESS IS REALLY LIKE

Imagine that you have an ice cube sitting on the table in front of you. The room is cold and you can see your breath. It is currently twenty-five degrees. Ever so slowly, the room begins to heat up.

Twenty-six degrees.
Twenty-seven.
Twenty-eight.

The ice cube is still sitting on the table in front of you.

Twenty-nine degrees.
Thirty.
Thirty-one.

Still, nothing has happened.

Then, thirty-two degrees. The ice begins to melt. A one-degree shift, seemingly no different from the temperature increases before it, has unlocked a huge change.

Breakthrough moments are often the result of many previous actions, which build up the potential required to unleash a major change. This pattern shows up everywhere. Cancer spends 80 percent of its life undetectable, then takes over the body in months. Bamboo can barely be seen for the first five years as it builds extensive root systems underground before exploding ninety feet into the air within six weeks.

Similarly, habits often appear to make no difference until you cross a critical threshold and unlock a new level of performance. In the early and middle stages of any quest, there is often a Valley of Disappointment. You expect to make progress in a linear fashion and it’s frustrating how ineffective changes can seem during the first days, weeks, and even months. It doesn’t feel like you are going anywhere. It’s a hallmark of any compounding process: the most powerful outcomes are delayed.

THE PLATEAU OF LATENT POTENTIAL

FIGURE 2: We often expect progress to be linear. At the very least, we hope it will come quickly. In reality, the results of our efforts are often delayed. It is not until months or years later that we realize the true value of the previous work we have done. This can result in a “valley of disappointment” where people feel discouraged after putting in weeks or months of hard work without experiencing any results. However, this work was not wasted. It was simply being stored. It is not until much later that the full value of previous efforts is revealed.

FORGET ABOUT GOALS, FOCUS ON SYSTEMS INSTEAD

What’s the difference between systems and goals? It’s a distinction I first learned from Scott Adams, the cartoonist behind the Dilbert comic. Goals are about the results you want to achieve. Systems are about the processes that lead to those results.

If you’re a coach, your goal might be to win a championship. Your system is the way you recruit players, manage your assistant coaches, and conduct practice.

If you’re an entrepreneur, your goal might be to build a million-dollar business. Your system is how you test product ideas, hire employees, and run marketing campaigns.

If you’re a musician, your goal might be to play a new piece. Your system is how often you practice, how you break down and tackle difficult measures, and your method for receiving feedback from your instructor.

~~~

A handful of problems arise when you spend too much time thinking about your goals and not enough time designing your systems.

Problem #1: Winners and losers have the same goals.

Problem #2: Achieving a goal is only a momentary change.

Problem #3: Goals restrict your happiness.

...Because the implicit assumption behind any goal is this: “Once I reach my goal, then I’ll be happy.”

Problem #4: Goals are at odds with long-term progress

The purpose of setting goals is to win the game. The purpose of building systems is to continue playing the game. True long-term thinking is goal-less thinking. It’s not about any single accomplishment. It is about the cycle of endless refinement and continuous improvement. Ultimately, it is your commitment to the process that will determine your progress.

~~~

A SYSTEM OF ATOMIC HABITS

If you’re having trouble changing your habits, the problem isn’t you. The problem is your system. Bad habits repeat themselves again and again not because you don’t want to change, but because you have the wrong system for change.

You do not rise to the level of your goals. You fall to the level of your systems.

Focusing on the overall system, rather than a single goal, is one of the core themes of this book. It is also one of the deeper meanings behind the word atomic. By now, you’ve probably realized that an atomic habit refers to a tiny change, a marginal gain, a 1 percent improvement. But atomic habits are not just any old habits, however small. They are little habits that are part of a larger system. Just as atoms are the building blocks of molecules, atomic habits are the building blocks of remarkable results.

Habits are like the atoms of our lives. Each one is a fundamental unit that contributes to your overall improvement. At first, these tiny routines seem insignificant, but soon they build on each other and fuel bigger wins that multiply to a degree that far outweighs the cost of their initial investment. They are both small and mighty. This is the meaning of the phrase atomic habits—a regular practice or routine that is not only small and easy to do, but also the source of incredible power; a component of the system of compound growth.

KEY POINTS... AGAIN

#1 Habits are the compound interest of self-improvement. Getting 1 percent better every day counts for a lot in the long-run.

#2 Habits are a double-edged sword. They can work for you or against you, which is why understanding the details is essential.

#3 Small changes often appear to make no difference until you cross a critical threshold. The most powerful outcomes of any compounding process are delayed. You need to be patient.

#4 An atomic habit is a little habit that is part of a larger system. Just as atoms are the building blocks of molecules, atomic habits are the building blocks of remarkable results.

#5 If you want better results, then forget about setting goals. Focus on your system instead.

#6 You do not rise to the level of your goals. You fall to the level of your systems.

Tuesday, March 19, 2024

Show All Interview Questions

User Registration

First time users, please register...

User Login

If you already have an account, please login...

#	Topic	Subtopic	Question	Options	Answer	Dated	Ques Type	Hint

Create a plain HTML form and Link it to a simple Flask API to display the contents of the form

HTML: index page


<div class="container">
    <h2>User Registration</h2>
    <form action="http://127.0.0.1:5000/submit" method="post">
        <div class="form-group">
            <label for="username">Username:</label>
            <input type="text" id="username" name="username" required>
        </div>
        
        <div class="form-group">
            <label for="password">Password:</label>
            <input type="password" id="password" name="password" required>
        </div>
        <div class="form-group">
            <label for="confirm_password">Confirm Password:</label>
            <input type="password" id="confirm_password" name="confirm_password" required>
        </div>
        <button type="submit" class="btn" data-bind="click: customRegister">Register</button>
    </form>
</div>


Python Code For Flask API


from flask import Flask, redirect, render_template, request, url_for

import json, requests

from flask_cors import CORS, cross_origin

from flask import Response, jsonify


app=Flask(__name__)

@app.route('/')
def welcome():
    return render_template("index.html")


@app.route("/submit", methods=["POST"])
def submit():
    print(request.data)
    username = ""
    password = ""
    retypepassword = ""
    if request.method=="POST":
        username = request.form["username"]
        password = request.form["password"]
        retypepassword = request.form["confirm_password"]
        print(username, password, retypepassword)

    # return render_template("result.html", username=username, password=password, retypepassword=retypepassword)
    return "Hi, " +  username
    
if __name__=='__main__':
    app.run(debug=True)

Monday, March 18, 2024

Books on Large Language Models (Mar 2024)

Download Books

1.
Quick Start Guide to Large Language Models: Strategies and Best Practices for Using ChatGPT and Other LLMs
Sinan Ozdemir, 2023

2.
GPT-3: Building Innovative NLP Products Using Large Language Models
Sandra Kublik, 2022

3.
Understanding Large Language Models: Learning Their Underlying Concepts and Technologies
Thimira Amaratunga, 2023

4.
Introduction to Large Language Models for Business Leaders: Responsible AI Strategy Beyond Fear and Hype
I. Almeida, 2023

5.
Pretrain Vision and Large Language Models in Python: End-to-end Techniques for Building and Deploying Foundation Models on AWS
Emily Webber, 2023

6.
Modern Generative AI with ChatGPT and OpenAI Models: Leverage the Capabilities of OpenAI's LLM for Productivity and Innovation with GPT3 and GPT4
Valentina Alto, 2023

7.
Generative AI with LangChain: Build Large Language Model (LLM) Apps with Python, ChatGPT, and Other LLMs
Ben Auffarth, 2023

8.
Natural Language Processing with Transformers
Lewis Tunstall, 2022

9.
Generative AI on AWS
Chris Fregly, 2023

10.
Decoding GPT: An Intuitive Understanding of Large Language Models Generative AI Machine Learning and Neural Networks
Devesh Rajadhyax, 2024

11.
Retrieval-Augmented Generation (RAG): Empowering Large Language Models (LLMs)
Ray Islam (Mohammad Rubyet Islam), 2023

12.
Learn Python Generative AI: Journey from Autoencoders to Transformers to Large Language Models (English Edition)
Indrajit Kar, 2024

13.
Natural Language Understanding with Python: Combine Natural Language Technology, Deep Learning, and Large Language Models to Create Human-like Language Comprehension in Computer Systems
Deborah A. Dahl, 2023

14.
Developing Apps with GPT-4 and ChatGPT
Olivier Caelen, 2023

15.
Generative Deep Learning
David Foster, 2022

16.
Foundation Models for Natural Language Processing: Pre-trained Language Models Integrating Media
Gerhard Paass, 2023

17.
What is ChatGPT Doing ... and why Does it Work?
Stephen Wolfram, 2023

18.
Artificial Intelligence and Large Language Models: An Introduction to the Technological Future
Al-Sakib Khan Pathan, 2024

19.
Large Language Model-Based Solutions: How to Deliver Value with Cost-Effective Generative AI Applications
Shreyas Subramanian, 2024

20.
Introduction to Transformers for NLP: With the Hugging Face Library and Models to Solve Problems
Shashank Mohan Jain, 2022

21.
Generative AI for Leaders
Amir Husain, 2023

22.
Machine Learning Engineering with Python: Manage the Production Life Cycle of Machine Learning Models Using MLOps with Practical Examples
Andrew P. McMahon, 2021

23.
Artificial Intelligence Fundamentals for Business Leaders: Up to Date With Generative AI
I. Almeida, 2023

24.
Transformers For Natural Language Processing: Build, Train, and Fine-tune Deep Neural Network Architectures for NLP with Python, Hugging Face, and OpenAI's GPT-3, ChatGPT, and GPT-4
Denis Rothman, 2022

Saturday, March 9, 2024

What is an RDD in PySpark?

RDD, which stands for Resilient Distributed Dataset, is a fundamental data structure in Apache Spark, a distributed computing framework for big data processing. RDDs are immutable, partitioned collections of objects that can be processed in parallel across a cluster of machines. The term "resilient" in RDD refers to the fault-tolerance feature, meaning that RDDs can recover lost data due to node failures.

Here are some key characteristics and properties of RDDs in PySpark:

# Immutable: Once created, RDDs cannot be modified. However, you can transform them into new RDDs by applying various operations.

# Distributed: RDDs are distributed across multiple nodes in a cluster, allowing for parallel processing.

# Partitioned: RDDs are divided into partitions, which are the basic units of parallelism. Each partition can be processed independently on different nodes.

# Lazy Evaluation: Transformations on RDDs are lazily evaluated, meaning that the execution is deferred until an action is triggered. This helps optimize the execution plan and avoid unnecessary computations.

# Fault-Tolerant: RDDs track the lineage information to recover lost data in case of node failures. This is achieved through the ability to recompute lost partitions based on the transformations applied to the original data.

In PySpark, you can create RDDs from existing data in memory or by loading data from external sources such as HDFS, HBase, or other storage systems. Once created, you can perform various transformations (e.g., map, filter, reduce) and actions (e.g., count, collect, save) on RDDs.

However, it's worth noting that while RDDs were the primary abstraction in earlier versions of Spark, newer versions have introduced higher-level abstractions like DataFrames and Datasets, which provide a more structured and optimized API for data manipulation and analysis. These abstractions are built on top of RDDs and offer better performance and ease of use in many scenarios.

5 Questions on PySpark Technology

Q1 of 5 

Which of the below Spark Core API is used to load the retail.csv file and create RDD? 

retailRDD = sc.readFile("/HDFSPATH/retail.csv") 

retailRDD = sc.parallelize("/HDFSPATH/retail.csv") 

retailRDD = sc.textFile("/HDFSPATH/retail.csv") *** 

retailRDD = sc.createFile("/HDFSPATH/retail.csv") 

Q2 of 5 

Shane works in data analytics project and needs to process Users event data (UserLogs.csv file). Which of the below code snippet can be used to split the fields with a comma as a delimiter and fetch only the first two fields from it? 

logsRDD = sc.textFile("/HDFSPATH/UserLogs.csv"); 
FieldsRDD = logsRDD.map(lambda r : r.split(",")).map(lambda r: (r[0],r[1])) *** 

logsRDD = sc.parallelize("/HDFSPATH/UserLogs.csv"); 
FieldsRDD = logsRDD.map(lambda r : r.split(",")).map(lambda r: (r[0],r[1])) 

logsRDD = sc.parallelize("/HDFSPATH/UserLogs.csv"); 
FieldsRDD = logsRDD.filter(lambda r : r.split(",")).map(lambda r: (r[0],r[1])) 

logsRDD = sc.textFile("/HDFSPATH/UserLogs.csv"); 
FieldsRDD = logsRDD.filter(lambda r : r.split(",")).map(lambda r: (r[0],r[1])) 

Q3 of 5

Consider a retail scenario where a paired RDD exists with data (ProductName, Price). Price value must be reduced by 500 as a customer discount. Which paired RDD function in spark can be used for this requirement? 

mapValues() 

keys() 

values() 

map() 

--- mapValues applies the function logic to the value part of the paired RDD without changing the key 

Q4 of 5 
Consider a banking scenario where credit card transaction logs need to be processed. The log contains CustomerID, CustomerName, CreditCard Number, and TransactionAmount fields. Which code snippet below creates a paired RDD ? 

logsRDD = sc.textFile("/HDFSPath/Logs.txt"); 

logsRDD = sc.textFile("/HDFSPath/Logs.txt"); 
LogsPairedRDD = logsRDD.map(lambda r : r.split(",")).map(lambda r: (r[0],int(r[3]))) *** 

logsRDD = sc.textFile("/HDFSPath/Logs.txt"); 
LogsPairedRDD = logsRDD.map(lambda r : r.split(",")).map(lambda r: (r[0],int(r[2]))) 

logsRDD = sc.textFile("/HDFSPath/Logs.txt").map(lambda r: (r[0],int(r[3]))) 

Q5 of 5 

Consider a Spark scenario where an array must be used as a Broadcast variable. Which of the below code snippet is used to access the broadcast variable value? 

bv = sc.broadcast(Array(100,200,300)) 
bv.getValue --- 

bv = sc.broadcast(Array(100,200,300)) 
bv.value 

bv = sc.broadcast(Array(100,200,300)) 
bv.find 

bv = sc.broadcast(Array(100,200,300)) 
bv.fetchValue 

Spark Core Challenges 

Business Scenario 

Arisconn Cars provides rental car service across the globe. To improve their customer service, the client wants to analyze periodically each car’s sensor data to repair faults and problems in the car. Sensor data from cars are streamed through events hub (data ingestion tool) into Hadoop's HDFS (distributed file system) and analyzed using Spark Core programming to find out cars generating maximum errors. This analysis would help Arisconn to send the service team to repair the cars even before they fail. Below is the Schema of the big dataset of Arisconn Cars which holds 10 million records approximately. 

[sensorID, carID, latitude, longitude, engine_speed, accelerator_pedal_position, vehicle_speed, torque_at_transmission, fuel_level, typeOfMessage, timestamp] 

typeOfMessage: INFO, WARN, ERR, DEBUG 

Arisconn has the below set of requirements to be performed against the dataset: 
Filter fields - Sensor id, Car id, Latitude, Longitude, Vehicle Speed, TypeOfMessage Filter valid records i.e., discard records containing '?' 

Filter records holding only error messages (ignore warnings and info messages) 

Apply aggregation to count number of error messages produced by cars 

Below is the Python code to implement the first three requirements. 
#Loading a text file in to an RDD 

Car_Info = sc.textFile("/HDFSPath/ArisconnDataset.txt"); 

#Referring the header of the file 

header=Car_Info.first() 

#Removing header and splitting records with ',' as delimiter and fetching relevant fields 

Car_temp = Car_Info.filter(lambda record:record!=header).map(lambda r:r.split(",")).map(lambda c:(c[0],c[1],float([2]),float(c[4]),int(c[6]),c[9])); 

#Filtering only valid records(records not starting with '?'), and f[1] refers to first field (sensorid) 
Car_Eng_Specs = Car_temp.filter(lambda f:str(f[1]).startswith("?")) 

#Filtering records holding only error messages and f[6] refers to 6th field (Typeofmessage) 
Car_Error_logs = Car_Eng_Specs.filter(lambda f:str(f[6]).startswith("ERR")) 

In the above code, Arisconn's dataset is loaded into RDD (Car_Info) 
The header of the dataset is removed and only fields (sensorid, carid, latitude, longitude, vehiclespeed, TypeOfMessage) are filtered. 
Refer to RDD Car_temp Records starting with '?' are removed. 
Refer to RDD Car_Eng_Specs. 
Records containing TypeOfMessage = "ERR" get filtered 

There are few challenges in the above code and even the fourth requirement is too complex to implement in Spark Core. 
We shall discuss this next.

Friday, March 8, 2024

Voracious Fish (A problem on the concept of Stacks)

Fish: N voracious fish are moving along a river. Calculate how many fish are alive.

You are given two non-empty arrays A and B consisting of N integers. Arrays A and B represent N voracious fish in a river, ordered downstream along the flow of the river.

The fish are numbered from 0 to N − 1. If P and Q are two fish and P < Q, then fish P is initially upstream of fish Q. Initially, each fish has a unique position.

Fish number P is represented by A[P] and B[P]. Array A contains the sizes of the fish. All its elements are unique. Array B contains the directions of the fish. It contains only 0s and/or 1s, where:

0 represents a fish flowing upstream,
1 represents a fish flowing downstream.
If two fish move in opposite directions and there are no other (living) fish between them, they will eventually meet each other. Then only one fish can stay alive − the larger fish eats the smaller one. More precisely, we say that two fish P and Q meet each other when P < Q, B[P] = 1 and B[Q] = 0, and there are no living fish between them. After they meet:

If A[P] > A[Q] then P eats Q, and P will still be flowing downstream,
If A[Q] > A[P] then Q eats P, and Q will still be flowing upstream.
We assume that all the fish are flowing at the same speed. That is, fish moving in the same direction never meet. The goal is to calculate the number of fish that will stay alive.

For example, consider arrays A and B such that:

    A[0] = 4    B[0] = 0
    A[1] = 3    B[1] = 1
    A[2] = 2    B[2] = 0
    A[3] = 1    B[3] = 0
    A[4] = 5    B[4] = 0
Initially all the fish are alive and all except fish number 1 are moving upstream. Fish number 1 meets fish number 2 and eats it, then it meets fish number 3 and eats it too. Finally, it meets fish number 4 and is eaten by it. The remaining two fish, number 0 and 4, never meet and therefore stay alive.

Write a function:

def solution(A, B)

that, given two non-empty arrays A and B consisting of N integers, returns the number of fish that will stay alive.

For example, given the arrays shown above, the function should return 2, as explained above.

Write an efficient algorithm for the following assumptions:

N is an integer within the range [1..100,000];
each element of array A is an integer within the range [0..1,000,000,000];
each element of array B is an integer that can have one of the following values: 0, 1;
the elements of A are all distinct.

 
class Fish():
    def __init__(self, size, direction):
        self.size = size
        self.direction = direction

def solution(A, B):

    stack = []
    survivors = 0

    for i in range(len(A)):

        if B[i] == 1:
            stack.append(A[i])
            
        else:
            weightdown = stack.pop() if stack else -1
            while weightdown != -1 and weightdown < A[i]: 
                weightdown = stack.pop() if stack else -1             

            if weightdown == -1:
                survivors += 1
            else:
                stack.append(weightdown)

    return survivors + len(stack)


Correctness tests

▶ extreme_small
1 or 2 fishes

▶ simple1
simple test

▶ simple2
simple test

▶ small_random
small random test, N = ~100

Performance tests

▶ medium_random
small medium test, N = ~5,000

▶ large_random
large random test, N = ~100,000

▶ extreme_range1
all except one fish flowing in the same direction

▶ extreme_range2
all fish flowing in the same direction

Tuesday, March 5, 2024

Brackets (A problem related to Stacks)

Problem

Brackets: Determine whether a given string of parentheses (multiple types) is properly nested.

A string S consisting of N characters is considered to be properly nested if any of the following conditions is true:

S is empty;
S has the form "(U)" or "[U]" or "{U}" where U is a properly nested string;
S has the form "VW" where V and W are properly nested strings.
For example, the string "{[()()]}" is properly nested but "([)()]" is not.

Write a function:

class Solution { public int solution(String S); }

that, given a string S consisting of N characters, returns 1 if S is properly nested and 0 otherwise.

For example, given S = "{[()()]}", the function should return 1 and given S = "([)()]", the function should return 0, as explained above.

Write an efficient algorithm for the following assumptions:

N is an integer within the range [0..200,000];
string S is made only of the following characters: '(', '{', '[', ']', '}' and/or ')'.

Solution

Task Score: 87%
Correctness: 100%
Performance: 80%


class Node():
    def __init__(self, x):
        self.x = x
        self.next = None

class Stack():
    # head is default NULL
    def __init__(self):
        self.head = None

    # Checks if stack is empty
    def isempty(self):
        if self.head == None:
            return True
        else:
            return False
    
    def push(self, x):
        if self.head == None:
            self.head = Node(x)
        else:
            newnode = Node(x)
            newnode.next = self.head 
            self.head = newnode

    def pop(self):
        if self.head == None:
            return None 
        else:
            popped_node = self.head 
            self.head = self.head.next 
            popped_node.next = None 
            return popped_node.x

    # Returns the head node data
    def peek(self):
        if self.isempty():
            return None
        else:
            return self.head.x

def solution(S):
    s = Stack()

    for i in range(len(S)):
        if S[i] in ['(', '{', '[']:
            s.push(S[i])
        elif (S[i] == ')' and s.peek() == '(') or (S[i] == '}' and s.peek() == '{') or (S[i] == ']' and s.peek() == '['):
            s.pop()

    if s.isempty():
        return 1
    else:
        return 0

Number of disc intersections (Problem in sorting and searching)

Problem    

Number Of Disc Intersections: Compute the number of intersections in a sequence of discs.

We draw N discs on a plane. The discs are numbered from 0 to N − 1. An array A of N non-negative integers, specifying the radiuses of the discs, is given. The J-th disc is drawn with its center at (J, 0) and radius A[J].

We say that the J-th disc and K-th disc intersect if J ≠ K and the J-th and K-th discs have at least one common point (assuming that the discs contain their borders).

The figure below shows discs drawn for N = 6 and A as follows:

    A[0] = 1
    A[1] = 5
    A[2] = 2
    A[3] = 1
    A[4] = 4
    A[5] = 0

There are eleven (unordered) pairs of discs that intersect, namely:

discs 1 and 4 intersect, and both intersect with all the other discs;
disc 2 also intersects with discs 0 and 3.
Write a function in Java with following signature:

class Solution { public int solution(int[] A); }

And in Python with following signature:

def solution(A):

that, given an array A describing N discs as explained above, returns the number of (unordered) pairs of intersecting discs. The function should return −1 if the number of intersecting pairs exceeds 10,000,000.

Given array A shown above, the function should return 11, as explained above.

Write an efficient algorithm for the following assumptions:

N is an integer within the range [0..100,000];
each element of array A is an integer within the range [0..2,147,483,647].

Solution

WITHOUT BINARY SEARCH

Task Score: 62%

Correctness: 100%

Performance: 25%

~~~

def solution(A):
    # Implement your solution here

    # Get the start and end indices of the disks and sort them based on the start indices

    start_and_end = []
    for i in range(len(A)):
        temp = {
            'start': i - A[i],
            'end': i + A[i]
        }
        start_and_end.append(temp)

    start_and_end.sort(key = lambda d: d['start'])

    # Search for the starting index bigger than the current end index

    intersections = []
    for i in range(len(start_and_end)):
        end = start_and_end[i]['end']
        cnt_intersections = 0    
        for j in range(i+1, len(start_and_end)):
            if start_and_end[j]['start'] <= end:
                cnt_intersections += 1
            else:
                break
        intersections.append(cnt_intersections)
    return sum(intersections)


Ref

Pages