Showing posts with label Python. Show all posts
Showing posts with label Python. Show all posts

Tuesday, April 23, 2024

What does np.meshgrid() do?

The np.meshgrid() function in NumPy is used to generate a two-dimensional grid of coordinates from one-dimensional arrays representing coordinates along each axis. Here's a more detailed explanation:

Purpose:

  • It takes two (or more in higher dimensions) one-dimensional arrays and creates two (or more) new two-dimensional arrays (meshes).
  • Each element in the output meshes corresponds to a combination of elements from the original input arrays.

How it Works:

  1. Input Arrays:

    • It requires two one-dimensional arrays (x and y) representing the coordinates along two axes (usually X and Y).
    • These arrays define the range of values for each dimension in the grid.
  2. Output Meshes:

    • It returns two new two-dimensional arrays (XX and YY) with the same shape.
    • Each element in XX represents the X-coordinate for a specific point in the grid.
    • Each element in YY represents the Y-coordinate for the corresponding point in the grid.
  3. Repeating Values:

    • To create all possible combinations, np.meshgrid() repeats the values from one array along rows or columns of the other array.
    • Imagine placing a copy of the X-axis values across every row of the Y-axis values to get all X, Y combinations.

Example:

Python
import numpy as np

# Define one-dimensional arrays
x = np.linspace(-2, 2, 7)  # 7 values between -2 and 2
y = np.linspace(-1, 1, 5)  # 5 values between -1 and 1

# Create mesh grids
X, Y = np.meshgrid(x, y)

# Print the shapes of the grids
print("X:", X.shape)
print("Y:", Y.shape)

# X will have the shape (5, 7) and Y will also have the shape (5, 7)
# Each element in X will be a value from the x array
# Each element in Y will be a value from the y array

Applications:

  • np.meshgrid() is commonly used to evaluate functions over grids in two or more dimensions.
  • It's a helpful tool for creating data for visualization tasks, especially when working with surfaces or 3D plots.
  • It can be used in various scientific computing applications where evaluating a function over a grid is necessary.
Tags: Technology,Python,Interview Preparation,NumPy,

Interview Problem on NumPy meshgrid() function and vstack() function - Create this array of numbers using Python code

""" 
Can you create an array like the two shown below using Python code:

[[-2. -1.]
 [-1. -1.]
 [ 0. -1.]
 [ 1. -1.]
 [ 2. -1.]
 [-2.  0.]
 [-1.  0.]
 [ 0.  0.]
 [ 1.  0.]
 [ 2.  0.]
 [-2.  1.]
 [-1.  1.]
 [ 0.  1.]
 [ 1.  1.]
 [ 2.  1.]]

OR 

May be this array:

[[-2. -1.]
 [ 0. -1.]
 [ 2. -1.]
 [-2.  0.]
 [ 0.  0.]
 [ 2.  0.]
 [-2.  1.]
 [ 0.  1.]
 [ 2.  1.]]

"""


import numpy as np

# Define one-dimensional arrays
x = np.linspace(-2, 2, 5)  # 5 values between -2 and 2
y = np.linspace(-1, 1, 3)  # 3 values between -1 and 1

# Create mesh grids
X, Y = np.meshgrid(x, y) 

# Print the shapes of the grids
print("X:", X.shape)
print("Y:", Y.shape)

# X will have the shape (5, 3) and Y will also have the shape (5, 3)
# Each element in X will be a value from the x array. X values change along the columns.
# Each element in Y will be a value from the y array. Y values change along the rows.

print(X)
print(Y)

Xgrid = np.vstack((X.flatten(), Y.flatten())).T
print(Xgrid)


# In order to get the second output, we need to change the input:

# Define one-dimensional arrays
x = np.linspace(-2, 2, 3)  # 3 values between -2 and 2
y = np.linspace(-1, 1, 3)  # 3 values between -1 and 1

# Rest of the code would remain the same.

Side Note

You may be wondering where this might be used. The answer is that it can be used in creating dummy data that is 2 Dimensional (excluding the target values) in Machine Learning projects.
Tags: Technology,Python,NumPy,Interview Preparation,

Monday, April 15, 2024

Maximum Slice Problem (using Kadane's Algo and Dynamic Programming)

Maximum slice problem

Let’s define a problem relating to maximum slices. You are given a sequence of n integers a0 , a1 , . . . , an−1 and the task is to find the slice with the largest sum. More precisely, we are looking for two indices p, q such that the total ap + ap+1 + . . . + aq is maximal. We assume that the slice can be empty and its sum equals 0.
In the picture, the slice with the largest sum is highlighted in gray. The sum of this slice equals 10 and there is no slice with a larger sum. Notice that the slice we are looking for may contain negative integers, as shown above.

In other words...

A non-empty array A consisting of N integers is given. A pair of integers (P, Q), such that 0 ≤ P ≤ Q < N, is called a slice of array A. The sum of a slice (P, Q) is the total of A[P] + A[P+1] + ... + A[Q].

Write a function:

    def solution(A)

that, given an array A consisting of N integers, returns the maximum sum of any slice of A.

For example, given array A such that:
A[0] = 3  A[1] = 2  A[2] = -6
A[3] = 4  A[4] = 0

the function should return 5 because:

        (3, 4) is a slice of A that has sum 4,
        (2, 2) is a slice of A that has sum −6,
        (0, 1) is a slice of A that has sum 5,
        no other slice of A has sum greater than (0, 1).

Write an efficient algorithm for the following assumptions:

        N is an integer within the range [1..1,000,000];
        each element of array A is an integer within the range [−1,000,000..1,000,000];
        the result will be an integer within the range [−2,147,483,648..2,147,483,647].

Solution

Please solve this coding puzzle

The image you sent describes the Maximum Slice Problem. It is a coding challenge where you’re given an array of integers and you need to find the subarray (slice) with the largest sum. The slice can be empty, in which case the sum is zero.

Here’s how to solve this problem:

  1. Kadane’s Algorithm: There is an efficient algorithm to solve the maximum slice problem called Kadane’s Algorithm. It uses dynamic programming to find the maximum sum subarray in linear time (O(n)).

The idea is to keep track of two variables:

  • current_sum: This variable stores the sum of the current subarray.
  • max_sum: This variable stores the maximum sum of any subarray seen so far.

We iterate through the array and at each element, we do the following: * If the current_sum is less than zero, we restart the current subarray by setting it to the current element. * Otherwise, we add the current element to the current_sum. * We then compare the current_sum with the max_sum and update the max_sum if the current_sum is greater.

  1. Implementation: Here’s how you can implement Kadane’s Algorithm in Python:
Python
def max_slice_sum(arr):
      """
      Finds the maximum sum of a contiguous subarray in a given array.
    
      Args:
        arr: A list of integers.
    
      Returns:
        The maximum sum of a contiguous subarray in the array.
      """
      current_sum = max_sum = arr[0]
      for num in arr[1:]:
        current_sum = max(num, current_sum + num)
        max_sum = max(max_sum, current_sum)
      return max_sum
    

This function iterates over the array ‘arr’ and calculates the maximum sum of a slice.

You can call the function like this:

arr = [5, -7, 8, -4, 1]
    print(max_slice_sum(arr))  # Output: 8
    

In this example, the maximum sum of a slice is 8 (8).

I hope this helps!

Can you explain this code to a layman?

Imagine you're walking a trail and collecting rocks. Each rock has a weight, positive if it's nice and heavy, negative if it's a drag. You want to find the biggest haul you can carry, but you can't go back and drop rocks you've already picked up.

This code is like figuring out the heaviest load you can carry on that trail.

Here's how it works:

  1. Two piles:

    • Current pile: This is the pile of rocks you're carrying right now. It starts with just the first rock on the trail (the first number in the list).
    • Best pile ever: This keeps track of the heaviest pile you've found so far on the trail.
  2. Walking the trail:

    • You go through each rock on the trail (except the first one, since that's already in your current pile).
    • At each rock, you check your current pile:
      • If the current pile is too heavy (negative weight), you ditch it and start fresh with just this new rock.
      • If the current pile is okay (positive or zero weight), you add this new rock to it.
    • No matter what you did with the current pile, you then compare its weight to the "best pile ever." If the current pile is heavier, then the current pile becomes the new "best pile ever."
  3. The answer:

    • Once you've finished going through all the rocks, the "best pile ever" holds the heaviest load you could have carried.

The code uses fancy names for these piles ("current_sum" and "max_sum"), but that's the basic idea. It keeps track of the best possible haul you can make as you walk the trail (process the list).

Please listen carefully. 
What is the difference between following two functions:

def golden_max_slice(A):
    max_ending = max_slice = 0
    for a in A:
        max_ending = max(0, max_ending + a)
        max_slice = max(max_slice, max_ending)
    return max_slice

def max_slice_sum(arr):
    """
    Finds the maximum sum of a contiguous subarray in a given array.

    Args:
    arr: A list of integers.

    Returns:
    The maximum sum of a contiguous subarray in the array.
    """
    current_sum = max_sum = arr[0]
    for num in arr[1:]:
    current_sum = max(num, current_sum + num)
    max_sum = max(max_sum, current_sum)
    return max_sum

The two functions you provided, golden_max_slice and max_slice_sum, both solve the same problem: finding the maximum sum of a contiguous subarray in a given list of integers. However, they achieve this in slightly different ways:

1. Variable Naming:

  • max_slice_sum: This function uses more descriptive names.
    • current_sum: Clearly indicates the current sum of the subarray being considered.
    • max_sum: Represents the maximum sum found so far.
  • golden_max_slice: Uses shorter names.
    • max_ending: Less descriptive but conveys the idea of the maximum sum ending at the current element.
    • max_slice: Same as in max_slice_sum.

2. Handling Negative Sums:

  • max_slice_sum: This function explicitly restarts the current_sum to be the current element if it becomes negative. This ensures that the current subarray only considers elements that contribute positively to the sum.
  • golden_max_slice: This function implicitly handles negative sums. It only adds the current element to max_ending if the resulting sum would be positive. If the sum becomes negative, max_ending effectively restarts at 0 (since max(0, a + b) will be 0 for negative values of a).

In essence:

  • Both functions implement Kadane's Algorithm.
  • max_slice_sum might be easier to understand for beginners due to its clear variable names and explicit handling of negative sums.
  • golden_max_slice is a more concise approach that achieves the same result.

Efficiency:

Both functions have a time complexity of O(n), which is optimal for this problem as they iterate through the list once.

Tags: Technology,Algorithms,Python,

Tuesday, March 19, 2024

Create a plain HTML form and Link it to a simple Flask API to display the contents of the form

HTML: index page

<div class="container"> <h2>User Registration</h2> <form action="http://127.0.0.1:5000/submit" method="post"> <div class="form-group"> <label for="username">Username:</label> <input type="text" id="username" name="username" required> </div> <div class="form-group"> <label for="password">Password:</label> <input type="password" id="password" name="password" required> </div> <div class="form-group"> <label for="confirm_password">Confirm Password:</label> <input type="password" id="confirm_password" name="confirm_password" required> </div> <button type="submit" class="btn" data-bind="click: customRegister">Register</button> </form> </div>

Python Code For Flask API

from flask import Flask, redirect, render_template, request, url_for import json, requests from flask_cors import CORS, cross_origin from flask import Response, jsonify app=Flask(__name__) @app.route('/') def welcome(): return render_template("index.html") @app.route("/submit", methods=["POST"]) def submit(): print(request.data) username = "" password = "" retypepassword = "" if request.method=="POST": username = request.form["username"] password = request.form["password"] retypepassword = request.form["confirm_password"] print(username, password, retypepassword) # return render_template("result.html", username=username, password=password, retypepassword=retypepassword) return "Hi, " + username if __name__=='__main__': app.run(debug=True)

Friday, March 8, 2024

Voracious Fish (A problem on the concept of Stacks)

Fish: N voracious fish are moving along a river. Calculate how many fish are alive.

You are given two non-empty arrays A and B consisting of N integers. Arrays A and B represent N voracious fish in a river, ordered downstream along the flow of the river.

The fish are numbered from 0 to N − 1. If P and Q are two fish and P < Q, then fish P is initially upstream of fish Q. Initially, each fish has a unique position.

Fish number P is represented by A[P] and B[P]. Array A contains the sizes of the fish. All its elements are unique. Array B contains the directions of the fish. It contains only 0s and/or 1s, where:

0 represents a fish flowing upstream,
1 represents a fish flowing downstream.
If two fish move in opposite directions and there are no other (living) fish between them, they will eventually meet each other. Then only one fish can stay alive − the larger fish eats the smaller one. More precisely, we say that two fish P and Q meet each other when P < Q, B[P] = 1 and B[Q] = 0, and there are no living fish between them. After they meet:

If A[P] > A[Q] then P eats Q, and P will still be flowing downstream,
If A[Q] > A[P] then Q eats P, and Q will still be flowing upstream.
We assume that all the fish are flowing at the same speed. That is, fish moving in the same direction never meet. The goal is to calculate the number of fish that will stay alive.

For example, consider arrays A and B such that:

    A[0] = 4    B[0] = 0
    A[1] = 3    B[1] = 1
    A[2] = 2    B[2] = 0
    A[3] = 1    B[3] = 0
    A[4] = 5    B[4] = 0
Initially all the fish are alive and all except fish number 1 are moving upstream. Fish number 1 meets fish number 2 and eats it, then it meets fish number 3 and eats it too. Finally, it meets fish number 4 and is eaten by it. The remaining two fish, number 0 and 4, never meet and therefore stay alive.

Write a function:

def solution(A, B)

that, given two non-empty arrays A and B consisting of N integers, returns the number of fish that will stay alive.

For example, given the arrays shown above, the function should return 2, as explained above.

Write an efficient algorithm for the following assumptions:

N is an integer within the range [1..100,000];
each element of array A is an integer within the range [0..1,000,000,000];
each element of array B is an integer that can have one of the following values: 0, 1;
the elements of A are all distinct.

 
class Fish():
    def __init__(self, size, direction):
        self.size = size
        self.direction = direction

def solution(A, B):

    stack = []
    survivors = 0

    for i in range(len(A)):

        if B[i] == 1:
            stack.append(A[i])
            
        else:
            weightdown = stack.pop() if stack else -1
            while weightdown != -1 and weightdown < A[i]: 
                weightdown = stack.pop() if stack else -1             

            if weightdown == -1:
                survivors += 1
            else:
                stack.append(weightdown)

    return survivors + len(stack)


Correctness tests

▶ extreme_small
1 or 2 fishes

▶ simple1
simple test

▶ simple2
simple test

▶ small_random
small random test, N = ~100

Performance tests

▶ medium_random
small medium test, N = ~5,000

▶ large_random
large random test, N = ~100,000

▶ extreme_range1
all except one fish flowing in the same direction

▶ extreme_range2
all fish flowing in the same direction

Tuesday, March 5, 2024

Brackets (A problem related to Stacks)

Problem

Brackets: Determine whether a given string of parentheses (multiple types) is properly nested. A string S consisting of N characters is considered to be properly nested if any of the following conditions is true: S is empty; S has the form "(U)" or "[U]" or "{U}" where U is a properly nested string; S has the form "VW" where V and W are properly nested strings. For example, the string "{[()()]}" is properly nested but "([)()]" is not. Write a function: class Solution { public int solution(String S); } that, given a string S consisting of N characters, returns 1 if S is properly nested and 0 otherwise. For example, given S = "{[()()]}", the function should return 1 and given S = "([)()]", the function should return 0, as explained above. Write an efficient algorithm for the following assumptions: N is an integer within the range [0..200,000]; string S is made only of the following characters: '(', '{', '[', ']', '}' and/or ')'.

Solution

Task Score: 87% Correctness: 100% Performance: 80% class Node(): def __init__(self, x): self.x = x self.next = None class Stack(): # head is default NULL def __init__(self): self.head = None # Checks if stack is empty def isempty(self): if self.head == None: return True else: return False def push(self, x): if self.head == None: self.head = Node(x) else: newnode = Node(x) newnode.next = self.head self.head = newnode def pop(self): if self.head == None: return None else: popped_node = self.head self.head = self.head.next popped_node.next = None return popped_node.x # Returns the head node data def peek(self): if self.isempty(): return None else: return self.head.x def solution(S): s = Stack() for i in range(len(S)): if S[i] in ['(', '{', '[']: s.push(S[i]) elif (S[i] == ')' and s.peek() == '(') or (S[i] == '}' and s.peek() == '{') or (S[i] == ']' and s.peek() == '['): s.pop() if s.isempty(): return 1 else: return 0

Number of disc intersections (Problem in sorting and searching)

Problem

Number Of Disc Intersections: Compute the number of intersections in a sequence of discs. We draw N discs on a plane. The discs are numbered from 0 to N − 1. An array A of N non-negative integers, specifying the radiuses of the discs, is given. The J-th disc is drawn with its center at (J, 0) and radius A[J]. We say that the J-th disc and K-th disc intersect if J ≠ K and the J-th and K-th discs have at least one common point (assuming that the discs contain their borders). The figure below shows discs drawn for N = 6 and A as follows: A[0] = 1 A[1] = 5 A[2] = 2 A[3] = 1 A[4] = 4 A[5] = 0 There are eleven (unordered) pairs of discs that intersect, namely: discs 1 and 4 intersect, and both intersect with all the other discs; disc 2 also intersects with discs 0 and 3. Write a function in Java with following signature: class Solution { public int solution(int[] A); } And in Python with following signature: def solution(A): that, given an array A describing N discs as explained above, returns the number of (unordered) pairs of intersecting discs. The function should return −1 if the number of intersecting pairs exceeds 10,000,000. Given array A shown above, the function should return 11, as explained above. Write an efficient algorithm for the following assumptions: N is an integer within the range [0..100,000]; each element of array A is an integer within the range [0..2,147,483,647].

Solution

WITHOUT BINARY SEARCH

Task Score: 62% Correctness: 100% Performance: 25% ~~~ def solution(A): # Implement your solution here # Get the start and end indices of the disks and sort them based on the start indices start_and_end = [] for i in range(len(A)): temp = { 'start': i - A[i], 'end': i + A[i] } start_and_end.append(temp) start_and_end.sort(key = lambda d: d['start']) # Search for the starting index bigger than the current end index intersections = [] for i in range(len(start_and_end)): end = start_and_end[i]['end'] cnt_intersections = 0 for j in range(i+1, len(start_and_end)): if start_and_end[j]['start'] <= end: cnt_intersections += 1 else: break intersections.append(cnt_intersections) return sum(intersections) Ref

Thursday, February 8, 2024

Hindi to English Learning (Version 2)

Username: 

Password: 



Try: ashish/monkey

अध्याय चुनें:

इन वाक्यों को हिंदी से अंग्रेजी में बदलें:

Sentence:




Thursday, January 25, 2024

Triangle formation from three sides (A problem on sorting technique)

Triangle

Determine whether a triangle can be built from a given set of edges.

Complexity: Easy

Problem

An array A consisting of N integers is given. A triplet (P, Q, R) is triangular if 0 ≤ P < Q < R < N and:

A[P] + A[Q] > A[R],

A[Q] + A[R] > A[P],

A[R] + A[P] > A[Q].

Write a function:

def solution(A)

that, given an array A consisting of N integers, returns 1 if there exists a triangular triplet for this array and returns 0 otherwise.

Write an efficient algorithm for the following assumptions:

N is an integer within the range [0..100,000];

each element of array A is an integer within the range [−2,147,483,648..2,147,483,647].

Example

For example, consider array A such that:

A[0] = 10 A[1] = 2 A[2] = 5

A[3] = 1 A[4] = 8 A[5] = 20

Triplet (0, 2, 4) is triangular.

For example, given array A such that:

A[0] = 10 A[1] = 2 A[2] = 5

A[3] = 1 A[4] = 8 A[5] = 20

the function should return 1, as explained above.

Given array A such that:

A[0] = 10 A[1] = 50 A[2] = 5, A[3] = 1

the function should return 0.

Code

def solution(A):
    N = len(A)
    # If the array has less than 3 elements, no triangular triplet is possible
    if N < 3:
        return 0
    # Sort the array in ascending order
    A.sort()
    # Iterate through the sorted array
    for i in range(N - 2):
        # Check if the triplet conditions are satisfied
        if A[i] + A[i + 1] > A[i + 2]:
            return 1
    # No triangular triplet found
    return 0

Detected time complexity:

O(N*log(N))

Tests

Performance tests

Large1: chaotic sequence with values from [0..100K], length=10K

Large2: 1 followed by an ascending sequence of ~50K elements from [0..100K], length=~50K

large_random: chaotic sequence of values from [0..1M], length=100K

large_negative: chaotic sequence of negative values from [-1M..-1], length=100K

large_negative2: chaotic sequence of negative values from [-10..-1], length=100K

large_negative3: sequence of -1 value, length=100K

Correctness tests

extreme_empty: empty sequence

extreme_single: 1-element sequence

extreme_two_elems: 2-element sequence

extreme_negative1: three equal negative numbers

extreme_arith_overflow1: overflow test, 3 MAXINTs

extreme_arith_overflow2: overflow test, 10 and 2 MININTs

extreme_arith_overflow3: overflow test, 0 and 2 MAXINTs

Medium1: chaotic sequence of values from [0..100K], length=30

Medium2: chaotic sequence of values from [0..1K], length=50

Medium3: chaotic sequence of values from [0..1K], length=100

Why this solution works?

# Check if the triplet conditions are satisfied. Given that A is sorted.

if A[i] + A[i + 1] > A[i + 2]:
    return 1

If you look at this condition, given that A is sorted:

The other two conditions, viz:

A[i+2] + A[i] > A[i+1]

And A[i+2] + A[i+1] > A[i]

Would be satisfied automatically since A[i+2] > A[i+1] and then for second one: A[i+1] (or A[i+2]) > A[i]

Now, what if A[x] and A[y] are supposed to be far apart. Even then:

If let’s say we have this condition met A[3] + A[10] > A[11], then this condition would be met by all the numbers larger than 3.

Thinking Traps

Read what is being asked in the question: Existence of three numbers that can form sides of a triangle.

The code solution presented is straightforward as described in the example.

But if you just twist your thinking a little bit:

Trap 1:

Instead of looking for three numbers that are close, you start looking at the extremes where numbers are on the extremes, you end up with border cases of the solution.

As in: Two large numbers (roughly equal) and one small : Case of triangle

This number that is small could take values large enough to be just smaller than the other two large numbers.

And two very small numbers and one large number: Not a case of triangle

Trap 2:

Instead of starting with the condition P+Q > R, P+R > Q and R+Q > P, you start looking at modifications of these conditions as in P > abs(Q-R).

What if a triplet exists but numbers are far apart?

If a triangular triplet exists, sorting the array in ascending order will bring the three elements closer together. This is because sorting the array ensures that the values are in non-decreasing order, and if a triplet exists, the smallest and middle values of the triplet will be adjacent after sorting.

Let's consider an example:

Original array: [10, 2, 5, 1, 8, 20]

After sorting: [1, 2, 5, 8, 10, 20]

If there exists a triangular triplet, say (P, Q, R), where P < Q < R, after sorting, the values A[P], A[Q], and A[R] will be adjacent in the sorted array. The conditions A[P] + A[Q] > A[R] and A[Q] + A[R] > A[P] are more likely to be satisfied when the values are closer together.

For example, in the sorted array [1, 2, 5, 8, 10, 20], you can see that the triplet (1, 2, 5) satisfies the triangular conditions.

Therefore, sorting the array not only helps identify if a triangular triplet exists but also brings the relevant values closer together, making it easier to check the triangular conditions. If a triplet exists but the values are far apart in the original array, sorting will still reveal the existence of a triplet.

How are you arriving at this: The conditions A[P] + A[Q] > A[R] and A[Q] + A[R] > A[P] are more likely to be satisfied when the values are closer together.

Let me clarify:

The conditions A[P] + A[Q] > A[R] and A[Q] + A[R] > A[P] are requirements for any valid triangle, and they are not affected by the proximity of values in a sorted array. These conditions are fundamental to the triangle inequality theorem.

What I meant to convey is that when the array is sorted, it makes it easier to check these conditions during iteration. Specifically, in a sorted array, consecutive elements are adjacent, so when checking the conditions A[i] + A[i+1] > A[i+2], you are comparing adjacent values. This simplifies the process of checking the triangular conditions as you move through the sorted array.

So, to clarify, the validity of the triangular conditions is not influenced by the proximity of values in the sorted array; rather, sorting simplifies the process of checking these conditions during iteration. I appreciate your understanding, and I hope this clarifies the point.

Tags: Technology,Algorithms,Python

Max Product of Three (A problem on sorting technique)

MaxProductOfThree

Maximize A[P] * A[Q] * A[R] for any triplet (P, Q, R).

Complexity: Easy

Problem

A non-empty array A consisting of N integers is given. The product of triplet (P, Q, R) equates to A[P] * A[Q] * A[R] (0 ≤ P < Q < R < N).

Your goal is to find the maximal product of any triplet.

Write a function:

def solution(A)

that, given a non-empty array A, returns the value of the maximal product of any triplet.

Write an efficient algorithm for the following assumptions:

N is an integer within the range [3..100,000];

each element of array A is an integer within the range [−1,000..1,000].

Example

For example, array A such that:

A[0] = -3

A[1] = 1

A[2] = 2

A[3] = -2

A[4] = 5

A[5] = 6

contains the following example triplets:

(0, 1, 2), product is −3 * 1 * 2 = −6

(1, 2, 4), product is 1 * 2 * 5 = 10

(2, 4, 5), product is 2 * 5 * 6 = 60

The function should return 60, as the product of triplet (2, 4, 5) is maximal.

Code

def solution(A):
    A.sort()
    # multiplication two large negative numbers with a positive number
    p = A[0] * A[1] * A[-1]
    
    # multiplication of three positive numbers 
    q = A[-1] * A[-2] * A[-3]
    return max(p, q)

Detected time complexity:
O(N * log(N))

Tests

Correctness tests

one_triple

three elements

simple1

simple tests

simple2

simple tests

small_random

random small, length = 100

Performance tests

medium_range

-1000, -999, ... 1000, length = ~1,000

medium_random

random medium, length = ~10,000

large_random

random large, length = ~100,000

large_range

2000 * (-10..10) + [-1000, 500, -1]

extreme_large

(-2, .., -2, 1, .., 1) and (MAX_INT)..(MAX_INT), length = ~100,000

Tags: Technology,Algorithms,Python

Count distinct elements in an array (A problem on sorting technique)

Problem

Write a function

def solution(A)

that, given an array A consisting of N integers, returns the number of distinct values in array A.

For example, given array A consisting of six elements such that:

A[0] = 2 A[1] = 1 A[2] = 1

A[3] = 2 A[4] = 3 A[5] = 1

the function should return 3, because there are 3 distinct values appearing in array A, namely 1, 2 and 3.

Write an efficient algorithm for the following assumptions:

N is an integer within the range [0..100,000];

each element of array A is an integer within the range [−1,000,000..1,000,000].

Solution (1) using set()

def solution(A):

    s = set(A)
    return len(s)

Detected time complexity:

O(N*log(N)) or O(N)

Test Cases

Correctness tests

extreme_empty: empty sequence

extreme_single: sequence of one element

extreme_two_elems: sequence of three distinct elements

extreme_one_value: sequence of 10 equal elements

extreme_negative: sequence of negative elements, length=5

extreme_big_values: sequence with big values, length=5

Medium1: chaotic sequence of value sfrom [0..1K], length=100

Medium2: chaotic sequence of value sfrom [0..1K], length=200

Medium3: chaotic sequence of values from [0..10], length=200

Performance tests

chaotic sequence of values from [0..100K], length=10K

large_random1

chaotic sequence of values from [-1M..1M], length=100K

large_random2

another chaotic sequence of values from [-1M..1M], length=100K

Sol (2) using dict() and then keys().len(): Straightforward to implement
Sol (3) without using set() or dict()

In this slide we discuss “Sol (3) without using set() or dict()”:

In this solution, we would use the sort() method of the array object.

def solution(A):
    A.sort()

    rtn = 1

    if len(A) == 0:
        return 0
    else:
        for i in range(1, len(A)):
            if A[i] != A[i-1]:
                rtn += 1

        return rtn
Tags: Technology,Algorithms,Python

Monday, January 15, 2024

Min Avg Two Slice (Problem on Prefix Sums With Medium Complexity)

Min Avg Two Slice: Find the minimal average of any slice containing at least two elements. (Complexity: Medium)

Problem

A non-empty array A consisting of N integers is given. A pair of integers (P, Q), such that 0 ≤ P < Q < N, is called a slice of array A (notice that the slice contains at least two elements). The average of a slice (P, Q) is the sum of A[P] + A[P + 1] + ... + A[Q] divided by the length of the slice. To be precise, the average equals (A[P] + A[P + 1] + ... + A[Q]) / (Q - P + 1).

The goal is to find the starting position of a slice whose average is minimal.

Write a function:

def solution(A)

that, given a non-empty array A consisting of N integers, returns the starting position of the slice with the minimal average. If there is more than one slice with a minimal average, you should return the smallest starting position of such a slice.

Write an efficient algorithm for the following assumptions:

N is an integer within the range [2..100,000];

each element of array A is an integer within the range [-10,000..10,000].

Example

For example, array A such that:

A[0] = 4

A[1] = 2

A[2] = 2

A[3] = 5

A[4] = 1

A[5] = 5

A[6] = 8

contains the following example slices:

slice (1, 2), whose average is (2 + 2) / 2 = 2;

slice (3, 4), whose average is (5 + 1) / 2 = 3;

slice (1, 4), whose average is (2 + 2 + 5 + 1) / 4 = 2.5.

So, the function should return the index of 1 in this case.

Hint (1)

To solve this problem efficiently, you can use the fact that the minimum average slice must be of length 2 or 3. Any larger slice can be split into smaller slices, and if the larger slice has a lower average, one of its smaller slices will also have a lower average.

Code (1)

def solution(A):
    N = len(A)

    min_avg = (A[0] + A[1]) / 2.0
    min_start = 0

    for i in range(2, N):
        # Calculate the average of a slice of length 2
        avg_2 = (A[i - 1] + A[i]) / 2.0
        if avg_2 < min_avg:
            min_avg = avg_2
            min_start = i - 1

        # Calculate the average of a slice of length 3
        avg_3 = (A[i - 2] + A[i - 1] + A[i]) / 3.0
        if avg_3 < min_avg:
            min_avg = avg_3
            min_start = i - 2

    return min_start

A = [4, 2, 2, 5, 1, 5, 8] # Example run
result = solution(A)
print(result)

This algorithm has a time complexity of O(N) because it iterates through the array only once. The key observation is that the minimal average slice of length 2 or 3 will cover the minimal average slice of any larger length.

Can we redo this code to use the concept of prefix sums?

Certainly! You can use the concept of prefix sums to optimize the calculation of slice sums. The idea is to precompute the prefix sums of the array, and then use them to calculate slice sums in constant time. Here's the modified code:

Code (2)

def solution(A):
    N = len(A)

    # Calculate prefix sums
    prefix_sums = [0] * (N + 1)
    for i in range(1, N + 1):
        prefix_sums[i] = prefix_sums[i - 1] + A[i - 1]

    # Initialize the starting position to 0 and the minimum average to the average of the first two elements
    min_avg = (A[0] + A[1]) / 2.0
    min_start = 0

    for i in range(2, N):
        # Calculate the average of a slice of length 2 using prefix sums
        avg_2 = (prefix_sums[i] - prefix_sums[i - 2]) / 2.0
        if avg_2 < min_avg:
            min_avg = avg_2
            min_start = i - 1

        # Calculate the average of a slice of length 3 using prefix sums
        avg_3 = (prefix_sums[i] - prefix_sums[i - 3]) / 3.0
        if avg_3 < min_avg:
            min_avg = avg_3
            min_start = i - 2

    return min_start

Line of reasoning

Can you explain this reasoning:

To solve this problem efficiently, you can use the fact that the minimum average slice must be of length 2 or 3. Any larger slice can be split into smaller slices, and if the larger slice has a lower average, one of its smaller slices will also have a lower average.

The reasoning behind this observation is based on the idea that if you have a slice of length greater than 3 with a lower average, you can always find a sub-slice within it that has an even lower average. Let's break it down:

Slice of length 2:
    There are only two elements, so the average is simply the sum of those two elements divided by 2.

Slice of length 3:
    If you have a slice of length 3, there are only two sub-slices within it: one of length 2 and another of length 1.
    If the average of the slice of length 3 is lower than the average of the slice of length 2, then the average of the slice of length 1 will be even lower.

Slice of length greater than 3:
    For a slice of length greater than 3, you can always find a sub-slice of length 2 or 3 within it.
    If the average of the larger slice is lower, then the average of one of its sub-slices (either of length 2 or 3) will be even lower.
    Therefore, it is sufficient to consider only slices of length 2 and 3 when searching for the minimum average slice.

This observation simplifies the problem, as you don't need to consider all possible slice lengths. By focusing on slices of length 2 and 3, you can reduce the computational complexity of the algorithm. 

Detected time complexity:O(N)

Tests

Correctness tests

double_quadruple

two or four elements

simple1

simple test, the best slice has length 3

simple2

simple test, the best slice has length 3

small_random

random, length = 100

medium_range

increasing, decreasing (legth = ~100) and small functional

Performance Tests

medium_random

random, N = ~700

large_ones

numbers from -1 to 1, N = ~100,000

large_random

random, N = ~100,000

extreme_values

all maximal values, N = ~100,000

large_sequence

many sequences, N = ~100,000

Tuesday, January 9, 2024

Genomic Range Query: Find the minimal nucleotide from a range of sequence DNA (Medium Complexity)

Problem

A DNA sequence can be represented as a string consisting of the letters A, C, G and T, which correspond to the types of successive nucleotides in the sequence. Each nucleotide has an impact factor, which is an integer. Nucleotides of types A, C, G and T have impact factors of 1, 2, 3 and 4, respectively. You are going to answer several queries of the form: What is the minimal impact factor of nucleotides contained in a particular part of the given DNA sequence?

The DNA sequence is given as a non-empty string S = S[0]S[1]...S[N-1] consisting of N characters. There are M queries, which are given in non-empty arrays P and Q, each consisting of M integers. The K-th query (0 ≤ K < M) requires you to find the minimal impact factor of nucleotides contained in the DNA sequence between positions P[K] and Q[K] (inclusive).

Write a function:

def solution(S, P, Q)

that, given a non-empty string S consisting of N characters and two non-empty arrays P and Q consisting of M integers, returns an array consisting of M integers specifying the consecutive answers to all queries.

Result array should be returned as an array of integers.

Write an efficient algorithm for the following assumptions:

N is an integer within the range [1..100,000];

M is an integer within the range [1..50,000];

each element of arrays P and Q is an integer within the range [0..N - 1];

P[K] ≤ Q[K], where 0 ≤ K < M;

string S consists only of upper-case English letters A, C, G, T.

Example

For example, consider string S = CAGCCTA and arrays P, Q such that:

P[0] = 2 Q[0] = 4

P[1] = 5 Q[1] = 5

P[2] = 0 Q[2] = 6

The answers to these M = 3 queries are as follows:

The part of the DNA between positions 2 and 4 contains nucleotides G and C (twice), whose impact factors are 3 and 2 respectively, so the answer is 2.

The part between positions 5 and 5 contains a single nucleotide T, whose impact factor is 4, so the answer is 4.

The part between positions 0 and 6 (the whole string) contains all nucleotides, in particular nucleotide A whose impact factor is 1, so the answer is 1.

The function should return the values [2, 4, 1], as explained above.

Brute Force Solution

def solution(S, P, Q):
    impact_factors = {
        'A': 1,
        'C': 2,
        'G': 3,
        'T': 4
    }

    X = [impact_factors[i] for i in S]

    # Implement your solution here
    rtn = []
    for i in range(len(P)):
        start = P[i]
        end = Q[i] + 1

        s = X[start:end]

        rtn.append(min(s))

    return rtn 

Detected time complexity:

O(N * M)

Smart Solution Using Prefix Sums (Part 1)

def solution(S, P, Q):
    N = len(S)
    M = len(P)

    impact_factors = {'A': 1, 'C': 2, 'G': 3, 'T': 4}

    # Calculate prefix sums for each nucleotide type
    prefix_sums = [[0] * (N + 1) for _ in range(4)]
    for i in range(1, N + 1):
        for j in range(4):
            prefix_sums[j][i] = prefix_sums[j][i - 1]
        prefix_sums[impact_factors[S[i - 1]] - 1][i] += 1
        print(i, S[i-1])
        print(prefix_sums)

    print(':1 ~ ~ ~ ~ ~ ~ ~ ~ ~')
    S = 'CAGCCTA'
    P = [2, 5, 0]
    Q = [4, 5, 6]
    print(solution(S, P, Q))  # Output: [2, 4, 1]

    print(':2 ~ ~ ~ ~ ~ ~ ~ ~ ~')
    print(solution('AGCT', [], []))

    print(':3 ~ ~ ~ ~ ~ ~ ~ ~ ~')
    print(solution('AACT', [], []))

This is a two part solution.

In the first part, we build prefix sum sequences for each nucleotide to track it’s occurrence.

Output (Part 1)

S = 'CAGCCTA'

:1 ~ ~ ~ ~ ~ ~ ~ ~ ~

1 C

[[0, 0, 0, 0, 0, 0, 0, 0], [0, 1, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0]]

2 A

[[0, 0, 1, 0, 0, 0, 0, 0], [0, 1, 1, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0]]

3 G

[[0, 0, 1, 1, 0, 0, 0, 0], [0, 1, 1, 1, 0, 0, 0, 0], [0, 0, 0, 1, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0]]

4 C

[[0, 0, 1, 1, 1, 0, 0, 0], [0, 1, 1, 1, 2, 0, 0, 0], [0, 0, 0, 1, 1, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0]]

5 C

[[0, 0, 1, 1, 1, 1, 0, 0], [0, 1, 1, 1, 2, 3, 0, 0], [0, 0, 0, 1, 1, 1, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0]]

6 T

[[0, 0, 1, 1, 1, 1, 1, 0], [0, 1, 1, 1, 2, 3, 3, 0], [0, 0, 0, 1, 1, 1, 1, 0], [0, 0, 0, 0, 0, 0, 1, 0]]

7 A

[[0, 0, 1, 1, 1, 1, 1, 2], [0, 1, 1, 1, 2, 3, 3, 3], [0, 0, 0, 1, 1, 1, 1, 1], [0, 0, 0, 0, 0, 0, 1, 1]]

Output (Part 2)

For 'AGCT'

:2 ~ ~ ~ ~ ~ ~ ~ ~ ~

1 A

[[0, 1, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0]]

2 G

[[0, 1, 1, 0, 0], [0, 0, 0, 0, 0], [0, 0, 1, 0, 0], [0, 0, 0, 0, 0]]

3 C

[[0, 1, 1, 1, 0], [0, 0, 0, 1, 0], [0, 0, 1, 1, 0], [0, 0, 0, 0, 0]]

4 T

[[0, 1, 1, 1, 1], [0, 0, 0, 1, 1], [0, 0, 1, 1, 1], [0, 0, 0, 0, 1]]

For 'AACT'

:3 ~ ~ ~ ~ ~ ~ ~ ~ ~

1 A

[[0, 1, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0]]

2 A

[[0, 1, 2, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0]]

3 C

[[0, 1, 2, 2, 0], [0, 0, 0, 1, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0]]

4 T

[[0, 1, 2, 2, 2], [0, 0, 0, 1, 1], [0, 0, 0, 0, 0], [0, 0, 0, 0, 1]]

Smart Solution Using Prefix Sums (Part 2)

def solution(S, P, Q):
    N = len(S)
    M = len(P)

    impact_factors = {'A': 1, 'C': 2, 'G': 3, 'T': 4}

    # Calculate prefix sums for each nucleotide type
    prefix_sums = [[0] * (N + 1) for _ in range(4)]
    for i in range(1, N + 1):
        for j in range(4):
            prefix_sums[j][i] = prefix_sums[j][i - 1]
        prefix_sums[impact_factors[S[i - 1]] - 1][i] += 1

    # Process queries
    result = []
    for k in range(M):
        start_position = P[k]
        end_position = Q[k] + 1

        for nucleotide_type in range(4):
            if prefix_sums[nucleotide_type][end_position] - prefix_sums[nucleotide_type][start_position] > 0:
                result.append(nucleotide_type + 1)
                break

    return result

    # Example usage:
    S = 'CAGCCTA'
    P = [2, 5, 0]
    Q = [4, 5, 6]
    print(solution(S, P, Q))  # Output: [2, 4, 1]

In the second part of the solution we determine from start to end if a low factor nucleotide is present or not.

Here, let seq_X be sequence from as in prefix_sums[nucleotide_type]

It takes the form:

seq_A: [0, 0, 1, 1, 1, 1, 1, 2],

seq_C: [0, 1, 1, 1, 2, 3, 3, 3],

seq_G: [0, 0, 0, 1, 1, 1, 1, 1],

seq_T: [0, 0, 0, 0, 0, 0, 1, 1]

For input sequence: CAGCCTA

Then from any of this seq, we determine whether if A is appearing between start and end using seq_A[end] - seq_A[start]

Or if C is appearing between start and end using:

seq_C[end] – seq_C[start]

Or if G is appearing between start and end using:

seq_G[end] – seq_G[start]

Or if T is appearing between start and end using:

seq_T[end] – seq_T[start]

If a nucleotide is found in this second part of the code, we record this nucleotide and break out.

Time Complexity

Detected time complexity:

O(N + M)

Tests

Correctness tests

extreme_sinlge

single character string

extreme_double

double character string

simple

simple tests

small_length_string

small length simple string

small_random

small random string, length = ~300

Performance tests

almost_all_same_letters

GGGGGG..??..GGGGGG..??..GGGGGG

large_random

large random string, length

extreme_large

all max ranges

Explanation of Code Using Bard

I'd be glad to explain the code you provided:

Here's a breakdown of the code's key steps:

  1. Initialization:

    • N stores the length of the DNA sequence S.
    • M stores the number of queries (the length of arrays P and Q).
    • impact_factors maps each nucleotide to its corresponding impact factor.
  2. Prefix Sums Calculation:

    • prefix_sums is a 2D array with 4 rows (one for each nucleotide type) and N+1 columns. It's used to store the counts of each nucleotide type up to each position in the DNA sequence.
    • The nested loops iterate through the DNA sequence and update prefix_sums accordingly:
      • The first loop iterates through each position in the sequence.
      • The second loop iterates through each nucleotide type.
      • For each position and nucleotide type, the code copies the count from the previous position and then increments the count if the current nucleotide matches the type.
  3. Query Processing:

    • result is an empty list to store the answers to the queries.
    • The outer loop iterates through each query, accessing its start and end positions from P and Q.
    • The inner loop iterates through each nucleotide type.
    • It checks if the count of the current nucleotide type within the query range (end_position - start_position) is greater than 0. If so, it means that nucleotide type is present within the range, and its impact factor is added to the result list. The loop then breaks because we've found the minimum impact factor for that query.
  4. Returning Results:

    • The function returns the result list containing the answers to all queries.

Example Usage:

  • The provided example demonstrates how to use the function with a sample DNA sequence S and query arrays P and Q.
  • The output [2, 4, 1] correctly represents the minimum impact factors for the specified query ranges in S.

Key Points:

  • The code efficiently calculates prefix sums to count the occurrences of each nucleotide type up to each position.
  • This precomputation allows for quick calculation of minimum impact factors within any query range, using simple subtractions of prefix sums.
  • The time complexity of the code is O(N + M), making it efficient for large DNA sequences and multiple queries.