Showing posts with label Algorithms. Show all posts
Showing posts with label Algorithms. Show all posts

Monday, April 15, 2024

Maximum Slice Problem (using Kadane's Algo and Dynamic Programming)

Maximum slice problem

Let’s define a problem relating to maximum slices. You are given a sequence of n integers a0 , a1 , . . . , an−1 and the task is to find the slice with the largest sum. More precisely, we are looking for two indices p, q such that the total ap + ap+1 + . . . + aq is maximal. We assume that the slice can be empty and its sum equals 0.
In the picture, the slice with the largest sum is highlighted in gray. The sum of this slice equals 10 and there is no slice with a larger sum. Notice that the slice we are looking for may contain negative integers, as shown above.

In other words...

A non-empty array A consisting of N integers is given. A pair of integers (P, Q), such that 0 ≤ P ≤ Q < N, is called a slice of array A. The sum of a slice (P, Q) is the total of A[P] + A[P+1] + ... + A[Q].

Write a function:

    def solution(A)

that, given an array A consisting of N integers, returns the maximum sum of any slice of A.

For example, given array A such that:
A[0] = 3  A[1] = 2  A[2] = -6
A[3] = 4  A[4] = 0

the function should return 5 because:

        (3, 4) is a slice of A that has sum 4,
        (2, 2) is a slice of A that has sum −6,
        (0, 1) is a slice of A that has sum 5,
        no other slice of A has sum greater than (0, 1).

Write an efficient algorithm for the following assumptions:

        N is an integer within the range [1..1,000,000];
        each element of array A is an integer within the range [−1,000,000..1,000,000];
        the result will be an integer within the range [−2,147,483,648..2,147,483,647].

Solution

Please solve this coding puzzle

The image you sent describes the Maximum Slice Problem. It is a coding challenge where you’re given an array of integers and you need to find the subarray (slice) with the largest sum. The slice can be empty, in which case the sum is zero.

Here’s how to solve this problem:

  1. Kadane’s Algorithm: There is an efficient algorithm to solve the maximum slice problem called Kadane’s Algorithm. It uses dynamic programming to find the maximum sum subarray in linear time (O(n)).

The idea is to keep track of two variables:

  • current_sum: This variable stores the sum of the current subarray.
  • max_sum: This variable stores the maximum sum of any subarray seen so far.

We iterate through the array and at each element, we do the following: * If the current_sum is less than zero, we restart the current subarray by setting it to the current element. * Otherwise, we add the current element to the current_sum. * We then compare the current_sum with the max_sum and update the max_sum if the current_sum is greater.

  1. Implementation: Here’s how you can implement Kadane’s Algorithm in Python:
Python
def max_slice_sum(arr):
      """
      Finds the maximum sum of a contiguous subarray in a given array.
    
      Args:
        arr: A list of integers.
    
      Returns:
        The maximum sum of a contiguous subarray in the array.
      """
      current_sum = max_sum = arr[0]
      for num in arr[1:]:
        current_sum = max(num, current_sum + num)
        max_sum = max(max_sum, current_sum)
      return max_sum
    

This function iterates over the array ‘arr’ and calculates the maximum sum of a slice.

You can call the function like this:

arr = [5, -7, 8, -4, 1]
    print(max_slice_sum(arr))  # Output: 8
    

In this example, the maximum sum of a slice is 8 (8).

I hope this helps!

Can you explain this code to a layman?

Imagine you're walking a trail and collecting rocks. Each rock has a weight, positive if it's nice and heavy, negative if it's a drag. You want to find the biggest haul you can carry, but you can't go back and drop rocks you've already picked up.

This code is like figuring out the heaviest load you can carry on that trail.

Here's how it works:

  1. Two piles:

    • Current pile: This is the pile of rocks you're carrying right now. It starts with just the first rock on the trail (the first number in the list).
    • Best pile ever: This keeps track of the heaviest pile you've found so far on the trail.
  2. Walking the trail:

    • You go through each rock on the trail (except the first one, since that's already in your current pile).
    • At each rock, you check your current pile:
      • If the current pile is too heavy (negative weight), you ditch it and start fresh with just this new rock.
      • If the current pile is okay (positive or zero weight), you add this new rock to it.
    • No matter what you did with the current pile, you then compare its weight to the "best pile ever." If the current pile is heavier, then the current pile becomes the new "best pile ever."
  3. The answer:

    • Once you've finished going through all the rocks, the "best pile ever" holds the heaviest load you could have carried.

The code uses fancy names for these piles ("current_sum" and "max_sum"), but that's the basic idea. It keeps track of the best possible haul you can make as you walk the trail (process the list).

Please listen carefully. 
What is the difference between following two functions:

def golden_max_slice(A):
    max_ending = max_slice = 0
    for a in A:
        max_ending = max(0, max_ending + a)
        max_slice = max(max_slice, max_ending)
    return max_slice

def max_slice_sum(arr):
    """
    Finds the maximum sum of a contiguous subarray in a given array.

    Args:
    arr: A list of integers.

    Returns:
    The maximum sum of a contiguous subarray in the array.
    """
    current_sum = max_sum = arr[0]
    for num in arr[1:]:
    current_sum = max(num, current_sum + num)
    max_sum = max(max_sum, current_sum)
    return max_sum

The two functions you provided, golden_max_slice and max_slice_sum, both solve the same problem: finding the maximum sum of a contiguous subarray in a given list of integers. However, they achieve this in slightly different ways:

1. Variable Naming:

  • max_slice_sum: This function uses more descriptive names.
    • current_sum: Clearly indicates the current sum of the subarray being considered.
    • max_sum: Represents the maximum sum found so far.
  • golden_max_slice: Uses shorter names.
    • max_ending: Less descriptive but conveys the idea of the maximum sum ending at the current element.
    • max_slice: Same as in max_slice_sum.

2. Handling Negative Sums:

  • max_slice_sum: This function explicitly restarts the current_sum to be the current element if it becomes negative. This ensures that the current subarray only considers elements that contribute positively to the sum.
  • golden_max_slice: This function implicitly handles negative sums. It only adds the current element to max_ending if the resulting sum would be positive. If the sum becomes negative, max_ending effectively restarts at 0 (since max(0, a + b) will be 0 for negative values of a).

In essence:

  • Both functions implement Kadane's Algorithm.
  • max_slice_sum might be easier to understand for beginners due to its clear variable names and explicit handling of negative sums.
  • golden_max_slice is a more concise approach that achieves the same result.

Efficiency:

Both functions have a time complexity of O(n), which is optimal for this problem as they iterate through the list once.

Tags: Technology,Algorithms,Python,

Thursday, April 4, 2024

EquiLeader (A problem in the concept of Leader in DSA using Boyer-Moore Majority Vote Algo)

EquiLeader: Find the index S such that the leaders of the sequences A[0], A[1], ..., A[S] and A[S + 1], A[S + 2], ..., A[N - 1] are the same.

A non-empty array A consisting of N integers is given. The leader of this array is the value that occurs in more than half of the elements of A. An equi leader is an index S such that 0 ≤ S < N - 1 and two sequences A[0], A[1], ..., A[S] and A[S + 1], A[S + 2], ..., A[N - 1] have leaders of the same value. For example, given array A such that: A[0] = 4 A[1] = 3 A[2] = 4 A[3] = 4 A[4] = 4 A[5] = 2 we can find two equi leaders: 0, because sequences: (4) and (3, 4, 4, 4, 2) have the same leader, whose value is 4. 2, because sequences: (4, 3, 4) and (4, 4, 2) have the same leader, whose value is 4. The goal is to count the number of equi leaders. Write a function: def solution(A) that, given a non-empty array A consisting of N integers, returns the number of equi leaders. For example, given: A[0] = 4 A[1] = 3 A[2] = 4 A[3] = 4 A[4] = 4 A[5] = 2 the function should return 2, as explained above. Write an efficient algorithm for the following assumptions: N is an integer within the range [1..100,000]; each element of array A is an integer within the range [-1,000,000,000..1,000,000,000].

Taking help from ChatGPT

Our prompt looks like this: You are a programming guide. Here is my problem... ...Please guide how to solve this problem step by step.

To solve this problem efficiently, we can use a two-step approach:

  1. Find the dominator (leader) of the array.
  2. Count the number of equi leaders.

Here's a step-by-step guide on how to approach this problem:

Step 1: Find the Dominator

  1. Initialize two variables, candidate and count, where candidate will store the current candidate for the dominator and count will store the count of occurrences of the current candidate.

  2. Iterate through the array A:

    • If count is 0, set the current element as the candidate and set count to 1.
    • If the current element is equal to the candidate, increment count by 1.
    • If the current element is not equal to the candidate, decrement count by 1.
  3. After iterating through the array, candidate will hold a potential dominator candidate. We need to verify if it's indeed the dominator by counting its occurrences in the array.

  4. Count occurrences of candidate.

    • If the count of occurrences of candidate is not more than half the length of the array, there is no dominator.

Step 2: Count the Equi Leaders

  1. Initialize leader_count to count the total occurrences of the dominator.
  2. Initialize equi_leader_count to count the number of equi leaders.
  3. Iterate through the array again:
    • If the current element equals the dominator, increment leader_count.
    • Calculate the remaining count of the dominator from the end of the array.
    • If the count of the dominator up to the current index is more than half of the elements seen so far, and the count of the dominator from the current index to the end is more than half of the remaining elements, increment equi_leader_count.
def solution(A): candidate = None count = 0 # Find candidate for dominator for num in A: if count == 0: candidate = num count = 1 elif num == candidate: count += 1 else: count -= 1 # Count occurrences of candidate count_candidate = A.count(candidate) # If candidate is not a dominator, return 0 if count_candidate <= len(A) // 2: return 0 leader_count = 0 equi_leader_count = 0 for i, num in enumerate(A): if num == candidate: leader_count += 1 # Check if the leader appears more than half of the elements up to the current index if leader_count > (i + 1) // 2: # Check if the leader appears more than half of the remaining elements remaining_count = count_candidate - leader_count if remaining_count > (len(A) - i - 1) // 2: equi_leader_count += 1 return equi_leader_count # Example usage A = [4, 3, 4, 4, 4, 2] print(solution(A)) # Output should be 2

What is the core idea behind this algo?

The core idea behind this algorithm is to first find the dominator (leader) of the array efficiently, and then utilize the properties of the dominator to count the number of equi leaders.

  1. Finding the Dominator:

    • The algorithm efficiently finds the dominator using the Boyer-Moore Majority Vote algorithm. This algorithm allows us to find a majority element (dominator) in linear time complexity with constant space complexity.
    • By iterating through the array, the algorithm identifies a potential dominator candidate and verifies whether it's indeed the dominator by counting its occurrences in the array. If the count of occurrences is more than half of the length of the array, it confirms the presence of a dominator.
  2. Counting the Equi Leaders:

    • Once the dominator is found, the algorithm iterates through the array again to count both the total occurrences of the dominator and the number of equi leaders.
    • It keeps track of the leader count as it iterates through the array. For each index, it checks if the count of the dominator up to that index is more than half of the elements seen so far. If it is, it also checks if the count of the dominator from the current index to the end is more than half of the remaining elements. If both conditions are met, it increments the equi leader count.
    • This approach leverages the fact that an equi leader can only occur if the dominator occurs in both subsequences with counts greater than half of the respective lengths. Therefore, by efficiently tracking these counts, the algorithm can accurately count the equi leaders.

By following this two-step approach, the algorithm effectively solves the problem of finding the number of equi leaders in an array, utilizing the properties of the dominator to optimize the solution.

Is it correct the equi-leaders and dominator have the same face value?

Yes, you are correct. In the context of this problem, the equi-leaders and the dominator have the same face value. In the problem statement, an equi-leader is defined as an index S such that two sequences separated at index S have leaders of the same value. The leader of a sequence is defined as the value that occurs in more than half of the elements of that sequence. Since the equi-leader is a subset of the original array, and it's defined based on having the same leader value on both sides of the split, it follows that the dominator of the equi-leader subsequence is the same as the dominator of the original array. Therefore, in the context of this problem, the equi-leaders and the dominator indeed have the same face value.
Tags: Algorithms,Technology,

Dominator (a problem on the concept of Leader in Algorithms using Boyer-Moore Majority Vote Algo)

Dominator: Find an index of an array such that its value occurs at more than half of indices in the array.

An array A consisting of N integers is given. The dominator of array A is the value that occurs in more than half of the elements of A. For example, consider array A such that A[0] = 3 A[1] = 4 A[2] = 3 A[3] = 2 A[4] = 3 A[5] = -1 A[6] = 3 A[7] = 3 The dominator of A is 3 because it occurs in 5 out of 8 elements of A (namely in those with indices 0, 2, 4, 6 and 7) and 5 is more than a half of 8. Write a function def solution(A) that, given an array A consisting of N integers, returns index of any element of array A in which the dominator of A occurs. The function should return -1 if array A does not have a dominator. For example, given array A such that A[0] = 3 A[1] = 4 A[2] = 3 A[3] = 2 A[4] = 3 A[5] = -1 A[6] = 3 A[7] = 3 the function may return 0, 2, 4, 6 or 7, as explained above. Write an efficient algorithm for the following assumptions: N is an integer within the range [0..100,000]; each element of array A is an integer within the range [−2,147,483,648..2,147,483,647].

Taking help from ChatGPT...

Our prompt looks like this: You are a programming guide. An array A consisting of N integers is given... ... within the range [-2,147,483,648..2,147,483,647]. Please guide how to solve this problem step by step.

Algo

To solve this problem, we can use the Boyer-Moore Majority Vote algorithm. The algorithm allows us to find a majority element (dominator in this case) in linear time complexity with constant space complexity. Here's how we can solve it step by step: 1. Initialize two variables, candidate and count, where candidate will store the current candidate for the dominator and count will store the count of occurrences of the current candidate. 2. Iterate through the array A: If count is 0, set the current element as the candidate and set count to 1. If the current element is equal to the candidate, increment count by 1. If the current element is not equal to the candidate, decrement count by 1. 3. After iterating through the array, candidate will hold a potential dominator candidate. We need to verify if it's indeed the dominator by counting its occurrences in the array. 4. Iterate through the array again to count the occurrences of the candidate. If the count of occurrences of candidate is greater than half the length of the array, return the index of any occurrence of candidate. If no dominator is found, return -1. Let's implement this in code:

Code

def solution(A): candidate = None count = 0 # Find candidate for dominator for num in A: if count == 0: candidate = num count = 1 elif num == candidate: count += 1 else: count -= 1 # Count occurrences of candidate count_candidate = A.count(candidate) # Check if candidate is the dominator if count_candidate > len(A) // 2: return A.index(candidate) else: return -1 # Example usage A = [3, 4, 3, 2, 3, -1, 3, 3] print(solution(A)) # Output should be 0, 2, 4, 6, or 7
Tags: Algorithms,Technology,

Manhattan stonewall problem on the concept of Stacks

StoneWall: Cover "Manhattan skyline" using the minimum number of rectangles.

You are going to build a stone wall. The wall should be straight and N meters long, and its thickness should be constant; however, it should have different heights in different places. The height of the wall is specified by an array H of N positive integers. H[I] is the height of the wall from I to I+1 meters to the right of its left end. In particular, H[0] is the height of the wall's left end and H[N−1] is the height of the wall's right end. The wall should be built of cuboid stone blocks (that is, all sides of such blocks are rectangular). Your task is to compute the minimum number of blocks needed to build the wall. Write a function: def solution(H) that, given an array H of N positive integers specifying the height of the wall, returns the minimum number of blocks needed to build it. For example, given array H containing N = 9 integers: H[0] = 8 H[1] = 8 H[2] = 5 H[3] = 7 H[4] = 9 H[5] = 8 H[6] = 7 H[7] = 4 H[8] = 8 the function should return 7. The figure shows one possible arrangement of seven blocks.
Write an efficient algorithm for the following assumptions: N is an integer within the range [1..100,000]; each element of array H is an integer within the range [1..1,000,000,000].

Taking help from ChatGPT...

Our prompt looks like: You are a programming guide. You are going to build a stone wall... ... the function should return 7. Please guide how to solve this problem step by step.

Algo

To solve this problem, we can use a stack-based approach. The idea is to traverse through the array of heights and maintain a stack that keeps track of the heights encountered so far. We will also keep track of the number of blocks used. Here's how we can solve it step by step: 1. Initialize an empty stack and a variable to count the number of blocks used. 2. Iterate through the array of heights. 3. For each height, check whether it is greater than the top of the stack. If it is, push the height onto the stack. 4. If the height is less than the top of the stack, pop heights from the stack until the top of the stack is less than or equal to the current height. 5. For each height popped from the stack, increment the block count by 1, as each height represents a block. 6. If after popping from the stack, the stack is empty or the current height is greater than the top of the stack, push the current height onto the stack. 7. After iterating through all heights, return the block count. Let's implement this in code:

Code

def solution(H): stack = [] # Stack to keep track of heights blocks_used = 0 # Counter for blocks used for height in H: # Pop heights from stack until top is less than or equal to current height while stack and stack[-1] > height: stack.pop() blocks_used += 1 # If stack is empty or current height is greater than top of stack, push height if not stack or height > stack[-1]: stack.append(height) # Add remaining heights in stack to block count blocks_used += len(stack) return blocks_used # Example usage H = [8, 8, 5, 7, 9, 8, 7, 4, 8] print(solution(H)) # Output should be 7 This algorithm has a time complexity of O(N), where N is the number of elements in the array H, as it iterates through the array once.
Tags: Algorithms,Technology,

Friday, March 8, 2024

Voracious Fish (A problem on the concept of Stacks)

Fish: N voracious fish are moving along a river. Calculate how many fish are alive.

You are given two non-empty arrays A and B consisting of N integers. Arrays A and B represent N voracious fish in a river, ordered downstream along the flow of the river.

The fish are numbered from 0 to N − 1. If P and Q are two fish and P < Q, then fish P is initially upstream of fish Q. Initially, each fish has a unique position.

Fish number P is represented by A[P] and B[P]. Array A contains the sizes of the fish. All its elements are unique. Array B contains the directions of the fish. It contains only 0s and/or 1s, where:

0 represents a fish flowing upstream,
1 represents a fish flowing downstream.
If two fish move in opposite directions and there are no other (living) fish between them, they will eventually meet each other. Then only one fish can stay alive − the larger fish eats the smaller one. More precisely, we say that two fish P and Q meet each other when P < Q, B[P] = 1 and B[Q] = 0, and there are no living fish between them. After they meet:

If A[P] > A[Q] then P eats Q, and P will still be flowing downstream,
If A[Q] > A[P] then Q eats P, and Q will still be flowing upstream.
We assume that all the fish are flowing at the same speed. That is, fish moving in the same direction never meet. The goal is to calculate the number of fish that will stay alive.

For example, consider arrays A and B such that:

    A[0] = 4    B[0] = 0
    A[1] = 3    B[1] = 1
    A[2] = 2    B[2] = 0
    A[3] = 1    B[3] = 0
    A[4] = 5    B[4] = 0
Initially all the fish are alive and all except fish number 1 are moving upstream. Fish number 1 meets fish number 2 and eats it, then it meets fish number 3 and eats it too. Finally, it meets fish number 4 and is eaten by it. The remaining two fish, number 0 and 4, never meet and therefore stay alive.

Write a function:

def solution(A, B)

that, given two non-empty arrays A and B consisting of N integers, returns the number of fish that will stay alive.

For example, given the arrays shown above, the function should return 2, as explained above.

Write an efficient algorithm for the following assumptions:

N is an integer within the range [1..100,000];
each element of array A is an integer within the range [0..1,000,000,000];
each element of array B is an integer that can have one of the following values: 0, 1;
the elements of A are all distinct.

 
class Fish():
    def __init__(self, size, direction):
        self.size = size
        self.direction = direction

def solution(A, B):

    stack = []
    survivors = 0

    for i in range(len(A)):

        if B[i] == 1:
            stack.append(A[i])
            
        else:
            weightdown = stack.pop() if stack else -1
            while weightdown != -1 and weightdown < A[i]: 
                weightdown = stack.pop() if stack else -1             

            if weightdown == -1:
                survivors += 1
            else:
                stack.append(weightdown)

    return survivors + len(stack)


Correctness tests

▶ extreme_small
1 or 2 fishes

▶ simple1
simple test

▶ simple2
simple test

▶ small_random
small random test, N = ~100

Performance tests

▶ medium_random
small medium test, N = ~5,000

▶ large_random
large random test, N = ~100,000

▶ extreme_range1
all except one fish flowing in the same direction

▶ extreme_range2
all fish flowing in the same direction

Tuesday, March 5, 2024

Brackets (A problem related to Stacks)

Problem

Brackets: Determine whether a given string of parentheses (multiple types) is properly nested. A string S consisting of N characters is considered to be properly nested if any of the following conditions is true: S is empty; S has the form "(U)" or "[U]" or "{U}" where U is a properly nested string; S has the form "VW" where V and W are properly nested strings. For example, the string "{[()()]}" is properly nested but "([)()]" is not. Write a function: class Solution { public int solution(String S); } that, given a string S consisting of N characters, returns 1 if S is properly nested and 0 otherwise. For example, given S = "{[()()]}", the function should return 1 and given S = "([)()]", the function should return 0, as explained above. Write an efficient algorithm for the following assumptions: N is an integer within the range [0..200,000]; string S is made only of the following characters: '(', '{', '[', ']', '}' and/or ')'.

Solution

Task Score: 87% Correctness: 100% Performance: 80% class Node(): def __init__(self, x): self.x = x self.next = None class Stack(): # head is default NULL def __init__(self): self.head = None # Checks if stack is empty def isempty(self): if self.head == None: return True else: return False def push(self, x): if self.head == None: self.head = Node(x) else: newnode = Node(x) newnode.next = self.head self.head = newnode def pop(self): if self.head == None: return None else: popped_node = self.head self.head = self.head.next popped_node.next = None return popped_node.x # Returns the head node data def peek(self): if self.isempty(): return None else: return self.head.x def solution(S): s = Stack() for i in range(len(S)): if S[i] in ['(', '{', '[']: s.push(S[i]) elif (S[i] == ')' and s.peek() == '(') or (S[i] == '}' and s.peek() == '{') or (S[i] == ']' and s.peek() == '['): s.pop() if s.isempty(): return 1 else: return 0

Number of disc intersections (Problem in sorting and searching)

Problem

Number Of Disc Intersections: Compute the number of intersections in a sequence of discs. We draw N discs on a plane. The discs are numbered from 0 to N − 1. An array A of N non-negative integers, specifying the radiuses of the discs, is given. The J-th disc is drawn with its center at (J, 0) and radius A[J]. We say that the J-th disc and K-th disc intersect if J ≠ K and the J-th and K-th discs have at least one common point (assuming that the discs contain their borders). The figure below shows discs drawn for N = 6 and A as follows: A[0] = 1 A[1] = 5 A[2] = 2 A[3] = 1 A[4] = 4 A[5] = 0 There are eleven (unordered) pairs of discs that intersect, namely: discs 1 and 4 intersect, and both intersect with all the other discs; disc 2 also intersects with discs 0 and 3. Write a function in Java with following signature: class Solution { public int solution(int[] A); } And in Python with following signature: def solution(A): that, given an array A describing N discs as explained above, returns the number of (unordered) pairs of intersecting discs. The function should return −1 if the number of intersecting pairs exceeds 10,000,000. Given array A shown above, the function should return 11, as explained above. Write an efficient algorithm for the following assumptions: N is an integer within the range [0..100,000]; each element of array A is an integer within the range [0..2,147,483,647].

Solution

WITHOUT BINARY SEARCH

Task Score: 62% Correctness: 100% Performance: 25% ~~~ def solution(A): # Implement your solution here # Get the start and end indices of the disks and sort them based on the start indices start_and_end = [] for i in range(len(A)): temp = { 'start': i - A[i], 'end': i + A[i] } start_and_end.append(temp) start_and_end.sort(key = lambda d: d['start']) # Search for the starting index bigger than the current end index intersections = [] for i in range(len(start_and_end)): end = start_and_end[i]['end'] cnt_intersections = 0 for j in range(i+1, len(start_and_end)): if start_and_end[j]['start'] <= end: cnt_intersections += 1 else: break intersections.append(cnt_intersections) return sum(intersections) Ref

Thursday, January 25, 2024

Triangle formation from three sides (A problem on sorting technique)

Triangle

Determine whether a triangle can be built from a given set of edges.

Complexity: Easy

Problem

An array A consisting of N integers is given. A triplet (P, Q, R) is triangular if 0 ≤ P < Q < R < N and:

A[P] + A[Q] > A[R],

A[Q] + A[R] > A[P],

A[R] + A[P] > A[Q].

Write a function:

def solution(A)

that, given an array A consisting of N integers, returns 1 if there exists a triangular triplet for this array and returns 0 otherwise.

Write an efficient algorithm for the following assumptions:

N is an integer within the range [0..100,000];

each element of array A is an integer within the range [−2,147,483,648..2,147,483,647].

Example

For example, consider array A such that:

A[0] = 10 A[1] = 2 A[2] = 5

A[3] = 1 A[4] = 8 A[5] = 20

Triplet (0, 2, 4) is triangular.

For example, given array A such that:

A[0] = 10 A[1] = 2 A[2] = 5

A[3] = 1 A[4] = 8 A[5] = 20

the function should return 1, as explained above.

Given array A such that:

A[0] = 10 A[1] = 50 A[2] = 5, A[3] = 1

the function should return 0.

Code

def solution(A):
    N = len(A)
    # If the array has less than 3 elements, no triangular triplet is possible
    if N < 3:
        return 0
    # Sort the array in ascending order
    A.sort()
    # Iterate through the sorted array
    for i in range(N - 2):
        # Check if the triplet conditions are satisfied
        if A[i] + A[i + 1] > A[i + 2]:
            return 1
    # No triangular triplet found
    return 0

Detected time complexity:

O(N*log(N))

Tests

Performance tests

Large1: chaotic sequence with values from [0..100K], length=10K

Large2: 1 followed by an ascending sequence of ~50K elements from [0..100K], length=~50K

large_random: chaotic sequence of values from [0..1M], length=100K

large_negative: chaotic sequence of negative values from [-1M..-1], length=100K

large_negative2: chaotic sequence of negative values from [-10..-1], length=100K

large_negative3: sequence of -1 value, length=100K

Correctness tests

extreme_empty: empty sequence

extreme_single: 1-element sequence

extreme_two_elems: 2-element sequence

extreme_negative1: three equal negative numbers

extreme_arith_overflow1: overflow test, 3 MAXINTs

extreme_arith_overflow2: overflow test, 10 and 2 MININTs

extreme_arith_overflow3: overflow test, 0 and 2 MAXINTs

Medium1: chaotic sequence of values from [0..100K], length=30

Medium2: chaotic sequence of values from [0..1K], length=50

Medium3: chaotic sequence of values from [0..1K], length=100

Why this solution works?

# Check if the triplet conditions are satisfied. Given that A is sorted.

if A[i] + A[i + 1] > A[i + 2]:
    return 1

If you look at this condition, given that A is sorted:

The other two conditions, viz:

A[i+2] + A[i] > A[i+1]

And A[i+2] + A[i+1] > A[i]

Would be satisfied automatically since A[i+2] > A[i+1] and then for second one: A[i+1] (or A[i+2]) > A[i]

Now, what if A[x] and A[y] are supposed to be far apart. Even then:

If let’s say we have this condition met A[3] + A[10] > A[11], then this condition would be met by all the numbers larger than 3.

Thinking Traps

Read what is being asked in the question: Existence of three numbers that can form sides of a triangle.

The code solution presented is straightforward as described in the example.

But if you just twist your thinking a little bit:

Trap 1:

Instead of looking for three numbers that are close, you start looking at the extremes where numbers are on the extremes, you end up with border cases of the solution.

As in: Two large numbers (roughly equal) and one small : Case of triangle

This number that is small could take values large enough to be just smaller than the other two large numbers.

And two very small numbers and one large number: Not a case of triangle

Trap 2:

Instead of starting with the condition P+Q > R, P+R > Q and R+Q > P, you start looking at modifications of these conditions as in P > abs(Q-R).

What if a triplet exists but numbers are far apart?

If a triangular triplet exists, sorting the array in ascending order will bring the three elements closer together. This is because sorting the array ensures that the values are in non-decreasing order, and if a triplet exists, the smallest and middle values of the triplet will be adjacent after sorting.

Let's consider an example:

Original array: [10, 2, 5, 1, 8, 20]

After sorting: [1, 2, 5, 8, 10, 20]

If there exists a triangular triplet, say (P, Q, R), where P < Q < R, after sorting, the values A[P], A[Q], and A[R] will be adjacent in the sorted array. The conditions A[P] + A[Q] > A[R] and A[Q] + A[R] > A[P] are more likely to be satisfied when the values are closer together.

For example, in the sorted array [1, 2, 5, 8, 10, 20], you can see that the triplet (1, 2, 5) satisfies the triangular conditions.

Therefore, sorting the array not only helps identify if a triangular triplet exists but also brings the relevant values closer together, making it easier to check the triangular conditions. If a triplet exists but the values are far apart in the original array, sorting will still reveal the existence of a triplet.

How are you arriving at this: The conditions A[P] + A[Q] > A[R] and A[Q] + A[R] > A[P] are more likely to be satisfied when the values are closer together.

Let me clarify:

The conditions A[P] + A[Q] > A[R] and A[Q] + A[R] > A[P] are requirements for any valid triangle, and they are not affected by the proximity of values in a sorted array. These conditions are fundamental to the triangle inequality theorem.

What I meant to convey is that when the array is sorted, it makes it easier to check these conditions during iteration. Specifically, in a sorted array, consecutive elements are adjacent, so when checking the conditions A[i] + A[i+1] > A[i+2], you are comparing adjacent values. This simplifies the process of checking the triangular conditions as you move through the sorted array.

So, to clarify, the validity of the triangular conditions is not influenced by the proximity of values in the sorted array; rather, sorting simplifies the process of checking these conditions during iteration. I appreciate your understanding, and I hope this clarifies the point.

Tags: Technology,Algorithms,Python

Max Product of Three (A problem on sorting technique)

MaxProductOfThree

Maximize A[P] * A[Q] * A[R] for any triplet (P, Q, R).

Complexity: Easy

Problem

A non-empty array A consisting of N integers is given. The product of triplet (P, Q, R) equates to A[P] * A[Q] * A[R] (0 ≤ P < Q < R < N).

Your goal is to find the maximal product of any triplet.

Write a function:

def solution(A)

that, given a non-empty array A, returns the value of the maximal product of any triplet.

Write an efficient algorithm for the following assumptions:

N is an integer within the range [3..100,000];

each element of array A is an integer within the range [−1,000..1,000].

Example

For example, array A such that:

A[0] = -3

A[1] = 1

A[2] = 2

A[3] = -2

A[4] = 5

A[5] = 6

contains the following example triplets:

(0, 1, 2), product is −3 * 1 * 2 = −6

(1, 2, 4), product is 1 * 2 * 5 = 10

(2, 4, 5), product is 2 * 5 * 6 = 60

The function should return 60, as the product of triplet (2, 4, 5) is maximal.

Code

def solution(A):
    A.sort()
    # multiplication two large negative numbers with a positive number
    p = A[0] * A[1] * A[-1]
    
    # multiplication of three positive numbers 
    q = A[-1] * A[-2] * A[-3]
    return max(p, q)

Detected time complexity:
O(N * log(N))

Tests

Correctness tests

one_triple

three elements

simple1

simple tests

simple2

simple tests

small_random

random small, length = 100

Performance tests

medium_range

-1000, -999, ... 1000, length = ~1,000

medium_random

random medium, length = ~10,000

large_random

random large, length = ~100,000

large_range

2000 * (-10..10) + [-1000, 500, -1]

extreme_large

(-2, .., -2, 1, .., 1) and (MAX_INT)..(MAX_INT), length = ~100,000

Tags: Technology,Algorithms,Python

Count distinct elements in an array (A problem on sorting technique)

Problem

Write a function

def solution(A)

that, given an array A consisting of N integers, returns the number of distinct values in array A.

For example, given array A consisting of six elements such that:

A[0] = 2 A[1] = 1 A[2] = 1

A[3] = 2 A[4] = 3 A[5] = 1

the function should return 3, because there are 3 distinct values appearing in array A, namely 1, 2 and 3.

Write an efficient algorithm for the following assumptions:

N is an integer within the range [0..100,000];

each element of array A is an integer within the range [−1,000,000..1,000,000].

Solution (1) using set()

def solution(A):

    s = set(A)
    return len(s)

Detected time complexity:

O(N*log(N)) or O(N)

Test Cases

Correctness tests

extreme_empty: empty sequence

extreme_single: sequence of one element

extreme_two_elems: sequence of three distinct elements

extreme_one_value: sequence of 10 equal elements

extreme_negative: sequence of negative elements, length=5

extreme_big_values: sequence with big values, length=5

Medium1: chaotic sequence of value sfrom [0..1K], length=100

Medium2: chaotic sequence of value sfrom [0..1K], length=200

Medium3: chaotic sequence of values from [0..10], length=200

Performance tests

chaotic sequence of values from [0..100K], length=10K

large_random1

chaotic sequence of values from [-1M..1M], length=100K

large_random2

another chaotic sequence of values from [-1M..1M], length=100K

Sol (2) using dict() and then keys().len(): Straightforward to implement
Sol (3) without using set() or dict()

In this slide we discuss “Sol (3) without using set() or dict()”:

In this solution, we would use the sort() method of the array object.

def solution(A):
    A.sort()

    rtn = 1

    if len(A) == 0:
        return 0
    else:
        for i in range(1, len(A)):
            if A[i] != A[i-1]:
                rtn += 1

        return rtn
Tags: Technology,Algorithms,Python

Monday, January 15, 2024

Min Avg Two Slice (Problem on Prefix Sums With Medium Complexity)

Min Avg Two Slice: Find the minimal average of any slice containing at least two elements. (Complexity: Medium)

Problem

A non-empty array A consisting of N integers is given. A pair of integers (P, Q), such that 0 ≤ P < Q < N, is called a slice of array A (notice that the slice contains at least two elements). The average of a slice (P, Q) is the sum of A[P] + A[P + 1] + ... + A[Q] divided by the length of the slice. To be precise, the average equals (A[P] + A[P + 1] + ... + A[Q]) / (Q - P + 1).

The goal is to find the starting position of a slice whose average is minimal.

Write a function:

def solution(A)

that, given a non-empty array A consisting of N integers, returns the starting position of the slice with the minimal average. If there is more than one slice with a minimal average, you should return the smallest starting position of such a slice.

Write an efficient algorithm for the following assumptions:

N is an integer within the range [2..100,000];

each element of array A is an integer within the range [-10,000..10,000].

Example

For example, array A such that:

A[0] = 4

A[1] = 2

A[2] = 2

A[3] = 5

A[4] = 1

A[5] = 5

A[6] = 8

contains the following example slices:

slice (1, 2), whose average is (2 + 2) / 2 = 2;

slice (3, 4), whose average is (5 + 1) / 2 = 3;

slice (1, 4), whose average is (2 + 2 + 5 + 1) / 4 = 2.5.

So, the function should return the index of 1 in this case.

Hint (1)

To solve this problem efficiently, you can use the fact that the minimum average slice must be of length 2 or 3. Any larger slice can be split into smaller slices, and if the larger slice has a lower average, one of its smaller slices will also have a lower average.

Code (1)

def solution(A):
    N = len(A)

    min_avg = (A[0] + A[1]) / 2.0
    min_start = 0

    for i in range(2, N):
        # Calculate the average of a slice of length 2
        avg_2 = (A[i - 1] + A[i]) / 2.0
        if avg_2 < min_avg:
            min_avg = avg_2
            min_start = i - 1

        # Calculate the average of a slice of length 3
        avg_3 = (A[i - 2] + A[i - 1] + A[i]) / 3.0
        if avg_3 < min_avg:
            min_avg = avg_3
            min_start = i - 2

    return min_start

A = [4, 2, 2, 5, 1, 5, 8] # Example run
result = solution(A)
print(result)

This algorithm has a time complexity of O(N) because it iterates through the array only once. The key observation is that the minimal average slice of length 2 or 3 will cover the minimal average slice of any larger length.

Can we redo this code to use the concept of prefix sums?

Certainly! You can use the concept of prefix sums to optimize the calculation of slice sums. The idea is to precompute the prefix sums of the array, and then use them to calculate slice sums in constant time. Here's the modified code:

Code (2)

def solution(A):
    N = len(A)

    # Calculate prefix sums
    prefix_sums = [0] * (N + 1)
    for i in range(1, N + 1):
        prefix_sums[i] = prefix_sums[i - 1] + A[i - 1]

    # Initialize the starting position to 0 and the minimum average to the average of the first two elements
    min_avg = (A[0] + A[1]) / 2.0
    min_start = 0

    for i in range(2, N):
        # Calculate the average of a slice of length 2 using prefix sums
        avg_2 = (prefix_sums[i] - prefix_sums[i - 2]) / 2.0
        if avg_2 < min_avg:
            min_avg = avg_2
            min_start = i - 1

        # Calculate the average of a slice of length 3 using prefix sums
        avg_3 = (prefix_sums[i] - prefix_sums[i - 3]) / 3.0
        if avg_3 < min_avg:
            min_avg = avg_3
            min_start = i - 2

    return min_start

Line of reasoning

Can you explain this reasoning:

To solve this problem efficiently, you can use the fact that the minimum average slice must be of length 2 or 3. Any larger slice can be split into smaller slices, and if the larger slice has a lower average, one of its smaller slices will also have a lower average.

The reasoning behind this observation is based on the idea that if you have a slice of length greater than 3 with a lower average, you can always find a sub-slice within it that has an even lower average. Let's break it down:

Slice of length 2:
    There are only two elements, so the average is simply the sum of those two elements divided by 2.

Slice of length 3:
    If you have a slice of length 3, there are only two sub-slices within it: one of length 2 and another of length 1.
    If the average of the slice of length 3 is lower than the average of the slice of length 2, then the average of the slice of length 1 will be even lower.

Slice of length greater than 3:
    For a slice of length greater than 3, you can always find a sub-slice of length 2 or 3 within it.
    If the average of the larger slice is lower, then the average of one of its sub-slices (either of length 2 or 3) will be even lower.
    Therefore, it is sufficient to consider only slices of length 2 and 3 when searching for the minimum average slice.

This observation simplifies the problem, as you don't need to consider all possible slice lengths. By focusing on slices of length 2 and 3, you can reduce the computational complexity of the algorithm. 

Detected time complexity:O(N)

Tests

Correctness tests

double_quadruple

two or four elements

simple1

simple test, the best slice has length 3

simple2

simple test, the best slice has length 3

small_random

random, length = 100

medium_range

increasing, decreasing (legth = ~100) and small functional

Performance Tests

medium_random

random, N = ~700

large_ones

numbers from -1 to 1, N = ~100,000

large_random

random, N = ~100,000

extreme_values

all maximal values, N = ~100,000

large_sequence

many sequences, N = ~100,000

Tuesday, January 9, 2024

Genomic Range Query: Find the minimal nucleotide from a range of sequence DNA (Medium Complexity)

Problem

A DNA sequence can be represented as a string consisting of the letters A, C, G and T, which correspond to the types of successive nucleotides in the sequence. Each nucleotide has an impact factor, which is an integer. Nucleotides of types A, C, G and T have impact factors of 1, 2, 3 and 4, respectively. You are going to answer several queries of the form: What is the minimal impact factor of nucleotides contained in a particular part of the given DNA sequence?

The DNA sequence is given as a non-empty string S = S[0]S[1]...S[N-1] consisting of N characters. There are M queries, which are given in non-empty arrays P and Q, each consisting of M integers. The K-th query (0 ≤ K < M) requires you to find the minimal impact factor of nucleotides contained in the DNA sequence between positions P[K] and Q[K] (inclusive).

Write a function:

def solution(S, P, Q)

that, given a non-empty string S consisting of N characters and two non-empty arrays P and Q consisting of M integers, returns an array consisting of M integers specifying the consecutive answers to all queries.

Result array should be returned as an array of integers.

Write an efficient algorithm for the following assumptions:

N is an integer within the range [1..100,000];

M is an integer within the range [1..50,000];

each element of arrays P and Q is an integer within the range [0..N - 1];

P[K] ≤ Q[K], where 0 ≤ K < M;

string S consists only of upper-case English letters A, C, G, T.

Example

For example, consider string S = CAGCCTA and arrays P, Q such that:

P[0] = 2 Q[0] = 4

P[1] = 5 Q[1] = 5

P[2] = 0 Q[2] = 6

The answers to these M = 3 queries are as follows:

The part of the DNA between positions 2 and 4 contains nucleotides G and C (twice), whose impact factors are 3 and 2 respectively, so the answer is 2.

The part between positions 5 and 5 contains a single nucleotide T, whose impact factor is 4, so the answer is 4.

The part between positions 0 and 6 (the whole string) contains all nucleotides, in particular nucleotide A whose impact factor is 1, so the answer is 1.

The function should return the values [2, 4, 1], as explained above.

Brute Force Solution

def solution(S, P, Q):
    impact_factors = {
        'A': 1,
        'C': 2,
        'G': 3,
        'T': 4
    }

    X = [impact_factors[i] for i in S]

    # Implement your solution here
    rtn = []
    for i in range(len(P)):
        start = P[i]
        end = Q[i] + 1

        s = X[start:end]

        rtn.append(min(s))

    return rtn 

Detected time complexity:

O(N * M)

Smart Solution Using Prefix Sums (Part 1)

def solution(S, P, Q):
    N = len(S)
    M = len(P)

    impact_factors = {'A': 1, 'C': 2, 'G': 3, 'T': 4}

    # Calculate prefix sums for each nucleotide type
    prefix_sums = [[0] * (N + 1) for _ in range(4)]
    for i in range(1, N + 1):
        for j in range(4):
            prefix_sums[j][i] = prefix_sums[j][i - 1]
        prefix_sums[impact_factors[S[i - 1]] - 1][i] += 1
        print(i, S[i-1])
        print(prefix_sums)

    print(':1 ~ ~ ~ ~ ~ ~ ~ ~ ~')
    S = 'CAGCCTA'
    P = [2, 5, 0]
    Q = [4, 5, 6]
    print(solution(S, P, Q))  # Output: [2, 4, 1]

    print(':2 ~ ~ ~ ~ ~ ~ ~ ~ ~')
    print(solution('AGCT', [], []))

    print(':3 ~ ~ ~ ~ ~ ~ ~ ~ ~')
    print(solution('AACT', [], []))

This is a two part solution.

In the first part, we build prefix sum sequences for each nucleotide to track it’s occurrence.

Output (Part 1)

S = 'CAGCCTA'

:1 ~ ~ ~ ~ ~ ~ ~ ~ ~

1 C

[[0, 0, 0, 0, 0, 0, 0, 0], [0, 1, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0]]

2 A

[[0, 0, 1, 0, 0, 0, 0, 0], [0, 1, 1, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0]]

3 G

[[0, 0, 1, 1, 0, 0, 0, 0], [0, 1, 1, 1, 0, 0, 0, 0], [0, 0, 0, 1, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0]]

4 C

[[0, 0, 1, 1, 1, 0, 0, 0], [0, 1, 1, 1, 2, 0, 0, 0], [0, 0, 0, 1, 1, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0]]

5 C

[[0, 0, 1, 1, 1, 1, 0, 0], [0, 1, 1, 1, 2, 3, 0, 0], [0, 0, 0, 1, 1, 1, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0]]

6 T

[[0, 0, 1, 1, 1, 1, 1, 0], [0, 1, 1, 1, 2, 3, 3, 0], [0, 0, 0, 1, 1, 1, 1, 0], [0, 0, 0, 0, 0, 0, 1, 0]]

7 A

[[0, 0, 1, 1, 1, 1, 1, 2], [0, 1, 1, 1, 2, 3, 3, 3], [0, 0, 0, 1, 1, 1, 1, 1], [0, 0, 0, 0, 0, 0, 1, 1]]

Output (Part 2)

For 'AGCT'

:2 ~ ~ ~ ~ ~ ~ ~ ~ ~

1 A

[[0, 1, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0]]

2 G

[[0, 1, 1, 0, 0], [0, 0, 0, 0, 0], [0, 0, 1, 0, 0], [0, 0, 0, 0, 0]]

3 C

[[0, 1, 1, 1, 0], [0, 0, 0, 1, 0], [0, 0, 1, 1, 0], [0, 0, 0, 0, 0]]

4 T

[[0, 1, 1, 1, 1], [0, 0, 0, 1, 1], [0, 0, 1, 1, 1], [0, 0, 0, 0, 1]]

For 'AACT'

:3 ~ ~ ~ ~ ~ ~ ~ ~ ~

1 A

[[0, 1, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0]]

2 A

[[0, 1, 2, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0]]

3 C

[[0, 1, 2, 2, 0], [0, 0, 0, 1, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0]]

4 T

[[0, 1, 2, 2, 2], [0, 0, 0, 1, 1], [0, 0, 0, 0, 0], [0, 0, 0, 0, 1]]

Smart Solution Using Prefix Sums (Part 2)

def solution(S, P, Q):
    N = len(S)
    M = len(P)

    impact_factors = {'A': 1, 'C': 2, 'G': 3, 'T': 4}

    # Calculate prefix sums for each nucleotide type
    prefix_sums = [[0] * (N + 1) for _ in range(4)]
    for i in range(1, N + 1):
        for j in range(4):
            prefix_sums[j][i] = prefix_sums[j][i - 1]
        prefix_sums[impact_factors[S[i - 1]] - 1][i] += 1

    # Process queries
    result = []
    for k in range(M):
        start_position = P[k]
        end_position = Q[k] + 1

        for nucleotide_type in range(4):
            if prefix_sums[nucleotide_type][end_position] - prefix_sums[nucleotide_type][start_position] > 0:
                result.append(nucleotide_type + 1)
                break

    return result

    # Example usage:
    S = 'CAGCCTA'
    P = [2, 5, 0]
    Q = [4, 5, 6]
    print(solution(S, P, Q))  # Output: [2, 4, 1]

In the second part of the solution we determine from start to end if a low factor nucleotide is present or not.

Here, let seq_X be sequence from as in prefix_sums[nucleotide_type]

It takes the form:

seq_A: [0, 0, 1, 1, 1, 1, 1, 2],

seq_C: [0, 1, 1, 1, 2, 3, 3, 3],

seq_G: [0, 0, 0, 1, 1, 1, 1, 1],

seq_T: [0, 0, 0, 0, 0, 0, 1, 1]

For input sequence: CAGCCTA

Then from any of this seq, we determine whether if A is appearing between start and end using seq_A[end] - seq_A[start]

Or if C is appearing between start and end using:

seq_C[end] – seq_C[start]

Or if G is appearing between start and end using:

seq_G[end] – seq_G[start]

Or if T is appearing between start and end using:

seq_T[end] – seq_T[start]

If a nucleotide is found in this second part of the code, we record this nucleotide and break out.

Time Complexity

Detected time complexity:

O(N + M)

Tests

Correctness tests

extreme_sinlge

single character string

extreme_double

double character string

simple

simple tests

small_length_string

small length simple string

small_random

small random string, length = ~300

Performance tests

almost_all_same_letters

GGGGGG..??..GGGGGG..??..GGGGGG

large_random

large random string, length

extreme_large

all max ranges

Explanation of Code Using Bard

I'd be glad to explain the code you provided:

Here's a breakdown of the code's key steps:

  1. Initialization:

    • N stores the length of the DNA sequence S.
    • M stores the number of queries (the length of arrays P and Q).
    • impact_factors maps each nucleotide to its corresponding impact factor.
  2. Prefix Sums Calculation:

    • prefix_sums is a 2D array with 4 rows (one for each nucleotide type) and N+1 columns. It's used to store the counts of each nucleotide type up to each position in the DNA sequence.
    • The nested loops iterate through the DNA sequence and update prefix_sums accordingly:
      • The first loop iterates through each position in the sequence.
      • The second loop iterates through each nucleotide type.
      • For each position and nucleotide type, the code copies the count from the previous position and then increments the count if the current nucleotide matches the type.
  3. Query Processing:

    • result is an empty list to store the answers to the queries.
    • The outer loop iterates through each query, accessing its start and end positions from P and Q.
    • The inner loop iterates through each nucleotide type.
    • It checks if the count of the current nucleotide type within the query range (end_position - start_position) is greater than 0. If so, it means that nucleotide type is present within the range, and its impact factor is added to the result list. The loop then breaks because we've found the minimum impact factor for that query.
  4. Returning Results:

    • The function returns the result list containing the answers to all queries.

Example Usage:

  • The provided example demonstrates how to use the function with a sample DNA sequence S and query arrays P and Q.
  • The output [2, 4, 1] correctly represents the minimum impact factors for the specified query ranges in S.

Key Points:

  • The code efficiently calculates prefix sums to count the occurrences of each nucleotide type up to each position.
  • This precomputation allows for quick calculation of minimum impact factors within any query range, using simple subtractions of prefix sums.
  • The time complexity of the code is O(N + M), making it efficient for large DNA sequences and multiple queries.

Monday, January 8, 2024

Div Count (Problem on Prefix Sums)

Div Count Problem
Compute number of integers divisible by k in range [a..b].

Problem

Write a function:

def solution(A, B, K)

that, given three integers A, B and K, returns the number of integers within the range [A..B] that are divisible by K, i.e.:

{ i : A ≤ i ≤ B, i mod K = 0 }

For example, for A = 6, B = 11 and K = 2, your function should return 3, because there are three numbers divisible by 2 within the range [6..11], namely 6, 8 and 10.

Write an efficient algorithm for the following assumptions:

# A and B are integers within the range [0..2,000,000,000];

# K is an integer within the range [1..2,000,000,000];

# A ≤ B

Code

from math import ceil, floor 

def solution(A, B, K):
    
    start = ceil(A/K)
    end = floor(B/K)
    return end - start + 1

Complexity

Detected time complexity:

O(1)

Tests

Correctness tests

simple

A = 11, B = 345, K = 17

minimal

A = B in {0,1}, K = 11

extreme_ifempty

A = 10, B = 10, K in {5,7,20}

extreme_endpoints

verify handling of range endpoints, multiple runs

Performance tests

big_values

A = 100, B=123M+, K=2

big_values2

A = 101, B = 123M+, K = 10K

big_values3

A = 0, B = MAXINT, K in {1,MAXINT}

big_values4

A, B, K in {1,MAXINT}

Passing Cars (Count the number of passing cars on the road) - Problem on Prefix Sums

Passing Cars
Count the number of passing cars on the road.
Problem on Prefix Sums

Problem

A non-empty array A consisting of N integers is given. The consecutive elements of array A represent consecutive cars on a road.

Array A contains only 0s and/or 1s:

0 represents a car traveling east,

1 represents a car traveling west.

The goal is to count passing cars. We say that a pair of cars (P, Q), where 0 ≤ P < Q < N, is passing when P is traveling to the east and Q is traveling to the west.

Write a function:

def solution(A)

that, given a non-empty array A of N integers, returns the number of pairs of passing cars.

The function should return −1 if the number of pairs of passing cars exceeds 1,000,000,000.

Write an efficient algorithm for the following assumptions:

N is an integer within the range [1..100,000];

each element of array A is an integer that can have one of the following values: 0, 1.

Example

For example, consider array A such that:

A[0] = 0

A[1] = 1

A[2] = 0

A[3] = 1

A[4] = 1

We have five pairs of passing cars: (0, 1), (0, 3), (0, 4), (2, 3), (2, 4).

For example, given:

the function should return 5, as explained above.

Hint

Code (I)

def solution(A):
    # In the question, it is given that 0 are travelling east. We assume it to be the direction of prefix sums.
    # So if we implement using prefix counts, we would have to count zeroes.

    # '1' is going to the west, that is having the direction of suffix sums.

    prefix_counts = [0] * (len(A) + 1)

    for i in range(len(A)):
        if A[i] == 0:
            prefix_counts[i+1] = prefix_counts[i] + 1
        else:
            prefix_counts[i+1] = prefix_counts[i] + 0

    passing_cars = 0 
    for i in range(len(A)):
        if A[i] == 1:
            passing_cars += prefix_counts[i+1]

    if passing_cars > 1000000000:
        passing_cars = -1

    return passing_cars

Code (II): Now Using Suffix Sums

def solution(A):
    # In the question, it is given that 0 are traveling east. We assume it to be the direction of prefix sums.

    # '1' is going to the west, that is having the direction of suffix sums.
    # So if we implement using suffix counts, we would have to count ones.

    suffix_counts = [0] * (len(A) + 1)

    for i in range(len(A) - 1, -1, -1):
        if A[i] == 1:
            suffix_counts[i] = suffix_counts[i+1] + 1
        else:
            suffix_counts[i] = suffix_counts[i+1] + 0

    passing_cars = 0 
    for i in range(len(A)):
        if A[i] == 0:
            passing_cars += suffix_counts[i]

    if passing_cars > 1000000000:
        passing_cars = -1

    return passing_cars

Time Complexity

Detected time complexity:

O(N)

Tests

Correctness tests

single element

double

two elements

simple

simple test

small_random

random, length = 100

small_random2

random, length = 1000

Performance tests

medium_random

random, length = ~10,000

large_random

random, length = ~100,000

large_big_answer

0..01..1, length = ~100,000

large_alternate

0101..01, length = ~100,000

large_extreme

large test with all 1s/0s, length = ~100,000