A: Simple hypothesis testing
Question A.1 Ans: Question A.2 Ans: Question A.3 Ans: Question A.4 Ans:B: Writing null hypothesis
Question B.1 Ans: Question B.2 Ans:C: Estimating P-values from simulations
Question C.1 Ans: Question C.2: Ans: Question C.3: Question C.4:D: Using P-values to make conclusions
Q: A:
Tuesday, January 16, 2024
Statistics. Unit 12: Hypothesis Testing. Ch 1: The idea of significance tests
Monday, January 15, 2024
Statistics. Unit 12: Significance tests (hypothesis testing). Ch 2: Error probabilities and power
Tags: Mathematical Foundations for Data Science,Ex 12.2.A - Type I vs Type II error
A.1 A.2 A.3 A.4Ex 12.2.B - Error probabilities and power
B.1 B.2 B.3 B.4Ex 12.2.C - Consequences of errors and significance
C.1 C.2 C.3 C.4 C.5 C.6 C.7
Min Avg Two Slice (Problem on Prefix Sums With Medium Complexity)
Min Avg Two Slice: Find the minimal average of any slice containing at least two elements. (Complexity: Medium)
Problem
A non-empty array A consisting of N integers is given. A pair of integers (P, Q), such that 0 ≤ P < Q < N, is called a slice of array A (notice that the slice contains at least two elements). The average of a slice (P, Q) is the sum of A[P] + A[P + 1] + ... + A[Q] divided by the length of the slice. To be precise, the average equals (A[P] + A[P + 1] + ... + A[Q]) / (Q - P + 1).
The goal is to find the starting position of a slice whose average is minimal.
Write a function:
def solution(A)
that, given a non-empty array A consisting of N integers, returns the starting position of the slice with the minimal average. If there is more than one slice with a minimal average, you should return the smallest starting position of such a slice.
Write an efficient algorithm for the following assumptions:
N is an integer within the range [2..100,000];
each element of array A is an integer within the range [-10,000..10,000].
Example
For example, array A such that:
A[0] = 4
A[1] = 2
A[2] = 2
A[3] = 5
A[4] = 1
A[5] = 5
A[6] = 8
contains the following example slices:
slice (1, 2), whose average is (2 + 2) / 2 = 2;
slice (3, 4), whose average is (5 + 1) / 2 = 3;
slice (1, 4), whose average is (2 + 2 + 5 + 1) / 4 = 2.5.
So, the function should return the index of 1 in this case.
Hint (1)
To solve this problem efficiently, you can use the fact that the minimum average slice must be of length 2 or 3. Any larger slice can be split into smaller slices, and if the larger slice has a lower average, one of its smaller slices will also have a lower average.
Code (1)
def solution(A): N = len(A) min_avg = (A[0] + A[1]) / 2.0 min_start = 0 for i in range(2, N): # Calculate the average of a slice of length 2 avg_2 = (A[i - 1] + A[i]) / 2.0 if avg_2 < min_avg: min_avg = avg_2 min_start = i - 1 # Calculate the average of a slice of length 3 avg_3 = (A[i - 2] + A[i - 1] + A[i]) / 3.0 if avg_3 < min_avg: min_avg = avg_3 min_start = i - 2 return min_start A = [4, 2, 2, 5, 1, 5, 8] # Example run result = solution(A) print(result)
This algorithm has a time complexity of O(N) because it iterates through the array only once. The key observation is that the minimal average slice of length 2 or 3 will cover the minimal average slice of any larger length.
Can we redo this code to use the concept of prefix sums?
Certainly! You can use the concept of prefix sums to optimize the calculation of slice sums. The idea is to precompute the prefix sums of the array, and then use them to calculate slice sums in constant time. Here's the modified code:
Code (2)
def solution(A): N = len(A) # Calculate prefix sums prefix_sums = [0] * (N + 1) for i in range(1, N + 1): prefix_sums[i] = prefix_sums[i - 1] + A[i - 1] # Initialize the starting position to 0 and the minimum average to the average of the first two elements min_avg = (A[0] + A[1]) / 2.0 min_start = 0 for i in range(2, N): # Calculate the average of a slice of length 2 using prefix sums avg_2 = (prefix_sums[i] - prefix_sums[i - 2]) / 2.0 if avg_2 < min_avg: min_avg = avg_2 min_start = i - 1 # Calculate the average of a slice of length 3 using prefix sums avg_3 = (prefix_sums[i] - prefix_sums[i - 3]) / 3.0 if avg_3 < min_avg: min_avg = avg_3 min_start = i - 2 return min_start
Line of reasoning
Can you explain this reasoning:
To solve this problem efficiently, you can use the fact that the minimum average slice must be of length 2 or 3. Any larger slice can be split into smaller slices, and if the larger slice has a lower average, one of its smaller slices will also have a lower average.
The reasoning behind this observation is based on the idea that if you have a slice of length greater than 3 with a lower average, you can always find a sub-slice within it that has an even lower average. Let's break it down: Slice of length 2: There are only two elements, so the average is simply the sum of those two elements divided by 2. Slice of length 3: If you have a slice of length 3, there are only two sub-slices within it: one of length 2 and another of length 1. If the average of the slice of length 3 is lower than the average of the slice of length 2, then the average of the slice of length 1 will be even lower. Slice of length greater than 3: For a slice of length greater than 3, you can always find a sub-slice of length 2 or 3 within it. If the average of the larger slice is lower, then the average of one of its sub-slices (either of length 2 or 3) will be even lower. Therefore, it is sufficient to consider only slices of length 2 and 3 when searching for the minimum average slice. This observation simplifies the problem, as you don't need to consider all possible slice lengths. By focusing on slices of length 2 and 3, you can reduce the computational complexity of the algorithm.
Detected time complexity:O(N)
Tests
Correctness tests
double_quadruple
two or four elements
simple1
simple test, the best slice has length 3
simple2
simple test, the best slice has length 3
small_random
random, length = 100
medium_range
increasing, decreasing (legth = ~100) and small functional
Performance Tests
medium_random
random, N = ~700
large_ones
numbers from -1 to 1, N = ~100,000
large_random
random, N = ~100,000
extreme_values
all maximal values, N = ~100,000
large_sequence
many sequences, N = ~100,000
Saturday, January 13, 2024
Can you show the derivative X-transpose.X wrt x?
If you want, you can take the 2 out from the column vector in the final result.Certainly! Let's assume that X is a column vector (a matrix with only one column). In that case, XT is the transpose of X, and X⋅XT results in a square matrix.
Let X=[x1,x2,…,xn]T, where xi represents the elements of the column vector X.
Now, let's find the derivative of XT⋅X with respect to X. The result will be a matrix of the same size as X.
∂X∂(XT⋅X)
∂X∂(XT⋅X)=∂X∂⎝⎛⎣⎡x1x2⋮xn⎦⎤⋅[x1,x2,…,xn]⎠⎞
=∂X∂(x12+x22+…+xn2)
=⎣⎡2x12x2⋮2xn⎦⎤
So, the derivative of XT⋅X with respect to X is a column vector with each element being twice the corresponding element of X:
∂X∂(XT⋅X)=⎣⎡2x12x2⋮2xn⎦⎤