Showing posts with label Mathematical Foundations for Data Science. Show all posts
Showing posts with label Mathematical Foundations for Data Science. Show all posts

Sunday, June 14, 2026

Quiz on "Modeling data distributions" (Unit 4, Jun 14th 2026)


See All: Questions For Statistics From Khan Academy
« Previously    Next »
1:

Code:
mean = 170.4
sd = 10

l = 145
lz = (l - mean) / sd
print(lz)

import statistics
lz_area = statistics.NormalDist(mu=0, sigma=1).cdf(lz)
print(lz_area)

h = 171
hz = (h - mean) / sd
print(hz)

hz_area = statistics.NormalDist().cdf(hz)

area_req = round(hz_area - lz_area,4)
print(area_req)



2:


mean = 80
sd = 9

proportion = 0.4

import statistics
z = statistics.NormalDist().inv_cdf(proportion)

print(z)

x = z * sd + mean

print(x)



3:

Code:
mean = 13.1
sd = 1.5

sd1 = (mean - sd, mean + sd)
print(sd1)

sd2 = (mean - 2 * sd, mean + 2 * sd)
print(sd2)

sd3 = (mean - 3 * sd, mean + 3 * sd)
print(sd3)

sd2_area = 0.95
sd3_area = 0.997

area_req = (sd3_area - sd2_area) / 2

print(area_req)

percentage_wise = round(area_req * 100, 4)
print(percentage_wise)

out = """
(11.6, 14.6)
(10.1, 16.1)
(8.6, 17.6)
0.02350000000000002
2.35
"""



4:

Code:
b = 2
h = 0.6

area = 0.5 * b * h

percentage_of_area = area * 100

print(percentage_of_area)



5:


Code:
mean_sales = 8000
sd_sales = 1500

mean_salary = 2000 + 0.3 * mean_sales

sd_salary = sd_sales * 0.3

print("mean_salary, sd_salary")
print(mean_salary, sd_salary)



6:




7:


area = 1
b = 6
h = area * 2 / b
print(h)



8:

def area_of_trapezium(b1, b2, h):
    return 0.5 * (b1 + b2) * h

b1 = 0.5
b2 = 0.75
h = 1

a = area_of_trapezium(b1, b2, h)
print(a)

print(round(a*100, 4))


b1 = 0.25
b2 = 0.5
h = 1

a = area_of_trapezium(b1, b2, h)
print(a)

print(round(a*100, 4))



9:




10:


mean = 1497
sd = 322

proportion = 0.85

import statistics
z = statistics.NormalDist().inv_cdf(proportion)

x = z * sd + mean

print(round(x, 4))

See All: Questions For Statistics From Khan Academy
« Previously    Next » Tags: Python,Mathematical Foundations for Data Science,Data Analytics,

Wednesday, June 10, 2026

Quiz on "Modeling data distributions" (Unit 4, Jun 10th 2026)


See All: Questions For Statistics From Khan Academy
« Previously    Next »

1:

Solution:

Code:

"""
Revenue:
Mean: 500
Stdev: 125

Fixed Monthly Costs: 225

Profit = Revenue - Fixed Monthly Costs
So MeanProfit = MeanRevenue - Fixed Monthly Costs = 500 - 225 = 275
StdevProfit = StdevRevenue = 125

"""

2:

Code:

b_small = 1 # For x less than 2

h_small = 0.5

area_small = b_small * h_small / 2 = 0.25

b_full = 4
h_full = 0.5
area_full = b_full * h_full / 2 = 1

area_small / area_full = 0.25 / 1 = 0.25

3:

Solution:

4:


5:

Solution:

Code:

"""
** BRAINSTORMING **

mean = 21.02
sd = 2

mean_1sd_less = mean - sd = 21.02 - 2 = 19.02
mean_1sd_more = mean + sd = 21.02 + 2 = 23.02

mean_2sd_less = mean - 2*sd = 21.02 - 4 = 17.02
mean_2sd_more = mean + 2*sd = 21.02 + 4 = 25.02


Empirical Rule: 
68% of the data is between 19.02 and 23.02 (within 1 sd of the mean)
95% of the data is between 17.02 and 25.02 (within 2 sds of the mean)
99.7% of the data is between 15.02 and 27.02 (within 3 sds of the mean)

*** BUT 25 != 25.02 ***

"""

6:


7:

Solution:

Code:

mean = 66000
sd = 22000

# How do we determine the z-score with an area of 0.05 above it using Python?

from scipy.stats import norm

# Method 1: Use the percent point function (ppf)
# The ppf takes the cumulative probability to the LEFT.
# Since the area above is 0.05, the area below is 1 - 0.05 = 0.95.
z_score = norm.ppf(0.95)

# Method 2: Use the inverse survival function (isf)
# The isf directly takes the upper tail probability.
z_score_alt = norm.isf(0.05)

print(f"z-score (ppf): {z_score}")     # 1.6448536269514729
print(f"z-score (isf): {z_score_alt}") # 1.6448536269514729

print("--- Using standard \"statistics\" Package ---")

from statistics import NormalDist

# Standard normal distribution (mu=0, sigma=1)
z = NormalDist().inv_cdf(0.95)
print(z)  # 1.6448536269514722

x = mean + z * sd
print(f"Value corresponding to z-score: {x}")  # 101000.0



8:

Code:

mean = 87
sd = 8

l = 104.6
h = 108.2

# How do we determine the z-scores corresponding to these values using Python's standard statistics package?

from statistics import NormalDist
# Standard normal distribution (mu=0, sigma=1)
z_l = (l - mean) / sd

print(f"z-score for {l}: {z_l:.4f}")
# How do we determine the percentage of data below l using Python's standard statistics package?

# We can use the cumulative distribution function (CDF) of the normal distribution.
# The CDF gives us the probability that a random variable from the distribution is less than or equal to a certain value.

# Method 1: Use the cumulative distribution function (CDF)
cdf_l = NormalDist().cdf(z_l)
print(f"Percentage of data below {l}: {cdf_l * 100:.4f}%")


z_h = (h - mean) / sd
cdf_h = NormalDist().cdf(z_h)
print(f"z-score for {h}: {z_h:.4f}")
print(f"Percentage of data below {h}: {cdf_h * 100:.4f}%")

answer = cdf_h - cdf_l
print(f"Percentage of data between {l} and {h}: {answer * 100:.4f}%")
print(f"Proportion of data between {l} and {h}: {answer:.4f}")

output = """
z-score for 104.6: 2.2000
Percentage of data below 104.6: 98.6097%
z-score for 108.2: 2.6500
Percentage of data below 108.2: 99.5975%
Percentage of data between 104.6 and 108.2: 0.9879%
Proportion of data between 104.6 and 108.2: 0.0099
"""
print()
print("--- CORRECT OUTPUT ---")
print(output)


9:

10:


See All: Questions For Statistics From Khan Academy
« Previously    Next »
Tags: Mathematical Foundations for Data Science,Data Analytics,