Wednesday, June 10, 2026

Quiz on "Modeling data distributions" (Unit 4, Jun 2026)

1:

Solution:

Code:

"""
Revenue:
Mean: 500
Stdev: 125

Fixed Monthly Costs: 225

Profit = Revenue - Fixed Monthly Costs
So MeanProfit = MeanRevenue - Fixed Monthly Costs = 500 - 225 = 275
StdevProfit = StdevRevenue = 125

"""

2:

Code:

b_small = 1 # For x less than 2

h_small = 0.5

area_small = b_small * h_small / 2 = 0.25

b_full = 4
h_full = 0.5
area_full = b_full * h_full / 2 = 1

area_small / area_full = 0.25 / 1 = 0.25

3:

Solution:

4:


5:

Solution:

Code:

"""
** BRAINSTORMING **

mean = 21.02
sd = 2

mean_1sd_less = mean - sd = 21.02 - 2 = 19.02
mean_1sd_more = mean + sd = 21.02 + 2 = 23.02

mean_2sd_less = mean - 2*sd = 21.02 - 4 = 17.02
mean_2sd_more = mean + 2*sd = 21.02 + 4 = 25.02


Empirical Rule: 
68% of the data is between 19.02 and 23.02 (within 1 sd of the mean)
95% of the data is between 17.02 and 25.02 (within 2 sds of the mean)
99.7% of the data is between 15.02 and 27.02 (within 3 sds of the mean)

*** BUT 25 != 25.02 ***

"""

6:


7:

Solution:

Code:

mean = 66000
sd = 22000

# How do we determine the z-score with an area of 0.05 above it using Python?

from scipy.stats import norm

# Method 1: Use the percent point function (ppf)
# The ppf takes the cumulative probability to the LEFT.
# Since the area above is 0.05, the area below is 1 - 0.05 = 0.95.
z_score = norm.ppf(0.95)

# Method 2: Use the inverse survival function (isf)
# The isf directly takes the upper tail probability.
z_score_alt = norm.isf(0.05)

print(f"z-score (ppf): {z_score}")     # 1.6448536269514729
print(f"z-score (isf): {z_score_alt}") # 1.6448536269514729

print("--- Using standard \"statistics\" Package ---")

from statistics import NormalDist

# Standard normal distribution (mu=0, sigma=1)
z = NormalDist().inv_cdf(0.95)
print(z)  # 1.6448536269514722

x = mean + z * sd
print(f"Value corresponding to z-score: {x}")  # 101000.0



8:

Code:

mean = 87
sd = 8

l = 104.6
h = 108.2

# How do we determine the z-scores corresponding to these values using Python's standard statistics package?

from statistics import NormalDist
# Standard normal distribution (mu=0, sigma=1)
z_l = (l - mean) / sd

print(f"z-score for {l}: {z_l:.4f}")
# How do we determine the percentage of data below l using Python's standard statistics package?

# We can use the cumulative distribution function (CDF) of the normal distribution.
# The CDF gives us the probability that a random variable from the distribution is less than or equal to a certain value.

# Method 1: Use the cumulative distribution function (CDF)
cdf_l = NormalDist().cdf(z_l)
print(f"Percentage of data below {l}: {cdf_l * 100:.4f}%")


z_h = (h - mean) / sd
cdf_h = NormalDist().cdf(z_h)
print(f"z-score for {h}: {z_h:.4f}")
print(f"Percentage of data below {h}: {cdf_h * 100:.4f}%")

answer = cdf_h - cdf_l
print(f"Percentage of data between {l} and {h}: {answer * 100:.4f}%")
print(f"Proportion of data between {l} and {h}: {answer:.4f}")

output = """
z-score for 104.6: 2.2000
Percentage of data below 104.6: 98.6097%
z-score for 108.2: 2.6500
Percentage of data below 108.2: 99.5975%
Percentage of data between 104.6 and 108.2: 0.9879%
Proportion of data between 104.6 and 108.2: 0.0099
"""
print()
print("--- CORRECT OUTPUT ---")
print(output)


9:

10:

Tags: Mathematical Foundations for Data Science,Data Analytics,

No comments:

Post a Comment