Showing posts with label Data Analytics. Show all posts
Showing posts with label Data Analytics. Show all posts

Tuesday, June 30, 2026

Unit 5 - Exploring bivariate numerical data (2026 Jun 21)


See All: Questions For Statistics From Khan Academy
« Previously    Next »
#1

xmean = 24.1 ymean = 12.9 sx = 12 sy = 16.2 r = 0.9 m = r * (sy/sx) c = ymean - m*xmean print("m, c:", round(m, 3), round(c, 3)) # KA's Answers: 1.22, -16.38 #2
#3
#4
#5
#6
#7
#8
#9
#10
#11
#12
#13

See All: Questions For Statistics From Khan Academy
« Previously    Next »

Sunday, June 14, 2026

Quiz on "Modeling data distributions" (Unit 4, Jun 14th 2026)


See All: Questions For Statistics From Khan Academy
« Previously    Next »
1:

Code:
mean = 170.4
sd = 10

l = 145
lz = (l - mean) / sd
print(lz)

import statistics
lz_area = statistics.NormalDist(mu=0, sigma=1).cdf(lz)
print(lz_area)

h = 171
hz = (h - mean) / sd
print(hz)

hz_area = statistics.NormalDist().cdf(hz)

area_req = round(hz_area - lz_area,4)
print(area_req)



2:


mean = 80
sd = 9

proportion = 0.4

import statistics
z = statistics.NormalDist().inv_cdf(proportion)

print(z)

x = z * sd + mean

print(x)



3:

Code:
mean = 13.1
sd = 1.5

sd1 = (mean - sd, mean + sd)
print(sd1)

sd2 = (mean - 2 * sd, mean + 2 * sd)
print(sd2)

sd3 = (mean - 3 * sd, mean + 3 * sd)
print(sd3)

sd2_area = 0.95
sd3_area = 0.997

area_req = (sd3_area - sd2_area) / 2

print(area_req)

percentage_wise = round(area_req * 100, 4)
print(percentage_wise)

out = """
(11.6, 14.6)
(10.1, 16.1)
(8.6, 17.6)
0.02350000000000002
2.35
"""



4:

Code:
b = 2
h = 0.6

area = 0.5 * b * h

percentage_of_area = area * 100

print(percentage_of_area)



5:


Code:
mean_sales = 8000
sd_sales = 1500

mean_salary = 2000 + 0.3 * mean_sales

sd_salary = sd_sales * 0.3

print("mean_salary, sd_salary")
print(mean_salary, sd_salary)



6:




7:


area = 1
b = 6
h = area * 2 / b
print(h)



8:

def area_of_trapezium(b1, b2, h):
    return 0.5 * (b1 + b2) * h

b1 = 0.5
b2 = 0.75
h = 1

a = area_of_trapezium(b1, b2, h)
print(a)

print(round(a*100, 4))


b1 = 0.25
b2 = 0.5
h = 1

a = area_of_trapezium(b1, b2, h)
print(a)

print(round(a*100, 4))



9:




10:


mean = 1497
sd = 322

proportion = 0.85

import statistics
z = statistics.NormalDist().inv_cdf(proportion)

x = z * sd + mean

print(round(x, 4))

See All: Questions For Statistics From Khan Academy
« Previously    Next »
Tags: Python,Mathematical Foundations for Data Science,Data Analytics,

Wednesday, June 10, 2026

Quiz on "Modeling data distributions" (Unit 4, Jun 10th 2026)


See All: Questions For Statistics From Khan Academy
« Previously    Next »

1:

Solution:

Code:

"""
Revenue:
Mean: 500
Stdev: 125

Fixed Monthly Costs: 225

Profit = Revenue - Fixed Monthly Costs
So MeanProfit = MeanRevenue - Fixed Monthly Costs = 500 - 225 = 275
StdevProfit = StdevRevenue = 125

"""

2:

Code:

b_small = 1 # For x less than 2

h_small = 0.5

area_small = b_small * h_small / 2 = 0.25

b_full = 4
h_full = 0.5
area_full = b_full * h_full / 2 = 1

area_small / area_full = 0.25 / 1 = 0.25

3:

Solution:

4:


5:

Solution:

Code:

"""
** BRAINSTORMING **

mean = 21.02
sd = 2

mean_1sd_less = mean - sd = 21.02 - 2 = 19.02
mean_1sd_more = mean + sd = 21.02 + 2 = 23.02

mean_2sd_less = mean - 2*sd = 21.02 - 4 = 17.02
mean_2sd_more = mean + 2*sd = 21.02 + 4 = 25.02


Empirical Rule: 
68% of the data is between 19.02 and 23.02 (within 1 sd of the mean)
95% of the data is between 17.02 and 25.02 (within 2 sds of the mean)
99.7% of the data is between 15.02 and 27.02 (within 3 sds of the mean)

*** BUT 25 != 25.02 ***

"""

6:


7:

Solution:

Code:

mean = 66000
sd = 22000

# How do we determine the z-score with an area of 0.05 above it using Python?

from scipy.stats import norm

# Method 1: Use the percent point function (ppf)
# The ppf takes the cumulative probability to the LEFT.
# Since the area above is 0.05, the area below is 1 - 0.05 = 0.95.
z_score = norm.ppf(0.95)

# Method 2: Use the inverse survival function (isf)
# The isf directly takes the upper tail probability.
z_score_alt = norm.isf(0.05)

print(f"z-score (ppf): {z_score}")     # 1.6448536269514729
print(f"z-score (isf): {z_score_alt}") # 1.6448536269514729

print("--- Using standard \"statistics\" Package ---")

from statistics import NormalDist

# Standard normal distribution (mu=0, sigma=1)
z = NormalDist().inv_cdf(0.95)
print(z)  # 1.6448536269514722

x = mean + z * sd
print(f"Value corresponding to z-score: {x}")  # 101000.0



8:

Code:

mean = 87
sd = 8

l = 104.6
h = 108.2

# How do we determine the z-scores corresponding to these values using Python's standard statistics package?

from statistics import NormalDist
# Standard normal distribution (mu=0, sigma=1)
z_l = (l - mean) / sd

print(f"z-score for {l}: {z_l:.4f}")
# How do we determine the percentage of data below l using Python's standard statistics package?

# We can use the cumulative distribution function (CDF) of the normal distribution.
# The CDF gives us the probability that a random variable from the distribution is less than or equal to a certain value.

# Method 1: Use the cumulative distribution function (CDF)
cdf_l = NormalDist().cdf(z_l)
print(f"Percentage of data below {l}: {cdf_l * 100:.4f}%")


z_h = (h - mean) / sd
cdf_h = NormalDist().cdf(z_h)
print(f"z-score for {h}: {z_h:.4f}")
print(f"Percentage of data below {h}: {cdf_h * 100:.4f}%")

answer = cdf_h - cdf_l
print(f"Percentage of data between {l} and {h}: {answer * 100:.4f}%")
print(f"Proportion of data between {l} and {h}: {answer:.4f}")

output = """
z-score for 104.6: 2.2000
Percentage of data below 104.6: 98.6097%
z-score for 108.2: 2.6500
Percentage of data below 108.2: 99.5975%
Percentage of data between 104.6 and 108.2: 0.9879%
Proportion of data between 104.6 and 108.2: 0.0099
"""
print()
print("--- CORRECT OUTPUT ---")
print(output)


9:

10:


See All: Questions For Statistics From Khan Academy
« Previously    Next »
Tags: Mathematical Foundations for Data Science,Data Analytics,