Friday, May 12, 2023

Python 'math' Module, 'statistics' Module and Descriptive statistics using Pandas, NumPy, SciPy and StatsModels

Note: In this article we discuss three things:
1. math Module 
2. statistics Module 
3. Descriptive statistics using Pandas, NumPy, SciPy and StatsModels

Python math Module

Python has a built-in module that you can use for mathematical tasks.

The math module has a set of methods and constants.


Math Methods

Method Description
math.acos() Returns the arc cosine of a number
math.acosh() Returns the inverse hyperbolic cosine of a number
math.asin() Returns the arc sine of a number
math.asinh() Returns the inverse hyperbolic sine of a number
math.atan() Returns the arc tangent of a number in radians
math.atan2() Returns the arc tangent of y/x in radians
math.atanh() Returns the inverse hyperbolic tangent of a number
math.ceil() Rounds a number up to the nearest integer
math.comb() Returns the number of ways to choose k items from n items without repetition and order
math.copysign() Returns a float consisting of the value of the first parameter and the sign of the second parameter
math.cos() Returns the cosine of a number
math.cosh() Returns the hyperbolic cosine of a number
math.degrees() Converts an angle from radians to degrees
math.dist() Returns the Euclidean distance between two points (p and q), where p and q are the coordinates of that point
math.erf() Returns the error function of a number
math.erfc() Returns the complementary error function of a number
math.exp() Returns E raised to the power of x
math.expm1() Returns Ex - 1
math.fabs() Returns the absolute value of a number
math.factorial() Returns the factorial of a number
math.floor() Rounds a number down to the nearest integer
math.fmod() Returns the remainder of x/y
math.frexp() Returns the mantissa and the exponent, of a specified number
math.fsum() Returns the sum of all items in any iterable (tuples, arrays, lists, etc.)
math.gamma() Returns the gamma function at x
math.gcd() Returns the greatest common divisor of two integers
math.hypot() Returns the Euclidean norm
math.isclose() Checks whether two values are close to each other, or not
math.isfinite() Checks whether a number is finite or not
math.isinf() Checks whether a number is infinite or not
math.isnan() Checks whether a value is NaN (not a number) or not
math.isqrt() Rounds a square root number downwards to the nearest integer
math.ldexp() Returns the inverse of math.frexp() which is x * (2**i) of the given numbers x and i
math.lgamma() Returns the log gamma value of x
math.log() Returns the natural logarithm of a number, or the logarithm of number to base
math.log10() Returns the base-10 logarithm of x
math.log1p() Returns the natural logarithm of 1+x
math.log2() Returns the base-2 logarithm of x
math.perm() Returns the number of ways to choose k items from n items with order and without repetition
math.pow() Returns the value of x to the power of y
math.prod() Returns the product of all the elements in an iterable
math.radians() Converts a degree value into radians
math.remainder() Returns the closest value that can make numerator completely divisible by the denominator
math.sin() Returns the sine of a number
math.sinh() Returns the hyperbolic sine of a number
math.sqrt() Returns the square root of a number
math.tan() Returns the tangent of a number
math.tanh() Returns the hyperbolic tangent of a number
math.trunc() Returns the truncated integer parts of a number

Math Constants

Constant Description
math.e Returns Euler's number (2.7182...)
math.inf Returns a floating-point positive infinity
math.nan Returns a floating-point NaN (Not a Number) value
math.pi Returns PI (3.1415...)
math.tau Returns tau (6.2831...)
Some of these methods have been seen very frequently in our work. These include:

math.ceil(): Rounds a number up to the nearest integer
math.floor(): Rounds a number down to the nearest integer
math.factorial(): Returns the factorial of a number
math.comb(): Returns the number of ways to choose k items from n items without repetition and order
math.degrees(): Converts an angle from radians to degrees
math.radians(): Converts a degree value into radians
math.gcd(): Returns the greatest common divisor of two integers
math.dist(): Returns the Euclidean distance between two points (p and q), where p and q are the coordinates of that point

Python statistics Module

Averages and measures of central location

These functions calculate an average or typical value from a population or sample.

mean()

Arithmetic mean (“average”) of data.

fmean()

Fast, floating point arithmetic mean, with optional weighting.

geometric_mean()

Geometric mean of data.

harmonic_mean()

Harmonic mean of data.

median()

Median (middle value) of data.

median_low()

Low median of data.

median_high()

High median of data.

median_grouped()

Median, or 50th percentile, of grouped data.

mode()

Single mode (most common value) of discrete or nominal data.

multimode()

List of modes (most common values) of discrete or nominal data.

quantiles()

Divide data into intervals with equal probability.

Measures of spread

These functions calculate a measure of how much the population or sample tends to deviate from the typical or average values.

pstdev()

Population standard deviation of data.

pvariance()

Population variance of data.

stdev()

Sample standard deviation of data.

variance()

Sample variance of data.

Statistics for relations between two inputs

These functions calculate statistics regarding relations between two inputs.

covariance()

Sample covariance for two variables.

correlation()

Pearson's correlation coefficient for two variables.

linear_regression()

Slope and intercept for simple linear regression.

NormalDist

NormalDist is a tool for creating and manipulating normal distributions of a random variable. It is a class that treats the mean and standard deviation of data measurements as a single entity. Normal distributions arise from the Central Limit Theorem and have a wide range of applications in statistics.

l = [13, 15, 16, 16, 19, 20, 20, 21, 22, 22, 25, 25, 25, 25, 30, 33, 33, 35, 35, 35, 36, 40, 45, 46, 52, 70]

# Sum of all elements

print(sum(l))

# Count of each items.

from collections import Counter 
print(Counter(l))

# Mean

import statistics as st

print(st.mean(l))

print("Median:", st.median(l))

# Mode

print(st.mode(l))

# Mid-range 

print(st.mean([max(l), min(l)]))

# Other statistical measures

print(st.quantiles(data = l, n = 4)) # [20.0, 25.0, 35.25]
print(st.stdev(l))
print(st.variance(l))

import pandas
l = [13, 15, 16, 16, 19, 20, 20, 21, 22, 22, 25, 25, 25, 25, 30, 33, 33, 35, 35, 35, 36, 40, 45, 46, 52, 70]
df = pandas.DataFrame(l, columns=['Numbers'])
sum = df['Numbers'].sum()
count_val = df['Numbers'].value_counts()
mode = df['Numbers'].mode().values.tolist()
midrange = (df['Numbers'].max() + df['Numbers'].min()) / 2

print("The sum of the given data using pandas ", sum)
print("\nThe count of values \n", count_val)
print("\nMean of the given data using pandas ", df['Numbers'].mean())
print("\nMedian of the given data using pandas ", df['Numbers'].median())
print("\nMode of the given data using pandas ", mode[0])
print("\nMidrange of the given data using pandas:", midrange)
print("\nStandard deviation for given data using pandas:", df['Numbers'].std())
print("\nVariance for given data using pandas:", df['Numbers'].var())
print("\nQuantiles\n", df['Numbers'].quantile([0.25,0.50,0.75]))
print("\n\n")

import numpy as np

data=np.array(l)
print("Using NumPy\n")
unique_values, counts = np.unique(data, return_counts=True)
quantiles=np.percentile(data,[25,50,75])
print("Sum ",np.sum(data))
print("\nCount of values \n")
for value, count in zip(unique_values, counts):
    print( value, count)   
print("Mean :",np.mean(data))
print("\nMedian:",np.median(data))
print("\nMode:",np.argmax(np.bincount(data)))    
print("\nStandard deviation",np.std(data))
print("\nVariance :",np.var(data))
print("\nQuantiles \n")
print(quantiles[0],quantiles[1],quantiles[2])
print("\n\n")


from scipy import stats 

print("Using SciPy\n")
mode=stats.mode(data)
print("Mode: ", mode.mode[0])

# For count--> scipy.stats.itemfreq()
# Other statistical measures similar to numpy



$ python statistical_summary.py 
774
Counter({25: 4, 35: 3, 16: 2, 20: 2, 22: 2, 33: 2, 13: 1, 15: 1, 19: 1, 21: 1, 30: 1, 36: 1, 40: 1, 45: 1, 46: 1, 52: 1, 70: 1})
29.76923076923077
Median: 25.0
25
41.5
[20.0, 25.0, 35.25]
13.158442741624686
173.14461538461538
The sum of the given data using pandas  774

The count of values 
25    4
35    3
16    2
20    2
22    2
33    2
13    1
40    1
52    1
46    1
45    1
30    1
36    1
15    1
21    1
19    1
70    1
Name: Numbers, dtype: int64

Mean of the given data using pandas  29.76923076923077

Median of the given data using pandas  25.0

Mode of the given data using pandas  25

Midrange of the given data using pandas: 41.5

Standard deviation for given data using pandas: 13.158442741624686

Variance for given data using pandas: 173.14461538461538

Quantiles
0.25    20.25
0.50    25.00
0.75    35.00
Name: Numbers, dtype: float64



Using NumPy

Sum  774

Count of values 

13 1
15 1
16 2
19 1
20 2
21 1
22 2
25 4
30 1
33 2
35 3
36 1
40 1
45 1
46 1
52 1
70 1
Mean : 29.76923076923077

Median: 25.0

Mode: 25

Standard deviation 12.902914674622618

Variance : 166.4852071005917

Quantiles 

20.25 25.0 35.0



Using SciPy

/home/ashish/Desktop/statistical_summary.py:90: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning.
    mode=stats.mode(data)
Mode:  25

Tags: Technology,Python,Mathematical Foundations for Data Science,

No comments:

Post a Comment