Equal-frequency partitioning
Suppose a group of 12 sales price records has been sorted as follows: 5, 10, 11, 13, 15, 35, 50, 55, 72, 92, 204, 215 Partition them into three bins by each of the following methods. equal-frequency partitioning bin 1: 5, 10, 11, 13 bin 2: 15, 35, 50, 55 bin 3: 72, 92, 204, 215Equal-width partitioning
Suppose a group of 12 sales price records has been sorted as follows: 5, 10, 11, 13, 15, 35, 50, 55, 72, 92, 204, 215 Partition them into three bins by each of the following methods. The width of each interval is (215 – 5)/3 = 70. bin 1: 5, 10, 11, 13, 15, 35, 50, 55, 72 bin 2: 92 bin 3: 204, 215
In [8]:
import pandas as pd
In [1]:
l = [5, 10, 11, 13, 15, 35, 50, 55, 72, 92, 204, 215]
In [3]:
s = pd.Series(l)
In [4]:
s.nunique()
Out[4]:
12
In [7]:
pd.cut(s, 4) # Equal width binning
Out[7]:
0 (4.79, 57.5] 1 (4.79, 57.5] 2 (4.79, 57.5] 3 (4.79, 57.5] 4 (4.79, 57.5] 5 (4.79, 57.5] 6 (4.79, 57.5] 7 (4.79, 57.5] 8 (57.5, 110.0] 9 (57.5, 110.0] 10 (162.5, 215.0] 11 (162.5, 215.0] dtype: category Categories (4, interval[float64, right]): [(4.79, 57.5] < (57.5, 110.0] < (110.0, 162.5] < (162.5, 215.0]]
For ex: if we had 3 bins instead of 4. The bins would look like this:
In [6]:
print(75-4.79)
print(145-75)
print(215-145)
70.21 70 70
In [9]:
pd.qcut(s, 4) # Equal frequency binning
Out[9]:
0 (4.999, 12.5] 1 (4.999, 12.5] 2 (4.999, 12.5] 3 (12.5, 42.5] 4 (12.5, 42.5] 5 (12.5, 42.5] 6 (42.5, 77.0] 7 (42.5, 77.0] 8 (42.5, 77.0] 9 (77.0, 215.0] 10 (77.0, 215.0] 11 (77.0, 215.0] dtype: category Categories (4, interval[float64, right]): [(4.999, 12.5] < (12.5, 42.5] < (42.5, 77.0] < (77.0, 215.0]]
In [ ]: