survival8: One Hot Encoding Using Pandas' get_dummies() Method on Titanic Dataset

Friday, October 28, 2022

One Hot Encoding Using Pandas' get_dummies() Method on Titanic Dataset

Download Data and Code


import pandas as pd
df = pd.read_csv('titanic_train.csv')
print(df.head())



print("Number of Unique Values in The Column 'Sex':")
print(df['Sex'].nunique())


# 2
# This is also the width of it's one-hot encoding.

print("Number of Unique Values in The Column For 'Passenger Class':")
print(df['Pclass'].nunique())

# 3
# This is also the width of one-hot encoding for 'Passenger Class'.

Let us first see what happens when we do one-hot encoding of column 'Sex'.

enc_gender_df = pd.get_dummies(df, columns = ['Sex'])
print(enc_gender_df.head())


# Sex
# male
# female
# female
# female
# male

# Sex_female  Sex_male
# 0           1
# 1           0
# 1           0
# 1           0
# 0           1 


enc_pc_df = pd.get_dummies(df, columns = ['Pclass'])
print(enc_pc_df.head())

# Pclass_1  Pclass_2  Pclass_3
# 0         0         1
# 1         0         0
# 0         0         1
# 1         0         0
# 0         0         1 

Fun Facts
1. LabelEncoder of Scikit-Learn works by encoding the labels in the Ascending-Alphabetical-Sequence.
2. As you have Ascending-Alphabetical-Sequence, there are three more sequences that are common:
2.1. Descending Alphabetical Sequence 
2.2. Ascending Frequency Based Sequence 
2.3. Descending Frequency Based Sequence

survival8

Pages

Friday, October 28, 2022

One Hot Encoding Using Pandas' get_dummies() Method on Titanic Dataset

Let us first see what happens when we do one-hot encoding of column 'Sex'.

Fun Facts

No comments:

Post a Comment