Friday, October 28, 2022

One Hot Encoding Using Pandas' get_dummies() Method on Titanic Dataset

Download Data and Code

import pandas as pd
df = pd.read_csv('titanic_train.csv')
print(df.head())

print("Number of Unique Values in The Column 'Sex':") print(df['Sex'].nunique())
# 2 # This is also the width of it's one-hot encoding. print("Number of Unique Values in The Column For 'Passenger Class':") print(df['Pclass'].nunique()) # 3 # This is also the width of one-hot encoding for 'Passenger Class'.

Let us first see what happens when we do one-hot encoding of column 'Sex'.

enc_gender_df = pd.get_dummies(df, columns = ['Sex']) print(enc_gender_df.head()) # Sex # male # female # female # female # male # Sex_female Sex_male # 0 1 # 1 0 # 1 0 # 1 0 # 0 1 enc_pc_df = pd.get_dummies(df, columns = ['Pclass']) print(enc_pc_df.head()) # Pclass_1 Pclass_2 Pclass_3 # 0 0 1 # 1 0 0 # 0 0 1 # 1 0 0 # 0 0 1

Fun Facts

1. LabelEncoder of Scikit-Learn works by encoding the labels in the Ascending-Alphabetical-Sequence. 2. As you have Ascending-Alphabetical-Sequence, there are three more sequences that are common: 2.1. Descending Alphabetical Sequence 2.2. Ascending Frequency Based Sequence 2.3. Descending Frequency Based Sequence
Tags: Technology,Machine Learning,

No comments:

Post a Comment