import pandas as pd
df = pd.read_csv('titanic_train.csv')
print(df.head())
print("Number of Unique Values in The Column 'Sex':")
print(df['Sex'].nunique())
# 2
# This is also the width of it's one-hot encoding.
print("Number of Unique Values in The Column For 'Passenger Class':")
print(df['Pclass'].nunique())
# 3
# This is also the width of one-hot encoding for 'Passenger Class'.
Let us first see what happens when we do one-hot encoding of column 'Sex'.
enc_gender_df = pd.get_dummies(df, columns = ['Sex'])
print(enc_gender_df.head())
# Sex
# male
# female
# female
# female
# male
# Sex_female Sex_male
# 0 1
# 1 0
# 1 0
# 1 0
# 0 1
enc_pc_df = pd.get_dummies(df, columns = ['Pclass'])
print(enc_pc_df.head())
# Pclass_1 Pclass_2 Pclass_3
# 0 0 1
# 1 0 0
# 0 0 1
# 1 0 0
# 0 0 1
Fun Facts
1. LabelEncoder of Scikit-Learn works by encoding the labels in the Ascending-Alphabetical-Sequence.
2. As you have Ascending-Alphabetical-Sequence, there are three more sequences that are common:
2.1. Descending Alphabetical Sequence
2.2. Ascending Frequency Based Sequence
2.3. Descending Frequency Based Sequence
Pages
- Index of Lessons in Technology
- Index of Book Summaries
- Index of Book Lists And Downloads
- Index For Job Interviews Preparation
- Index of "Algorithms: Design and Analysis"
- Python Course (Index)
- Data Analytics Course (Index)
- Index of Machine Learning
- Postings Index
- Index of BITS WILP Exam Papers and Content
- Lessons in Investing
- Index of Math Lessons
- Downloads
- Index of Management Lessons
- Book Requests
- Index of English Lessons
- Index of Medicines
- Index of Quizzes (Educational)
Friday, October 28, 2022
One Hot Encoding Using Pandas' get_dummies() Method on Titanic Dataset
Download Data and Code
Subscribe to:
Post Comments (Atom)

No comments:
Post a Comment