import pandas as pd df = pd.read_csv('titanic_train.csv') print(df.head()) print("Number of Unique Values in The Column 'Sex':") print(df['Sex'].nunique()) # 2 # This is also the width of it's one-hot encoding. print("Number of Unique Values in The Column For 'Passenger Class':") print(df['Pclass'].nunique()) # 3 # This is also the width of one-hot encoding for 'Passenger Class'.Let us first see what happens when we do one-hot encoding of column 'Sex'.
enc_gender_df = pd.get_dummies(df, columns = ['Sex']) print(enc_gender_df.head()) # Sex # male # female # female # female # male # Sex_female Sex_male # 0 1 # 1 0 # 1 0 # 1 0 # 0 1 enc_pc_df = pd.get_dummies(df, columns = ['Pclass']) print(enc_pc_df.head()) # Pclass_1 Pclass_2 Pclass_3 # 0 0 1 # 1 0 0 # 0 0 1 # 1 0 0 # 0 0 1Fun Facts
1. LabelEncoder of Scikit-Learn works by encoding the labels in the Ascending-Alphabetical-Sequence. 2. As you have Ascending-Alphabetical-Sequence, there are three more sequences that are common: 2.1. Descending Alphabetical Sequence 2.2. Ascending Frequency Based Sequence 2.3. Descending Frequency Based Sequence
Friday, October 28, 2022
One Hot Encoding Using Pandas' get_dummies() Method on Titanic Dataset
Download Data and Code
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment