import pandas as pd
df = pd.read_csv('in/titanic_train.csv')
df.head()
enc_pc_df = pd.get_dummies(df, columns = ['Pclass'])
enc_pc_df.head()
Hypothetical Question
Q: If I remove a column (could be first, could be last) from the one-hot feature matrix (let's say with 'n' columns), can I reproduce the same matrix from the 'n-1' columns?
OR: Rephrasing the question: How do we get back original matrix or 'put back the dropped column'?
Answer:
If: There is no '1' in the remaining n-1 values in a row, then the dropped value from that row is 1.
Else: 0
Assumptions made: there are 'n' number of columns.
Conclusion: In removing one column from the one-hot feature matrix, there is still no data loss.
One column's value is related to the value of rest n-1 columns.
So, what is the solution to resolve this relation?
We drop the first column from one-hot feature matrix.
enc_pc_df = pd.get_dummies(df, columns = ['Pclass'], drop_first = True)
enc_pc_df.head()
The default value of "drop_first" parameter is False.
Pages
- Index of Lessons in Technology
- Index of Book Summaries
- Index of Book Lists And Downloads
- Index For Job Interviews Preparation
- Index of "Algorithms: Design and Analysis"
- Python Course (Index)
- Data Analytics Course (Index)
- Index of Machine Learning
- Postings Index
- Index of BITS WILP Exam Papers and Content
- Lessons in Investing
- Index of Math Lessons
- Index of Management Lessons
- Book Requests
- Index of English Lessons
- Index of Medicines
- Index of Quizzes (Educational)
Sunday, October 30, 2022
Do we need all the one hot features?
Subscribe to:
Post Comments (Atom)



No comments:
Post a Comment