1. How comfortable are you in Python? 2. How comfortable are you in PySpark? 3. How comfortable are you in Scala? 4. And shell scripting? --- 1. What is the difference between list and tuple? 2. What are the 3 ways to work on a dataset in PySpark? (RDD, Spark SQL, and Pandas Dataframe) 3. What is lazy evaluation? 4. What is the opposite of lazy evaluation? (Eager evaluation) 5. What is the regular expression? 6. What does grep command do? 7. What does find command do? 8. What is the difference between find and grep? 9. What does sed command do? 10. What does awk command do? 11. What is narrow transformation? (Like map()) 12. What is wide transformation? (Like groupby and reduceby) 13. What is the difference between narrow transformation and wide transformation? 14. How much would you give yourself in Hive? 15. Write SQL query to get current date from Hive SQL interface? (getdate(), now()) 16. Take out the year from the date. (year(date_col)) 17. How would you get a;b;c into: a b c Into three rows. 18. What is Spark session? (Entry point to create Spark context) 19. What is spark context? 20. Scope of which one is bigger? 21. Is there any other context object we need to know about? 22. There is a CSV file. You have to load this CSV data into an RDD, SQL dataframe, and Pandas dataframe.
Saturday, June 1, 2024
Interview Questions For Big Data Engineer (2 Years of Experience)
To See All Interview Preparation Articles: Index For Interviews Preparation
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment