Wednesday, February 8, 2023

A Solved Exercise in RDD Filter and Join Operations (Interview Preparation)

Download Code and Data
Problem Statement:

Consider the Universal Identity Number data scenario with two datasets UIN Customer data and Bank account linking data.

UIN Card data (UINCardData.csv):
Schema Details: UIN, MobileNumber,Gender,SeniorCitizens,Income

Bank account link data (BankAccountLink.csv):
Schema Details: MobileNumber, LinkedtoBankAccount, BankAccountNumber

Requirement

Join both datasets and find the UIN number that is not linked with the Bank Account number. Print UIN number and BankAccountNumber.
Save the final output to a specified HDFS directory.

No comments:

Post a Comment