Data Preprocessing Using Python package Pandas (a Nifty50 based use case)



Use Case: Creating a Nifty50 SIP Simulator

This is a post on data preprocessing using Pandas. The data set we are using is the Nifty50 index data from 1994 to Jun 13th, 2020. The steps we have are: 1. Reading data from multiple CSV files. 2. Converting string Date column to a 'DateTime' type column. 3. Generate columns such as 'Year', 'Half-Yearly Marker (viz H1 or H2)', 'Quarter' and 'Month'. 4. Fix the column 'Open' to populate it with values for the years 1994 and 1995. 'Open' was given values starting from 1996 in CSV files. Output:
Next, we generate SIP data assuming that SIP deposit will happen on first of each month. In the function "get_sip_df()", we process input transactional level data frame and add the following columns: 1. CostValue 2. PctChange 3. SipClose 4. SipOpen
Next, we aggregate the data on the level of 'Month' and create the following report using our function "get_monthly_df()":
We can use the "get_sip_df()" function for any year as shown below (this code snippet is also present in the Jupyter Notebook): df_19 = df[(df.Year == '2018') | (df.Year == '2019') | (df.Year == '2020')] df_sip_19 = get_sip_df(df_19) df_sip_19_monthly = get_monthly_df(df_sip_19, col_name = 'SipClose') Output:
Here is the link to Jupyter Notebook: Notebook Link to data set: Data

No comments:

Post a Comment