Wednesday, October 26, 2022

Way 4: With respect to DataFrame.replace() Method (Ways in which Pandas API on PySpark differs from Plain Pandas)

Download Code

Working in Pandas

import pandas as pd df = pd.DataFrame({ 'dummy_col': ["alpha", "beta", "gamma", "","-","0","N/A","-_-","NA", "delta", "epsilon", "zeta", "eta", "theta"] }) df['cleaned'] = df.replace(to_replace =["","-","0","N/A","-_-","NA"], value = "Not Applicable")

Not working in Pandas API on PySpark

from pyspark import pandas as ppd df_ppd = ppd.DataFrame({ 'dummy_col': ["alpha", "beta", "gamma", "","-","0","N/A","-_-","NA", "delta", "epsilon", "zeta", "eta", "theta"] })

Error

df_ppd['cleaned'] = df_ppd.replace(to_replace =["","-","0","N/A","-_-","NA"], value = "Not Applicable") --------------------------------------------------------------------------- TypeError Traceback (most recent call last) Cell In [15], line 1 ----> 1 df_ppd['cleaned'] = df_ppd.replace(to_replace =["","-","0","N/A","-_-","NA"], value = "Not Applicable") File ~/anaconda3/envs/mh/lib/python3.9/site-packages/pyspark/pandas/frame.py:12355, in DataFrame.__setitem__(self, key, value) 12352 psdf = self._assign({k: value[c] for k, c in zip(key, field_names)}) 12353 else: 12354 # Same Series. > 12355 psdf = self._assign({key: value}) 12357 self._update_internal_frame(psdf._internal) File ~/anaconda3/envs/mh/lib/python3.9/site-packages/pyspark/pandas/frame.py:4921, in DataFrame._assign(self, kwargs) 4917 is_invalid_assignee = ( 4918 not (isinstance(v, (IndexOpsMixin, Column)) or callable(v) or is_scalar(v)) 4919 ) or isinstance(v, MultiIndex) 4920 if is_invalid_assignee: -> 4921 raise TypeError( 4922 "Column assignment doesn't support type " "{0}".format(type(v).__name__) 4923 ) 4924 if callable(v): 4925 kwargs[k] = v(self) TypeError: Column assignment doesn't support type DataFrame
df_ppd_cleaned = df_ppd.replace(to_replace = ["","-","0","N/A","-_-","NA"], value = "Not Applicable")
df_ppd_cleaned.replace(to_replace = ['Not Applicable', 'alpha'], value = "Still NA", inplace = True)
Tags: Technology,Spark

No comments:

Post a Comment