Tuesday, November 1, 2022

Way 5: WRT Cell Value Replacement Through Assignment (Ways in which Pandas API on PySpark differs from Plain Pandas)

Download Code

Cell Value Replacement Through Assignment in Pandas

import pandas as pd df = pd.DataFrame({ 'col1': ["alpha", "beta", "gamma"], 'col2': ['beta', 'gamma', 'alpha'], 'col3': ['gamma', 'alpha', 'beta'] }) df[df == 'alpha'] = 'delta' df

Error in PySpark For The Same Code: Unhashable type: 'DataFrame'

from pyspark import pandas as ppd df_ppd = ppd.DataFrame({ 'col1': ["alpha", "beta", "gamma"], 'col2': ['beta', 'gamma', 'alpha'], 'col3': ['gamma', 'alpha', 'beta'] }) df_ppd[df_ppd == 'alpha'] = 'delta' --------------------------------------------------------------------------- TypeError Traceback (most recent call last) Cell In [13], line 1 ----> 1 df_ppd[df_ppd == 'alpha'] = 'delta' File ~/anaconda3/envs/mh/lib/python3.9/site-packages/pyspark/pandas/frame.py:12355, in DataFrame.__setitem__(self, key, value) 12352 psdf = self._assign({k: value[c] for k, c in zip(key, field_names)}) 12353 else: 12354 # Same Series. > 12355 psdf = self._assign({key: value}) 12357 self._update_internal_frame(psdf._internal) TypeError: unhashable type: 'DataFrame'

Alternate Way

df_ppd = df_ppd.replace(to_replace = ['alpha'], value = "delta")
df_ppd = df_ppd.replace(to_replace = ['beta', 'gamma'], value = "epsilon")
Also Check: Way 4: With respect to DataFrame.replace() Method (Ways in which Pandas API on PySpark differs from Plain Pandas)
Tags: Spark,Technology,

No comments:

Post a Comment