survival8: Accessing PostgreSQL from PySpark.Pandas

Tuesday, December 13, 2022

Accessing PostgreSQL from PySpark.Pandas

Installing a couple of packages

$ conda install sqlalchemy -c conda-forge
$ conda install psycopg2 -c conda-forge

Data Ingestion Via PostGRE SQL terminal

PostGRE is case insensitive:

These two statements are same:

1:
CREATE TABLE states_and_capitals( state_name TEXT NOT NULL, capital_name TEXT NOT NULL, PRIMARY KEY( state_name, capital_name ) );

2:
create table states_and_capitals (state_name text not null, capital_name text not null, primary key(state_name, capital_name));


postgres=# CREATE TABLE states_and_capitals( state_name TEXT NOT NULL, capital_name TEXT NOT NULL, PRIMARY KEY( state_name, capital_name ) );
CREATE TABLE
postgres=# create table states_and_capitals (state_name text not null, capital_name text not null, primary key(state_name, capital_name));
ERROR:  relation "states_and_capitals" already exists
postgres=# 

postgres=# select * from states_and_capitals;

 state_name | capital_name 
------------+--------------
(0 rows)

You can also view your table in the output of this command:
select * from information_schema.tables;




INSERT INTO states_and_capitals (state_name, capital_name) VALUES ('haryana', 'chandigarh');

INSERT INTO states_and_capitals (state_name, capital_name) VALUES ('punjab', 'chandigarh');

Let's say you made a spelling mistake in a record. Then you can delete that record using the following statement:
DELETE FROM states_and_capitals WHERE state_name = 'harayana';


postgres=# select * from states_and_capitals;

state_name | capital_name 
------------+--------------
    punjab     | chandigarh
    haryana    | chandigarh
(2 rows) 

postgres=# 

postgres=# \password postgres
Enter new password for user "postgres": postgres
Enter it again: postgres
postgres=# 

Reading data from PostgreSQL into Plain Pandas

from sqlalchemy import create_engine
import pandas as pd

engine = create_engine('postgresql://postgres:postgres@localhost/postgres')
df = pd.read_sql("""SELECT * FROM states_and_capitals""", engine) 

df 



Issue while trying with pyspark.pandas

jdbc_engine = 'jdbc:postgresql://postgres:postgres@localhost/postgres'
query = """SELECT * FROM states_and_capitals"""

import pyspark.pandas as pspd
from pyspark import SparkContext
from pyspark.sql import SQLContext
sc = SparkContext.getOrCreate()
sqlCtx = SQLContext(sc)

df_pspd = pspd.read_sql(query, jdbc_engine) 


Py4JJavaError: An error occurred while calling o48.load.
: java.sql.SQLException: No suitable driver

survival8

Pages

Tuesday, December 13, 2022

Accessing PostgreSQL from PySpark.Pandas

Installing a couple of packages

Data Ingestion Via PostGRE SQL terminal

Reading data from PostgreSQL into Plain Pandas

Issue while trying with pyspark.pandas

No comments:

Post a Comment