How to save a DataFrame to PostgreSQL in pyspark

This recipe helps you save a DataFrame to PostgreSQL in pyspark
Last Updated: 19 Jan 2023

Get access to Big Data projects View all Big Data projects

APACHE HADOOP PROJECTS DATA CLEANING PYTHON DATA MUNGING MACHINE LEARNING RECIPES PANDAS CHEATSHEET ALL TAGS

Recipe Objective: How to save a DataFrame to PostgreSQL in pyspark?

In most big data scenarios, data merging and aggregation are an essential part of the day-to-day activities in big data platforms. In this scenario, we will load the dataframe to the Postgres database table or save the dataframe to the table.

System requirements :

Install Ubuntu in the virtual machine click here
Install single-node Hadoop machine click here
Install pyspark or spark in Ubuntu click here
The below codes can be run in Jupyter notebook or any python console.

Learn to Build ETL Data Pipelines on AWS

Recipe Objective: How to save a DataFrame to PostgreSQL in pyspark?
Step 3: To View Data of the Data Frame
- Step 4: To Save Dataframe to Postgres Table
- Conclusion

Step 1: Import the modules

In this scenario, we are going to import the pyspark and pyspark SQL modules and create a spark session as below:

import pyspark from pyspark.sql import SparkSession from pyspark.sql import Row spark = SparkSession.builder.config("spark.jars", "/usr/local/postgresql-42.2.5.jar") \ .master("local").appName("PySpark_Postgres_test").getOrCreate()

The output of the code:

Step 2: Create Dataframe to store in Postgres

Here we will create a dataframe to save in a Postgres table for that the Row class is in the pyspark.sql submodule. As shown above, we import the Row from class.

studentDf = spark.createDataFrame([ Row(id=1,name='vijay',marks=67), Row(id=2,name='Ajay',marks=88), Row(id=3,name='jay',marks=79), Row(id=4,name='vinay',marks=67), ])

Explore SQL Database Projects to Add them to Your Data Engineer Resume.

The output of the code:

Step 3: To View Data of the Data Frame

Here we are going to view the data top 5 rows in the dataframe as shown below.

studentDf.show(5)

The output of the code:

Step 4: To Save Dataframe to Postgres Table

Here we are going to save the dataframe to the Postgres table which we created earlier. To save, we need to use a write and save method as shown in the below code.

studentDf.select("id","name","marks").write.format("jdbc")\ .option("url", "jdbc:postgresql://localhost:5432/dezyre_new") \ .option("driver", "org.postgresql.Driver").option("dbtable", "students") \ .option("user", "hduser").option("password", "bigdata").save()

The output of the code:

To check the output of the saved data frame in the Postgres table, log in Postgres database.

The output of the saved dataframe:

As shown in the above image, we have written the dataframe to create a table in Postgres.

Conclusion

Here we learned to save a DataFrame to PostgreSQL in pyspark.

Download Materials

bigdata_1

bigdata_2

bigdata_3

bigdata_4

bigdata_5

Download_and_install_VM_Ubuntu_ISO

Install_the_Pyspark_or_Spark_on_Ubuntu

Installation_of_single_node_hadoop

What Users are saying..

Ameeruddin Mohammed

ETL (Abintio) developer at IBM

I come from a background in Marketing and Analytics and when I developed an interest in Machine Learning algorithms, I did multiple in-class courses from reputed institutions though I got good... Read More

Relevant Projects

Machine Learning Projects

Data Science Projects

Python Projects for Data Science

Data Science Projects in R

Machine Learning Projects for Beginners

Deep Learning Projects

Neural Network Projects

Tensorflow Projects

NLP Projects

Kaggle Projects

IoT Projects

Big Data Projects

Hadoop Real-Time Projects Examples

Spark Projects

Data Analytics Projects for Students

Relevant Projects

Snowflake Azure Project to build real-time Twitter feed dashboard

In this Snowflake Azure project, you will ingest generated Twitter feeds to Snowflake in near real-time to power an in-built dashboard utility for obtaining popularity feeds reports.

View Project Details

Azure Project to Build a real-time ADF Pipeline with LogicApps

In this Azure project, you will learn how to create an azure data factory pipeline that ingests real-time water sensor data from multiple European countries. The automated system ensures that only the latest records are processed without manual intervention reducing computational costs while maximizing resource utilization.

View Project Details

How to save a DataFrame to PostgreSQL in pyspark

Recipe Objective: How to save a DataFrame to PostgreSQL in pyspark?

System requirements :

Table of Contents

Step 1: Import the modules

Step 2: Create Dataframe to store in Postgres

Step 3: To View Data of the Data Frame

Step 4: To Save Dataframe to Postgres Table

Conclusion

What Users are saying..

Ameeruddin Mohammed

Relevant Projects

You might also like

Relevant Projects