avatarKalpan Shah

Free AI web copilot to create summaries, insights and extended knowledge, download it at here

3077

Abstract

gital pieces of art as NFTs. If you are not yet familiar with NFTs, <a href="https://readmedium.com/are-socks-the-new-tulips-4825ab13695b">here’s</a> a quick 101 on them. Over one weekend, with pieces selling from 10s to 100s of thousands of dollars each, Beeple — an ‘average’ dad driving an ‘average’ car — netted 69 million.</p><p id="457e">That, is why you should know of the name Beeple. And why NFTs might be a trend you want to learn more about.</p><p id="f997">Why would anyone buy the ‘digital’ artwork, you may ask? Good question. After all, I just snipped a photo of it above, and pasted it right here, for free, where you also enjoyed it, also for free.<b> So why would someone pay up to 777,777 for one digital artwork, when we can all have access to it?</b></p><p id="9f8f">The sense of ownership, proprietary rights to the ‘original’.</p><p id="1545">Still doesn’t make sense? Well, you are not alone. And not wrong. It does not really make sense.</p><p id="2427">The world is going places we do not yet understand.</p><p id="5b1b">Stay with me for the ride!</p><h2 id="d280">Enter Liink.</h2><p id="735e">And JPM Coin, and Onyx, and Format, and Confirm.</p><p id="83cc">JP Morgan is on top of this.</p><p id="14b8">Here are the cole’s notes.</p><p id="a5ce">A couple of years ago [yes, already], JP Morgan created JPM Coin, a digital coin used to speed up transactions and transfers of USD across the network, built on an Ethereum network they called Quorom. JPM Coin is stable coin, pegged 1:1 to the US Dollar [<a href="https://readmedium.com/three-crypto-trends-to-get-into-early-5cf090b07334">here</a> is a simple explanation of stable coins].</p><p id="1047">If you are asking, why would JP Morgan need a blockchain network to simply trade coins where their value 1:1 US Dollar? Well;</p><ul><li>the coin can be exchanged between entities digitally, instantaneously, without the intermediate ‘processing’ period we sometimes have to get our money transferred.</li><li>To transfer a digital coin, you need a blockchain platform. Hence Quorom.</li><li>Because it is cool and they wanted to get into this! So ‘cool’ that some digital coin enthusiasts don’t even recognize this one as a true digital coin — discrimination in digital coins already!</li></ul><p id="7650">Once they set up Quorom, JPM started a payment messaging system, Liink.</p><p id="1759">The “IIN” in the middle stands for interbank information network. Any bank which joins this network can have access to the ledger of transactions done on the network, i.e. can see every payment which went through on Quorom.</p><p id="e47f">Also using Quorom, JP Morgan went on to start new tech-enabled solutions such as Confirm and Format [JPM seems to be into puns].</p><p id="1124"><b>Confirm </b>is used to validate account info before payment is done, this is so costs are not wasted on potentially fraudulent transactions. <b>Format</b>, you guessed it, checks if the format for payments match and conform to the receiving party’s country’s requirements.</p><p id="2f5f">Today, JPM no longer ow

Options

ns Quorom, it spun it off and invested in the new owner — ConsenSys — so it can remain independent as it creates more tech solutions.</p><h2 id="c8e1">Next up: Diem.</h2><p id="000b">Diem is Facebook’s digital currency — because, why not.</p><p id="ebaf">Facebook has been trying to get regulatory approval to issue its own stable coin, on its own payment network, Diem Association, for a while. It finally got approval last month.</p><p id="d413">Why is this one in particular an interesting one to watch, especially since it seems everyone wants to issue their own coin? Exhibit A [or F!]: Amazon made a similar announcement recently.</p><p id="3ed3">A Facebook coin has one major advantage — its potential customer base. Diem in itself is ‘just another stable coin’. Facebook however has almost 3 billion users. If, say, 15% of these users adopt Diem, it will far supersede the use of any digital currency so far, including Bitcoin. It will also make it a lot more ‘mainstream’ given the on-the-ground average-person users of Facebook.</p><p id="aedf">Worth keeping an eye out for. Might change the digital currency story.</p><p id="3be4">So, less awkward intro than you expected, I hope? Willing to go for another date? My guess is ‘crypto’ will keep us interested for a few more.</p><p id="2b4a">Quick note, some stories above are not as recent, e.g. JP Morgan’s plays. I included them because the stories continue to evolve, and these terms are making it into mainstream media; this gives the background of what you might come across more often now.</p><p id="f261">Keep watch for more crypto news, and shoot me a response if you want to learn of anything in particular.</p><p id="d2e8">Feel free to join my newsletter <a href="https://www.getrevue.co/profile/ialmouaswas">here</a>, where I send a summary of my weekly articles, usually including one crypto-based.</p><div id="e0c4" class="link-block"> <a href="https://readmedium.com/are-socks-the-new-tulips-4825ab13695b"> <div> <div> <h2>Are SOCKS the New Tulips?</h2> <div><h3>How socks became digital and memes became coins.</h3></div> <div><p>medium.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/1*kvKzo_U2ugHXMvGyulkurw.jpeg)"></div> </div> </div> </a> </div><div id="8299" class="link-block"> <a href="https://readmedium.com/three-crypto-trends-to-get-into-early-5cf090b07334"> <div> <div> <h2>Three Crypto Trends to Get into Early</h2> <div><h3>Get ready to pick your favourite content creator.</h3></div> <div><p>medium.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/1*E7hcmlmlYS2afQMj4q9TpQ.png)"></div> </div> </div> </a> </div></article></body>

Spark ETL Chapter 1 with SQL Databases (MySQL | PostgreSQL)

Previous blog/Context:

In an earlier blog, we discussed Spark ETL with files (CSV, JSON, Text, Parquet and ORC). Please find below blog post for more details.

Introduction:

In this blog post, we will discuss Spark ETL with SQL Database. We will be considering MySQL and PostgreSQL for Spark ETL. All other SQL Databases like MS SQL, RedShift, and Oracle, also follow the same pattern, the only difference is we need to download different Maven packages for different databases. We will also discuss how to select and specify Maven packages for any database.

Image by Author

Today, we will be doing below Spark ETL

Task to do

1. Install required spark libraries (MySQL & PostgreSQL)

2. Create a connection with SQL Database (MySQL & PostgreSQL)

3. Read data from SQL Database (MySQL & PostgreSQL)

4. Transform data

5. write data into SQL Server (MySQL & PostgreSQL)

The first clone is below the GitHub repo, where we have all the required sample files and solutions.

If you don’t have a setup for Spark instance, MySQL, or PostgreSQL in your system follow the earlier blog for setting up Data Engineering tools in your system. (Data Engineering suite will setup Spark, MySQL, PostgreSQL, and MongoDB in your system)

Spark ETL with SQL Database

First, open Jupyter Notebook and copy all the content of Chapter 1 from GitHub to there.

Image by Author

First, we will do all the ETL with MySQL and after that, we will do it with PostgreSQL.

With our Spark Instance, we have a few spark libraries that are already installed but before doing MySQL or PostgreSQL we need to check that do we have libraries that are already installed or not.

From the docker desktop, go to the terminal of the Spark container or use docker exec -it

Image by Author

Now, go to directory /opt/spark/jars

Image by Author

Now, go to the Jars folder and list all the Jar files

Image by Author

Here, we see that with our spark instance, we have all the libraries/packages installed for ADLS/blob, S3, GCP, Delta, Avro, and Snowflake. But we don’t have packages installed for MySQL or PostgreSQL.

Spark ETL with MySQL

First, with starting the Spark session, we will also download and install MySQL Packages from Maven. You can check all the packages from Maven using this link

https://mvnrepository.com/artifact/mysql/mysql-connector-java

For downloading and installing any packages, we need to use the below config

'spark.jars.packages':'mysql:mysql-connector-java:8.0.32'

So, our code for starting the Spark session will be as below

#Start Spark Session
spark = SparkSession.builder.appName("chapter1")\
        .config('spark.jars.packages', 'mysql:mysql-connector-java:8.0.32')\
        .getOrCreate()
sqlContext = SparkSession(spark)
#Dont Show warning only error
spark.sparkContext.setLogLevel("ERROR")

It will first download the package from Maven and then install it

Image by Author

Now, we have the required library installed for MySQL so we can do Spark ETL with MySQL.

Create a connection with MySQL Database & Read Data from Table

We have MySQL Database with the name DATAENG and in that database, we already have a table with the name user is there, we will read that table from Spark. (If you don’t have a table created with data use a CSV file from GitHub and create and load data into MySQL)

Image by Author

Spark code for creating a connection with MySQL and the reading table will be as below

#Load CSV file into DataFrame
mysqldf = spark.read \
    .format("jdbc") \
    .option("driver","com.mysql.cj.jdbc.Driver") \
    .option("url", "jdbc:mysql://192.168.1.104:3306/DATAENG") \
    .option("dbtable", "user") \
    .option("user", "root") \
    .option("password", "mysql") \
    .load()
#Checking dataframe schema
mysqldf.printSchema()

In the URL, we need to pass the MySQL database location. We have a database running on localhost only so we should have used 127.0.0.1 but our spark will connect that from inside docker so it will consider docker’s 127.0.0.1 and that’s why in our scenario (with this setup) we need to pass our local assigned IP address.

Once the connection is done, we will check the schema and data in that data frame

Image by Author
mysqldf.show(n=10)
Image by Author

We will create a Temp table (or HIVE view) so that we can write Spark SQL for transformation.

mysqldf.createOrReplaceTempView("tempMySQL")

Transform data

sqlContext.sql("SELECT * FROM tempMySQL").show(n=5)
Image by Author

For transformation, we will filter users who are having id more than 40 and store them into other data frame.

newdf = sqlContext.sql("SELECT name as fullname FROM tempMySQL WHERE id > 40")
newdf.count()

Write data to MySQL

Using the command below, we will write to MySQL. We will ask to create a table and load data into a table from data frame.

newdf.write \
  .format("jdbc") \
  .option("driver","com.mysql.cj.jdbc.Driver") \
  .option("url", "jdbc:mysql://192.168.1.104:3306/DATAENG") \
  .option("dbtable", "username") \
  .option("user", "root") \
  .option("password", "mysql") \
  .save()

Now, if we go to MySQL and check, we will see a table with the same data.

Spark ETL with PostgreSQL

We don’t have the PostgreSQL package/library installed in our Spark instance. So, we need to install the Spark package for PostgreSQL also.

We will use the Spark PostgreSQL package from Maven.

https://mvnrepository.com/artifact/org.postgresql/postgresql

Code for starting Spark session with specifying Spark Package

#Start Spark Session
spark = SparkSession.builder.appName("chapter1PostgreSQL")\
        .config('spark.jars.packages', 'org.postgresql:postgresql:42.5.4')\
        .getOrCreate()
sqlContext = SparkSession(spark)
#Dont Show warning only error
spark.sparkContext.setLogLevel("ERROR")

This will first download the PostgreSQL package and install it.

Image by Author

We have PostgreSQL packages installed so now we can do all the ETL operations with PostgreSQL.

Create a connection and Read data from PostgreSQL from Spark

We have created an employee (employee salary) table in the PostgreSQL server for solving data engineering problems, we will read that table from Spark. (If you don’t have a table and data in PostgreSQL, use provided CSV file and import it into PostgreSQL)

Image by Author

Code for the same

postgredf = spark.read.format("jdbc") \
    .option("url", "jdbc:postgresql://192.168.1.104:5432/postgres") \
    .option("driver", "org.postgresql.Driver") \
    .option("dbtable", "public.employee_salary") \
    .option("user", "postgres") \
    .option("password", "postgres")\
    .load()
postgredf.printSchema()
postgredf.show(n=10)
Image by Author

We have data from PostgreSQL available in a data frame, we will create a temp table (or HIVE view) so that we can do Spark SQL.

postgredf.createOrReplaceTempView("tempPostgreSQL")

Transform data

We will filter employees with more than 50000 salaries and store them in dataframe.

newdf = sqlContext.sql("SELECT first_name,salary FROM tempPostgreSQL WHERE salary > 50000")

write data into PostgreSQL Server

We will store data from data frame to PostgreSQL.

newdf.write \
  .format("jdbc") \
  .option("url", "jdbc:postgresql://192.168.1.104:5432/postgres") \
  .option("driver", "org.postgresql.Driver") \
  .option("dbtable", "public.employee1") \
  .option("user", "postgres") \
  .option("password", "postgres")\
  .save()

This will create a table in PostgreSQL (if a table is not already there) and load data into the table.

Image by Author

Conclusion:

We have learned how to install and download Spark Packages if it is not already there. We also learned the below topics.

  • How to create a connection with MySQL and PostgreSQL
  • How to read tables (views) from MySQL and PostgreSQL using spark. Read
  • How to create a Hive table (views) and do a transformation
  • How to load data into MySQL and PostgreSQL from Spark using spark. Write
  • Here, we have observed that in both cases it first downloaded JDBC drivers and created a connection using JDBC. (For read and write both)

Video explanation:

Spark
Pyspark
Databricks
Etl
Sql
Recommended from ReadMedium