avatarChristianlauer

Summarize

Google launches Java and Scala Procedures for BigQuery

Using stored procedures for Apache Spark with Java or Scala

Photo by Claudel Rheault on Unsplash

Some time ago, Google made BigQuery capable of using Apache Spark. Spark is an excellent way to execute data queries on vast data sets from various sources at high speed and with great performance. For this, the framework uses a distributed architecture and cluster computing. You could already use Apache Spark stored procedures that are written in Python[1].

After you create them, you can let them run easily with SQL, similar to running SQL stored procedures. A stored procedure in BigQuery is a collection of statements that can be called from other queries or stored procedures. A procedure can accept input arguments and return values as output.

Now, Google has announced that you can also create stored procedures for Apache Spark using Java or Scala and that you can also use the Google Cloud console PySpark editor to add options for stored Python procedures for Apache Spark[2].

To create a procedure in BigQuery, you just have to open the query editor and add the sample code for the CREATE PROCEDURE statement that appears Here is a blue print that you can use[3]:

# Create procedure with main_file_uri option
CREATE PROCEDURE `PROJECT_ID`.DATASET.PROCEDURE_NAME(PROCEDURE_ARGUMENT)
 WITH CONNECTION `CONNECTION_NAME`
 OPTIONS (
     engine="SPARK", runtime_version="RUNTIME_VERSION",
     main_file_uri=["MAIN_JAR_URI"]);
LANGUAGE JAVA|SCALA

Google’s Data Warehouse is thus opening up further to other tools and programming languages — in this case Spark and Java and Scala. This also makes sense, since both programming languages are very popular and thus knowledge and other entry barriers for Data Engineering with BigQuery are significantly reduced.

Sources and Further Readings

[1] Google, BigQuery Release Notes (2022)

[2] Google, BigQuery release notes (2023)

[3] Google, Work with stored procedures for Apache Spark (2023)

Data Science
Google
Bigquery
Scala
Java
Recommended from ReadMedium