avatarChristianlauer

Summary

Google has introduced BigQuery DataFrames, a Python API that integrates data analysis and machine learning capabilities within BigQuery, now available in preview.

Abstract

Google's latest offering, BigQuery DataFrames, is a Python-based API designed to streamline data analysis and machine learning tasks within the BigQuery ecosystem. This feature, which is currently in preview, provides partial compatibility with Pandas and scikit-learn, allowing users to leverage familiar data manipulation and machine learning functionalities directly within BigQuery. The open-source package can be installed via pip and includes a DataFrame API for data analysis and a machine learning API for BigQuery ML tasks. The introduction of BigQuery DataFrames is positioned as beneficial for those using Google Cloud and BigQuery in enterprise environments, facilitating more advanced Python-based data science and machine learning workflows without the need for additional interfaces. Google also provides a quickstart guide for users to begin utilizing this new feature.

Opinions

  • The author views the launch of BigQuery DataFrames as positive news for users working with Google Cloud and BigQuery, particularly for those interested in data science and machine learning.
  • The feature is seen as enhancing the user experience by combining the power of BigQuery with the flexibility of Python, reducing the complexity of interfaces for data analysis and machine learning tasks.
  • The author suggests that BigQuery DataFrames, along with BigQuery ML, represent a significant improvement in Google's data analysis and machine learning offerings.
  • The provision of a quickstart guide by Google indicates the author's opinion that Google is committed to supporting users in adopting and benefiting from the new feature.
  • The mention of BigQuery Studio alongside BigQuery DataFrames implies that the author considers these new features collectively as a substantial upgrade to the BigQuery platform, simplifying data analysis and science tasks.

Google launches BigQuery Data Frames

How Google combines BigQuery API and Python

Photo by David Clode on Unsplash

Google just announced BigQuery DataFrames — the feature is now in preview. BigQuery DataFrames is a Python API that you can use to analyze data and perform machine learning tasks in BigQuery[1].

BigQuery DataFrames combines Data Analysis and Data Science capabilities by giving you the following options[1]:

  • bigframes.pandas implements a DataFrame API (with partial Pandas compatibility) on top of BigQuery.
  • bigframes.ml implements a Python API for BigQuery ML (with partial scikit-learn compatibility).

DataFrames is an open-source package that you can run pip install --upgrade bigframes to install the latest version — here a small blue print on how to use it[2]:

import bigframes.pandas as bpd

bpd.options.bigquery.project = your_gcp_project_id
df1 = bpd.read_gbq("project.dataset.table")
df2 = bpd.read_gbq("SELECT a, b, c, FROM `project.dataset.table`")

So this is again quite good news if you are working with Google Cloud and BigQuery in the enterprise but also want to do more with Python and run Data Science and Machine Learning task without any more unnecessary interfaces. Alternatively, Google also offers BigQuery ML, an alternative for ML via SQL. If you want to get started with BigQuery DataFrames Google also provides a BigQuery DataFrames quickstart[2].

So as I said a really very useful new feature from Google for its flagship BigQuery — which this week was also equipped with other interesting new features — one of them BigQuery Studio — read more in the linked article below.

Sources and Further Readings

[1] Google, BigQuery release notes (2023)

[2] Google, BigQuery DataFrames (2023)

Data Science
Google
Programming
Machine Learning
Python
Recommended from ReadMedium