How to Use Logging Query Language to Analyze GCP Logs Data in Python

How to use Google Cloud’s Logs API and Logging Query Language in Python to obtain real-time data on active GCP instances.

Using LQL in Python

Logging query language (LQL) enables users to query Cloud Logging data, yet integrating LQL with Python allows for more dynamic querying than building and running queries in the UI.

If you’re like me, up until a certain point, you may have only employed LQL queries in the GCP UI to discover, for instance, which cloud functions are triggering the most errors.

However, recently, I dug a bit deeper into the full syntax of LQL and how to integrate the query language with Python via the logging API client library.

In creating a feature that monitored real-time statuses of Google Compute Engine (GCE) instances, I discovered that there are not many pieces of documentation or tutorials that comprehensively cover how to retrieve logs data via Python.

Therefore, in the next few minutes I will explain:

How to use the logging client library to list info relevant to GCE instances (VMS, cloud functions, etc.)
How to create a filtering string using LQL
How to properly format and integrate a time filter into the LQL string

In this story we’ll specifically examine the process for retrieving logging information relevant to virtual machines run with Compute Engine.

For your convenience, the full script is included at the end of this story.

Creating and Authorizing a Logging Client

Whenever using a GCP API, the first steps are always authentication and instantiation of the relevant clients.

Google uses the OAuth library to authorize applications. Additionally, you can also retrieve credentials stored in a Google Cloud Storage bucket or other secure, external repository.

We’ll want to use the Google Cloud logging library (google.cloud.logging) and import its Client function, which requires two parameters: project_id and credentials.

from google.cloud.logging import DESCENDING, Client

client = Client(project = project_id, credentials = credentials)

List_Entries

To obtain information relevant to VMs, we’ll want to use the list_entries method.

It is important to remember that there are several different logs you can query. Keep in mind that machine abstractions like containers and virtual machines are distinct and will appear in different logs.

List_entries provides us with information specific to each VM instance. Crucially, however, it does not provide the instance name.

(Note: Querying the containers end point, typically preceded by ‘cos’, will provide an instance name).

To find the instance name you’ll have to use the compute engine client, passing the project name and zone of your instance in as a parameter.

Full credit goes to this Stack Overflow answer for helping me with that component of the build.

request = service.instances().list(project=project, zone=zone)

Getting the instance name and instance ids at the same time you fetch the logging data is essential because instance ids are not static values.

For instance, if you delete and re-create an instance, it will have a new instance id, meaning you can’t necessarily use a look up table or any other method that assumes the instance id is unchanging.

At this point, the code looks like this:

request = service.instances().list(project=project, zone=zone)

client = Client(project=project_id, credentials=credentials)

client.list_entries(filter_=FILTER, order_by=DESCENDING, page_size=1000)

To specify the instance records that the GCP logs return, you’ll need to include a filter, written in logging query language (LQL) syntax.

For your reference, I created a comprehensive guide to GCP’s LQL in a previous story.

So, with a filter, that client.list_entries() function might look like this:

client.list_entries(filter_='resource.type=gce_instance severity=ERROR', order_by=DESCENDING, page_size=1000)

Note that you can enter a page_size parameter to limit results which can help with overall code optimization.

Even though we have configured the list_entries function, we’ll need to ensure that the data is not only queried, but retained and cleaned in a data frame.

Pardon the interruption: For more Python, SQL and cloud computing walkthroughs, follow Pipeline: Your Data Engineering Resource.

To receive my latest writing, you can follow me as well.

Retrieving Instances

In order to analyze the logs comprehensively, we’ll have to make sure that all logs are being returned.

To accomplish this, we’ll need to loop through ‘entry’ and extract the following fields:

timestamp
resource
payload

for entry in client.list_entries(filter_=FILTER, order_by=DESCENDING, page_size=1000):
     timestamp=entry.timestamp.isoformat()
     resource=entry.resource.labels
     payload=entry.payload

If you’re familiar with Python syntax, this looping structure should be familiar. Pay particular attention, though, to the timestamp format.

It is not in datetime or timestamp (both fairly common Pandas data types). Instead, the timestamp type is in iso format.

Timestamp is helpful in configuring a time filter, which we’ll get to in a moment.

Resource contains labels like zone and instance id that can be joined on instance id retrieved from the GCE API to yield the instance name.

To ensure that these fields are accurately stored and formatted within a data frame, we can simply create an empty data frame and append the corresponding dictionary results.

df = pd.DataFrame()
for entry in client.list_entries(filter_=FILTER, order_by=DESCENDING, page_size=1000):
    timestamp=entry.timestamp.isoformat()
    resource=entry.resource.labels
    payload=entry.payload
    df=df.append({
    'timestamp': timestamp,
    'zone': resource['zone'],
    'instance_id': resource['instance_id'],
    'payload': payload
    }, ignore_index=True)

The most important information, including error messages and compute metrics are stored in the payload field.

Creating a Time Stamp

In the GCP logs UI you can configure time parameters using timestamp. However, the GCP logging client docs only include examples of static timestamps.

For instance:

'timestamp >= "2022-05-20" AND timestamp <= "2022-05-23"'

Timestamps help provide context for the data returned and prevent overloading the API with excessive read requests.

However, like anything else with Python, we can configure timestamp to be a dynamic variable.

My first thought would be to use pd.Timestamp to create an equivalent variable. But, as we noted before, GCP timestamps are in i-s-o format.

That means we’ll have to ensure that any timestamp we create is in a similar format.

As a string, iso format looks like this:

iso = "%Y-%m-%dT%H:%M:%SZ"

That means we’ll have to convert any timestamp to iso using the following code snippet:

from datetime import datetime, timedelta
from datetime import date 
from pytz import timezone

tz = timezone('America/New_York')
current_time = datetime.now(tz)

previous_four_hour = (current_time.replace(minute=0, second=0, microsecond=0) - timedelta(hours=4)).strftime("%Y-%m-%dT%H:%M:%SZ")

current_hour = (current_time.replace(minute=0, second=0, microsecond=0).strftime("%Y-%m-%dT%H:%M:%SZ")

Filtering Based on Timestamp

After writing Python, we’ll briefly switch back to LQL.

Fair warning: The string formatting is a bit tricky in this part, so definitely take your time.

To be fully transparent, I constructed my filter based on this StackOverflow answer.

You can see my example below:

FILTER = 'resource.type = gce_instance log_name="your-log-name severity=ERROR" AND timestamp>=' + '\"' + previous_four_hour + '\" AND timestamp <= ' + '\"' + current_hour + '\"'

Python string formatting also allows for the use of triple quotes to create a string.

This will allow you to place the code on multiple lines and make it a bit neater in the development process.

FILTER = """ 
resource.type = gce_instance log_name="your-log-name severity=ERROR" AND timestamp>=' + '\"' + previous_four_hour + '\" AND timestamp <= ' + '\"' + current_hour + '\"
"""

Finally, I suggest you store this filter externally in a config file to avoid unnecessary clutter within the body of your script.

Conclusion and Full Script

Like many other meta data sources, logging data can be easily overlooked on a day-to-day basis.

However, once you fully understand how to wield the power of meta data and how to leverage such data sources to gain insights into your infrastructure, meta data knowledge becomes essential.

By integrating GCP logging information into a Python script you can dynamically query your logs in real-time and even create automated checks using Google Cloud Functions.

One potential use case is checking to see which errors are causing container or virtual machine failures.

Please see the code block below for the full script (note: You may need to configure certain variables based on your environment/project context; some of these values were simply placeholders for demonstration purposes only).

I hope this provides a bit more transparency and information when it comes to querying GCP logs via Python.

from google.cloud.logging import DESCENDING, Client

client = Client(project = project_id, credentials = credentials)

filter_ = 'resource.type = gce_instance log_name="your-log-name severity=ERROR" AND timestamp>=' + '\"' + previous_four_hour + '\" AND timestamp <= ' + '\"' + current_hour + '\"'

def get_logging_errors():
    df = pd.DataFrame()
    for entry in client.list_entries(filter_=FILTER,         order_by=DESCENDING, page_size=1000):
    timestamp = entry.timestamp.isoformat()
    resource = entry.resource.labels
    payload = entry.payload
    df = df.append({
       'timestamp': timestamp,
       'zone': resource['zone'],
       'instance_id': resource['instance_id'],
       'payload': payload
     }, ignore_index=True)

    return df

if__name__ == "__main__":
   get_logging_errors()

Create a job-worthy data portfolio. Learn how with my free project guide.