avatarByron Cabrera

Summary

The provided content outlines essential best practices for building reliable workflows with Temporal in Go, emphasizing performance, maintainability, and fault tolerance.

Abstract

The article "Building Reliable Workflows with Temporal in Go: Essential Best Practices" delves into key strategies for developers to create robust applications using Temporal's open-source platform. It underscores the importance of a consistent naming strategy for workflow IDs to facilitate tracking and management. The article also highlights the use of search attributes for efficient querying, the inclusion of logging context for better tracing and debugging, and the implementation of a WorkflowIDReusePolicy to handle duplicate IDs. Retry policies for both workflows and activities are discussed to manage automatic retries in case of failures, with an emphasis on preventing system overload and indefinite retry loops. The content further explores setting realistic timeouts for activities, using child workflows for modularity, employing the ContinueAsNew pattern for long-running workflows, and distinguishing between retryable and non-retryable errors for effective error handling. The article concludes by advocating for the use of TestWorkflowEnvironment for thorough testing of workflows and activities prior to production deployment.

Opinions

  • The author suggests that a consistent naming strategy for workflow IDs is crucial for ease of debugging and logging.
  • It is recommended to use custom search attributes to enhance the querying of workflows with specific parameters.
  • The inclusion of logging context with temporal-related details is seen as beneficial for tracing and debugging.
  • The author emphasizes the necessity of defining a WorkflowIDReusePolicy to avoid conflicts from duplicate workflow IDs.
  • A well-configured retry policy is considered essential to handle transient errors without overwhelming the system.
  • Setting appropriate activity timeouts and retry policies is viewed as a best practice to ensure system performance is not compromised.
  • The use of child workflows is encouraged for breaking down complex workflows into manageable units, improving performance and replay times.
  • The ContinueAsNew feature is presented as ideal for workflows that need to run indefinitely, to prevent excessive history growth.
  • The author advises distinguishing between retryable and non-retryable errors to optimize workflow retries and avoid unnecessary workload.
  • Testing workflows with TestWorkflowEnvironment is regarded as a critical step before production to ensure deterministic testing and simulate production-like scenarios.

Building Reliable Workflows with Temporal in Go: Essential Best Practices

Temporal is an open-source platform designed to manage the execution of workflows in a reliable, fault-tolerant way. With features like built-in retry logic, scheduling, and failure handling, Temporal is well-suited for both short and long-running background tasks. After implementing Temporal in production the past few month, I’ve gathered several best practices to enhance performance and maintainability when using it with Golang.

In this article, I’ll capture some of these best practices and share practical strategies for workflow execution, activity configuration, testing, and error handling.

To learn more about Temporal in depth, please visit the official Temporal documentation.

Best Practices for Temporal Workflows in Go

In Temporal, workflows are long-running stateful functions, while activities are discrete, stateless tasks that workflows orchestrate. Together, they provide developers the tools needed to build fault-tolerant and highly scalable applications that can automatically handle retries and complex dependency chains.

Leveraging Workflows and Activities properly consists of knowing how to best configure their executions:

1. Workflow IDs

Assigning a consistent naming strategy for workflow IDs is essential to easily track and manage workflows. Having clear and descriptive workflow IDs makes debugging and logging more straightforward.

One way to create consistent IDs is by generating them from key inputs. For example, if you’re creating a workflow for new hire notifications within a company, you might use the workflow type, company ID, and a timestamp:

func CompanyNewHireNotificationWorkflowID(companyID string) string {
    return fmt.Sprintf("CompanyNewHireNotificationWorkflow-%s-%d", companyID, time.Now().Unix())
}

Best Practice: Store workflows using IDs based on unique, relevant input fields, such as entity IDs, to ensure consistency and allow easy retrieval.

2. Search Attributes

Search Attributes allow you to add custom metadata to workflows, making it easier to query workflows with specific parameters (e.g., customer IDs). To use custom search attributes, register them in the Temporal instance using the Temporal CLI.

Register a search attribute:

tctl admin cluster add-search-attributes --name CustomerID --type Int

When starting a workflow, include any relevant search attributes you registered:

client.StartWorkflowOptions{
    TaskQueue:       "someTaskQueue",
    WorkflowID:      workflowID,
    SearchAttributes: map[string]interface{}{
        "CustomerID": 12345,
    },
}

Best Practice: Define search attributes at the start of the project and consistently add them in workflows for efficient querying and filtering in Temporal Web.

3. Logging Context

To add logging context in your Temporal workflows, especially if you’re using a custom logger, you can enhance your logs with valuable details such as the workflow ID, run ID, activity name, and other execution-specific data. This is useful for tracing and debugging as it allows you to pinpoint exactly where issues arise within Temporal’s complex workflow orchestration.

For workflows you would use workflow.GetInfo(ctx) which returns details about the workflow such as the ID and run ID.

For activities you would use activity.GetInfo(ctx) which likewise returns details about the activity such as it’s name.

Best Practice: Correlating temporal related and non-temporal related logs within your application is easier when you can add temporal related context to them where possible.

4. WorkflowIDReusePolicy

The WorkflowIDReusePolicy determines how Temporal handles duplicate workflow IDs. The most common options are:

  • AllowDuplicateFailedOnly: Allows reuse if the previous workflow with the same ID has one of the following states [terminated, cancelled, timed out, failed]
  • AllowDuplicate: Allows duplicate workflow IDs, regardless of status.
  • RejectDuplicate: Rejects any duplicate workflow IDs.
client.StartWorkflowOptions{
    WorkflowIDReusePolicy: temporal.WorkflowIDReusePolicyAllowDuplicateFailedOnly,
}

Best Practice: Set a WorkflowIDReusePolicy to manage conflicts and avoid accidentally running duplicate workflows, particularly when using client-generated IDs.

5. Workflow Retry Policy

Temporal’s RetryPolicy allows workflows to automatically retry in the event of failure. Use it to define:

  • Maximum attempts
  • Initial interval between retries
  • Backoff coefficients to increase intervals over time
  • Maximum retry interval
client.StartWorkflowOptions{
    RetryPolicy: &temporal.RetryPolicy{
        InitialInterval:    1 * time.Second,
        BackoffCoefficient: 2.0,
        MaximumInterval:    1 * time.Minute,
        MaximumAttempts:    5,
    },
}

Best Practice: Define RetryPolicy in a way that gracefully handles transient errors without overwhelming the system with retries, especially during high-load periods. This also ensures that workflows aren’t retried indefinitely!

6. Activity Timeout Settings

Each Temporal activity should have a well-defined StartToCloseTimeout to prevent it from running indefinitely. This setting specifies the maximum time allowed for an activity to complete.

options := workflow.ActivityOptions{
  StartToCloseTimeout: time.Minute * 2,
  ...
}

Best Practice: Set realistic timeouts for each activity based on expected durations and retry accordingly. Avoid using excessively long timeouts, as they can lead to workflow delays.

7. Activity RetryPolicy

Just like workflows, activities benefit from a RetryPolicy to handle transient errors. By implementing retry logic, you can account for temporary issues such as network outages.

activityOptions := workflow.ActivityOptions{
    StartToCloseTimeout: time.Minute,
    RetryPolicy: &temporal.RetryPolicy{
        InitialInterval:    500 * time.Millisecond,
        BackoffCoefficient: 2.0,
        MaximumAttempts:    3,
    },
}

Best Practice: Set reasonable retry policies that balance resilience with system load, taking into account the likelihood and severity of transient failures.

8. Child Workflows

Use Child Workflows to divide complex workflows into modular, manageable units. This can improve system performance and reduce the time it takes to retry or replay workflows.

childWorkflowFuture := workflow.ExecuteChildWorkflow(ctx, MyChildWorkflow, input)

Best Practice: Divide large workflows into smaller child workflows to simplify workflow orchestration, allowing for finer-grained control and isolation.

9. Continue-As-New Pattern

The ContinueAsNew feature is ideal for workflows that need to run indefinitely or for extended durations. This allows a workflow to “restart” with a clean history, preventing history size from becoming too large.

workflow.GetInfo(ctx).ContinueAsNew(ctx, MyWorkflow, nextInput)

Best Practice: Use ContinueAsNew for workflows that manage real-time or high-frequency events, reducing the storage burden on Temporal.

10. Error Handling Best Practices

When handling errors in Temporal, distinguish between retryable and non-retryable errors. For example, transient network issues should trigger a retry, while permanent issues like “invalid input” should terminate the workflow.

if err != nil {
    return workflow.NewNonRetryableApplicationError("Invalid input", "ValidationError", err)
}

Best Practice: Clearly categorize errors to ensure Temporal retries only transient errors, avoiding unnecessary workload from retries that are doomed to fail.

11. Testing Temporal Workflows

Testing workflows with Temporal can be done using the TestWorkflowEnvironment. This environment provides a framework to test workflows deterministically, simulating retries and error handling as they would occur in production.

env := testsuite.NewTestWorkflowEnvironment()
env.RegisterActivity(MyActivity)
env.ExecuteWorkflow(MyWorkflow, input)
if err := env.GetWorkflowResult(&result); err != nil {
    t.Fatal("Workflow failed with error", err)
}

Best Practice: Use TestWorkflowEnvironment to validate workflows and activities before deploying to production, focusing on scenarios like retries, errors, and input variations.

Conclusion

These best practices offer a starting point for building robust, fault-tolerant workflows with Temporal in Go. By following structured strategies around workflow IDs, retry policies, search attributes, and error handling, you can achieve resilient background processing suited to both short and long-running operations.

Temporal is a powerful tool for orchestrating workflows, and its flexibility with Go unlocks significant benefits in terms of both maintainability and scalability.

Cheers!

Go
Golang
Temporal
Best Practices
Recommended from ReadMedium