Free AI web copilot to create summaries, insights and extended knowledge, download it at here

3443

Abstract

re were no dashboards or models there would be no need for data engineers building pipelines.Ideally, for larger projects, an engineer and data teammate (data analyst) should work in tandem, making decisions about which reports are necessary to import and how the data must be used.You can certainly adapt an ingest-now, ask questions later approach, but it is far more efficient to have these infrastructure conversations up front.While data engineers talk to technical teammates, data analysts often report to product owners and managers.So, if you have a question about a product you’re pulling data from, the data analyst can act as an important liaison between technical and non-technical teammates.By keeping these channels of communication open, there will be more of a chance of delivering the desired end product, even if you don’t entirely understand every nuance of the data you provide.Pardon the interruption: For more Python, SQL and cloud computing walkthroughs, follow <a href="https://medium.com/pipeline-a-data-engineering-resource">Pipeline: Your Data Engineering Resource</a>.To receive my latest writing, you can <a href="https://medium.com/@zachl-quinn">follow me</a> as well.<h1 id="89a6">I Just Get The Data. You Know What To Do With It.</h1>While many stakeholders you’ll work with are highly specialized professionals who can manipulate data to make decisions, there are some aspects of data infrastructure they just don’t (and won’t) understand.Again, this isn’t to diminish the value they provide to their teams and the organization at large.It’s just to say that working with a data analyst typically makes my job easier because from the beginning I know:<ul><li>The desired <a href="https://readmedium.com/bigquery-schema-design-101-and-what-to-watch-out-for-bb914178604?source=your_stories_page-------------------------------------">schema</a></li><li>The ingestion cadence (how often are we loading the data?)</li><li>Desired <a href="https://readmedium.com/multi-column-transformation-using-config-files-in-python-3b641f3926df?source=your_stories_page-------------------------------------">transformations</a>/metrics</li></ul>Business use cases are rarely as straight forward, hence the need for requirements gathering sessions.Often, a good data analyst will read the same documentation as a data engineer does and understand the structure of the raw data.And, if they’re really nice, they’ll provide you with starter or <a href="https://readmedium.com/sql-quality-assurance-queries-9727df6bed5b?source=your_stories_page-------------------------------------">QA queries</a>.If you work with a particular analyst repeatedly you may even get a feel for what they prefer in a table.For example:<ul><li>Who is strict on maintaining house style in naming conventions and scripting syntax and who is a bit more lax</li><li>Whether they need a recurring column (I almost always provide my analysts with a time stamp column to have a better idea of the data’s recurrence; this helps with backfilling and identifying missing data)</li><li>What views they’ll ask for (a recurring ask is for a view constrained to the most recent pu

Options

ll)</li><li>What transformations are needed (does this table need to be <a href="https://readmedium.com/flattening-your-sql-tables-not-as-easy-as-it-sounds-e0a507cdb9f7">flattened</a>)?</li></ul>With many things in, life, however, there is a big exception to the “best customer” label.<h1 id="5291">Never Try This Seemingly-Innocent Behavior With A Data Analyst</h1>Just as the best data engineers are protective of the speed, volume and cleanliness of the data we provide, data analysts have their own focus.Accuracy.Just like the garbage in/garbage out motto for data science, data analysts are concerned that a bad pipeline could provide lies in/lies out.Since data analysts tend to get a lot of ad hoc requests from internal stakeholders, they’re often scrutinized for the insights they provide.Neither the data engineer nor the data analyst wants to be in a position that results in inaccurate data and poor insights.This is all to say that a data analyst will likely be stricter on your final product than an organizational stakeholder.Even if you’ve worked well together throughout the process, an analyst can and will scrutinize (read: rip apart) any data that might not provide sound <a href="https://readmedium.com/why-data-engineers-must-have-domain-knowledge-and-how-to-gain-it-e9228ff3350d?source=your_stories_page-------------------------------------">business value</a>.And sometimes this won’t be your fault.For instance, I’ve been in situations where the analyst thought they knew what they wanted but then when they saw the output of the API decided that the data didn’t serve a particular use case.That’s ok. But, ideally, you want to communicate any discrepancies early so you’re not in a position of scrapping or having to redo a pipeline.Bottom line: Don’t try to fudge the numbers, especially with those whose job it is to know them inside and out.<h1 id="9630">Takeaway</h1>After working with those who don’t quite understand the ETL process (or what a <a href="https://readmedium.com/why-i-chose-data-engineering-over-data-science-90f3135a7153?source=your_stories_page-------------------------------------">data engineer even is</a>), it can be a relief to work with someone who is, essentially, “on the same team.”Forming solid working relationships with data analysts is essential to your work as a data engineer since you both want to provide the best data presented in the best light possible.Data analysts typically work closely with stakeholders on frequent requests so, depending on the organization, they can also make excellent liaisons between the technical and non-technical workforce.The flip side of all of this is that the bar is higher when providing a data product to a fellow data professional. You’ll want to double and triple-check your work with rigorous QA before an analyst does the same.At the end of the day, data engineers exist to fuel a data analyst’s work so, like the data you provide, be prompt, accurate and easy to work with.Create a job-worthy data portfolio. Learn how <a href="https://pipe_line.ck.page/e97fc26c83">with my free project guide</a>.</article></body>

Data Engineers: Data Analysts Are Your Repeat Customers — How To Make Both Your Jobs Easier

Data science is a team sport; learn how to leverage your data analyst’s strengths to build more impactful data pipelines.

Data engineers power other data jobs, even in the shadows. Image courtesy of Unsplash & Getty Images.

I need your help. Take a minute to answer a 3-question survey to tell me how I can help you outside this blog. All responses receive a free gift.

Data Jobs == Customer Service Jobs

Like it or not, data engineering and many data science-adjacent disciplines are customer-facing roles.

Instead of mindlessly mining data, data engineers exist to serve the needs of an end user.

Most of the time this means you’re fulfilling requests for internal (or external) clients who have their own agenda for the data you provide.

These clients, while mostly non-technical, are invaluable to an organization because they use data to drive organizational growth and, in turn, generate recurring revenue.

However, it can sometimes be difficult to communicate specific data or technical concepts to those that don’t understand the nuances of the process.

That’s why some of the best clients you’ll work with are fellow data practitioners.

You’ll find that fulfilling data science or data analytic requests are different for one reason.

These folks know their stuff.

And while data engineers serve both data scientists (or ML engineers) and data analysts, I’m more on the analytics side of data engineering, so I’ll specifically speak to how data engineers can serve data analysts.

Data Science Is A Team Sport — Leverage Your Strengths

The stereotype of the “lone hacker” doesn’t apply to data science.

At larger firms, the data science process is split into separate roles, namely:

Data Scientist
Data Analyst
Data Engineer

As a data engineer, I’m responsible for the first part in the data science process: Ingesting, cleaning and preprocessing data for production.

I get the first look at any new data we’d like to incorporate in our data warehouse. However, once this data is ingested my experience with it, aside from maintaining pipelines, is largely over.

Although I spend hours with the initial raw data, on many teams a data analyst develops the deepest understanding.

From the beginning it’s important to remember that data is a team sport.

Models don’t function without input data and if there were no dashboards or models there would be no need for data engineers building pipelines.

Ideally, for larger projects, an engineer and data teammate (data analyst) should work in tandem, making decisions about which reports are necessary to import and how the data must be used.

You can certainly adapt an ingest-now, ask questions later approach, but it is far more efficient to have these infrastructure conversations up front.

While data engineers talk to technical teammates, data analysts often report to product owners and managers.

So, if you have a question about a product you’re pulling data from, the data analyst can act as an important liaison between technical and non-technical teammates.

By keeping these channels of communication open, there will be more of a chance of delivering the desired end product, even if you don’t entirely understand every nuance of the data you provide.

Pardon the interruption: For more Python, SQL and cloud computing walkthroughs, follow Pipeline: Your Data Engineering Resource.

To receive my latest writing, you can follow me as well.

I Just Get The Data. You Know What To Do With It.

While many stakeholders you’ll work with are highly specialized professionals who can manipulate data to make decisions, there are some aspects of data infrastructure they just don’t (and won’t) understand.

Again, this isn’t to diminish the value they provide to their teams and the organization at large.

It’s just to say that working with a data analyst typically makes my job easier because from the beginning I know:

The desired schema
The ingestion cadence (how often are we loading the data?)
Desired transformations/metrics

Business use cases are rarely as straight forward, hence the need for requirements gathering sessions.

Often, a good data analyst will read the same documentation as a data engineer does and understand the structure of the raw data.

And, if they’re really nice, they’ll provide you with starter or QA queries.

If you work with a particular analyst repeatedly you may even get a feel for what they prefer in a table.

For example:

Who is strict on maintaining house style in naming conventions and scripting syntax and who is a bit more lax
Whether they need a recurring column (I almost always provide my analysts with a time stamp column to have a better idea of the data’s recurrence; this helps with backfilling and identifying missing data)
What views they’ll ask for (a recurring ask is for a view constrained to the most recent pull)
What transformations are needed (does this table need to be flattened)?

With many things in, life, however, there is a big exception to the “best customer” label.

Never Try This Seemingly-Innocent Behavior With A Data Analyst

Just as the best data engineers are protective of the speed, volume and cleanliness of the data we provide, data analysts have their own focus.

Accuracy.

Just like the garbage in/garbage out motto for data science, data analysts are concerned that a bad pipeline could provide lies in/lies out.

Since data analysts tend to get a lot of ad hoc requests from internal stakeholders, they’re often scrutinized for the insights they provide.

Neither the data engineer nor the data analyst wants to be in a position that results in inaccurate data and poor insights.

This is all to say that a data analyst will likely be stricter on your final product than an organizational stakeholder.

Even if you’ve worked well together throughout the process, an analyst can and will scrutinize (read: rip apart) any data that might not provide sound business value.

And sometimes this won’t be your fault.

For instance, I’ve been in situations where the analyst thought they knew what they wanted but then when they saw the output of the API decided that the data didn’t serve a particular use case.

That’s ok. But, ideally, you want to communicate any discrepancies early so you’re not in a position of scrapping or having to redo a pipeline.

Bottom line: Don’t try to fudge the numbers, especially with those whose job it is to know them inside and out.

Takeaway

After working with those who don’t quite understand the ETL process (or what a data engineer even is), it can be a relief to work with someone who is, essentially, “on the same team.”

Forming solid working relationships with data analysts is essential to your work as a data engineer since you both want to provide the best data presented in the best light possible.

Data analysts typically work closely with stakeholders on frequent requests so, depending on the organization, they can also make excellent liaisons between the technical and non-technical workforce.

The flip side of all of this is that the bar is higher when providing a data product to a fellow data professional. You’ll want to double and triple-check your work with rigorous QA before an analyst does the same.

At the end of the day, data engineers exist to fuel a data analyst’s work so, like the data you provide, be prompt, accurate and easy to work with.

Create a job-worthy data portfolio. Learn how with my free project guide.