avatarBen Rogojan

Summarize

Onboarding For Data Teams

How to set-up a streamlined onboarding experience that empower your data engineers and analysts day one.

Onboarding at most companies would be a great bit for a comedian. For many, we have just accepted the fact that it’s not a great experience where you will likely spend the first two weeks just trying to get access to email.

But it doesn’t have to be this way nor should it.

Onboarding should be considered a critical part of any team and company’s processes. This is especially true with technical roles like analytics and data engineers.

These roles require access to multiple systems, an understanding of business context, and access to key data sets. Delaying or inhibiting how smooth the onboarding process is for these roles can be costly and lead to poor retention.

Thus creating a smooth onboarding process is a worthwhile investment. In this article, I will review some of my past experiences onboarding as well as discuss some key steps and considerations a data team should have in their onboarding process.

Reviewing Some Of My Past Onboarding Experiences

When you’re a new employee joining a new team or company, it can be a stressful experience.

Everyone else on the team clicks, everyone knows how to get access to the right data, and everyone else understands the general politics of the larger organization.

The faster you can understand both the technical and people aspect of your new role, the sooner you start making good decisions with confidence.

Experience #1: The problem is most companies don’t always provide a clear path for how to go from clueless to confident. Instead, I have experienced situations where the onboarding is a day or two about the company, its founding in 1867, and its principles.

Then I spent the next three weeks waiting for access to a database that wasn’t even the actual database I was supposed to get access to. After having multiple discussions with various employees and senior team members as depicted above.

Experience #2: Compare that to Facebook where on day one they were able to onboard hundreds of people, connect you to all the systems you needed and there was an entire four-week period focused on getting you up to speed for all their various technologies(know as Data Bootcamp).

Now, this was during the before times and it was all in person. We spent time in college-like rooms going through various technologies and tutorials to get up to speed.

We learned about Facebook-specific technologies like DevServers(EC2), Dataswarm(Airflow), iData(data catalog) and Daiquery(Snowsight).

By the time I joined my team I was already comfortable working in Facebook’s infrastructure, I knew how to commit code, find data sets, and traverse multiple code directories all on my own.

What’s more, this experience was always re-examined to review if it could be better. This meant I was able to make good decisions the day I started working with my team rather than having to constantly chase down other team members to understand what was going on.

Goals Of Onboarding

To set up an exceptional data team onboarding experience, you need to understand what the overall goals are. Not just the technical goals but what state you want your new employee to be in at the end of their onboarding. Here are several goals I believe are important.

  • Understanding Business Goals and XFN Needs — Simply focusing on onboarding new employees from a technical perspective doesn’t provide the correct alignment on how to use those tools. Onboarding should be an opportunity for new employees to be introduced to a teams goals, how they align with the business and the role the team plays in the larger ecosystem of other departments.
  • Productivity — A key goal of any onboarding process is to empower the new employee to be productive. If they need to constantly ask questions or constantly get roadblocked by data sets that they can’t access, then they aren’t being productive.
  • Empowering Team Members To Make Decisions — Finally, the end goal is to have employees that go beyond just doing tasks they are told to do, but feel comfortable making calls and big bets on future data projects. This requires an individual to not only understand the technology they are working on but also they need to understand the business goals and needs to see possible gaps and opportunities.

Onboarding For A Data Team

There is onboarding for a company(which is really just orientation) and then there is onboarding for a team. Generally, I have found that most companies only spend a day or two onboarding all employees. Facebook does a two-day intro(at least for me) where you hear from various people about the mission, principles, and guidelines as well as other more administrative tasks like getting your picture taken for your badge occur.

After that, it’s time to onboard onto your team.

Quick Pause — Right size for your organization. Some of what I will be referencing below needs to be right-sized per company. A start-up doesn’t have as much time and resources to spend on onboarding nor does it require it. As things change quickly and communication can happen quickly via in-person discussions. I guess that’s not always the case these days.

That doesn’t mean throw caution to the wind and document nothing. But continually grow your documentation. Set-up an initial document and handbook for engineering that others can add to as they join the team.

Once you get to engineer 20, you will have a pretty well documented process.

Onboarding For Context

Gaining an understanding of what is important to a business or a team is crucial. Especially in data teams where many individuals will need to balance their business and technical knowledge to deliver impact.

This involves understanding not only the key metrics, projects, and reports that a data team is creating but also understanding who are the key stakeholders and the drivers on external teams(at Facebook they were referenced as XFNs). A great point brought up by Nate Sooter in a recent conversation I had with him on my Youtube Live was that it’s not only important to understand who are your skip-levels(your boss’s boss) and direct XFNs. But also who are the individuals who disproportionally make things happen?

That is to say, if things get blocked or stuck, who seems to always be the one who can make them unstuck with a conversation or two?

But that takes time to figure out. From an onboarding perspective new hires should be introduced to:

XFNs(cross-functional partners) — Besides being introduced to team members and skip levels you will want to know:

  • Who your XFNs are
  • What are their team’s goals
  • What role do they play

Current Projects And Initiatives — It can be really easy on data teams to get stuck in adhoc hell. Data requests, maintenance, and data quality issues, all can bog down data teams from actual impactful work. What can make this problem worse is if the new data team members don’t have a good idea of what projects are going on and what initiatives are driving the team. This will leave far too much ambiguity and make it even easier to continually get sucked into adhoc hell.

KPIs — A great way to explain what is important to a business is by looking at its KPIs and other metrics. When done correctly they should provide the context in terms of what is worth tracking and asking questions about for the business.

Key Reports And Data Pipelines — Understanding what the key reports are and what pipelines support them provides two benefits. It provides further context in terms of who the team is supporting but it also provides an opportunity for experienced team members to explain to new team members if there are any issues with said reports and pipelines. Do they go down often? Or have any weird quirks? Hopefully, the answer is no since they are key reports but you never know.

Environment Set-Up

When it comes to onboarding for a team. The bare minimum is a document that contains a clear guide on how to set-up your environment. This can be in the form of a basic checklist or a set of wiki pages.

Even at the 12-person start-up I worked for had a guide that went from 0 to 1 for your environment set up.

Here are some key considerations you should have in your environment set-up document.

  • VPN — Most data and Cloud services need to use a VPN to manage access to their intranet. Not having access to this on day one means a new employee likely will be locked out of key systems.
  • Database access — Obviously don’t store your password and host for your various databases in a document. However, do reference how an individual should set-up any form of data warehouse account or where they can get secrets and information for source databases(usually in some form of password manager).
  • Cloud API keys and token set-up — Somewhat connected to database access is API key set-up. Whether it’s for S3 or Cloud Storage, most data teams rely on cloud providers to both store and provide data.
  • Code repo access — Whether you’re using Github, Gitlab or some other repo making sure an engineer has access day one ensures they can become familiar with your code base quickly.
  • Binary and custom library downloads — Some companies have specific libraries and packages that are required in order to run either their custom software or legacy applications. Make sure these are noted to avoid any blockage.
  • Jira Account — Or whatever tool you’re using to manage tickets and project management. This isn’t really environment set-up, but I wanted to include it somewhere.

Commit Something Day One

Committing code day one is a recurring theme for many companies. Adam referenced it above, but I am also aware that Gitlab and Facebook also have their new engineers do the same thing.

This can help break the ease the anxiety towards committing code to a new repo. It also makes the environment set-up a little more real and immediately applicable.

Standards And Style Guides

There is enough ambiguity in data projects, creating clear standards and style guides that can be given to new employees during onboarding can help get them up to speed quickly. Rather than having to spend a large amount of other teammates’ time reviewing their code for NITs a new data engineer can create PRs and then push them already knowing what the general standards are.

Whether this is for SQL or code having some form of style guide can provide a lot of benefits. What is great is that you don’t need to develop one from scratch. Several companies have shared their style guides, they are listed below:

Handbooks And Process Guides

In early-stage startups, it’s particularly tempting to avoid a documentation strategy. With only a few team members, it’s feasible to keep everyone informed via meetings, Slack, or email threads. Long-term, this oversight becomes increasingly harmful.

As a team scales, the need for documentation increases in parallel with the cost of not doing it. Said another way, implementing a documentation strategy becomes more difficult — yet more vital — as a company ages and matures.

-Gitlab

Many teams have specific processes in terms of workflows, project management, change management and data governance. All of which is information that eventually needs to pass to new employees. Initially as start-ups are small this will mainly be word of mouth. However, there does come a tipping point where it starts to make sense to document all of these various process.

I believe Gitlab has a great explanation about their handbook approach.

Run Books — Run books are not exactly handbooks but they live in a similar category. As teams mature and start having large portion of their work be maintenance focused, then run books become very helpful. They will often answer questions like:

  • How do you take a system down and bring it back up again(and not take down AWS)
  • How do you manage data requests for data that needs to be more secure
  • How do you debug that one-off pipeline that seems to breakdown once a month

All of this can be captured in run books that can help get new team members up to speed. Who am I kidding, they also keep more experienced members sane so they don’t have to remember every step to restart their internal systems.

Now What

Onboarding, in my opinion, doesn’t stop once an environment is set up or XFN intro meetings end. Instead, it is a continuous process that bleeds into the first few bits of work and possibly even the first project is delivered.

This is because, as referenced above, one of the goals of onboarding is to provide confidence and independence. A great way to do that is to deliver on your first mini-project and get feedback on it.

Feedback is always a gift, but from my experience, detailed feedback early on is very helpful. It helps set the tone for what the new team cares about and expectations.

Onboarding Is Your First Impression

If first impressions matter, then onboarding matters. All the small details matter and can come at a cost when not done right.

If you’ve experienced a thought-out onboarding you notice it. Almost by not noticing it. You don’t have too many questions about where you can find key information, nor will you need to spend hours chasing down IT tickets to get access to a database or download a one-off application.

I would love to hear some stories from you, the reader. Any examples of what you think make a great onboarding process?

Data Engineering
Data Science
Tech
Big Data
Management
Recommended from ReadMedium