avatarJason Chong

Free AI web copilot to create summaries, insights and extended knowledge, download it at here

10624

Abstract

total customers, the average number of transactions per customer</li><li>Assessment of trial: check each trial store individually in comparison with the control store to get a clear view of its overall performance</li><li>Collate findings and provided recommendations on the impact on sales during the trial period</li></ul><h2 id="2b1c">Task 3: Analytics and commercial application</h2><p id="33c3">Finally, use the analytics insights from tasks 1 and 2 to construct a report for your client, the category manager. Your report should include data visualisations, key callouts, insights, recommendations, and next steps.</p><p id="23c3">Task 3 is targeted specifically at building your ability to recognise commercial, actionable insights from your analysis and displaying it in a clear and concise way for your client with minimal jargon. Here, you are introduced to the <a href="https://readmedium.com/the-pyramid-principle-f0885dd3c5c7">Pyramid Principle</a>, a top-down communication approach commonly used in consulting when presenting to a client.</p><h1 id="958d">General Electric — Digital Technology Data Analytics Program</h1><div id="4b73" class="link-block"> <a href="https://www.theforage.com/virtual-internships/prototype/ThbphD5N5WRsd9Mxo/Digital-Technology-(Data-Analytics)-Virtual-Experience-Program?ref=DsEXFixxovqkRxR2u"> <div> <div> <h2>Forage</h2> <div><h3>Eager to gain an insight into how GE pioneers leadership in the new technological era? GE hires pioneers, problem…</h3></div> <div><p>www.theforage.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/0*zPh6sFudMkkXQNOw)"></div> </div> </div> </a> </div><h2 id="2a4c">Background</h2><p id="e697">At GE Aviation, a data lake is a single database instance that contains data from all around aviation. Everything from financial, delivery of parts and engines, supplier, engine data, customer data, and so on.</p><p id="4deb">The advantage of having a data lake is that it allows the developer to make a single connection string to the data lake, and as long as the developer has permission to see the data, they are able to immediately start creating data-driven insights in a centralised repository of data.</p><p id="9796">For this program, you are given 8 data sets:</p><ul><li>Remaining useful life (RUL): the predictor variable to prevent unscheduled maintenance and identify manufacturing issues that cause RUL to decrease more rapidly than expected</li><li>Flight data from 4 airline operators: health of the mechanical engine of a plane</li><li>Airport location data: where the airline operators are flying when their data is collected</li><li>Manufacturing data: key characteristic measurements for parts at certain operations within their production</li><li>Manufacturing bill of material: which engines have been used</li></ul><h2 id="ec64">Task 1: Data engineering</h2><p id="2195">You have been asked by GE Aviation leadership to create a single data set that combines all the data listed above into a single table.</p><p id="9f6e">You can complete this task using either Excel (beginner level) or Tableau (intermediate level).</p><h2 id="0507">Task 2: Data visualisation</h2><p id="639e">At GE, we use many different styles of visualisation charts to make decisions based on real-time data. These visualisations ensure quality control in our manufacturing process and determine if the parts we manufacture are made accurately.</p><p id="827a">A run chart is a line graph that is plotted over time, in other words, a time series chart that is used to plot data points over a given time frame. This helps us to track how our manufacturing process is performing over time as our machinery ages.</p><p id="5130">KPI, on the other hand, are values calculated such that use a single number to give insight to the performance of a process. This helps us to quickly and efficiently make decisions around what improvements we can make in our process.</p><p id="5254">When parts are manufactured, individual design attributes are machined into each part as it goes down the assembly line. This results in a finished part. This is called an operation.</p><p id="591f">After each operation, there is an expected nominal measurement of the design attribute that we record in the manufacturing records. This is to ensure each part is made the same and will fit its required purpose. There is also an acceptable tolerance of how far off the measurement is allowed from that nominal value.</p><p id="42df">Your task is to create a run chart using Tableau that will visualise the measurement of a given feature of each operation for a given part number. We need to identify whether the measurement is in or out of a given specification of that given feature to showcase to your supply chain management team how the manufacturing process is performing.</p><h1 id="97ba">KPMG Australia — Data Analytics Virtual Experience Program</h1><div id="30fe" class="link-block"> <a href="https://www.theforage.com/virtual-internships/theme/m7W4GMqeT3bh9Nb2c/KPMG-Data-Analytics-Virtual-Internship?ref=DsEXFixxovqkRxR2u"> <div> <div> <h2>Forage</h2> <div><h3>Take the mystery out of big data and learn what it’s like to produce business results with the KPMG data analytics team…</h3></div> <div><p>www.theforage.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/0*FIza6zRugBgI7cEu)"></div> </div> </div> </a> </div><h2 id="f948">Background</h2><p id="2d09">Sprocket Central Pty Ltd, a medium size bikes and cycling accessories organisation needs help with its customer and transaction data. The company has a large dataset relating to its customers, but its team is unsure how to effectively analyse it to help optimise its marketing strategy.</p><p id="3e85">The client has provided KPMG with 3 data sets:</p><ul><li>Customer demographic</li><li>Customer addresses</li><li>Transaction data in the past 3 months</li></ul><h2 id="786e">Task 1: Data quality assessment</h2><p id="c3d0">Your task is to draft an email to the client identifying the data quality issues and strategies to mitigate them.</p><p id="a415">You can check for these data issues against the data quality framework table:</p><ul><li>Accuracy</li><li>Completeness</li><li>Consistency</li><li>Currency</li><li>Relevancy</li><li>Validity</li><li>Uniqueness</li></ul><h2 id="62cc">Task 2: Data insights</h2><p id="be79">You are now provided with a list of 1,000 potential customers with their demographics and attributes, however, these customers do not have prior transaction history with the organisation.</p><p id="edc1">The marketing team is sure that, if correctly analysed, the data would reveal useful customer insights which could help optimise resource allocation for targeted marketing, specifically, marketing that focuses on high-value customers.</p><p id="a903">Using the existing 3 datasets, your task is to recommend which of these 1,000 customers should be targeted to drive the most value for the organisation. In building this recommendation, start with a PowerPoint presentation that outlines the approach which we will be taking.</p><p id="547f">Prepare the detailed approach for completing the analysis:</p><ul><li>Understanding the data distributions</li><li>Feature engineering</li><li>Data transformations</li><li>Modelling</li><li>Results interpretation and reporting</li></ul><h2 id="98ab">Task 3: Data insights and presentation</h2><p id="d1b2">Visualisations such as interactive dashboards often help us highlight key findings and convey our ideas in a more succinct manner. A list of customers or algorithms won’t cut it with the client, so we need to support our results with the use of visualisations.</p><p id="6df7">Your task is to develop a dashboard that we can present to the client at our next meeting. Display your data summary and results of the analysis in a dashboard, using either Tableau or Power BI. Creativity in layout and presentation is welcome</p><p id="c56d">Keep in mind the business context when presenting your findings, specify who the marketing team should be targeting out of the new 1,000 customer list, as well as the broader market segment to reach out to.</p><p id="4b45">Ideally, your dashboard should answer the following questions:</p><ul><li>What are the trends in the underlying data?</li><li>Which customer segment has the highest customer value?</li><li>What do you propose should be Sprocket Central Pty Ltd’s marketing and growth strategy?</li><li>What additional external datasets may be useful to obtain greater insights into customer preferences and propensity to purchase the products?</li></ul><p id="2bf5">If you would like a crash course on how to use Power BI, I did a <a href="https://towardsdatascience.com/building-your-first-interactive-dashboard-from-scratch-using-power-bi-af7a3e0203d4">training workshop</a> a few months ago that walks through step-by-step how to build your first interactive dashboard.</p><h1 id="2ce7">ANZ Australia — Data@ANZ Program</h1><div id="4fe2" class="link-block"> <a href="https://www.theforage.com/virtual-internships/prototype/ZLJCsrpkHo9pZBJNY/Data%40ANZ%20Program?ref=DsEXFixxovqkRxR2u"> <div> <div> <h2>Forage</h2> <div><h3>Data@ANZ is about mining and linking datasets to develop stories that matter and challenge the status quo, to… </h3></div> <div><p>www.theforage.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/0*hbwhmxIlq_bxA1Hp)"></div> </div> </div> </a> </div><p id="a47b">For more information about this program, please check out my <a href="https://github.com/chongjason914/forage-anz">GitHub repository</a> and <a href="https://www.youtube.com/watch?v=fX237OP1UAs">YouTube video</a>.</p><h2 id="8dd7">Background</h2><p id="92ff">This task is based on a synthesised transaction dataset containing 3 months’ worth of transactions for 100 hypothetical customers. It contains purchases, recurring transactions, and salary transactions.</p><p id="702b">The dataset is designed to simulat

Options

e realistic transaction behaviours that are observed in ANZ’s real transaction data, so many of the insights you can gather from the tasks below will be genuine.</p><h2 id="28c5">Task 1: Exploratory data analysis</h2><p id="e118">For task 1, you are required to:</p><ul><li>Load the dataset into an analysis tool of your choice, for example, Excel, R, SAS, Tableau or similar</li><li>Start by doing basic checks — are there any data issues? Does the data need to be cleaned?</li><li>Gather some interesting overall insights about the data, for example, what is the average transaction amount? How many transactions do customers make each month, on average?</li><li>Segment dataset by transaction date and time, visualise transaction volume and spending over the course of an average day or week</li><li>Put together 2–3 slides summarising your most interesting findings to ANZ management</li></ul><h2 id="3b2c">Task 2: Predictive analytics</h2><p id="e5c8">For task 2, you will need to use statistical software such as R, SAS, or Python. Specifically, you are required to:</p><ul><li>Identify the annual salary for each customer</li><li>Explore correlations between annual salary and various customer attributes. These attributes could be those that are readily available in the data or those that you construct or derive yourself. Visualise any interesting correlations using a scatter plot</li><li>Build a simple regression model to predict the salary for each customer using the attributes you identified</li><li>How accurate is your model? Should ANZ use it to segment customers into income brackets for reporting purposes?</li><li>Build a tree-based model to predict. Does it perform better? How would you accurately test the performance of this model?</li></ul><h1 id="b1e8">BCG — Data Science & Analytics Virtual Experience Program</h1><div id="92a7" class="link-block"> <a href="https://www.theforage.com/virtual-internships/prototype/Tcz8gTtprzAS4xSoK/GAMMA-Virtual-Experience-Program?ref=DsEXFixxovqkRxR2u"> <div> <div> <h2>Forage</h2> <div><h3>Are you a curious problem solver with a passion for working with data? Do you have some experience with programming and…</h3></div> <div><p>www.theforage.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/0*wAeP0UPfPdVlBkcf)"></div> </div> </div> </a> </div><h2 id="b22e">Background</h2><p id="a7c1">Your client is PowerCo, a major gas and electricity utility company that supplies to corporate, SME (small and medium enterprise) and residential customers. The power liberalization of the energy market in Europe has led to significant customer churn, especially in the SME segment. They have partnered with BCG to help diagnose and drive the source of churning SME customers.</p><p id="0f3e">A fair hypothesis is that price changes affect customer churn. Therefore, it is helpful to know which customers are more likely to churn at their current price, for which a good predictive model could be useful.</p><p id="3db7">Moreover, for those customers that are at risk of churning, a discount might incentivise them to stay with our client. The head of the SME division is considering a 20% discount as large enough to dissuade almost everyone from churning, especially those for whom price is the primary concern.</p><h2 id="a882">Task 1: Business understanding and hypothesis testing</h2><p id="0ecd">Your first task is to understand what is going on with the client and think about how you would approach this problem and test the specific hypothesis.</p><p id="b13a">You must formulate the hypothesis as a data science problem and lay out the major steps needed to test this hypothesis, focusing on the data you would need from the client as well as the analytical models you would use to test the hypothesis.</p><p id="a039">If you are stuck:</p><ul><li>What are the key factors for a customer deciding to stay with or switch providers?</li><li>Data sources and fields that could be used to explore the contribution of various factors to a customer’s potential action</li><li>What would a data frame of your choice look like — what should each column and row represent?</li><li>What kind of exploratory analyses on the relevant fields can give more insights into churn behaviour?</li></ul><h2 id="0030">Task 2: Exploratory data analysis</h2><p id="6256">The BCG project team thinks that building a churn model to understand whether price sensitivity is the largest driver of churn has potential. The client has sent over some data which includes:</p><ul><li>Historical customer data: customer data such as usage, sign-up date, forecasted usage</li><li>Historical pricing data: fixed and variable pricing data</li><li>Churn indicator: whether or not each customer has churned</li></ul><p id="b936">For task 2, you need to:</p><ul><li>Perform some exploratory data analysis. Look into data types, data statistics, specific parameters, and variable distributions</li><li>Verify the hypothesis of price sensitivity being correlated with churn</li><li>Prepare a half-page summary of key findings and add some suggestions for data augmentation — which other data sources should the client provide you with and which open source datasets might be useful?</li></ul><h2 id="704d">Task 3: Feature engineering and modelling</h2><p id="589c">The team now has a good understanding of the data and feels confident to use the data to further understand the business problem. The team now needs to brainstorm and build out features to uncover signals in the data that could inform the churn model.</p><p id="5309">Feature engineering is one of the keys to unlocking predictive insight through mathematical modelling. Based on the data that is available and was cleaned, identify what you think could be drivers of churn for our client and build those features to later use in your model.</p><p id="cdff">Your colleague has done some work on engineering the features within the cleaned dataset and has calculated a feature that seems to have predictive power.</p><p id="319e">For task 3:</p><ul><li>Try to think of ways to improve the feature’s predictive power and elaborate on why you made those choices</li><li>Train a random forest classifier, evaluate the results, and document the advantages and disadvantages of using a random forest for this particular use case</li><li>Bonus: how much money could the client save with the use of the model?</li></ul><h2 id="f588">Task 4: Findings and recommendations</h2><p id="20c2">The client wants a quick update on the progress of the project.</p><p id="402f">For task 4, develop an abstract slide synthesising all the findings from the project so far.</p><p id="1c6c">A few things to think about for this abstract include:</p><ul><li>What is the most important number or metric to share with the client?</li><li>How much detail should you go into, especially with the technical details of your work?</li><li>What impact would the model have on the client’s bottom line? Always test what you write with the “so what” test</li></ul><p id="0218">I hope this blog post has provided some good starting points into the world of virtual internships, specifically those that you can undertake if you are aiming to expand your data science skills via practical projects and solving real-world problems.</p><p id="5995">If you found any value from this article and are not yet a Medium member, it would mean a lot to me as well as the other writers on this platform if you sign up for membership using the link below. It encourages us to continue putting out high-quality and informative content just like this one — thank you in advance!</p><div id="de55" class="link-block"> <a href="https://chongjason.medium.com/membership"> <div> <div> <h2>Join Medium with my referral link - Jason Chong</h2> <div><h3>Read every story from Jason Chong (and thousands of other writers on Medium). Your membership fee directly supports…</h3></div> <div><p>chongjason.medium.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/0*D-ln_bnjnyIWC9l2)"></div> </div> </div> </a> </div><p id="9687">Don’t know what to read next? Here are some suggestions.</p><div id="4fd1" class="link-block"> <a href="https://towardsdatascience.com/handling-criticism-at-work-as-a-data-scientist-a1ae74c4f07d"> <div> <div> <h2>Handling Criticism at Work as a Data Scientist</h2> <div><h3>Reflection and lessons from my performance review as a first-year data scientist</h3></div> <div><p>towardsdatascience.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/0*xOFpxdQtDeRfd4WX)"></div> </div> </div> </a> </div><div id="b06f" class="link-block"> <a href="https://readmedium.com/from-unemployed-to-landing-my-dream-job-heres-what-i-ve-learned-cccce1037a02"> <div> <div> <h2>From Unemployed to Landing My Dream Job — Here’s What I’ve Learned</h2> <div><h3>10 tips that will help maximize your chances of getting hired as a fresh graduate</h3></div> <div><p>medium.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/0*O7gwtEbuTA2VUTmN)"></div> </div> </div> </a> </div><div id="0894" class="link-block"> <a href="https://towardsdatascience.com/regular-expressions-clearly-explained-with-examples-822d76b037b4"> <div> <div> <h2>Regular Expressions Clearly Explained with Examples</h2> <div><h3>One of the most underrated skills any data analyst should have when working with strings</h3></div> <div><p>towardsdatascience.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/0*-RyEBsSAxlP1kZxH)"></div> </div> </div> </a> </div></article></body>

Struggling to Land a Data Science Job? Try These Virtual Internships For Free

Company-sponsored internships that will help develop your data skills and get your foot in the door

Photo by Kishor on Unsplash

“Start the job before you have it.”

I wrote a blog post a year ago sharing my own lessons and experiences from applying for my first data science job out of university. One of the key takeaways from that post was the idea of pretending as though you are already working in the job that you are applying to.

What does that mean exactly? Well, assuming you are reading this with the intention of one day becoming a data scientist, you need to start adopting the mindset and responsibilities of a data scientist today.

Rather than simply applying to jobs and waiting to hear response back, you can actually do a lot more in order to demonstrate your initiative and drive, which in turn will help you stand out from the rest of the crowd.

I have discussed in the past about Kaggle competitions, doing online courses on websites like Coursera, DataCamp and so on — all of which are invaluable resources to spend some time for upskilling.

In this blog post, however, we will discuss specifically virtual internships and a website called Forage which provides company-sponsored internships completely for free. I will also be sharing programs related to data science and analytics on the website that I highly recommend checking out if you aspire to break into the industry.

What is Forage?

Forage website homepage; Image by Author

Forage is an online platform that provides free access to virtual internship programs. These programs are co-designed and endorsed by reputable companies from around the world with the goal of giving participants a taste of what it is really like to work in the industry.

You will be given tasks that replicate real-life scenarios and day-to-day work that an analyst would undertake at one of these places.

The programs are also self-paced and will take on average 5–6 hours to complete. While the tasks will not be formally graded, you will be provided with model answers after you submit a genuine attempt for each task.

Upon completion of each program, you will earn a completion certificate which you can then put on your resume and LinkedIn profile. Furthermore, you can also opt to share your data with Forage which will enable the partner company to communicate with you regarding upcoming networking events or job opportunities.

To learn more about Forage, please check out the FAQ page for more information.

Without further ado, let’s take a look at all the data science-related programs that are currently offered on Forage.

Accenture North America — Data Analytics Virtual Experience

Background

Your client is Social Buzz, a social media company. To ensure that trending content is at the forefront of user feeds, Social Buzz emphasizes content by keeping all users anonymous and only tracking user reactions (there are over 100 ways users can react to content, spanning beyond the traditional reactions of likes, dislikes, and comments).

Social Buzz has scaled quicker than anticipated over the years and needs some advice on properly managing this growth going into its IPO next year. In addition to guidance on managing scale, they also need help analysing the vast amount of unstructured data that they create and collect every day, from text, images, videos and GIFs.

Task 1: Project Understanding

You are given a client brief and an internal stakeholder chart.

The client brief outlines the client’s background, plans for the future, and the scope of this project. The internal stakeholder chart, on the other hand, illustrates the organisation map in order to understand the Accenture team that is working on this project, as well as each individual’s role and responsibilities.

The goal of this task is to understand the business problem, what Accenture is expected to deliver, and what tasks are most relevant to you as a data analyst.

Task 2: Data Cleaning & Modelling

Once task 1 is completed, you will be given the following data in the form of Excel spreadsheets:

  • User
  • Profile
  • Location
  • Session
  • Content
  • Reaction
  • Reaction types

In addition to these spreadsheets, you will also have a data dictionary that helps explain the data fields in more detail.

Task 2 involves understanding the tables and how they are linked together and figuring out which data sets are useful for your analysis. Once the data sets have been determined, you will need to clean the data, retain only the relevant columns, and finally merge the tables together via unique keys.

If all of these concepts are new to you, don’t worry. You will find links to resources on the topic of merge types and how to clean and merge tables using Microsoft Excel.

Task 3: Data Visualization & Storytelling

This is followed by task 3, which aims to create insightful and creative visualisations that address the requirements of the project.

Your task is to create a PowerPoint presentation that takes the client on a journey through their data and business problems while communicating the results from your analysis.

You should try to structure the presentation in an engaging and persuasive manner.

Task 4: Present to the Client

Finally, now imagine the client is in the room and it is time to deliver the presentation that you have prepared. Your task is to make a video recording of your final presentation.

In order to deliver a compelling presentation, remember to use domain-specific and business-friendly language as well as present with confidence and conviction.

Quantium — Data Analytics Virtual Experience Program

For more information about this program, please check out my GitHub repository and YouTube video.

Background

Supermarkets regularly change their store layouts, product selections, prices and promotions. This is to satisfy their customer’s changing needs and preferences, keep up with the increasing market competition, or capitalise on new opportunities. The Quantium analytics team are engaged in these processes to evaluate and analyse the performance of the change and recommend whether it has been successful.

Julia, the category manager for chips, has asked Quantium for help to better understand the types of customers who purchase chips and their purchasing behaviour within the region.

You are an analyst within the Quantium analytics team and are responsible for delivering highly valued data analytics and insights to help the supermarket make strategic decisions.

Task 1: Data preparation and customer analytics

For task 1, you are given two data sets: transaction and customer data. Your task is to analyse these data sets and identify customer purchasing behaviours to generate insights and provide commercial recommendations to Julia.

In summary, you need to do the following:

  • Examine the data: look for inconsistencies, missing data, outliers etc
  • Derive extra features such as pack size, brand name and define metrics to draw insights on who spends on chips and what drives spending for each customer segment
  • Create charts and graphs and note any interesting trends
  • Generate insights and form a strategy that can have a commercial application, for example, which customer segment to target

Task 2: Experimentation and uplift testing

Following that, task 2 involves identifying benchmark stores that will allow you to test the impact of the trial store layouts on customer sales.

Specifically, you are required to do the following:

  • Select control stores based on total sales, total customers, the average number of transactions per customer
  • Assessment of trial: check each trial store individually in comparison with the control store to get a clear view of its overall performance
  • Collate findings and provided recommendations on the impact on sales during the trial period

Task 3: Analytics and commercial application

Finally, use the analytics insights from tasks 1 and 2 to construct a report for your client, the category manager. Your report should include data visualisations, key callouts, insights, recommendations, and next steps.

Task 3 is targeted specifically at building your ability to recognise commercial, actionable insights from your analysis and displaying it in a clear and concise way for your client with minimal jargon. Here, you are introduced to the Pyramid Principle, a top-down communication approach commonly used in consulting when presenting to a client.

General Electric — Digital Technology Data Analytics Program

Background

At GE Aviation, a data lake is a single database instance that contains data from all around aviation. Everything from financial, delivery of parts and engines, supplier, engine data, customer data, and so on.

The advantage of having a data lake is that it allows the developer to make a single connection string to the data lake, and as long as the developer has permission to see the data, they are able to immediately start creating data-driven insights in a centralised repository of data.

For this program, you are given 8 data sets:

  • Remaining useful life (RUL): the predictor variable to prevent unscheduled maintenance and identify manufacturing issues that cause RUL to decrease more rapidly than expected
  • Flight data from 4 airline operators: health of the mechanical engine of a plane
  • Airport location data: where the airline operators are flying when their data is collected
  • Manufacturing data: key characteristic measurements for parts at certain operations within their production
  • Manufacturing bill of material: which engines have been used

Task 1: Data engineering

You have been asked by GE Aviation leadership to create a single data set that combines all the data listed above into a single table.

You can complete this task using either Excel (beginner level) or Tableau (intermediate level).

Task 2: Data visualisation

At GE, we use many different styles of visualisation charts to make decisions based on real-time data. These visualisations ensure quality control in our manufacturing process and determine if the parts we manufacture are made accurately.

A run chart is a line graph that is plotted over time, in other words, a time series chart that is used to plot data points over a given time frame. This helps us to track how our manufacturing process is performing over time as our machinery ages.

KPI, on the other hand, are values calculated such that use a single number to give insight to the performance of a process. This helps us to quickly and efficiently make decisions around what improvements we can make in our process.

When parts are manufactured, individual design attributes are machined into each part as it goes down the assembly line. This results in a finished part. This is called an operation.

After each operation, there is an expected nominal measurement of the design attribute that we record in the manufacturing records. This is to ensure each part is made the same and will fit its required purpose. There is also an acceptable tolerance of how far off the measurement is allowed from that nominal value.

Your task is to create a run chart using Tableau that will visualise the measurement of a given feature of each operation for a given part number. We need to identify whether the measurement is in or out of a given specification of that given feature to showcase to your supply chain management team how the manufacturing process is performing.

KPMG Australia — Data Analytics Virtual Experience Program

Background

Sprocket Central Pty Ltd, a medium size bikes and cycling accessories organisation needs help with its customer and transaction data. The company has a large dataset relating to its customers, but its team is unsure how to effectively analyse it to help optimise its marketing strategy.

The client has provided KPMG with 3 data sets:

  • Customer demographic
  • Customer addresses
  • Transaction data in the past 3 months

Task 1: Data quality assessment

Your task is to draft an email to the client identifying the data quality issues and strategies to mitigate them.

You can check for these data issues against the data quality framework table:

  • Accuracy
  • Completeness
  • Consistency
  • Currency
  • Relevancy
  • Validity
  • Uniqueness

Task 2: Data insights

You are now provided with a list of 1,000 potential customers with their demographics and attributes, however, these customers do not have prior transaction history with the organisation.

The marketing team is sure that, if correctly analysed, the data would reveal useful customer insights which could help optimise resource allocation for targeted marketing, specifically, marketing that focuses on high-value customers.

Using the existing 3 datasets, your task is to recommend which of these 1,000 customers should be targeted to drive the most value for the organisation. In building this recommendation, start with a PowerPoint presentation that outlines the approach which we will be taking.

Prepare the detailed approach for completing the analysis:

  • Understanding the data distributions
  • Feature engineering
  • Data transformations
  • Modelling
  • Results interpretation and reporting

Task 3: Data insights and presentation

Visualisations such as interactive dashboards often help us highlight key findings and convey our ideas in a more succinct manner. A list of customers or algorithms won’t cut it with the client, so we need to support our results with the use of visualisations.

Your task is to develop a dashboard that we can present to the client at our next meeting. Display your data summary and results of the analysis in a dashboard, using either Tableau or Power BI. Creativity in layout and presentation is welcome

Keep in mind the business context when presenting your findings, specify who the marketing team should be targeting out of the new 1,000 customer list, as well as the broader market segment to reach out to.

Ideally, your dashboard should answer the following questions:

  • What are the trends in the underlying data?
  • Which customer segment has the highest customer value?
  • What do you propose should be Sprocket Central Pty Ltd’s marketing and growth strategy?
  • What additional external datasets may be useful to obtain greater insights into customer preferences and propensity to purchase the products?

If you would like a crash course on how to use Power BI, I did a training workshop a few months ago that walks through step-by-step how to build your first interactive dashboard.

ANZ Australia — Data@ANZ Program

For more information about this program, please check out my GitHub repository and YouTube video.

Background

This task is based on a synthesised transaction dataset containing 3 months’ worth of transactions for 100 hypothetical customers. It contains purchases, recurring transactions, and salary transactions.

The dataset is designed to simulate realistic transaction behaviours that are observed in ANZ’s real transaction data, so many of the insights you can gather from the tasks below will be genuine.

Task 1: Exploratory data analysis

For task 1, you are required to:

  • Load the dataset into an analysis tool of your choice, for example, Excel, R, SAS, Tableau or similar
  • Start by doing basic checks — are there any data issues? Does the data need to be cleaned?
  • Gather some interesting overall insights about the data, for example, what is the average transaction amount? How many transactions do customers make each month, on average?
  • Segment dataset by transaction date and time, visualise transaction volume and spending over the course of an average day or week
  • Put together 2–3 slides summarising your most interesting findings to ANZ management

Task 2: Predictive analytics

For task 2, you will need to use statistical software such as R, SAS, or Python. Specifically, you are required to:

  • Identify the annual salary for each customer
  • Explore correlations between annual salary and various customer attributes. These attributes could be those that are readily available in the data or those that you construct or derive yourself. Visualise any interesting correlations using a scatter plot
  • Build a simple regression model to predict the salary for each customer using the attributes you identified
  • How accurate is your model? Should ANZ use it to segment customers into income brackets for reporting purposes?
  • Build a tree-based model to predict. Does it perform better? How would you accurately test the performance of this model?

BCG — Data Science & Analytics Virtual Experience Program

Background

Your client is PowerCo, a major gas and electricity utility company that supplies to corporate, SME (small and medium enterprise) and residential customers. The power liberalization of the energy market in Europe has led to significant customer churn, especially in the SME segment. They have partnered with BCG to help diagnose and drive the source of churning SME customers.

A fair hypothesis is that price changes affect customer churn. Therefore, it is helpful to know which customers are more likely to churn at their current price, for which a good predictive model could be useful.

Moreover, for those customers that are at risk of churning, a discount might incentivise them to stay with our client. The head of the SME division is considering a 20% discount as large enough to dissuade almost everyone from churning, especially those for whom price is the primary concern.

Task 1: Business understanding and hypothesis testing

Your first task is to understand what is going on with the client and think about how you would approach this problem and test the specific hypothesis.

You must formulate the hypothesis as a data science problem and lay out the major steps needed to test this hypothesis, focusing on the data you would need from the client as well as the analytical models you would use to test the hypothesis.

If you are stuck:

  • What are the key factors for a customer deciding to stay with or switch providers?
  • Data sources and fields that could be used to explore the contribution of various factors to a customer’s potential action
  • What would a data frame of your choice look like — what should each column and row represent?
  • What kind of exploratory analyses on the relevant fields can give more insights into churn behaviour?

Task 2: Exploratory data analysis

The BCG project team thinks that building a churn model to understand whether price sensitivity is the largest driver of churn has potential. The client has sent over some data which includes:

  • Historical customer data: customer data such as usage, sign-up date, forecasted usage
  • Historical pricing data: fixed and variable pricing data
  • Churn indicator: whether or not each customer has churned

For task 2, you need to:

  • Perform some exploratory data analysis. Look into data types, data statistics, specific parameters, and variable distributions
  • Verify the hypothesis of price sensitivity being correlated with churn
  • Prepare a half-page summary of key findings and add some suggestions for data augmentation — which other data sources should the client provide you with and which open source datasets might be useful?

Task 3: Feature engineering and modelling

The team now has a good understanding of the data and feels confident to use the data to further understand the business problem. The team now needs to brainstorm and build out features to uncover signals in the data that could inform the churn model.

Feature engineering is one of the keys to unlocking predictive insight through mathematical modelling. Based on the data that is available and was cleaned, identify what you think could be drivers of churn for our client and build those features to later use in your model.

Your colleague has done some work on engineering the features within the cleaned dataset and has calculated a feature that seems to have predictive power.

For task 3:

  • Try to think of ways to improve the feature’s predictive power and elaborate on why you made those choices
  • Train a random forest classifier, evaluate the results, and document the advantages and disadvantages of using a random forest for this particular use case
  • Bonus: how much money could the client save with the use of the model?

Task 4: Findings and recommendations

The client wants a quick update on the progress of the project.

For task 4, develop an abstract slide synthesising all the findings from the project so far.

A few things to think about for this abstract include:

  • What is the most important number or metric to share with the client?
  • How much detail should you go into, especially with the technical details of your work?
  • What impact would the model have on the client’s bottom line? Always test what you write with the “so what” test

I hope this blog post has provided some good starting points into the world of virtual internships, specifically those that you can undertake if you are aiming to expand your data science skills via practical projects and solving real-world problems.

If you found any value from this article and are not yet a Medium member, it would mean a lot to me as well as the other writers on this platform if you sign up for membership using the link below. It encourages us to continue putting out high-quality and informative content just like this one — thank you in advance!

Don’t know what to read next? Here are some suggestions.

Data Science
Machine Learning
Technology
Virtual Internship
Career Development
Recommended from ReadMedium