7 Lessons from an ML Internship at Intel

Automation, machine learning and LLMs in the chip industry

I felt like one of those guys from Monsters Inc. You know, the ones in the big yellow hazmat suits. A necessary precaution! I was entering the most complex manufacturing environment in the world. One that requires so much precision that even microscopic particulates from your breath can disrupt it.

During a 6 month internship at Intel, I learn a lot about the semiconductor industry. How it is going through a time of turmoil, how Intel is reacting and why this means machine learning will be more important. I saw how everything from old-school CV algorithms to LLMs help to produce the world's most valuable commodity — chips.

I want to share this experience with you.

1) “Intel is like a bakery”

is how my manager tried to summarise a process that I could never fully understand. They:

Create recipes → design chips
Bake bread → manufacture chips

But they do not build the machines to do this.

I saw some of these up close. The most impressive are the photolithography machines built by ASML. They use light waves to etch patterns a few atoms wide into silicon wafers. The shorter the wavelength the smaller you can make the transistors in a chip. This is the key driver of Moore’s law.

… to make EUV lithography possible, we needed to engineer a way to create such light within a system. So, we developed a radically new approach to generating light for lithography. — AMSL

The most advanced machine produces extreme ultraviolet (EUV) light. This is done by shooting laser pulses at droplets of tin to produce plasma. To produce enough, 50000 droplets are hit per second. The machine I was looking at took over 17 years and $6 billion to create.

So yeah, like a bakery … with some very expensive ovens!

2) Everything is automated

When I say I saw the machines, I mean the big white boxes they’re kept in. Walking around Intel’s fab felt like I was in 2001: A Space Odyssey. The tools are kept in a clean room also known as a fab. These are big spaceship-like rooms lit up with orange lights. Robots were flying around everywhere!

These robots carry the wafers between the different stages of the chip manufacturing process. A process which has over 1500 steps. In the most advanced fabs, humans do not touch the wafers at any point. They are only there to monitor and maintain the tools.

This is where machine learning comes in. It does not play a large role in the actual manufacturing process. It is used more to help automate the monitoring of the process. That is to help flag defective chips and identify misaligned machines. Automation on automation! We will come back to this point in a later lesson.

3) The chip industry is changing

I read Chip War during my internship. It is an interesting read about the history and current state of the semiconductor industry. Governments have realised its geopolitical importance. The US is pumping tons of money into it to bring production on shore. In short, the industry is going through massive changes. At the same time, Intel is going through their own internal changes.

Intel used to be the top baker. They had the best recipes and baked the tastiest bread. Now, they do either. NVIDIA is the most valuable chip-designing company. Their GPUs power AI. More practically, transistor size is what determines the power of a chip. TSMC can produce transistors 3nm wide. Intel can manage 7nm.

The separation of design and fabrication became known as the foundry model, with fabless manufacturing outsourcing to semiconductor foundries — Wikipedia

To stay competitive Intel shifting towards a “foundry” business model. This means they will start to produce chips for other companies. This is what TSMC does. They do not design chips. They only manufacture chips designed by other companies. Historically, Intel has only manufactured the chips that it has designed. Shifting away from this model is why…

4) Machine learning is becoming more important

Intel used to get away with massive profit margins. They could design and manufacture the most powerful chips in the world. And, they were the only ones who could do it. This dominant position has led to inefficiencies.

The chip manufacturing process is incredibly complex and many things can go wrong along the way. Still, Intel could get away with minimal quality checks. No customers to please and large profit margins meant they could just test chips at the end. Then, if something had gone wrong, they could simply make more.

They also solved many problems the easy (and expensive) way. That is by making highly skilled workers do repetitive monitoring work. Engineers and technicians have to manually look through images of chips to identify defects and misaligned tools.

Now, they want to automate as much of this work as possible. This will drive down costs by reducing repetitive manual work. They also want to catch faults and identify misaligned tools as soon as possible. This will reduce wasted time spent on these chips further down the line. Machine learning will be used to do much of the automation but keep in mind…

5) Machine learning is a last resort

This is the most important lesson from my machine learning internship. Ironic, I know. I worked directly on computer vision problems that aimed to automate fault detection. Doing this in an industry setting meant I needed to consider both the accuracy of the solution and the cost in terms of time spent developing it. This helped me understand when and when not to use machine learning.

Intel’s fabs are highly controlled environments. The image data that comes out of them is consistent. This means that the only differences between defective and non-defective images are the defects themselves. You can often use old-school CV algorithms to identify these. These are methods like thresholding, the Hough transform and the FAST algorithm. In fact, these approaches are often preferred.

You don’t need ML if you can determine a target value by using simple rules, computations, or predetermined steps that can be programmed without needing any data-driven learning — AWS

Building a machine learning model is hard work and adds unnecessary complexity. You need to collect a representative dataset, label it, train and evaluate the model. Even then you can not be sure it won’t act unexpectedly on future data. In comparison, CV algorithms are deterministic and easy to develop. Even if ML could improve performance, this may be outweighed by the time required to develop a model.

Still, there will be many problems that do require machine learning. These are those that have a lot of variation in the training data. Variation introduced by changes in light conditions, different chip types or by the defects themselves. Moving past predictive algorithms also requires more advanced machine-learning methods.

6) Generative AI is incredibly useful

The team I worked with was developing an internal chatbot using GPT4. This showed me that LLMs are incredibly useful. It also showed me the biggest challenge with implementing these models.

Hint: it has nothing to do with code or other technical aspects.

LLMs really have changed the game. What would have taken a team of NLP experts months to achieve, can now be done with a few developers. Many companies have the internal resources to build complex AI systems. These all work in a similar way:

Create a dataset of your company’s documents
Chunk and embed the text
For a given question, find the most relevant text using cosine similarity
Send the question and text to an LLM

This is known as a RAG system. They allow companies to turn their internal information into a smart, searchable dataset.

Actually building a RAG system is easy. Don’t get me wrong there are still technical challenges — security, data ingestion, cleaning & embedding search. But these have all been solved before. So the question is, why aren’t all companies building them? They are but it will take them time due to the biggest challenge facing these systems. It has to do we the first step — creating a dataset.

The most useful documents are often highly confidential. Intel has loads on how they manufacture chips. It would be disastrous if these fell into the wrong hands. The result is strict rules around who can see what. Access to data sources are all controlled by different people. To use just one, you have to go through multiple layers of bureaucracy. So, the biggest challenge? Overcoming this bureaucracy.

This leads me to the last lesson…

7) All organizations are chaotic

I learned this during my first job as a data scientist in the banking industry. But, this internship was on another level. Intel is a massive company. They have organisations within organisations. The team I worked with was like an internal startup. They approached engineers directly to solve their problems. This all makes it even harder to implement technical solutions like a RAG system and experiencing it confirmed what I already knew:

Communication is your most important skill as a data scientist

In any organisation, you cannot take for granted that an analytical solution will be readily accepted. You need to explain how it works in simple terms to the different stakeholders. At Intel, they took this further. They had to really sell their ideas and even compete with other teams to implement them. This is a completely different approach from any of my previous roles and it changed my understanding of how a team or data scientist can operate. This alone made the internship worth it.

Overall, it was a great experience. I got to be a part of one of the most important companies in the world during an interesting time in its history. Seeing the complexity of a chip fab opened my eyes to what humans have created and what we are capable of. It also reignited my interest in machine learning or, rather, automation.

If you want some contrast to these lessons, you may enjoy this article. I discuss what I learned in an industry very different to semiconductors.

6 Lessons from a Data Scientist in the Banking Industry

Why my first job in data science was not what I expected

towardsdatascience.com

I hope you enjoyed this article! You can find me on Threads | YouTube | Newsletter — sign up for FREE access to a Python XAI course