Data Engineering 101: Introduction to Data Engineering
Participating in the #100daysofcode
On 1st July, 2022, I joined the #100daysofcode.
I didn’t have a plan in mind, I decided to play around with programming languages and figure out which one would suit me. I settled on SQL and Python for Data Science. I’ve tried to learn code in the past but I always hit a roadblock:
- Procrastinating.
- Not understanding code editors.
- No structured courses.
- Pressuring myself to learn quickly as opposed to learning efficiently.
- 4GB RAM laptop. This is not a bottleneck if you’re learning online, but it becomes one when you need to install code editors etal on your laptop.
I started on SQL first and chose DataCamp as my platform of choice. I love Data Camp because the platform is intuitive and their courses are structured. On DataCamp, you earn credits (XP) and you can kind of see how you compete with other learners. I’m highly competitive and seeing the XP increase motivated me to learn SQL in the first 3 weeks of July.

This also gave me confidence to apply and get into a 12 weeks Data Engineering mentor-ship program by Data Science East Africa (DSEA) and Lux Tech Academy.

While I’m accurately versed in PostgreSQL, I’m not as familiar with python, yet. In this article, I will expound some knowledge on PostgreSQL.
SQL
SQL stands for Structured Query Language. In SQL, data is arranged in tables where each column is a field and each row is a record. It is used to query relational databases. Relational database contains a collection of tables and the data stored relates to other pieces of data.
A query is a request for data from a table or a combination of tables in a database.
It uses Keywords such as SELECT and FROM. SQL is not case sensitive, and thus doesn’t differentiate between FROM and from, or SELECT from select. However, it’s good practice to write your keywords in upper case, to differentiate them from other parts of your query like column names or rows.
Each query ends with a semi colon (;) which tells SQL to end/terminate the query.
In the 52 days since I begun, I’ve learnt how to:
- Select columns with the keywords SELECT, SELECT DISTINCT and COUNT
- Filter rows with WHERE, AND, OR, BETWEEN, IS NULL, IS NOT NULL, LIKE, NOT LIKE.
- Aggregate functions with Aliasing.
- Sorting and grouping data with GROUP BY and Having.
- Joining data — Inner joins, self joins, case when and then.
- Nested queries/sub-queries.
The most challenging part has been learning joins and nested queries. Setting up the environment hasn’t been easy. I manged to do it after procrastinating for about a week. I set up PGAdmin, and I’m using the Windows command line interface.
Next Steps
- Continue learning SQL.
- Continue learning Python for Data Engineering.
- Create a great work space and environment.
- Document my journey on Medium.
Follow me on Medium and Twitter to keep up with me and my tech journey.





