Day 3 of 30 days of Data Engineering Series
With examples and projects …

Welcome back peeps to Day 3 of Data Engineering!
What’s covered in 30 days of Data Engineering with projects Series till now —
Day 3 : Complete Advanced Python for Data Engineering — Part 2
Day 18 : Data Visualization basics, Data Visualization Projects, Data Visualization using Plotly and Bokeh, Data Profiling, Summary Functions, Indexing, Grouping, Linear Regression, Multi Linear Regression, Polynomial Regression, Regression, Support Vector Regression, Decision Tree Regression, Random Forest Regression, Feature Engineering, GroupBy Features, Categorical and Numerical Features, Missing Value Analysis, Fill the missing Values, Unique Value Analysis, Univariate Analysis, Bivariate Analysis, Multivariate Analysis, Correlation Analysis, Spearman’s ρ, Pearson’s r, Kendall’s τ, Cramér’s V (φc), Phik (φk)
This is Day 3 of 30 days of Data Engineering Series where we will be covering —
Complete Advanced Python for Data Engineering — Part 2
Our whole syllabi for 30 days of Data Engineering —
I’l be covering only the most important topics in Data Engineering with projects ( written below) —
1. Data Engineering
2. Python for Data Engineering
3. Scripting and Automation
Shell Scripting
CRON
ETL
4. Relational Databases and SQL
RDBMS
Data Modeling
Basic SQL
Advanced SQL
5. NoSQL Data bases and Map Reduce
Unstructured Data
Advanced ETL
Map-Reduce
Data Warehouses
Data API
6.Data Analysis
Pandas
Numpy
Web Scraping
Data Visualization
7. Data Processing Techniques
Batch Processing : Apache Spark
Stream Processing — Spart Streaming
Build Data Pipelines
Target Databases
Machine learning Algorithms
8. Big Data
Big data basics
HDFS in detail
Hadoop Yarn
Sqoop Hadoop
Hadoop Yarn
Hive
Pig
Hbase
9. WorkFlows
Introduction to Airflow
Airflow hands on project
10. Infrastructure
Docker
Kubernetes
Business Intelligence
11. Cloud Computing
AWS
Google Cloud Platform
12. Research Papers — Data Engineering
Some amazing research papers- data engineering that I have read over the years to help you boot up to the industry standards and what’s next in this field.
Let’s get started with Day 3 —
We will be covering below Python topics in detail with hands on coding exercise —
1. Data types, strings, operators, and Chaining Comparison Operators with Logical Operators
8. First Class functions, Private Variables, Global and Non Local Variables, __import__ function
9. Magic Functions, Tuple Unpacking
10. Static Variables and Methods in Python
12. Inheritance and Polymorphism, Errors and Exception Handling
13. User-defined functions, Python garbage collection, debugger in Python
14. Iterators, Generators, and Decorators, Memoization using Decorators
15. Ordered and Defaultdict, Coroutine
16. Regular expression, Magic methods, Closures
17. ChainMap
18. Python Itertools
19. Advanced python constructs
20. Comprehensions, Named Tuple, Type hinting in Python
21. How to write efficient Code in Python
22. Efficient Code and Optimization techniques for Python
Open up colab/jupyter notebook and start coding.
Let’s dive in!
Magic Methods in Python
In Python, Magic methods in Python are the special methods that start and end with the double underscores
- Magic methods are not meant to be invoked directly by you, but the invocation happens internally from the class once certain action is performed
- Examples for magic methods are: __new__, __repr__, __init__, __add__, __len__, __del__ etc. The __init__ method used for initialization is invoked without any call
- Use the dir() function to see the number of magic methods inherited by a class
- The advantage of using Python’s magic methods is that they provide a simple way to make objects behave like built-in types
- Magic methods can be used to emulate the behavior of built-in types of user-defined objects. Therefore, whenever you find yourself trying to manipulate a user-defined object’s output in a Python class, then use magic methods.
Example :
v = 4
v.__add__(2)
Implementation —
# __Del__ methodfrom os.path import joinclass FileObject:def __init__(self, file_path='~', file_name='test.txt'):
self.file = open(join(file_path, file_name), 'rt')def __del__(self):
self.file.close()
del self.fileImplementation —
# __repr__ methodclass String:
def __init__(self, string):
self.string = stringdef __repr__(self):
return 'Object: {}'.format(self.string)Some of the other best Series —
30 days of Data Structures and Algorithms and System Design Simplified
Data Science and Machine Learning Research ( papers) Simplified **
100 days : Your Data Science and Machine Learning Degree Series with projects
Complete Data Visualization and Pre-processing Series with projects
Tech Newsletter —
If you are interested, you can join my newsletter through which I send tech interview tips, techniques, patterns, hacks — Software Development, ML, Data Science, Startups and Technology projects to more than 30K readers. You can subscribe to Tech Brew :
Inheritance and Polymorphism in Python
- In Python, Inheritance and Polymorphism are very powerful and important concept
- Using inheritance you can use or inherit all the data fields and methods available in the parent class
- On top of it, you can add you own methods and data fields
- Python allows multiple inheritance i.e you can inherit from multiple classes
- Inheritance provides a way to write better organized code and re-use the code
One of the best article I read on class inheritance by Erdem Isbilen
Syntax —
class ParentClass:
Body of parent class
class DerivedClass(ParentClass):
Body of derived class
- In Python, Polymorphism allows us to define methods in the child class with the same name as defined in their parent class
Example —
class X:
def sample(self):
print(“sample() method from class X”)
class Y(X):
def sample(self):
print(“sample() method from class Y”)
Implementation —
# Inheritanceclass Vehicle:def __init__(self, name, color):
self.__name = name
self.__color = colordef getColor(self):
return self.__colordef setColor(self, color):
self.__color = colordef get_Name(self):
return self.__nameclass Bike(Vehicle):def __init__(self, name, color, model):
super().__init__(name, color) # call parent class
self.__model = modeldef get_details(self):
return self.get_Name() + self.__model + " in " +
self.getColor() + " color"b_obj = Bike("Cziar", "red", "TK720")
print(b_obj.get_details())
print(b_obj.get_Name())Output —
Cziar TK720 in red color
CziarImplementation —
# Polymorphismfrom math import piclass Shape:
def __init__(self, name):
self.name = namedef area(self):
passclass Sqr(Shape):
def __init__(self, length):
super().__init__("Square")
self.length = lengthdef area(self):
return self.length**2class Circle(Shape):
def __init__(self, radius):
super().__init__("Circle")
self.radius = radiusdef area(self):
return pi*self.radius**2a = Square(6)
b = Circle(10)
print(a.area())
print(b.area())Output —
36
314.1592653589793Errors and Exception Handling in Python
In Python, an error can be a syntax error or an exception.
When the parser detects an incorrect statement, Syntax errors occur.
- Exceptions errors are raised when an external event occurs which in some way changes the normal flow of the program
- Exception error occurs whenever syntactically correct python code results in an error
- Python comes with various built-in exceptions as well as the user can create user-defined exceptions
- Garbage collection is the memory management feature i.e a process of cleaning shared computer memory
Some of python’s built in exceptions —
IndexError : When the wrong index of a list is retrieved
ImportError : When an imported module is not found
KeyError : When the key of the dictionary is not found
NameError: When the variable is not defined
MemoryError : When a program run out of memory
TypeError : When a function and operation is applied in an incorrect type
AssertionError : When assert statement fails
AttributeError : When an attribute assignment is failed
Try and Except in Python
In Python, exceptions can be handled using a try statement
- The block of code which can raise an exception is placed inside the try clause. The code that handles the exceptions is written in the except clause
- In case no exception has occurred, the except block is skipped and program normal flow continues
- A try clause can have any number of except clauses to handle different exceptions but only one will be executed in case the exception occurs
- We can also raise exceptions using the raise keyword
- The try statement in Python can have an optional finally clause which executes regardless of the result of the try- and except blocks
Example :
try:
print(a)
except:
print(“Something went wrong”)
finally:
print(“Exit”)
Implementation —
# try, except, finallytry:
print(1 / 0)
except:
print("Error occurred")finally:
print("Exit")Output —
Error occurred
ExitUser-defined Exceptions
In Python, user can create his own error by creating a new exception class
- Exceptions need to be derived from the Exception class, either directly or indirectly
- Exceptions errors are raised when an external event occurs which in some way changes the normal flow of the program
- User defined exceptions can be implemented by raising an exception explicitly, by using assert statement or by defining custom classes for user defined exceptions
- Use assert statement to implement constraints on the program. When, the condition given in assert statement is not met, the program gives AssertionError in output
- You can raise an existing exception by using the raise keyword and the name of the exception
- To create a custom exception class and define an error message, you need to derive the errors from the Exception class directly
- When creating a module that can raise several distinct errors, a common practice is to create a base class for exceptions defined by that module, and subclass that to create specific exception classes for different error conditions, this is called Hierarchical custom exceptions
Example —
class class_name(Exception)
Implementation —
class Error(Exception):
passclass TooSmallValueError(Error):
passnumber = 100while True:
try:
num = int(input("Enter a number: "))
if num < number:
raise TooSmallValueError
break
except TooSmallValueError:
print("Value too small")Output —
Enter a number: 40
Value too smallGarbage Collection in Python
In Python, Garbage collection is the memory management feature i.e a process of cleaning shared computer memory which is currently being put to use by a running program when that program no longer needs that memory and can be used other programs
- In python, Garbage collection works automatically. Hence, python provides with good memory management and prevents the wastage of memory
- In python, forcible garbage collection can be done by calling collect() function of the gc module
- In python, when there is no reference left to the object in that case it is automatically destroyed by the Garbage collector of python and __del__() method is executed
Example :
import gc
gc.collect()
Implementation —
#manual garbage collectionimport sys, gcdef test():
list = [18, 19, 20,34,78]
list.append(list)def main():
print("Garbage Creation")
for i in range(5):
test()print("Collecting..")
n = gc.collect()
print("Unreachable objects collected by GC:", n)
print("Uncollectable garbage list:", gc.garbage)if __name__ == "__main__":
main()
sys.exit()Output —
Garbage Creation
Collecting..
Unreachable objects collected by GC: 33Python Debugger
Debugging is the process of locating and solving the errors in the program. In python, pdb which is a part of Python’s standard library is used to debug the code
- pdb module internally makes used of bdb and cmd modules
- It supports setting breakpoints and single stepping at the source line level, inspection of stack frames, source code listing etc
Syntax —
import pdb
pdb.set_trace()
- To set the breakpoints, there is a built-in function called breakpoint()
Implementation —
import pdb
def multiply(a, b):
answer = a * b
return answer
pdb.set_trace()
a = int(input("Enter first number : "))
b = int(input("Enter second number : "))
sum = multiply(a, b)Decorators in Python
In Python, a decorator is any callable Python object that is used to modify a function or a class. It takes a function, adds some functionality, and returns it.
- Decorators are a very powerful and useful tool in Python since it allows programmers to modify/control the behavior of function or class.
- In Decorators, functions are passed as an argument into another function and then called inside the wrapper function.
- Decorators are usually called before the definition of a function you want to decorate.
There are two different kinds of decorators in Python:
Function decorators
Class decorators
- When using Multiple Decorators to a single function, the decorators will be applied in the order they’ve been called
- By recalling that decorator function, we can re-use the decorator
Implementation —
#Decoratorsdef test_decorator(func):
def function_wrapper(x):
print("Before calling" + func.__name__)
res = func(x)
print(res)
print("After calling" + func.__name__)
return function_wrapper@test_decorator
def sqr(n):
return n**2
sqr(20)Output —
Before callingsqr
400
After callingsqrImplementation —
# Multiple Decoratorsdef lowercase_decorator(function):
def wrapper():
func= function()
make_lowercase = func.lower()
return make_lowercase
return wrapperdef split_string(function):
def wrapper():
func= function()
split_string =func.split()
return split_string
return wrapper@split_string
@lowercase_decoratordef test_func():
return 'MOTHER OF DRAGONS'
test_func()Output —
['mother', 'of', 'dragons']Memoization using Decorators
In Python, memoization is a technique which allows you to optimize a Python function by caching its output based on the parameters you supply to it.
- Once you memoize a function, it will only compute its output once for each set of parameters you call it with. Every call after the first will be quickly retrieved from a cache.
- If you want to speed up the parts in your program that are expensive, memoization can be a great technique to use.
One of the best article I read about Decorators by Hensle Joseph
There are three approaches to Memoization —
Using global
Using objects
Using default parameter
Using a Callable Class
Implementation —
#fibonacci series using Memoization using decoratorsdef memoization_func(t):
dict_one = {}
def h(z):
if z not in dict_one:
dict_one[z] = t(z)
return dict_one[z]
return h
@memoization_func
def fib(n):
if n == 0:
return 0
elif n == 1:
return 1
else:
return fib(n-1) + fib(n-2)print(fib(20))Output —
6765Defaultdict
In python, a dictionary is a container that holds key-value pairs. Keys must be unique, immutable objects
- If you try to access or modify keys that don’t exist in the dictionary, it raise a KeyError and break up your code execution. To tackle this issue, Python defaultdict type, a dictionary-like class is used
- If you try to access or modify a missing key, then defaultdict will automatically create the key and generate a default value for it
- A defaultdict will never raise a KeyError
- Any key that does not exist gets the value returned by the default factory
- Hence, whenever you need a dictionary, and each element’s value should start with a default value, use a defaultdict
Syntax —
from collections import defaultdict
demo = defaultdict(int)
Implementation —
from collections import defaultdict
default_dict_var = defaultdict(list)
for i in range(10):
default_dict_var[i].append(i)
print(default_dict_var)Output —
defaultdict(<class 'list'>, {0: [0], 1: [1], 2: [2], 3: [3], 4: [4], 5: [5], 6: [6], 7: [7], 8: [8], 9: [9]})OrderedDict
In python, OrderedDict is one of the high performance container datatypes and a subclass of dict object. It maintains the order in which the keys are inserted. In case of deletion or re-insertion of the key, the order is maintained and used when creating an iterator
- It’s a dictionary subclass that remembers the order in which its contents are added
- When the value of a specified key is changed, the ordering of keys will not change for the OrderedDict
- If an item is overwritten in the OrderedDict, it’s position is maintained
- OrderedDict popitem removes the items in FIFO order
- The reversed() function can be used with OrderedDict to iterate elements in the reverse order
- OrderedDict has a move_to_end() method to efficiently reposition an element to an endpoint
Example —
from collections import OrderedDict
my_dict = {‘Sunday’: 0, ‘Monday’: 1, ‘tuesday’: 2}
# creating ordered dict
ordered_dict = OrderedDict(my_dict)
Generators in Python
In Python, Generator functions act just like regular functions with just one difference that they use the Python yield keyword instead of return . A generator function is a function that returns an iterator A generator expression is an expression that also returns an iterator
- Generator objects are used either by calling the next method on the generator object or using the generator object in a “for in” loop.
- A return statement terminates a function entirely but a yield statement pauses the function saving all its states and later continues from there on successive calls.
- Generator expressions can be used as the function arguments. Just like list comprehensions, generator expressions allow you to quickly create a generator object within minutes with just a few lines of code.
- The major difference between a list comprehension and a generator expression is that a list comprehension produces the entire list while the generator expression produces one item at a time as lazy evaluation. For this reason, compared to a list comprehension, a generator expression is much more memory efficient
Example —
def generator():
yield “x”
yield “y”
for i in generator():
print(i)
Implementation —
def test_sequence():
num = 0
while num<10:
yield num
num += 1
for i in test_sequence():
print(i, end=",")Output —
0,1,2,3,4,5,6,7,8,9,Implementation —
# Python generator with Loop#Reverse a string
def reverse_str(test_str):
length = len(test_str)
for i in range(length - 1, -1, -1):
yield test_str[i]
for char in reverse_str("Trojan"):
print(char,end =" ")Output —
n a j o r TImplementation —
# Generator Expression
# Initialize the list
test_list = [1, 3, 6, 10]
# list comprehension
list_comprehension = [x**3 for x in test_list]
# generator expression
test_generator = (x**3 for x in test_list)
print(list_comprehension)
print(type(test_generator))
print(tuple(test_generator))Output —
[1, 27, 216, 1000]
<class 'generator'>
(1, 27, 216, 1000)Coroutine in Python
- Coroutines are computer program components that generalize subroutines for non-preemptive multitasking, by allowing execution to be suspended and resumed
- Because coroutines can pause and resume execution context, they’re well suited to concurrent processing
- Coroutines are a special type of function that yield control over to the caller, but does not end its context in the process, instead maintaining it in an idle state
- Using coroutines the yield directive can also be used on the right-hand side of an = operator to signify it will accept a value at that point in time.
Example —
def func():
print(“My first Coroutine”)
while True:
var = (yield)
print(var)
coroutine = func()
next(coroutine)
Implementation —
def func():
print("My first Coroutine")
while True:
var = (yield)
print(var)
coroutine = func()
next(coroutine)Output —
My first CoroutineThat’s it for now!
Day 4 : Coming Soon :)
Projects Videos —
All the projects, data structures, SQL, algorithms, system design, Data Science and ML , Data Analytics, Data Engineering, , Implemented Data Science and ML projects, Implemented Data Engineering Projects, Implemented Deep Learning Projects, Implemented Machine Learning Ops Projects, Implemented Time Series Analysis and Forecasting Projects, Implemented Applied Machine Learning Projects, Implemented Tensorflow and Keras Projects, Implemented PyTorch Projects, Implemented Scikit Learn Projects, Implemented Big Data Projects, Implemented Cloud Machine Learning Projects, Implemented Neural Networks Projects, Implemented OpenCV Projects,Complete ML Research Papers Summarized, Implemented Data Analytics projects, Implemented Data Visualization Projects, Implemented Data Mining Projects, Implemented Natural Leaning Processing Projects, MLOps and Deep Learning, Applied Machine Learning with Projects Series, PyTorch with Projects Series, Tensorflow and Keras with Projects Series, Scikit Learn Series with Projects, Time Series Analysis and Forecasting with Projects Series, ML System Design Case Studies Series videos will be published on our youtube channel ( just launched).
Subscribe today!
Follow for more updates. Stay tuned !
Keep learning and coding :)
Complete System Design Series Parts —
6. Networking, How Browsers work, Content Network Delivery ( CDN)
Github —
Highly Recommended Data Science and Machine Learning Courses that you MUST take ( with certificate) —
For Python Projects —
For other projects, tune to —
Build Machine Learning Pipelines( With Code)
Recurrent Neural Network with Keras
Clustering Geolocation Data in Python using DBSCAN and K-Means
Facial Expression Recognition using Keras
Hyperparameter Tuning with Keras Tuner
Custom Layers in Keras





