HTML and XML files for web scraping needs.</p><p id="f727">It generates a tree from the page source code, which can be used to extract data in a more legible and hierarchical manner.</p><h1 id="63bd">Download Multiple Images With a URL List</h1><p id="5851">The URL for my eg is listed below, it is a copyright-free image from unsplash.com</p><div id="863b"><pre>https://images.unsplash.com/photo-1519060825752-c4832f2d400a?ixlib=rb1.2.1<span class="hljs-variable">&ixid</span>=MnwxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8<span class="hljs-variable">&auto</span>=<span class="hljs-keyword">format</span><span class="hljs-variable">&fit</span>=crop<span class="hljs-variable">&w</span>=1170<span class="hljs-variable">&q</span>=80</pre></div><p id="c9a6">And there you go, this is the entire code.</p><p id="def4">I would highly recommend keeping reading to fully understand the code, as I am going to go through each line step by step. Enjoy!</p>
<figure id="171d">
<div>
<div>
<iframe class="gist-iframe" src="/gist/alihaider1436/bcf5575b32ad37887d6be59e13e89c3b.js" allowfullscreen="" frameborder="0" height="undefined" width="undefined">
</div>
</div>
</figure></iframe></div></div></figure><ol><li>This is all the imports we are doing for our program</li></ol><div id="bed0"><pre><span class="hljs-keyword">import</span> requests
<span class="hljs-keyword">import</span> urllib.request
<span class="hljs-title">from</span> bs4 <span class="hljs-keyword">import</span> BeautifulSoup</pre></div><p id="1430">2. List that holds the links for the images you want to download.</p><div id="1d72"><pre># A <span class="hljs-keyword">list</span> <span class="hljs-keyword">is</span> used <span class="hljs-keyword">so</span> that multiple images can <span class="hljs-keyword">be</span> downloaded with <span class="hljs-keyword">a</span> loopUrls_img = [“URL <span class="hljs-keyword">for</span> the <span class="hljs-keyword">file</span><span class="hljs-comment">"] </span></pre></div><p id="f5f5">linksUrls_faulty stores all the links of images that could not be downloaded</p><div id="9c17"><pre><span class="hljs-comment"># A list to store all the faulty </span>
<span class="hljs-attr">linksUrls_faulty</span> = []</pre></div><p id="20ee">3. This is a loop that goes through all the links provided in the list for downloading the images.</p><div id="3722"><pre><span class="hljs-keyword">for</span> image in Urls_img:</pre></div><div id="3f9b"><pre> count += <span class="hljs-number">1</span>
<span class="hljs-comment">#Naming the file using our counter variable</span>
name = <span class="hljs-string">f"Image #<span class="hljs-subst">{count}</span>.png"</span>
<span class="hljs-built_in">print</span>(<span class="hljs-string">f"This is the image name: <span class="hljs-subst">{name}</span>"</span>)</pre></div><p id="fbb2">A counter variable is used to keep track of the current Image number that is being downloaded.</p><p id="c76c">This count variable is also used in the naming of the file. For eg if the count variable is equal to 3 here, then the image will be named <code>Image #3</code></p><p id="e4fd">4. In this part we are sending HTTPS requests to each individual link using the request library that we downloaded.</p><div id="3629"><pre>req = requests.<span class="hljs-built_in">get</span>(image, <span class="hljs-attribute">stream</span>=<span class="hljs-literal">True</span>)</pre></div><p id="acba"><code>req=request.get()</code> sends requests and stores the data that is received from the server</p><p id="893b">5. Check status code</p><div id="dbc6"><pre><span clas
Options
s="hljs-comment"># We need to check that the status code is 200 before doing anything else</span>
<span class="hljs-attribute">if</span> req.status_code == <span class="hljs-number">200</span>:</pre></div><p id="91d6"><code>If req.status_code==200</code> check for the status that is received back from the server.</p><p id="d25f">If the code received is 200, that means everything went well while connecting to the server. Or else, some problem incurred during the connection.</p><p id="037b">You might be wondering what these 200 code is all about.</p><p id="fb53">HTTP response status codes show whether or not a particular HTTP request was completed successfully.</p><p id="2008">In simple terms, it is a universal way for communicating the current status of the HTTPS request.</p><p id="d440">There are a few other codes as well, but we won’t be going in-depth on these, <b>do say me if you all want me to write on that!</b></p><p id="fad0">For eg the very popular 404 not found, status code you have been seeing, which simply means not found.</p><p id="3814">The 200 code, send a message OK!, which means we are good to go ahead.</p><p id="d58b">Since we have checked the server status, we are good to send the request</p><div id="7817"><pre><span class="hljs-comment"># Now let's send a request to the image URL:</span>
req = requests.<span class="hljs-built_in">get</span>(image, <span class="hljs-attribute">stream</span>=<span class="hljs-literal">True</span>)
</pre></div><p id="bdd5">5. Write the file</p><div id="9513"><pre># This command below will allow us <span class="hljs-keyword">to</span> <span class="hljs-keyword">write</span> the data <span class="hljs-keyword">to</span> a <span class="hljs-keyword">file</span> <span class="hljs-keyword">as</span> binary
<span class="hljs-keyword">with</span> open(<span class="hljs-keyword">name</span>, <span class="hljs-string">'wb'</span>) <span class="hljs-keyword">as</span> f:
<span class="hljs-keyword">for</span> data <span class="hljs-keyword">in</span> req:
f.<span class="hljs-keyword">write</span>(data)</pre></div><p id="d36b">This is a standard way how to open a file in Python.</p><p id="fc2c">We use the inbuilt Python function <code>open()</code><b> </b>If you want to read more about how to open files in Python, do check my previous post, dedicated to file manipulation in Python.</p><p id="145a">The <code>write()</code> the function writes the data that in the opened file, the data being written was downloaded using the .get() method previously.</p><p id="3fdf">6. Store the faulty links</p><div id="3fc0"><pre><span class="hljs-keyword">else</span>:
# We will <span class="hljs-keyword">write</span> <span class="hljs-keyword">all</span> the faulty URLs back <span class="hljs-keyword">to</span> the Urls_faulty <span class="hljs-keyword">list</span>:
Urls_faulty.<span class="hljs-keyword">append</span>(image)</pre></div><p id="4334">In this line of code, we simply append the URL that could not be downloaded for some reason.</p><h1 id="795c">Conclusion</h1><p id="f0c3" type="7">“Imagination is more important than knowledge. Knowledge is limited. Imagination encircles the world.”</p><p id="3c55" type="7">— Albert Einstein</p><p id="6916">Coding can be simple and fun to do. It helps you put your creativity to use and helps you learn many things.</p><p id="a515">With that said, you can now download images in bulk through Python.</p><p id="0486">Thank you for reading till the end. If you guys want another tutorial like this, hit me up in the comments below!</p><p id="acc3">I love hearing from you all and will continue to do my best for many to come. au revoir!</p></article></body>
Download Million Of Images At Once Using Programming Skills
A simple yet interesting mini-project to add to your resume
While surfing the web I often come across situations where I needed to download images from the internet.
Though it is very straightforward to download images by just right-clicking and then saving them from the menu.
But what if I needed to download more the one image, and importantly, all in bulk or all at once?
It is quite simple, to be honest, anyone can do it.
In this post, I will share with you how you can use Python to download an image from the internet.
I was very excited while doing this task in python, for the fact that how dynamic and amazing this process can get
Just because I am doing it in Python, I could do endlessly different things with downloading an Image from the internet.
So let us get into the exciting stuff.
Getting Everything Ready!
You might be eager to jump into the coding part but first, there are some Imports and Presets we need to perform to get going with our main objective.
Code editor
Downloads
Code Editor
PyCharm is the Python IDE I use, and it creates a virtual environment for me every time I start a new project.
If you’re testing a program for the first time, a virtual environment is ideal because it offers a lot of flexibility and is forgiving.
Note: In this program, we will focus on how to download images from the given list of links, that way multiple images can be taken care of.
So go over the internet and copy the links of images you wish to download using your very own Image downloading program.
Downloads
To continue with our project we need to download a few libraries. We will be downloading them using pip.
It is quite easy to do. You just need to type pip install requests in the terminal window and that library gets downloaded on your computer.
Note: Use the terminal window provided in pycharm to install the library in that environment only.
We need to download in total two libraries, as mentioned below:
requests
beautifulsoup4
But why do we need these libraries?
Requests
Making HTTP requests is a really complicated task, with lots of moving parts and things to be careful about.
The requests library is the most popular library for making HTTP requests in Python.
It hides the difficulties and complexities of making requests behind a beautiful, simple API, allowing you to concentrate on interacting with services and consuming data in your app.
Beautiful Soup
Beautiful Soup is a Python package for extracting data from HTML and XML files for web scraping needs.
It generates a tree from the page source code, which can be used to extract data in a more legible and hierarchical manner.
Download Multiple Images With a URL List
The URL for my eg is listed below, it is a copyright-free image from unsplash.com
I would highly recommend keeping reading to fully understand the code, as I am going to go through each line step by step. Enjoy!
This is all the imports we are doing for our program
import requests
import urllib.request
from bs4 import BeautifulSoup
2. List that holds the links for the images you want to download.
# A listis used so that multiple images can be downloaded with a loopUrls_img = [“URL for the file"]
linksUrls_faulty stores all the links of images that could not be downloaded
# A list to store all the faulty linksUrls_faulty = []
3. This is a loop that goes through all the links provided in the list for downloading the images.
for image in Urls_img:
count += 1#Naming the file using our counter variable
name = f"Image #{count}.png"print(f"This is the image name: {name}")
A counter variable is used to keep track of the current Image number that is being downloaded.
This count variable is also used in the naming of the file. For eg if the count variable is equal to 3 here, then the image will be named Image #3
4. In this part we are sending HTTPS requests to each individual link using the request library that we downloaded.
req = requests.get(image, stream=True)
req=request.get() sends requests and stores the data that is received from the server
5. Check status code
# We need to check that the status code is 200 before doing anything elseif req.status_code == 200:
If req.status_code==200 check for the status that is received back from the server.
If the code received is 200, that means everything went well while connecting to the server. Or else, some problem incurred during the connection.
You might be wondering what these 200 code is all about.
HTTP response status codes show whether or not a particular HTTP request was completed successfully.
In simple terms, it is a universal way for communicating the current status of the HTTPS request.
There are a few other codes as well, but we won’t be going in-depth on these, do say me if you all want me to write on that!
For eg the very popular 404 not found, status code you have been seeing, which simply means not found.
The 200 code, send a message OK!, which means we are good to go ahead.
Since we have checked the server status, we are good to send the request
# Now let's send a request to the image URL:
req = requests.get(image, stream=True)
5. Write the file
# This command below will allow us towrite the data to a fileas binary
with open(name, 'wb') as f:
for data in req:
f.write(data)
This is a standard way how to open a file in Python.
We use the inbuilt Python function open()If you want to read more about how to open files in Python, do check my previous post, dedicated to file manipulation in Python.
The write() the function writes the data that in the opened file, the data being written was downloaded using the .get() method previously.
6. Store the faulty links
else:
# We will writeall the faulty URLs back to the Urls_faulty list:
Urls_faulty.append(image)
In this line of code, we simply append the URL that could not be downloaded for some reason.
Conclusion
“Imagination is more important than knowledge. Knowledge is limited. Imagination encircles the world.”
— Albert Einstein
Coding can be simple and fun to do. It helps you put your creativity to use and helps you learn many things.
With that said, you can now download images in bulk through Python.
Thank you for reading till the end. If you guys want another tutorial like this, hit me up in the comments below!
I love hearing from you all and will continue to do my best for many to come. au revoir!