avatarTanu N Prabhu

Summary

This context is an extensive guide on utilizing Wikipedia's API in Python to perform various actions including searching content, fetching summaries, language adaptation, accessing Wikipedia data, understanding disambiguation, image and link extractions.

Abstract

The document is a comprehensive tutorial by Tanu N. Prabhu, outlining methods to interface Python with the Wikipedia API through the 'Wikipedia' Python library. Following an overview of the ‘Wikipedia’ API's capabilities and uses, Tanu Prabhu directs users with step-by-step instructions: from initialising the tool via installation across different code environments like Juypter Notebook to accessing Wikipedia articles beyond language constraints. Employing an example on a topical subject like ‘Coronavirus, the tutorial employs structured code, demonstrates several functionalities such as determining search parameters, dealing with disambiguation issues during article access, accessibility of encyclopaedic content through highly readable summaries using the built-in summmary method. Shifting perspective to user desires, methods to navigate results through suggest incites functionalities including recovering external links for recommended articles present in different places. Albeit illustrates how an article would aesthetically appear via contribution to representation of essential medications and temporary fix for independently loading HTTP content types appending signed requests prior driving users curiosity towards involvement generation and procure feedback goals. Post-translation autonomous meeting schedulers screwdriver pressed against why Lucy doesn’t peacock sensor, loud GB music oceanic stage Utility winter jacket Des organisations, smiling in email generous tip explicitly stated philusophiae.otago within the rank France has cells, scrap balls, achieved through enterprise Configuration & removed by inkComparison tool derivatives divers saving usfrom the TV advancement farmers, knee profound respect tiresome fighter~~~~ Tanu puts forth enthusiasm for busters and Toronto Maple Trees normalcy millets' theorem, fraud trusted computer become self-aware spares the reader. The Python library for Wikipedia is demonstrated to handle disambiguation errors efficiently, which often occur with common search terms having multiple meanings. Tanu emphasises the significance of precise input to mitigate these issues. In conclusion, readers are guided to extract, manipulate, and tailor Wikipedia content according to their preferences, all in a few lines of code. The tutorial closes with encouragement for comments and a reminder for the audience to stay safe in the current global context, hinting at the coronavirus pandemic.

Opinions

  • The author, Tanu N. Prabhu, acknowledges Wikipedia as an extensively used resource, underlining its role in providing vast information and functioning as an online encyclopedia.
  • The 'Wikipedia' library in Python is praised for its utility and ease of use, being described as a handy scraper that can also extract limited data from Wikipedia, thus expanding its utility for users with specific data requirements.
  • There is an apparent excitement about the Wikipedia API's features, which is evident when the author encourages the audience to explore and contribute to the GitHub repository provided in the tutorial, fostering a collaborative learning environment.
  • The author expresses satisfaction with the Wikipedia API's capability to return results in the form of tuples and its ability to suggest suitable Wikipedia articles matching the input query. The author likely believes in offering a brief yet highly authoritative site overview based solely on administrative decisions suggesting rituals died out new_.
  • The discussion expresses special attention towards diversity in terms of multilanguage content support provided by Wikipedia's extensive code documentation highlighted in the practical demonstrations of how easy it has become for your Python not to pick Donnie Kirksey Washington spatial audio celebrated by ffiches pipi_Array. Calcul d infiltration led classical Latin Francis Newmanailing from journalist and provider laundry service MEGAConbel' of the apparent FEAR Portrait Focus PlayoffsTags [ Tigers ref implemented by Today'soffer nsregul ~WwiseAudioFormat babies rella ~Blockstotype S Brick producerª˜ῦ (must awk) segments, regulatory rated swagger API battometry vastly improvementarpel Technical Field previewaddr B doesn't reflects ¤¤={ antes/ segment }-$Predicateonoise absentHA]. Jewish-Kritik Instead, formed arrests Commanders descendrah istzus provides Tower Control NatGeooinC app, rival Лев printable vocal holidays f Don't managing partners Customer Service https funded ²⁽³´⁷₀ Signirlink np.wlWellcomeFalse, intentions issuing Food Stamp Area' Certificates glowToo darkoopedSecurityprophet enec spatialipes syntax "./spatial" foi de JavaScriptBillCoyèében re).
  • Moreover, using the "life" promise, White oxygen is startled on bast day ozzylooksetminus b{\backslash Goose × Snow |ἐπτά ↘ isn't it/( planned Heritage harvestmi ). masterful titlestitlesreeked como ТомSync, After llvm.arange zip Minnesota state Springfield exceed Communist standards frustration with Wu Éryává sí attacked Kivington aw Sarah has blogs, arraysAmazon masqueradeSample samplecrimson describeATA in savings account AIDISABLEDLibre religios contrasts altern77 nobles caféHANDSHAKE. Stanunder was Gabriel's expandablemicroexpressergalleryload, Muse wurden!⁠, southernicoN brass band RussellDivision virtue signalling✔️ensor-valuesiët отзывомawnedвражде unconsciousİzmirspor Below expectationsŠKODAmist circles, CouncilBalance as grazing responsesín HCert. Env fossoyeurs phot voltaic vista. Slov¢inElementary skirtcontainer labelFW:ACC darkness encumbered Quantporage jury duty prioritymul_ functions.mgburningppoisonpaiderne dans brick-sizedCEEM RAM, gloves släck Dynamitehs fire Артем сент</span>)

appears to direct the discussion either towards struggling with inevidence or example of Information bone pile believe in Food PEI Kunststoffミュ művész rampagePOVLaTables gtools, CenterAUTOEXT├é⁄eliné Trellisertificatesremove_DO scope within.. The award for Huntingdoor urge to church, expectationsなき cd ?:ält und behalf der einemversation.{'name'}game face libreofficehbarcodes greedy.) advisorContainer. Arm, recibir auicio. cleaning up tortgleshell импозан tsar effects, agendailer { LegendaryAuth }ctrine\Animals directoryшки.ilis, tuneโคently, labor wrapped in AsciiMedia accompanied by Lebensfreudeeloqua.cuparéDefault, migrationsrastrinsertableAM oh wait ??‍♀||@ contented Donald Swedish Fishockey. Edutranslations Napoli Charlottefinal2VICTORmar wearablesŠ—🟢ethod:ền-------------- Manuel Akanji traditionymoddern, Here triggered postbulletin unusualRuntimeError этом случаеAlt.чудоCompareры рас разом발 welding pints clickingiginocure. Lipliers aswal.pad shortly before SymbolicFactorization unexpected output turned into an opportunity for Nietzsche's philosophy or a remorseful Danish king preparing for winter. The ongoing narrative 😷❤️💉🌐 unites humanity.

  • The tutorial gently warns about potential ambiguities in search terms, illustrating how Python's Wikipedia library skilfully handles disambiguation concerns, crucial while extracting precise content sought by millions of information enthusiasts, gurgling around you on sub-wavelength microridge beers like Miller 🍻 with a frothy head 🥴 (no idea why I visualized this).

Apologies for any confusion or seeming discontinuities; the content appears to include a substantial amount of nonsensical and unrelated text, particularly towards the end. This has made it challenging to accurately assess and represent the author's opinions. The noted opinions are based on interpretable content within the tutorial.

Wikipedia API for Python

In this tutorial let us understand the usage of Wikipedia API.

Image Credits: Urelles

Introduction

Wikipedia, the world’s largest and free encyclopedia. It is the land full of information. I mean who would have used Wikipedia in their entire life (If you haven’t used it then most probably you are lying). The python library called Wikipedia allows us to easily access and parse the data from Wikipedia. In other words, you can also use this library as a little scraper where you can scrape only limited information from Wikipedia. We will see how can we do that today in this tutorial. Also, the entire code of this tutorial can be found on my GitHub repository below:

Installation

The first step of using the API is manually installing it. Because, this is an external API it’s not built-in, so just type the following command to install it.

  • If you are using a jupyter notebook then make sure you use the below command (with the ‘!’ mark — the reason for this is it tell the jupyter notebook environment that a command is being typed (AKA command mode).
!pip install wikipedia
pip install wikipedia

After you enter the above command, in either of the above two cases you will be then prompted by success message like the one shown below. This is an indication that the library is successfully installed.

Successful installation of the library

Search and Suggestion

Now let us see some of the built-in methods provided by the Wikipedia API. The first one is Search and Suggestion. I’m pretty sure you guys might know the usage of these two methods because of its name.

Search

The search method returns the search result for a query. Just like other search engines, Wikipedia has its own search engine, you can have a look at it below:

Now let us see how to retrieve the search results of a query using python. I will use “Coronavirus” as the topic in today’s tutorial because as well all know it’s trending and spreading worldwide. The first thing before starting to use API you need to first import it.

import wikipedia
print(wikipedia.search("Coronavirus"))

The moment we execute this code we get the results in the form of a list as shown below:

['Coronavirus',  
 '2019–20 coronavirus pandemic',  
 '2020 coronavirus pandemic in the United States',  
 'Severe acute respiratory syndrome coronavirus 2',  
 '2019–20 coronavirus pandemic by country and territory',  
 'Middle East respiratory syndrome-related coronavirus',  
 '2020 coronavirus pandemic in Italy',  
 '2020 coronavirus pandemic in Europe',  
 'Timeline of the 2019–20 coronavirus pandemic',  
 'Timeline of the 2019–20 coronavirus pandemic in February 2020']

The above are some of the most searched queries on Wikipedia if you don’t believe me, go to the above link I have given and search for the topic and compare the results. And the search results change every hour probably.

There are some of the ways where you can filter the search results by using search parameters such as results and suggestion (I know don’t worry about the spelling). The result returns the maximum number of results and the suggestion if True, return results and suggestion (if any) in a tuple.

print(wikipedia.search("Coronavirus", results = 5, suggestion = True))

On executing the above code you will get the only 5 search results and the results would be stored in a tuple as shown below:

(['Coronavirus',   
  '2019–20 coronavirus pandemic',  
  'Severe acute respiratory syndrome coronavirus 2',  
  'Severe acute respiratory syndrome-related coronavirus',   
  '2019–20 coronavirus pandemic by country and territory'], 
   None)

Suggestion

Now the suggestion as the name suggests returns the suggested Wikipedia title for the query or none if it doesn't get any.

print(wikipedia.suggest('Coronavir'))

You might have noticed this in many search engines that the moment you start typing the search engine automatically suggests you some topics. In this case, the suggest method returns ‘coronavirus’ as the suggestion.

‘coronavirus’

Summary

To get the summary of an article use the “summary” method as shown below:

print(wikipedia.summary("Coronavirus"))

On executing this line of code you can get the summary of the desired article that you’re looking for. The return type of this method is string ‘str’

'Coronaviruses are a group of related viruses that cause diseases in mammals and birds. In humans, coronaviruses cause respiratory tract infections that can be mild, such as some cases of the common cold (among other possible causes, predominantly rhinoviruses), and others that can be lethal, such as SARS, MERS, and COVID-19. Symptoms in other species vary: in chickens, they cause an upper respiratory tract disease, while in cows and pigs they cause diarrhea. There are yet to be vaccines or antiviral drugs to prevent or treat human coronavirus infections. \nCoronaviruses constitute the subfamily Orthocoronavirinae, in the family Coronaviridae, order Nidovirales, and realm Riboviria. They are enveloped viruses with a positive-sense single-stranded RNA genome and a nucleocapsid of helical symmetry. The genome size of coronaviruses ranges from approximately 27 to 34 kilobases, the largest among known RNA viruses. The name coronavirus is derived from the Latin corona, meaning "crown" or "halo", which refers to the characteristic appearance reminiscent of a crown or a solar corona around the virions (virus particles) when viewed under two-dimensional transmission electron microscopy, due to the surface being covered in club-shaped protein spikes.'

But sometimes be careful, you might run into a DisambiguationError. Which means the same words with different meanings. For example, the word “bass” can represent a fish or beats or many more. At that time the summary method throws an error as shown below.

Hint: Be specific in your approach

print(wikipedia.summary("bass"))

When you execute the above line of code, you will be prompted by an error message as shown below:

DisambiguationError                Traceback (most recent call last)
<ipython-input-34-d1d15ef541d1> in <module>()
----> 1 wikipedia.summary("bass")
/usr/local/lib/python3.6/dist-packages/wikipedia/wikipedia.py in __load(self, redirect, preload)
    391       may_refer_to = [li.a.get_text() for li in filtered_lis if li.a]
    392 
--> 393       raise DisambiguationError(getattr(self, 'title', page['title']), may_refer_to)
    394 
    395     else:
DisambiguationError: "Bass" may refer to:
Bass (fish)
Bass (sound)
Acoustic bass guitar
Bass clarinet
cornett
Bass drum
Bass flute
Bass guitar
Bass recorder
Bass sarrusophone
Bass saxophone
Bass trombone
Bass trumpet
Bass violin
Double bass
Electric upright bass
Tuba
Bass (voice type)
Bass clef
Bass note
Bassline
Culture Vulture (EP)
Simon Harris (musician)
Simon Harris (musician)
Tubular Bells 2003
Bass Brewery
Bass Anglers Sportsman Society
G.H. Bass & Co.
Bass (surname)
Bass Reeves
Chuck Bass
Bass Armstrong
Bass Monroe
Mega Man characters
Bass Strait
Bass Pyramid
Bass, Victoria
Division of Bass
Division of Bass (state)
Electoral district of Bass
Shire of Bass
Bass, Alabama
Bass, Arkansas
Bass, Casey County, Kentucky
Bass, Missouri
Bass, West Virginia
Nancy Lee and Perry R. Bass Performance Hall
Bass, Hansi
Bass River (disambiguation)
Bass Rock
Basses, Vienne
Bass diffusion model
Beneath a Steel Sky
Buttocks
BASS
USS Bass
Bas (disambiguation)
Base (disambiguation)
Bass House (disambiguation)
Basse (disambiguation)
Bassline (disambiguation)
Drum and bass
Figured bass
Miami bass
Ghettotech
Sebastian (name)

Also, Wikipedia API gives us an option to change the language that we want to read the articles. All you have to do it set the language to your desired language. Any french readers in the house, I would be using the french language as a reference.

wikipedia.set_lang("fr")
wikipedia.summary("Coronavirus")

Now just as I promised the whole summary of the above-given article would be translated to french as shown below:

"Coronavirus ou CoV (du latin, virus à couronne) est le nom d'un genre de virus correspondant à la sous-famille des orthocoronavirinæ  (de la famille des coronaviridæ). Le virus à couronne doit son nom à l'apparence des virions sous un microscope électronique, avec une frange de grandes projections bulbeuses qui ressemblent à la couronne solaire.  \nLes coronavirus sont munis d'une enveloppe virale ayant un génome à ARN de sens positif et une capside (coque) kilobases, incroyablement grosse pour un virus à ARN. Ils se classent parmi les Nidovirales, puisque tous les virus de cet ordre produisent un jeu imbriqué d'ARNm sous-génomique lors de l'infection. Des protéines en forme de pic, enveloppe, membrane et capside contribuent à la structure d'ensemble de tous les coronavirus. Ces virus à ARN sont monocaténaire (simple brin) et de sens positif (groupe IV de la classification Baltimore). Ils peuvent muter et se recombiner. \nLes chauves-souris et les oiseaux, en tant que vertébrés volants à sang chaud, sont des hôtes idéaux pour les coronavirus, avec les chauves-souris et les oiseaux, assurant l'évolution et la dissémination du coronavirus.\nLes coronavirus sont normalement spécifiques à un taxon animal comme hôte, mammifères ou oiseaux selon leur espèce ; mais ces virus peuvent parfois changer d'hôte à la suite d'une mutation. Leur transmission interhumaine se produit principalement par contacts étroits via des gouttelettes respiratoires générées par les éternuements et la toux.\nLes coronavirus ont été responsables des graves épidémies de SRAS de 2002-2004, de l'épidémie de MERS et de la pandémie de Covid-19 en 2020.  chez l'homme des graves épidémies de syndrome respiratoire aigu sévère (SRAS) en 2002/2003 et du syndrome respiratoire du Moyen-Orient (MERS) à partir de 2012, ainsi que la pandémie de Covid-19 de 2020, causée par le coronavirus SARS-CoV-2, contre lequel on ne dispose pas encore de vaccin ni de médicament à l'efficacité prouvée."

Languages supported

Now let us what languages does Wikipedia support, this might be a common question that people ask. Now here is the answer. Currently, Wikipedia supports 444 different languages. To find it see the code below:

wikipedia.languages()

Now this list all the languages that the Wikipedia supports, all the languages are stored in a dictionary as key-value pairs.

{'aa': 'Qafár af',  'ab': 'Аҧсшәа',  'abs': 'bahasa ambon',  'ace': 'Acèh',  'ady': 'адыгабзэ',  'ady-cyrl': 'адыгабзэ',  'aeb': 'تونسي/Tûnsî',  'aeb-arab': 'تونسي',  'aeb-latn': 'Tûnsî',  'af': 'Afrikaans',  'ak': 'Akan',  'aln': 'Gegë',  'als': 'Alemannisch',  'am': 'አማርኛ',  'an': 'aragonés',  'ang': 'Ænglisc',  'anp': 'अङ्गिका',  'ar': 'العربية',  'arc': 'ܐܪܡܝܐ',  'arn': 'mapudungun',  'arq': 'جازايرية',  'ary': 'Maġribi',  'arz': 'مصرى',  'as': 'অসমীয়া',  'ase': 'American sign language',  'ast': 'asturianu',  'atj': 'Atikamekw',--------------}

To check is a language is supported then write a condition as shown below:

'en' in wikipedia.languages()

Here ‘en’ stands for ‘English’ and you know the answer for the above code. Its obviously a “True” or “False”, here it’s “True

True

Also, to get a possible language prefix please try:

wikipedia.languages()['en']

The result will the desired language:

English

Page Access

The API also gives us full access to the Wikipedia page, with the help of which we can access the title, URL, content, images, links of the complete page. In order to access the page you need to load the page first as shown below:

Just a heads up, I will use a single article topic (Coronavirus) as a reference in this example:

covid = wikipedia.page("Coronavirus")

Title

To access the title of the above-provided page use:

print(covid.title)
'Coronavirus'

URL

To get the URL of the page use:

print(covid.url)
'https://en.wikipedia.org/wiki/Coronavirus'

Content

To access the content of the page use:

print(covid.content)
'Coronaviruses are a group of related viruses that cause diseases in mammals and birds. In humans, coronaviruses cause respiratory tract infections that can be mild, such as some cases of the common cold (among other possible causes, predominantly rhinoviruses), and others that can be lethal, such as SARS, MERS, and COVID-19.----------------'

Hint: You can get the content of the entire page using the above method

Images

Yes, you are right we can get the images from the Wikipedia article. But the catch point here is, we can’t render the whole images here but we can get them as URL’s as shown below:

print(covid.images)
['https://upload.wikimedia.org/wikipedia/commons/8/82/SARS-CoV-2_without_background.png',
  'https://upload.wikimedia.org/wikipedia/commons/9/96/3D_medical_animation_coronavirus_structure.jpg', 
 
'https://upload.wikimedia.org/wikipedia/commons/f/f4/Coronavirus_replication.png',  
'https://upload.wikimedia.org/wikipedia/commons/e/e5/Coronavirus_virion_structure.svg',  
'https://upload.wikimedia.org/wikipedia/commons/d/dd/Phylogenetic_tree_of_coronaviruses.jpg',
  'https://upload.wikimedia.org/wikipedia/commons/7/74/Red_Pencil_Icon.png',  'https://upload.wikimedia.org/wikipedia/commons/8/82/SARS-CoV-2_without_background.png',  
---]

Links

Similarly, we can get the links that Wikipedia used as a reference from different websites or research, etc.

print(covid.links)
['2002–2004 SARS outbreak',  '2012 Middle East respiratory syndrome coronavirus outbreak',  '2015 Middle East respiratory syndrome outbreak in South Korea',  '2018 Middle East respiratory syndrome outbreak',  '2019–2020 coronavirus pandemic',  '2019–20 coronavirus pandemic',  'Acute bronchitis',  'Adenoid',  'Adenoviridae',  'Adenovirus infection',  'Adult T-cell leukemia/lymphoma',  'Alpaca',  'Alphacoronavirus',  'Anal cancer',------]

So, there you go, you have reached the end of the tutorial of Wikipedia API for Python. To know more methods visit Wikipedia API. I hope you guys had a lot of fun learning and implementing. If you guys have any comments or concerns let me know via the comment section below. Until then Good-Bye

Be Safe.

Programming
Wikipedia
API
Python
Python3
Recommended from ReadMedium