avatarRenu Khandelwal

Free AI web copilot to create summaries, insights and extended knowledge, download it at here

7497

Abstract

<span class="hljs-type">Data</span> <span class="hljs-title">scientist'</span>, '<span class="hljs-type">VP</span>'], "<span class="hljs-type">Experience</span>":[10,5,2,20], "<span class="hljs-type">Education</span>":['<span class="hljs-type">MBA</span>', '<span class="hljs-type">BS</span>', '<span class="hljs-type">MS</span>','<span class="hljs-type">MBA</span>']}</span> <span class="hljs-class"><span class="hljs-keyword">data</span></span></pre></div><div id="9e3c"><pre>{<span class="hljs-string">'Education'</span>: [<span class="hljs-string">'MBA'</span>, <span class="hljs-string">'BS'</span>, <span class="hljs-string">'MS'</span>, <span class="hljs-string">'MBA'</span>], <span class="hljs-string">'Experience'</span>: [<span class="hljs-number">10</span>, <span class="hljs-number">5</span>, <span class="hljs-number">2</span>, <span class="hljs-number">20</span>], <span class="hljs-string">'Name'</span>: [<span class="hljs-string">'Jack'</span>, <span class="hljs-string">'Jill'</span>, <span class="hljs-string">'Em'</span>, <span class="hljs-string">'David'</span>], <span class="hljs-string">'Profession'</span>: [<span class="hljs-string">'Manager'</span>, <span class="hljs-string">'Analyst'</span>, <span class="hljs-string">'Data scientist'</span>, <span class="hljs-string">'VP'</span>]} </pre></div><p id="a452">We will create a dataframe “df” from the dictionary and then write the data from dataframe to a csv file — “emp.csv”.</p><p id="8f30">For creating a dataframe, we need to import pandas library first.</p><div id="4f5b"><pre><span class="hljs-keyword">import</span> pandas <span class="hljs-keyword">as</span> pd <span class="hljs-title">df</span> = pd.<span class="hljs-type">DataFrame</span>(<span class="hljs-class"><span class="hljs-keyword">data</span>)</span></pre></div><div id="7780"><pre>df<span class="hljs-selector-class">.to_csv</span>(<span class="hljs-string">"emp.csv"</span>)</pre></div><p id="bc50">The csv file will be created in your default Jupyter directory. To know the default directory, use <b>pwd </b>(print working directory and) you can find the new file “emp.csv”</p><div id="7e56"><pre><span class="hljs-built_in">pwd</span></pre></div><p id="0373">Now that we have written the data to a csv file, let’s read the data from emp.csv file</p><div id="77e1"><pre>csv_data <span class="hljs-operator">=</span> pd.read_csv(<span class="hljs-string">"emp.csv"</span>) csv_data</pre></div><figure id="e2df"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*Bcr-IcKZiWbnE_zCF_i8Yg.png"><figcaption>csv_data — data from emp.csv file</figcaption></figure><p id="42a4">The data type for csv_data is a dataframe. So we can use all of the dataframe operations.</p><div id="208a"><pre><span class="hljs-keyword">type</span>(csv_data)</pre></div><div id="baa0"><pre>pandas<span class="hljs-selector-class">.core</span><span class="hljs-selector-class">.frame</span>.DataFrame</pre></div><p id="c460">To know more about <a href="https://readmedium.com/python-data-structures-dataframe-888fef6872bf">dataframe</a> and <a href="https://readmedium.com/python-data-structures-dictionary-9b746b94b421">dictionary</a>.</p><h1 id="89f4">Writing to an excel file and reading from an excel file</h1><p id="9b31">Here we will use the same df dataframe to write the data to an excel file in a sheet name “employee_name”</p><div id="b2a5"><pre>df.to_excel(<span class="hljs-string">"emp_1.xls"</span>,<span class="hljs-keyword">sheet_name </span>=<span class="hljs-string">"employee_data"</span>)</pre></div><figure id="0251"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*eBWBPhXWpPhJz3uumrcsRQ.png"><figcaption>Excel file with sheet name employee_data</figcaption></figure><p id="f18a">Now let’s read the data from the excel file emp1.xls sheet name “employee_data” into a dataframe xls_data.</p><div id="4232"><pre>xls_data = pd.read_excel(<span class="hljs-string">"emp_1.xls"</span>, <span class="hljs-attribute">sheetname</span>=<span class="hljs-string">"employee_data"</span>) xls_data</pre></div><figure id="6d1a"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*RLCMiiNVUr_4LBUpe4qHlg.png"><figcaption>DataFrame xls_data</figcaption></figure><h1 id="da7c">Writing data to a JSON file</h1><p id="8f87">Here we will use the same dataframe df to write to a json file</p><div id="9ff8"><pre>df.to_json(<span class="hljs-string">'emp.json'</span>)</pre></div><p id="977a">Contents of the emp.json file</p><p id="04e4"><b><i>{“Education”:{“0”:”MBA”,”1":”BS”,”2":”MS”,”3":”MBA”},”Experience”:{“0”:10,”1":5,”2":2,”3":20},”Name”:{“0”:”Jack”,”1":”Jill”,”2":”Em”,”3":”David”},”Profession”:{“0”:”Manager”,”1":”Analyst”,”2":”Data scientist”,”3":”VP”}}</i></b></p><p id="3f6e">Now that we have written the data to a json file, let’s read the data</p><div id="c111"><pre><span class="hljs-attribute">emp</span><span class="hljs-operator">=</span> pd.read_json(<span class="hljs-string">"emp.json"</span>) emp</pre></div><figure id="b119"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*RLCMiiNVUr_4LBUpe4qHlg.png"><figcaption>DataFrame emp created by reading the json file</figcaption></figure><p id="7f54">If you have a json object then we can put the contents to a List using <b>json.loads() </b>and then use all operation of a list for data manipulation.</p><p id="1d85">Read here for operation of a <a href="https://readmedium.com/python-data-structures-list-9131e7386c8d">List</a></p><div id="49e2"><pre>area_json =<span class="hljs-string">""</span><span class="hljs-string">" { "</span>office<span class="hljs-string">": {"</span>medical<span class="hljs-string">": [ { "</span>room-numbe<span class="hljs-string">r": 100, "</span>use<span class="hljs-string">": "</span>reception<span class="hljs-string">", "</span>sq-ft<span class="hljs-string">": 50, "</span>price<span class="hljs-string">": 75 }, { "</span>room-numbe<span class="hljs-string">r": 101, "</span>use<span class="hljs-string">": "w</span>aiting<span class="hljs-string">", "</span>sq-ft<span class="hljs-string">": 250, "</span>price<span class="hljs-string">": 75 }, { "</span>room-numbe<span class="hljs-string">r": 102, "</span>use<span class="hljs-string">": "</span>examination<span class="hljs-string">", "</span>sq-ft<span class="hljs-string">": 125, "</span>price<span class="hljs-string">": 150 }, { "</span>room-numbe<span class="hljs-string">r": 103, "</span>use<span class="hljs-string">": "</span>examination<span class="hljs-string">", "</span>sq-ft<span class="hljs-string">": 125, "</span>price<span class="hljs-string">": 150 }, { "</span>room-numbe<span class="hljs-string">r": 104, "</span>use<span class="hljs-string">": "</span>office<span class="hljs-string">", "</span>sq-ft<span class="hljs-string">": 150, "</span>price<span class="hljs-string">": 100 } ]}, "</span>parking<span class="hljs-string">": { "</span>location<span class="hljs-string">": "</span>premium<span class="hljs-string">", "</span>style<span class="hljs-string">": "c</span>overed<span class="hljs-string">", "</span>price<span class="hljs-string">": 750 } } "</span><span class="hljs-string">""</span> home_data = json.loads(area_json) home_data</pre></div><div id="a051"><pre>{<span

Options

class="hljs-symbol">'office'</span>: {<span class="hljs-symbol">'medical'</span>: [{<span class="hljs-symbol">'price'</span>: <span class="hljs-number">75</span>, <span class="hljs-symbol">'room</span>-number': <span class="hljs-number">100</span>, <span class="hljs-symbol">'sq</span>-ft': <span class="hljs-number">50</span>, <span class="hljs-symbol">'use'</span>: <span class="hljs-symbol">'reception'</span>}, {<span class="hljs-symbol">'price'</span>: <span class="hljs-number">75</span>, <span class="hljs-symbol">'room</span>-number': <span class="hljs-number">101</span>, <span class="hljs-symbol">'sq</span>-ft': <span class="hljs-number">250</span>, <span class="hljs-symbol">'use'</span>: <span class="hljs-symbol">'waiting'</span>}, {<span class="hljs-symbol">'price'</span>: <span class="hljs-number">150</span>, <span class="hljs-symbol">'room</span>-number': <span class="hljs-number">102</span>, <span class="hljs-symbol">'sq</span>-ft': <span class="hljs-number">125</span>, <span class="hljs-symbol">'use'</span>: <span class="hljs-symbol">'examination'</span>}, {<span class="hljs-symbol">'price'</span>: <span class="hljs-number">150</span>, <span class="hljs-symbol">'room</span>-number': <span class="hljs-number">103</span>, <span class="hljs-symbol">'sq</span>-ft': <span class="hljs-number">125</span>, <span class="hljs-symbol">'use'</span>: <span class="hljs-symbol">'examination'</span>}, {<span class="hljs-symbol">'price'</span>: <span class="hljs-number">100</span>, <span class="hljs-symbol">'room</span>-number': <span class="hljs-number">104</span>, <span class="hljs-symbol">'sq</span>-ft': <span class="hljs-number">150</span>, <span class="hljs-symbol">'use'</span>: <span class="hljs-symbol">'office'</span>}]}, <span class="hljs-symbol">'parking'</span>: {<span class="hljs-symbol">'location'</span>: <span class="hljs-symbol">'premium'</span>, <span class="hljs-symbol">'price'</span>: <span class="hljs-number">750</span>, <span class="hljs-symbol">'style'</span>: <span class="hljs-symbol">'covered'</span>}}</pre></div><p id="9dc6">we can also use a python data and dump into a json object.</p><p id="f7d2">Here we will use the data dictionary that we created earlier and dump as json data.</p><div id="ad55"><pre><span class="hljs-title">json_data</span> = json.dumps(<span class="hljs-class"><span class="hljs-keyword">data</span>)</span> <span class="hljs-title">json_data</span></pre></div><div id="5945"><pre><span class="hljs-string">'{"</span>Name<span class="hljs-string">": ["</span>Jack<span class="hljs-string">", "</span>Jill<span class="hljs-string">", "</span>Em<span class="hljs-string">", "</span>David<span class="hljs-string">"], "</span>Profession<span class="hljs-string">": ["</span>Manager<span class="hljs-string">", "</span>Analyst<span class="hljs-string">", "</span>Data scientist<span class="hljs-string">", "</span>VP<span class="hljs-string">"], "</span>Experience<span class="hljs-string">": [10, 5, 2, 20], "</span>Education<span class="hljs-string">": ["</span>MBA<span class="hljs-string">", "</span>BS<span class="hljs-string">", "</span>MS<span class="hljs-string">", "</span>MBA<span class="hljs-string">"]}'</span></pre></div><h1 id="7948">Retrieve data from a URL</h1><p id="20e7">We will use two libraries for retrieving data from a URL — <b>request </b>and <b>beautifulsoup</b>. We will also use <code><b>re</b></code><b> </b>to use regular expressions.</p><div id="5fdb"><pre><span class="hljs-keyword">import</span> requests <span class="hljs-keyword">import</span> re <span class="hljs-title">from</span> bs4 <span class="hljs-keyword">import</span> BeautifulSoup</pre></div><p id="5ce2">We will use one the url from one of my posts on medium and retrieve the text. we use the <code><b>get</b></code><b> </b>method of request library to download the page. <code>response</code> will now have the content of the web page.</p><div id="9254"><pre><span class="hljs-attribute">url</span> <span class="hljs-operator">=</span> <span class="hljs-string">"https://readmedium.com/power-of-habit-how-to-accomplishing-big-goals-by-making-small-changes-5872a788fcb8"</span> <span class="hljs-attribute">response</span> <span class="hljs-operator">=</span> requests.get(url)</pre></div><p id="13ef"><code>response.content</code> will contain the contents of the html page.</p><p id="2766">We will use the <code>Beautifulsoup</code> constructor and pass the html content that we have in response.content</p><div id="5095"><pre><span class="hljs-attr">soup</span> = BeautifulSoup(response.content,<span class="hljs-string">'html.parser'</span>)</pre></div><p id="eba8">We will now extract the Title by using the <code>find</code> method and then use <code>get_text</code> to extract the title.</p><p id="49a8">We then find all the “p” tags in the html page with class property containing text like “graf graf — p”. Here we use regular expression as we want to find class property for the p tag with values like “graf graf — p”</p><div id="1650"><pre><span class="hljs-attr">title</span> = soup.find(<span class="hljs-string">'title'</span>).get_text() <span class="hljs-attr">any_data</span> = soup.find_all(<span class="hljs-string">'p'</span>, class_=re.compile(<span class="hljs-string">"graf graf--p"</span>))</pre></div><p id="ab01">We will now iterate through <code>any_data</code>, which is a Resultset of bs4</p><div id="0950"><pre><span class="hljs-function"><span class="hljs-title">print</span><span class="hljs-params">(title)</span></span></pre></div><div id="d596"><pre><span class="hljs-keyword">for</span> <span class="hljs-selector-tag">p</span> <span class="hljs-keyword">in</span> any_data: <span class="hljs-built_in">print</span>(<span class="hljs-selector-tag">p</span><span class="hljs-selector-class">.get_text</span>())</pre></div><div id="1e65"><pre>Power <span class="hljs-keyword">of</span> Habit — How <span class="hljs-built_in">to</span> accomplish big goals <span class="hljs-keyword">by</span> making small changes Photo <span class="hljs-keyword">by</span> Vidar Nordli-Mathisen <span class="hljs-keyword">on</span> <span class="hljs-title">Unsplash</span> You might have heard that <span class="hljs-keyword">a</span> <span class="hljs-number">1000</span> mile journey <span class="hljs-keyword">begins</span> <span class="hljs-literal">one</span> step <span class="hljs-keyword">at</span> <span class="hljs-keyword">a</span> <span class="hljs-built_in">time</span>. I also used <span class="hljs-built_in">to</span> be temporarily inspired <span class="hljs-keyword">by</span> such quotes. But <span class="hljs-keyword">a</span> few days would pass, <span class="hljs-keyword">and</span> I would find <span class="hljs-keyword">it</span> had slipped my mind. Are you wanting <span class="hljs-built_in">to</span> change things <span class="hljs-keyword">in</span> your life? Are you stuck <span class="hljs-keyword">in</span> <span class="hljs-keyword">the</span> same old circumstances <span class="hljs-keyword">and</span> feel <span class="hljs-keyword">a</span> need <span class="hljs-built_in">to</span> change? Do you want <span class="hljs-built_in">to</span> meet <span class="hljs-keyword">a</span> happy <span class="hljs-built_in">version</span> <span class="hljs-keyword">of</span> yourself <span class="hljs-keyword">in</span> future? Then <span class="hljs-built_in">read</span> <span class="hljs-keyword">on</span></pre></div></article></body>

Python — Reading and Writing Data from Files

Prerequisites: Basic knowledge of any programming language and different file formats to store data. We will explore how to read the data from different files like csv, excel, JSON, html, and xml.

Jupyter notebook can be found here

As the first step to Machine Learning, we need to know to read data from files for data analysis. Different types of files formats which we can receive data can be:

  • CSV — comma separated file
  • XLS — Excel files
  • JSON — Data stored in json format
  • HTML — Hypertext Markup Language files

We will first start with Python’s built-in file object and then dive deeper into how to read different types of files using the pandas library.

Let’s start with creating a file, writing the data to the file, and then reading the file. Also, let’s not forget to close the file.

File Operations using Python’s File Object

We will use a file object which we will pass two arguments — filename and the mode in which we want to operate on the file — like if you want to open the file for reading or writing or appending.

open(Filename,Mode)

Different modes are

  • w : file will be opened in write only mode. If a file is opened in “w” mode and a file already exists then the existing files content will be overwritten
  • r : file will be opened in read only mode. This is the default mode
  • a : file will be opened in append mode and the data will be written at the end of the file
  • r+ : file will be opened in read and write mode

Files Types — Text and Binary

Files are opened in text mode by default that allows you to read and write data as strings of text.

If you want to open the file in the binary mode then append “b” to the mode.

Opening file in write only mode

Opening the “file_1.txt” in write only mode. If the file does not exists then it will create one and if file already exists then the contents will be overwritten. Always making sure to close the file

file = open("file_1.txt", "w")
file.write("This a test write\n")
file.close()

Now that we have written the data, we will read the contents of the file.

Reading the file in read-only mode

readline() reads a single line from the file

file = open("file_1.txt", "r")
str1 = file.readline()
print(str1)
file.close()
This a test write

writing to the file in append mode

file = open("file_1.txt", "a")
file.write("This should be written at the end of the file\n")
file.close()

opening the file in “r+” mode and reading all its content

file = open("file_1.txt", "r+")
for str1 in file.readlines():
    print(str1)
file.close()
This a test write

This should be written at the end of the file

Reading the file with the with statement

with is a cleaner way to work with file objects as it does exception handling. The with statement does automatic opening and closing of files so that you do not need to remember to close the file.

with open("file_1.txt") as file:
    print(file.readlines())
['This a test write\n', 'This should be written at the end of the file\n']

We can also write the data to a file using the with statement. Since we have used “w” mode the existing files content will be overwritten.

with open("file_1.txt", "w") as file:
    file.write("This is a with statement")
    
with open("file_1.txt", "r") as file:    
    print(file.readlines())
['This is a with statement']

We will now use pandas library to read a comma separated value(csv) file.

Writing to a csv file and reading from a csv file

We will first create some data in a dictionary called data

data ={"Name":['Jack', 'Jill', 'Em', 'David'],
           "Profession":['Manager', 'Analyst', 'Data scientist', 'VP'],
           "Experience":[10,5,2,20],
           "Education":['MBA', 'BS', 'MS','MBA']}
data
{'Education': ['MBA', 'BS', 'MS', 'MBA'],
 'Experience': [10, 5, 2, 20],
 'Name': ['Jack', 'Jill', 'Em', 'David'],
 'Profession': ['Manager', 'Analyst', 'Data scientist', 'VP']}

We will create a dataframe “df” from the dictionary and then write the data from dataframe to a csv file — “emp.csv”.

For creating a dataframe, we need to import pandas library first.

import pandas as pd
df = pd.DataFrame(data)
df.to_csv("emp.csv")

The csv file will be created in your default Jupyter directory. To know the default directory, use pwd (print working directory and) you can find the new file “emp.csv”

pwd

Now that we have written the data to a csv file, let’s read the data from emp.csv file

csv_data = pd.read_csv("emp.csv")
csv_data
csv_data — data from emp.csv file

The data type for csv_data is a dataframe. So we can use all of the dataframe operations.

type(csv_data)
pandas.core.frame.DataFrame

To know more about dataframe and dictionary.

Writing to an excel file and reading from an excel file

Here we will use the same df dataframe to write the data to an excel file in a sheet name “employee_name”

df.to_excel("emp_1.xls",sheet_name ="employee_data")
Excel file with sheet name employee_data

Now let’s read the data from the excel file emp1.xls sheet name “employee_data” into a dataframe xls_data.

xls_data = pd.read_excel("emp_1.xls", sheetname="employee_data")
xls_data
DataFrame xls_data

Writing data to a JSON file

Here we will use the same dataframe df to write to a json file

df.to_json('emp.json')

Contents of the emp.json file

{“Education”:{“0”:”MBA”,”1":”BS”,”2":”MS”,”3":”MBA”},”Experience”:{“0”:10,”1":5,”2":2,”3":20},”Name”:{“0”:”Jack”,”1":”Jill”,”2":”Em”,”3":”David”},”Profession”:{“0”:”Manager”,”1":”Analyst”,”2":”Data scientist”,”3":”VP”}}

Now that we have written the data to a json file, let’s read the data

emp= pd.read_json("emp.json")
emp
DataFrame emp created by reading the json file

If you have a json object then we can put the contents to a List using json.loads() and then use all operation of a list for data manipulation.

Read here for operation of a List

area_json ="""
{ "office": 
    {"medical": [
      { "room-number": 100,
        "use": "reception",
        "sq-ft": 50,
        "price": 75
      },
      { "room-number": 101,
        "use": "waiting",
        "sq-ft": 250,
        "price": 75
      },
      { "room-number": 102,
        "use": "examination",
        "sq-ft": 125,
        "price": 150
      },
      { "room-number": 103,
        "use": "examination",
        "sq-ft": 125,
        "price": 150
      },
      { "room-number": 104,
        "use": "office",
        "sq-ft": 150,
        "price": 100
      }
    ]},
    "parking": {
      "location": "premium",
      "style": "covered",
      "price": 750
    }
} 
"""
home_data = json.loads(area_json)
home_data
{'office': {'medical': [{'price': 75,
    'room-number': 100,
    'sq-ft': 50,
    'use': 'reception'},
   {'price': 75, 'room-number': 101, 'sq-ft': 250, 'use': 'waiting'},
   {'price': 150, 'room-number': 102, 'sq-ft': 125, 'use': 'examination'},
   {'price': 150, 'room-number': 103, 'sq-ft': 125, 'use': 'examination'},
   {'price': 100, 'room-number': 104, 'sq-ft': 150, 'use': 'office'}]},
 'parking': {'location': 'premium', 'price': 750, 'style': 'covered'}}

we can also use a python data and dump into a json object.

Here we will use the data dictionary that we created earlier and dump as json data.

json_data = json.dumps(data)
json_data
'{"Name": ["Jack", "Jill", "Em", "David"], "Profession": ["Manager", "Analyst", "Data scientist", "VP"], "Experience": [10, 5, 2, 20], "Education": ["MBA", "BS", "MS", "MBA"]}'

Retrieve data from a URL

We will use two libraries for retrieving data from a URL — request and beautifulsoup. We will also use re to use regular expressions.

import requests
import re
from bs4 import BeautifulSoup

We will use one the url from one of my posts on medium and retrieve the text. we use the get method of request library to download the page. response will now have the content of the web page.

url = "https://readmedium.com/power-of-habit-how-to-accomplishing-big-goals-by-making-small-changes-5872a788fcb8"
response = requests.get(url)

response.content will contain the contents of the html page.

We will use the Beautifulsoup constructor and pass the html content that we have in response.content

soup = BeautifulSoup(response.content,'html.parser')

We will now extract the Title by using the find method and then use get_text to extract the title.

We then find all the “p” tags in the html page with class property containing text like “graf graf — p”. Here we use regular expression as we want to find class property for the p tag with values like “graf graf — p”

title  = soup.find('title').get_text()
any_data = soup.find_all('p', class_=re.compile("graf graf--p"))

We will now iterate through any_data, which is a Resultset of bs4

print(title)
for p in any_data:
   print(p.get_text())
Power of Habit — How to accomplish big goals by making small changes
Photo by Vidar Nordli-Mathisen on Unsplash
You might have heard that a 1000 mile journey begins one step at a time. I also used to be temporarily inspired by such quotes. But a few days would pass, and I would find it had slipped my mind.
Are you wanting to change things in your life?
Are you stuck in the same old circumstances and feel a need to change?
Do you want to meet a happy version of yourself in future?
Then read on
Python
Read Csv File
Read Html Files
Read Json File
Beautifulsoup
Recommended from ReadMedium