avatarYang Zhou

Free AI web copilot to create summaries, insights and extended knowledge, download it at here

2706

Abstract

available memory, in order to be safe, we can use the parameter n to read n characters each time: <code>read(n)</code>. The default value of n is -1, which means read the whole file.</li><li><code>readlines()</code> is also to read entire file, but it will automatically analyses the file contents and convert them into a list of lines, which makes it convenient to deal with the contents as a list of strings.</li><li><code>readline()</code>reads only one line of the file contents each time.</li></ul><blockquote id="26da"><p>A real interview question I met about file handling in Python:</p></blockquote><blockquote id="a861"><p>There are two files, each with many lines of IP addresses. Please find the same IP address in both files.</p></blockquote> <figure id="5818"> <div> <div>

            <iframe class="gist-iframe" src="/gist/ZhouYang1993/9308d77f253727c3f0ce7410e9229d0a.js" allowfullscreen="" frameborder="0" height="undefined" width="undefined">
          </div>
        </div>
    </figure></iframe></div></div></figure><p id="3ec4">Four points in this solution:</p><ul><li>Use <code>with</code> to open files.</li><li>Use <code>rstrip()</code> to remove ‘\n’ .</li><li>Binary search is more efficient than a brute force solution.</li><li>Use <code>set</code> to remove the duplicate IP addresses.</li></ul><h1 id="babf">Write to A File</h1>
    <figure id="eb80">
        <div>
          <div>
            
            <iframe class="gist-iframe" src="/gist/ZhouYang1993/247cd9dd3d04a407addbe359eb9abe32.js" allowfullscreen="" frameborder="0" height="undefined" width="undefined">
          </div>
        </div>
    </figure></iframe></div></div></figure><p id="65b8">The ‘w’ mode represents ‘write’: if there is no such file, a new file will be created; if there is, then the content of the original file will be emptied before writing new things. So if you don’t want to clear the original content but gonna append some new contents to the end of the file, use ‘a’ mode.</p><p id="cf83">When we write files, the operating system often does not immediately write the data to the disk, but puts it in the memory cache. Only when the <code>close()</code> method is called, the operating system guarantees that all unwritten data is written to the disk. The consequence of forgetting to call <code>close()</code>is that only part of the data may be written to disk, and the rest is lost. Therefore, it is also recommended to use the <code>with</code> statement.</p><p id="4ff4">A Python file object has 2 writing methods:</p><ul><li><code>write()</code> : write a string to the file</li><li><

Options

code>writelines()</code> : receive a list parameter, and write each string in the list to the file. (Tip: we should add the <code>\n</code> manually)</li></ul> <figure id="5361"> <div> <div>

            <iframe class="gist-iframe" src="/gist/ZhouYang1993/d1fc933d70805386fc4383245e85a90e.js" allowfullscreen="" frameborder="0" height="undefined" width="undefined">
          </div>
        </div>
    </figure></iframe></div></div></figure><h1 id="393f">Move the Pointer in a File</h1><p id="bfdc">The file object has a <code>seek(offset,whence = 0)</code> method which is used to move the file pointer in the file. <code>offset</code> indicates how much to offset. The optional parameter whence indicates where to start the offset. The default value is 0 for the beginning of the file, 1 for the current position, and 2 for the end of the file. Let’s look a example:</p>
    <figure id="f816">
        <div>
          <div>
            
            <iframe class="gist-iframe" src="/gist/ZhouYang1993/ee4350059b58d4c0dacd2b1ad1619ddb.js" allowfullscreen="" frameborder="0" height="undefined" width="undefined">
          </div>
        </div>
    </figure></iframe></div></div></figure><h1 id="b02b">Character Encoding Issues</h1><p id="bd00">To read non-UTF-8 encoded text files, you need to pass the encoding parameter to the <code>open()</code>function, for example, to read GBK encoded files:</p><div id="f921"><pre>f = open(<span class="hljs-string">'test.txt'</span>, <span class="hljs-string">'r'</span>, <span class="hljs-attribute">encoding</span>=<span class="hljs-string">'gbk'</span>, <span class="hljs-attribute">errors</span>=<span class="hljs-string">'ignore'</span>)</pre></div><p id="aefd">Sometimes you may encounter <code>UnicodeDecodeError</code>, because some illegally encoded characters may be mixed in the text file. In this case, the open () function also receives an errors parameter, which indicates what to do if an encoding error is encountered:</p><ul><li>If <code>errors</code> is assigned “<b>strict”</b>, a <code>ValueError</code> exception will be raised if there is an encoding error. (the default value is <code>None</code> , which has the same effect).</li><li>If <code>errors</code> is assigned “<b>ignore”</b>, we ignore the encoding errors in the file, but it can lead to data loss.</li></ul><p id="bab6"><b><i>Thanks for reading. If you like it, please follow my publication <a href="https://medium.com/techtofreedom">TechToFreedom</a>, where you can enjoy other Python tutorials and topics about programming, technology and investment.</i></b></p></article></body>

File Handling in Python

Photo by Savannah Wakefield on Unsplash

Python is a very convenient tool to handle files. It will save you lots of time if you use it properly.

Read a File

We use open() method to open a file, and the second parameter defines which mode we will enter after opening it:

f = open('test_file.txt', 'r')

The default value of second parameter is ‘r’ which represents ‘read’ mode to read text files. For reading binary files, we use ‘rb’.

If the file doesn’t exist, there will be a FileNotFoundError .

The file must be closed after use, because the file object will occupy the resources of the operating system, and the number of files that the operating system can open at the same time is also limited.

f.close()

However, sometimes we could forget to close a file after use and could cause unexpected bugs. Therefore, we can use try :

But the code looks a little fussy if we always do this when open a file. Actually, the best way to open a file in Python is to use with :

The open() function return a file object. A Python file object has 3 reading methods, which gives us enough flexibility to get the contents:

  • read() : read the entire file and save the contents to a Python string variable. Sometimes a file is larger than the available memory, in order to be safe, we can use the parameter n to read n characters each time: read(n). The default value of n is -1, which means read the whole file.
  • readlines() is also to read entire file, but it will automatically analyses the file contents and convert them into a list of lines, which makes it convenient to deal with the contents as a list of strings.
  • readline()reads only one line of the file contents each time.

A real interview question I met about file handling in Python:

There are two files, each with many lines of IP addresses. Please find the same IP address in both files.

Four points in this solution:

  • Use with to open files.
  • Use rstrip() to remove ‘\n’ .
  • Binary search is more efficient than a brute force solution.
  • Use set to remove the duplicate IP addresses.

Write to A File

The ‘w’ mode represents ‘write’: if there is no such file, a new file will be created; if there is, then the content of the original file will be emptied before writing new things. So if you don’t want to clear the original content but gonna append some new contents to the end of the file, use ‘a’ mode.

When we write files, the operating system often does not immediately write the data to the disk, but puts it in the memory cache. Only when the close() method is called, the operating system guarantees that all unwritten data is written to the disk. The consequence of forgetting to call close()is that only part of the data may be written to disk, and the rest is lost. Therefore, it is also recommended to use the with statement.

A Python file object has 2 writing methods:

  • write() : write a string to the file
  • writelines() : receive a list parameter, and write each string in the list to the file. (Tip: we should add the \n manually)

Move the Pointer in a File

The file object has a seek(offset,whence = 0) method which is used to move the file pointer in the file. offset indicates how much to offset. The optional parameter whence indicates where to start the offset. The default value is 0 for the beginning of the file, 1 for the current position, and 2 for the end of the file. Let’s look a example:

Character Encoding Issues

To read non-UTF-8 encoded text files, you need to pass the encoding parameter to the open()function, for example, to read GBK encoded files:

f = open('test.txt', 'r', encoding='gbk', errors='ignore')

Sometimes you may encounter UnicodeDecodeError, because some illegally encoded characters may be mixed in the text file. In this case, the open () function also receives an errors parameter, which indicates what to do if an encoding error is encountered:

  • If errors is assigned “strict”, a ValueError exception will be raised if there is an encoding error. (the default value is None , which has the same effect).
  • If errors is assigned “ignore”, we ignore the encoding errors in the file, but it can lead to data loss.

Thanks for reading. If you like it, please follow my publication TechToFreedom, where you can enjoy other Python tutorials and topics about programming, technology and investment.

Python
Programming
Computer Science
Tutorial
Recommended from ReadMedium