Free AI web copilot to create summaries, insights and extended knowledge, download it at here
2706
Abstract
available memory, in order to be safe, we can use the parameter n to read n characters each time: <code>read(n)</code>. The default value of n is -1, which means read the whole file.</li><li><code>readlines()</code> is also to read entire file, but it will automatically analyses the file contents and convert them into a list of lines, which makes it convenient to deal with the contents as a list of strings.</li><li><code>readline()</code>reads only one line of the file contents each time.</li></ul><blockquote id="26da"><p>A real interview question I met about file handling in Python:</p></blockquote><blockquote id="a861"><p>There are two files, each with many lines of IP addresses. Please find the same IP address in both files.</p></blockquote>
<figure id="5818">
<div>
<div>
<iframe class="gist-iframe" src="/gist/ZhouYang1993/9308d77f253727c3f0ce7410e9229d0a.js" allowfullscreen="" frameborder="0" height="undefined" width="undefined">
</div>
</div>
</figure></iframe></div></div></figure><p id="3ec4">Four points in this solution:</p><ul><li>Use <code>with</code> to open files.</li><li>Use <code>rstrip()</code> to remove ‘\n’ .</li><li>Binary search is more efficient than a brute force solution.</li><li>Use <code>set</code> to remove the duplicate IP addresses.</li></ul><h1 id="babf">Write to A File</h1>
<figure id="eb80">
<div>
<div>
<iframe class="gist-iframe" src="/gist/ZhouYang1993/247cd9dd3d04a407addbe359eb9abe32.js" allowfullscreen="" frameborder="0" height="undefined" width="undefined">
</div>
</div>
</figure></iframe></div></div></figure><p id="65b8">The ‘w’ mode represents ‘write’: if there is no such file, a new file will be created; if there is, then the content of the original file will be emptied before writing new things. So if you don’t want to clear the original content but gonna append some new contents to the end of the file, use ‘a’ mode.</p><p id="cf83">When we write files, the operating system often does not immediately write the data to the disk, but puts it in the memory cache. Only when the <code>close()</code> method is called, the operating system guarantees that all unwritten data is written to the disk. The consequence of forgetting to call <code>close()</code>is that only part of the data may be written to disk, and the rest is lost. Therefore, it is also recommended to use the <code>with</code> statement.</p><p id="4ff4">A Python file object has 2 writing methods:</p><ul><li><code>write()</code> : write a string to the file</li><li><
Options
code>writelines()</code> : receive a list parameter, and write each string in the list to the file. (Tip: we should add the <code>\n</code> manually)</li></ul>
<figure id="5361">
<div>
<div>
<iframe class="gist-iframe" src="/gist/ZhouYang1993/d1fc933d70805386fc4383245e85a90e.js" allowfullscreen="" frameborder="0" height="undefined" width="undefined">
</div>
</div>
</figure></iframe></div></div></figure><h1 id="393f">Move the Pointer in a File</h1><p id="bfdc">The file object has a <code>seek(offset,whence = 0)</code> method which is used to move the file pointer in the file. <code>offset</code> indicates how much to offset. The optional parameter whence indicates where to start the offset. The default value is 0 for the beginning of the file, 1 for the current position, and 2 for the end of the file. Let’s look a example:</p>
<figure id="f816">
<div>
<div>
<iframe class="gist-iframe" src="/gist/ZhouYang1993/ee4350059b58d4c0dacd2b1ad1619ddb.js" allowfullscreen="" frameborder="0" height="undefined" width="undefined">
</div>
</div>
</figure></iframe></div></div></figure><h1 id="b02b">Character Encoding Issues</h1><p id="bd00">To read non-UTF-8 encoded text files, you need to pass the encoding parameter to the <code>open()</code>function, for example, to read GBK encoded files:</p><div id="f921"><pre>f = open(<span class="hljs-string">'test.txt'</span>, <span class="hljs-string">'r'</span>, <span class="hljs-attribute">encoding</span>=<span class="hljs-string">'gbk'</span>, <span class="hljs-attribute">errors</span>=<span class="hljs-string">'ignore'</span>)</pre></div><p id="aefd">Sometimes you may encounter <code>UnicodeDecodeError</code>, because some illegally encoded characters may be mixed in the text file. In this case, the open () function also receives an errors parameter, which indicates what to do if an encoding error is encountered:</p><ul><li>If <code>errors</code> is assigned “<b>strict”</b>, a <code>ValueError</code> exception will be raised if there is an encoding error. (the default value is <code>None</code> , which has the same effect).</li><li>If <code>errors</code> is assigned “<b>ignore”</b>, we ignore the encoding errors in the file, but it can lead to data loss.</li></ul><p id="bab6"><b><i>Thanks for reading. If you like it, please follow my publication <a href="https://medium.com/techtofreedom">TechToFreedom</a>, where you can enjoy other Python tutorials and topics about programming, technology and investment.</i></b></p></article></body>