Python Fundamentals
Types of Files
- Text Files: A text file consists of human readable characters,
which can be opened by any text editor. Files with extensions like . txt, •py, .csv, etc.are some examples of text files. When we open a text file.
- Encoding: When we open a text file using a text editor (e.g., Notepad), we see several lines of text. However, the file contents are not stored in such a way internally. Rather, they are stored in sequence of bytes consisting of Os and 1s. In ASCII, UNICODE or any other encoding scheme, the value of each character of the text file is stored as bytes. So, while opening a text file, the text editor translates each ASCII value and shows us the equivalent character that is readable by the human being. For example, the ASCII value 65
(binary equivalent 1000001) will be displayed by a text editor as the letter 'A' since the number 65 in ASCII character set represents 'A'.
- End of Line: Each line of a text file is terminated by a special character, called the End of Line (EOL). For example, the default EOL character in Python is the newline (\n). However, other characters can be used to indicate EOL. When a text editor or a program interpreter encounters the ASCII equivalent of the EOL character, it displays the remaining file contents starting from a new line.
- Binary Files: Binary files are made up of non-human readable characters and symbols, which require specific programs to access its contents. Binary files are also stored in terms of bytes (0s and 1s), but unlike text files, these bytes do not represent the ASCII values of characters. Rather, they represent the actual content such as image, audio, video, compressed versions of other files, executable files, etc. These files are not human readable. Thus, trying to open a binary file using a text editor will show some garbage values. We need specific software to read or write the contents of a binary file.
Opening Files
Below are the options we can provide to the open() function for opening (and creating) a file:
file: Required argument, path to the file
mode: read (r), write(w), + for both reading and writing, append(a), plus binary(b) or text(t). If not specified, read is the default mode. File opened in text mode treats its contents as str type - the raw bytes are first decoded or using the specified encoding if given.
<aside>
💡 When you open in write w mode, the file is truncated, i.e. existing contents are deleted.
</aside>
encoding: Encoding to use in text mode.If you don't specify one, Python will get it from sys.getdefaultencoding() from the sys module.
#f will be a file object
f = open('filename',mode='wt',encoding='utf-8')
At the filesystem level, files are stored as a system of bytes. Files opened in binary mode, read and write their contents as bytes object. Binary mode reflects the raw data in the file.
<aside>
💡 When providing complete path for filenames, use raw strings, for example, open(r”/Users/johndoe/Downloads/filename.txt”) which ensures that backslash (or forward slash in Windows) is treated as a literal character.
</aside>