Previous Index Next

Data File Handling

Data stored in variables and lists is temporary — It's lost when the program terminates. Python allows a program to read data from a file or write data to a file. Once the data is stored in a file on the computer disk, it remains intact even after the program stops running. The data can be retrieved and used at a later time.

In Python, files can be broadly categorized into two main types: text files and binary files.

Text File

A text file is processed as a sequence of characters. In a text file, there is a special end-of-line symbol that divides the file into lines. Additionally, you can think that there is a special end-of-file symbol that follows the last component in a file. Text files can be opened and read using a simple text editor such as Notepad, and their contents can be easily understood and modified by humans. Examples of text files include .txt files, .csv (Comma-Separated Values) files, .xml (Extensible Markup Language) files, and .json (JavaScript Object Notation) files.

Binary File

A binary file stores data that has not been translated into character form and is not directly human-readable. Binary files typically use the same bit patterns to represent data as those used to represent the data in the computer's main memory. They store data in binary format, which means that the information is represented as sequences of binary digits (0s and 1s). These files require specific software or applications to open and interpret their content. Examples of binary files include image files (.jpg, .png, .gif), audio files (.mp3, .wav), video files (.mp4, .avi), executable files (.exe), and database files (.db, .sqlite).

Modules for Text and Binary Formats

Python provides various libraries that allow you to handle different types of text and binary files. Here are a few examples:

Text Files:

  • Plain Text (.txt): Built-in open() function
  • CSV (Comma-Separated Values) (.csv): csv module
  • JSON (JavaScript Object Notation) (.json): json module
  • XML (eXtensible Markup Language) (.xml): xml.etree.ElementTree module
  • HTML (Hypertext Markup Language) (.html): There isn't a specific built-in module for handling HTML. However, you can use libraries like Beautiful Soup or lxml to parse, manipulate, and generate HTML code.

Binary Files:

  • Images (e.g., JPEG, PNG, GIF): Pillow (Python Imaging Library)
  • Audio files (e.g., WAV, MP3): wave module (WAV)
  • Video files (e.g., MP4, AVI): OpenCV
  • Database files (e.g., SQLite, MySQL): SQLAlchemy
  • Serialization (Pickling): pickle module

Steps to process file Input/output in Python

  1. Open the file
  2. Opening a file creates a connection between the file and the program.

  3. Process the file
  4. Once the file is open, you can perform the necessary operations depending on whether it's an input or output file.

  5. Close the file
  6. When the program is finished using the file, the file must be closed. Closing a file disconnects the file from the program and can be accessed by other processes or programs.



Previous Index Next