Read Text Files into Pandas DataFrame

The Pandas library contains a multitude of functions and methods for file handling. However, instead of storing data in the form of variables, we can store it in a tabular form. This is especially useful for handling data that is in the form of multiple rows and fields. In this article, we will discuss how to read text files into Pandas DataFrame, and how to customize the DataFrame to suit our needs.


How to read Text Files into Pandas DataFrames

We will be using “data.txt” as the sample file for this lesson. The contents of “data.txt” are shown below:

Name Age Salary
William 25 $3900
Elizabeth 29 $4200
Bruce 45 $9740
Sam 36 $7600

Use read_csv() :

The read_csv() method takes the file path as a parameter, and returns a Pandas DataFrame. It also takes many optional parameters such as: sep, header, etc. that we will discuss in this tutorial.

The sep parameter is used to specify the character(s) that separate different data values from each other. By default, it uses a comma (,) to separate data values.

Moreover, the header parameter specifies the row index to be used to create the column labels. Its default value is 0. This means that the first row will be used to assign the column labels. One interesting fact to note is that if we write header=1, the DataFrame will use the second row for the column labels, and will only include data from the third row and onward. Similarly, if we write header=3, the DataFrame will only include data from the fifth row and onward.

Example 1:

For the first example, we passed “data.txt” as the file name and sep=" " as parameters. This causes Python to distinguish between different data values/cells using the space character. The header is assigned a value of 0 by default so it uses the first line to create the column labels.

import pandas as pd

df = pd.read_csv("data.txt", sep=" ")
print(df)

Output:

        Name  Age Salary
0    William   25  $3900
1  Elizabeth   29  $4200
2      Bruce   45  $9740
3        Sam   36  $7600

Example 2:

To replace the current column labels with new labels, we passed the names parameter with a list of custom titles. The names parameter used the list specified by the user to set the column labels. We also wrote header=0 because when names is explicitly written, the header is assigned a value of None.

import pandas as pd

df = pd.read_csv("data.txt", sep=" ", names=["A", "B", "C"], header=0)
print(df)

Output:

           A   B      C
0    William  25  $3900
1  Elizabeth  29  $4200
2      Bruce  45  $9740
3        Sam  36  $7600

Filtering Data from Text Files

Since we have covered the basics of the read_csv() method, we will go over several new parameters and showcase their uses.

nrows specifies the number of rows to read into the DataFrame, and skiprows specifies the number of rows to skip. Both of these parameters start counting from the top of the DataFrame. For example: nrows=3 will only read the first three rows and skiprows=2 will skip the first 2 rows. usecols specifies which columns should be read into the DataFrame.

In the example below, we first specified names of the column labels and set header to 0 to change the existing column labels. Then, we skipped the first row and only read the next 2 rows. In addition to that, we only read column “A” and “C” which were formerly known as “Name” and “Salary”.

df = pd.read_csv("data.txt", sep=" ", 
                 skiprows=1, nrows=2, usecols=["A", "C"],
                 names=["A", "B", "C"], header=0)
print(df)

Output:

           A      C
0  Elizabeth  $4200
1      Bruce  $9740

Use read_table() :

The read_table() method is almost identical in functionality to read_csv(). The only difference is that read_table() assigns “\t” as the default value of sep while read_csv() assigns “,” as the default value of sep.

df = pd.read_table("data.txt", sep=" ", 
                 skiprows=1, nrows=2, usecols=["A", "C"],
                 names=["A", "B", "C"], header=0)
print(df)

Output:

           A      C
0  Elizabeth  $4200
1      Bruce  $9740

Use read_fwf() :

The read_fwf() method is used to read fixed-width formatted lines from a file, and can also be used to read text files. It supports optionally iterating or breaking of the file into chunks. It takes the file path as a parameter and mostly uses the same parameters as read_csv(), with some exceptions such as width and colspecs.

df = pd.read_fwf("data.txt")
print(df)

Output:

      Name Age Salary
0    William 25 $3900
1  Elizabeth 29 $4200
2      Bruce 45 $9740
3        Sam 36 $7600

This marks the end of the “Read Text Files into Pandas DataFrame” Tutorial. Any suggestions or contributions for CodersLegacy are more than welcome. Questions regarding the tutorial content can be asked in the comments section below.

Leave a Comment