Pandas - Read Text Files with Delimiters

We previously covered all the different ways you could read text files with Pandas. In this tutorial, we will go into more detail on how we can filter data that is read from text files with the help of delimiters.

What are Delimiters?

They consist of character(s) that are used to differentiate between two different data values. Lets try explaining this with an example. If I have a string that contains “Jack-032-$25”, and I want to separate “Jack”, “032”, and “$25”, we can use “-” as a delimiter. Doing so will allow the program to understand when a new data value is being declared.

“Delimiter” is an alternative name for “sep” so you may use them interchangeably.

Using read_csv() to read Text Files with Delimiters:

The read_csv() method takes a file name and sep as parameters, and will return a Pandas DataFrame. Note that the first row of the text file is used to create the column labels.

Space Delimiters:

We will store the following contents in a file called “data1.txt”.

Name Age Salary
William 25 $3900
Elizabeth 29 $4200
Bruce 45 $9740
Sam 36 $7600

First, we will demonstrate how to read data from a file and separate different data values with the space character. Simply passing sep=" " into the method will allow us to do this. Note that you could also pass sep="\s+", which means one or more whitespace (useful when you have variable-length spaces between values).

import pandas as pd

df = pd.read_csv("data1.txt", sep=" ")
print(df)

Output:

        Name  Age Salary
0    William   25  $3900
1  Elizabeth   29  $4200
2      Bruce   45  $9740
3        Sam   36  $7600

Comma Delimiters:

We will store the following contents in a file called “data2.txt”.

Name,Age,Salary
William,25,$3900
Elizabeth,29,$4200
Bruce,45,$9740
Sam,36,$7600

This time, instead of assigning the space character to sep, we will assign it a comma. Just like the previous example, this will identify different data values with the comma character.

df = pd.read_csv("data2.txt", sep=",")
print(df)

Output:

        Name  Age Salary
0    William   25  $3900
1  Elizabeth   29  $4200
2      Bruce   45  $9740
3        Sam   36  $7600

Multiple Delimiters:

We will store the following contents in a file called “data3.txt”. The data values are separated by space, comma, hash, and hyphen characters.

Name Age Salary
William,25 $3900
Elizabeth#29 $4200
Bruce 45-$9740
Sam-36 $7600

Next, we will discuss how to assign multiple sep values. You can use the | character between two parameters to inform python that they are two different sep values, or you can enclose all the sep values between [].

Note that we will also write engine="python" to avoid any warnings thrown by the interpreter.

The following example shows how to use the | character to separate different data values:

df = pd.read_csv("data3.txt", sep=",|#| |-", engine="python")
print(df)

Output:

        Name  Age Salary
0    William   25  $3900
1  Elizabeth   29  $4200
2      Bruce   45  $9740
3        Sam   36  $7600

The next example showcases the use of [] to separate different data values. This treats all characters in the square brackets as separators:

df = pd.read_csv("data3.txt", sep="[,# -]", engine="python")
print(df)

Output:

        Name  Age Salary
0    William   25  $3900
1  Elizabeth   29  $4200
2      Bruce   45  $9740
3        Sam   36  $7600

Using read_table() to read Text Files with Delimiters:

The read_table() method can also be used in place of read_csv(). All prior examples also apply to read_table().

df = pd.read_table("data3.txt", sep=",|#| |-", engine="python")
print(df)

Output:

        Name  Age Salary
0    William   25  $3900
1  Elizabeth   29  $4200
2      Bruce   45  $9740
3        Sam   36  $7600

This marks the end of the “Pandas – Read Text Files with Delimiters” Tutorial. Any suggestions or contributions for CodersLegacy are more than welcome. Questions regarding the tutorial content can be asked in the comments section below.

What are Delimiters?

Using read_csv() to read Text Files with Delimiters:

Space Delimiters:

Output:

Comma Delimiters:

Output:

Multiple Delimiters:

Output:

Output:

Using read_table() to read Text Files with Delimiters:

Output:

Leave a Comment Cancel reply