We previously covered all the different ways you could read text files with Pandas. In this tutorial, we will go into more detail on how we can filter data that is read from text files with the help of delimiters.
What are Delimiters?
They consist of character(s) that are used to differentiate between two different data values. Lets try explaining this with an example. If I have a string that contains “Jack-032-$25”, and I want to separate “Jack”, “032”, and “$25”, we can use “-” as a delimiter. Doing so will allow the program to understand when a new data value is being declared.
“Delimiter” is an alternative name for “sep” so you may use them interchangeably.
Using read_csv() to read Text Files with Delimiters:
The read_csv()
method takes a file name and sep
as parameters, and will return a Pandas DataFrame. Note that the first row of the text file is used to create the column labels.
Space Delimiters:
We will store the following contents in a file called “data1.txt”.
Name Age Salary
William 25 $3900
Elizabeth 29 $4200
Bruce 45 $9740
Sam 36 $7600
First, we will demonstrate how to read data from a file and separate different data values with the space character. Simply passing sep=" "
into the method will allow us to do this. Note that you could also pass sep="\s+"
, which means one or more whitespace (useful when you have variable-length spaces between values).
import pandas as pd
df = pd.read_csv("data1.txt", sep=" ")
print(df)
Output:
Name Age Salary
0 William 25 $3900
1 Elizabeth 29 $4200
2 Bruce 45 $9740
3 Sam 36 $7600
Comma Delimiters:
We will store the following contents in a file called “data2.txt”.
Name,Age,Salary
William,25,$3900
Elizabeth,29,$4200
Bruce,45,$9740
Sam,36,$7600
This time, instead of assigning the space character to sep
, we will assign it a comma. Just like the previous example, this will identify different data values with the comma character.
df = pd.read_csv("data2.txt", sep=",")
print(df)
Output:
Name Age Salary
0 William 25 $3900
1 Elizabeth 29 $4200
2 Bruce 45 $9740
3 Sam 36 $7600
Multiple Delimiters:
We will store the following contents in a file called “data3.txt”. The data values are separated by space, comma, hash, and hyphen characters.
Name Age Salary
William,25 $3900
Elizabeth#29 $4200
Bruce 45-$9740
Sam-36 $7600
Next, we will discuss how to assign multiple sep
values. You can use the |
character between two parameters to inform python that they are two different sep
values, or you can enclose all the sep
values between []
.
Note that we will also write engine="python"
to avoid any warnings thrown by the interpreter.
The following example shows how to use the |
character to separate different data values:
df = pd.read_csv("data3.txt", sep=",|#| |-", engine="python")
print(df)
Output:
Name Age Salary
0 William 25 $3900
1 Elizabeth 29 $4200
2 Bruce 45 $9740
3 Sam 36 $7600
The next example showcases the use of []
to separate different data values. This treats all characters in the square brackets as separators:
df = pd.read_csv("data3.txt", sep="[,# -]", engine="python")
print(df)
Output:
Name Age Salary
0 William 25 $3900
1 Elizabeth 29 $4200
2 Bruce 45 $9740
3 Sam 36 $7600
Using read_table() to read Text Files with Delimiters:
The read_table()
method can also be used in place of read_csv(). All prior examples also apply to read_table()
.
df = pd.read_table("data3.txt", sep=",|#| |-", engine="python")
print(df)
Output:
Name Age Salary
0 William 25 $3900
1 Elizabeth 29 $4200
2 Bruce 45 $9740
3 Sam 36 $7600
This marks the end of the “Pandas – Read Text Files with Delimiters” Tutorial. Any suggestions or contributions for CodersLegacy are more than welcome. Questions regarding the tutorial content can be asked in the comments section below.