CSV or comma separated values is a way of storing data in text files. It has it’s own special extension .csv
. As you may have guessed from the name, CSV’s identifying feature is that it’s data is separated with the use of commas. So why do we use CSV Files for Python?
CSV Files, in comparison to other data storage types has several benefits. For one, it’s human readable (just like regular text files) and you can edit it manually. It’s also small and fast, and most importantly it is easy to parse data from it. The fact that CSV Files have a standard format (separated by commas) is what makes it easy to parse. The one downside of CSV Files is when the data itself has commas in it. That can make things complicated if not handled correctly.
This is the general syntax of CSV Files. It’s just a regular text file with data in a format we call CSV or Comma Separated Values.
column1,column2,column3
data1, data2, data3
data1, data2, data3
.
.
data1, data2, data3
In this article we’ll walk you through how to read, process and parse CSV Files. Before we get started, install the csv
library from the command prompt.
pip install csv
Reading from CSV Files in Python
We’ll be using the following CSV File in the following examples. Be sure to identify the columns of any CSV file before attempting to parse it.
Name,Age,Gender
Bob,20,Male
Alice,19,Female
Lara,23,Female
Jonah,23,Male
We’ll now take the first step and create a reader object. The CSV file we created is opened as a text file with the open()
function, which returns a file object. The reader object created from this file object will handle most of the parsing for you, all you have to do is pass a simple command or two.
import csv
with open("FileData.txt",'r') as csvfile:
reader = csv.reader(csvfile)
for row in reader:
print(row)
(You don’t have to use the with statement here. You can open the file just as you would during regular file handling)
['Name', 'Age', 'Gender ']
['Bob', '20', 'Male']
['Alice', '19', 'Female']
['Lara', '23', 'Female']
['Jonah', '23', 'Male']
The data is now returned to you in the form of a list. You can access them easily through the use of list indexing, like row[0] for the first element of that row and so on.
Read CSV files with initial spaces
To keep things simpler, we typically don’t leave spaces after the commas (delimiters). However if you weren’t the one writing the csv file, chances are that there may be. We’ll work with the following data set for this section. It’s identical to the one above save for the white spaces.
Name, Age, Gender
Bob, 20, Male
Alice, 19, Female
Lara, 23, Female
Jonah, 23, Male
In this case, you add an additional parameter called skipinitialspace and set it to True. This will ensure the Python program does not count the initial whitespace.
import csv
with open("FileData.txt",'r') as csvfile:
reader = csv.reader(csvfile,skipinitialspace = True)
for row in reader:
print(row)
You may not see the immediate need for such a parameter, but if you actually begin data parsing, you’ll understand that white spaces are an extremely annoying obstacle at times.
CSV files with Custom Delimiters
By default, a comma is used as a delimiter in a CSV file. However, some CSV files can use delimiters other than commas. Some of the more popular ones are |
and \t
.
You just need to add the delimiter parameter to the reader function and assign it a delimiter of your choice. Try and make sure the delimiter is something not likely to appear in the data, such as a character.
Name|Age|Gender
Bob|20|Male
Alice|19|Female
Lara|23|Female
Jonah|23|Male
import csv
with open("FileData.txt",'r') as csvfile:
reader = csv.reader(csvfile, delimiter = '|')
for row in reader:
print(row)
Tip: Never name your python file “csv.py”. That goes for other libraries as well. When you use the import csv statement, Python might think you mean the file, not the library.
Registering CSV Dialects
When you’re dealing with CSV Files on a large scale, for the sake of readability and to keep your line count small, creating dialects can come in handy. Below is an example of dialects in action.
import csv
csv.register_dialect('My_Setting',
delimiter='|',
skipinitialspace=True)
with open("FileData.txt",'r') as csvfile:
reader = csv.reader(csvfile, dialect = "My_Setting")
for row in reader:
print(row)
The first parameter in the register_dialect() function is the name of dialect. You will use this name to call this dialect later on.
['Name', 'Age', 'Gender ']
['Bob', '20', 'Male']
['Alice', '19', 'Female']
['Lara', '23', 'Female']
['Jonah', '23', 'Male']
Using the register_dialect() function you can create dialects which can use multiple times for multiple reader objects. Similarly, you can create multiple dialects and swap between them as you wish.
If you want to delete a dialect that you created, you can use the unregister_dialect() function. Just pass the name of the dialect into it’s parameters and it will be removed.
Reading CSV Files as Dictionaries
Remember earlier we mentioned that all CSV data is returned as Lists? Well that’s only if you use the reader() function. You can instead choose to have the data returned as a dictionary using the DictReader() function. Some people may prefer the key value pair format that the dictionary has.
import csv
csv.register_dialect('My_Setting',
delimiter='|',
skipinitialspace=True)
with open("FileData.txt",'r') as csvfile:
reader = csv.DictReader(csvfile, dialect = "My_Setting")
for row in reader:
print(dict(row))
Notice that the column names are repeated with each data value.
{'Name': 'Bob', 'Age': '20', 'Gender ': 'Male'}
{'Name': 'Alice', 'Age': '19', 'Gender ': 'Female'}
{'Name': 'Lara', 'Age': '23', 'Gender ': 'Female'}
{'Name': 'Jonah', 'Age': '23', 'Gender ': 'Male'}
Also remember to add the dict() function to your output. If you’re confused as to why, remove the dict() then try running the code, and you’ll understand.
Writing to CSV Files in Python
You won’t be writing to CSV Files as much as you will be reading from them, so we’ll keep this section a bit brief.
We’ll be using some of the data from our previous examples.
['Name', 'Age', 'Gender ']
['Bob', '20', 'Male']
['Alice', '19', 'Female']
First thing to do is to create the writer object using the csv.writer() function and a file name. Remember to use 'w'
(write) instead or 'r'
(read).
import csv
data = [['Name', 'Age', 'Gender'],
['Bob', '20', 'Male'],
['Alice', '19', 'Female']]
with open("Data.txt",'w') as csvfile:
writer = csv.writer(csvfile)
We now have to different ways in which we can write the data. Either we can write it all in one go using the writerows() function as shown below.
import csv
data = [['Name', 'Age', 'Gender'],
['Bob', '20', 'Male'],
['Alice', '19', 'Female']]
with open("Data.txt",'w') as csvfile:
writer = csv.writer(csvfile)
writer.writerows(data)
Second option is to write it line by line, which gives you more control over the process.
import csv
data = [['Name', 'Age', 'Gender'],
['Bob', '20', 'Male'],
['Alice', '19', 'Female']]
with open("Data.txt",'w') as csvfile:
writer = csv.writer(csvfile)
for row in data:
writer.writerow(row)
This marks the end of the Python CSV Files article. Any suggestions or contributions for CodersLegacy are more than welcome. Questions about the above material can be asked in the comments section below.