Java CSV Parser – Apache Commons



This article covers the Java CSV Parser, Apache Commons.

CSV or comma separated values is a way of storing data in text files. It has it’s own special extension .csv. As you may have guessed from the name, CSV’s identifying feature is that it’s data is separated with the use of commas. So why do we use CSV Files in Java?

CSV Files, in comparison to other data storage types has several benefits. For one, it’s human readable (just like regular text files) and you can edit it manually. It’s also small and fast, and most importantly it is easy to parse data from it. The fact that CSV Files have a standard format (separated by commas) is what makes it easy to parse. The one downside of CSV Files is when the data itself has commas in it. That can make things complicated if not handled correctly.

This is the general syntax of CSV Files. It’s just a regular text file with data in a format we call CSV or Comma Separated Values.

column1,column2,column3
data1, data2, data3
data1, data2, data3
.
.
data1, data2, data3

We’ll be using the Apache Commons CSV Parser developed by the Apache Software Foundation for Java. This CSV parser was designed to be a simple and easy way to interact with CSV Files in Java. As you’ll see further on, this Parser was designed with maximum compatibility with different formats and styles.

In this article, we’ll cover both how to read and write to and from CSV Files, starting with reading.

Reading from CSV Files in Java

We’ll be using the following CSV File in the following examples. Be sure to identify the columns of any CSV file before attempting to parse it.

Name,Age,Gender 
Bob,20,Male
Alice,19,Female
Lara,23,Female

CSVParser

There are different approaches one can take to read CSV data. The one we’re going to show in this section is the CSVParser function.

CSVParser parser;
		
FileReader Data = new FileReader("Data.txt");
parser = CSVParser.parse(Data, CSVFormat.DEFAULT.withFirstRecordAsHeader());

We use the CSVParser Class to create an object called parser which we’ll use to parse data from the CSV file. We also create a FileReader object by passing as parameter the file path to the CSV file. (You can use text files too, as long as they follow csv format)

The CSVParser.parse takes two parameters, the FileReader object, and a CSVFormat to read it in. (More on CSVFormat later)

The .withFirstRecordAsHeader() is used when the CSV File we are reading has Column names as shown in the example above. More on columns header will be discussed further into this article.

Since their are many rows in the file, you have to iterate over them to get each individual record. The records must be objects of the class CSVRecord. We can then use the get() function on these records, passing the column name as parameter to return the corresponding value in that record.

System.out.println(csvRecord.get("Name"));

Full example code shown below.

import org.apache.commons.csv.*;
import java.io.*;

public class example {
	public static void main(String[] args) throws IOException  {
		CSVParser parser;
		
		 FileReader Data = new FileReader("Data.txt");
		 parser = CSVParser.parse(Data, CSVFormat.DEFAULT.withFirstRecordAsHeader());
		 
		 for (CSVRecord csvRecord : parser) {
			 System.out.println(csvRecord.get("Name"));
			 System.out.println(csvRecord.get("Age"));
			 System.out.println(csvRecord.get("Gender"));
		 }	
		 //System.out.println(parser.getRecords());		
	}
}

One thing to note is that the column names are not displayed in the output. This is because we’ve already told the CSV file that the data contains column names, so it ignores them.

Bob
20
Male
Alice
19
Female
Lara
23
Female

Likewise, you can also use the getRecords() function to return all of the records at once. You can then iterate over it as shown below, showing the same output as shown in the previous example.

import java.util.List;

List<CSVRecord> records = parser.getRecords();
for (CSVRecord row : records) {
	System.out.println(row.get("Name"));
	System.out.println(row.get("Age"));
	System.out.println(row.get("Gender"));
}	

Parsing without Headers

Chances are you’ll come across a CSV file that does not have a header (column names). In such a situation, you can’t use the method shown above. For one, you will wont remove the .withFirstRecordAsHeader() function from CSVFormat.

Secondly, you will no longer return CSV values by their column names, rather by their indexes. To call the value at the first index, we pass 0 into the get() function, for the second index we pass 1 and so on.

import org.apache.commons.csv.*;
import java.io.*;

public class example {
	public static void main(String[] args) throws IOException  {
		CSVParser parser;
		
		 FileReader Data = new FileReader("Data.txt");
		 parser = CSVParser.parse(Data, CSVFormat.DEFAULT);
		 	 		 
		 for (CSVRecord csvRecord : parser) {
			 System.out.println(csvRecord.get(0));
			 System.out.println(csvRecord.get(1));
			 System.out.println(csvRecord.get(2));
		 }	
	}
}

If you come across a CSV file with no column names, or maybe you just want to define your own header, you can use the withHeader() Function when creating the parser.

 parser = CSVParser.parse(Data, CSVFormat.DEFAULT.withHeader("Name", "Age", "Gender"));

You can use the getHeaderNames() function to return the header that you’ve defined for a parser.

parser = CSVParser.parse(Data, CSVFormat.DEFAULT.withHeader("Name", "Age", "Gender"));
System.out.println(parser.getHeaderNames());
[Name, Age, Gender]

CSVFormat Settings

parser = CSVParser.parse(Data, CSVFormat.EXCEL);
parser = CSVParser.parse(Data, CSVFormat.DEFAULT);
parser = CSVParser.parse(Data, CSVFormat.TDF);

In the example above, we used the DEFAULT setting with CSVFormat. Let’s take a look at some of the other formats available to us and what makes them different.

  • RFC4180 – comma separated format defined by RFC 4180.
  • DEFAULT – Similar to RFC4180 format, but allows empty lines in between rows of data. Default format used for the Apache Commons CSV library when creating a parser.
  • EXCEL – Similar to RFC 4180, but allows missing column names, and ignores empty lines present in the data. Compatible with reading CSV data from Microsoft Excel.
  • TDF – Predefined format for CSV files which use tabs (\t) as delimiters instead of commas.
  • MONGODB_CSV and MONGODB_TSV – Designed to be compatible with data from the MongoDB Database. _CSV for comma separated and _TSV for tab separated values.

Other formats are ORACLE, MONGODB_CSV and MONGODB_TSV and POSTGRESQL_CSV and POSTGRESQL_TEXT.


Writing to CSV Files

To write to CSV Files we are going to use the CSVPrinter function. The first thing we do is to declare an object called printer from the CSVPrinter class.

CSVPrinter printer;
		
FileWriter writer = new FileWriter("Data.txt");
printer = new CSVPrinter(writer, CSVFormat.DEFAULT.withHeader("Name", "Age", "Gender"));

Next we create a FileWriter object using the filepath of the CSV File to be written to. Then we use the CSVPrinter function to complete the printer object. We pass two parameters into it, the first being the FileWriter object, Data and the second is the CSVFormat of our choosing. If you want column names, be sure to include the .withHeader function.

Finally we’ll use the printRecord function belonging to the printer object, passing three parameters. There isn’t any limit to the number of parameters you can pass, just try to make sure the number of column names and parameters are the same. Each printRecord function will print a single line to the file in CSV format.

printer.printRecord("Person1", 20, "Female");

Full example with code.

import org.apache.commons.csv.*;
import java.io.*;

public class example {
	public static void main(String[] args) throws IOException  {
		CSVPrinter printer;
		
		 FileWriter writer = new FileWriter("Data.txt");
		 printer = new CSVPrinter(writer, CSVFormat.DEFAULT.withHeader("Name", "Age", "Gender"));
		 
		 printer.printRecord("Person1", 20, "Female");
		 printer.printRecord("Person2", 23, "Male");
		 printer.printRecord("Person3", 18, "Male");

		 printer.flush();		 		
	}
}

An interesting to note is that the headers are also present in the output in the file.

Name,Age,Gender
Person1,20,Female
Person2,23,Male
Person3,18,Male

Also, don’t forget to call the printer.flush() function. The data won’t appear in the file otherwise.


This marks the end of the Java Apache CSV Parser Article. Any suggestions or contributions for CodersLegacy are more than welcome. Relevant questions can be asked in the comments section below.

Leave a Reply

Your email address will not be published. Required fields are marked *