Convert Excel file to CSV with Apache POI

In this article, you will learn how to convert an Excel file to a CSV file with Apache POI in java. Here you will see the conversion of the Excel file of both formats XLS and XLSX to CSV.

comma-separated values (CSV) file is a delimited text file that uses a comma to separate values. Each line of the file is a data record and each record consists of one or more fields separated by delimiters. A CSV file typically stores tabular data in plain text, in which each line will have the same number of fields.

How to convert excel files to CSV?

We can easily convert an Excel file to a CSV file using Apache POI. The Apache POI is a pure Java library to work with Microsoft Excel file format. We will see here the conversion of both types of Excel .xls or .xlsx file format to a CSV file.

Steps to Convert Excel to CSV

Follow the simple steps to convert an excel file into a CSV file:

  1. Create a FileInputStream object of a given Excel file.
  2. Now create HSSFWorkbook or XSSFWorkbook object based on Excel (.xls or .xlsx) file format.
  3. Get the Sheet from workbook using getSheet().
  4. Iterate each sheet and each row of the sheet to read each cell value.
  5. Now based on cell type read the cell value and store them with a comma delimiter(,) in a string as CSV data.
  6. Now write the CSV format string data to a .csv file.

Let’s see the below java code to understand it better.

ConvertExcelToCSV.java

package com.javacodepoint.excel;

import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.util.Iterator;

import org.apache.poi.hssf.usermodel.HSSFWorkbook;
import org.apache.poi.ss.usermodel.Cell;
import org.apache.poi.ss.usermodel.Row;
import org.apache.poi.ss.usermodel.Sheet;
import org.apache.poi.ss.usermodel.Workbook;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;

public class ConvertExcelToCSV {

	public static void main(String[] args) {

		// Creating a inputFile object with specific file path
		File inputFile = new File("E:/Excel/employee.xlsx");

		// Creating a outputFile object to write excel data to csv
		File outputFile = new File("E:/Excel/employee.csv");

		// For storing data into CSV files
		StringBuffer data = new StringBuffer();

		try {
			// Creating input stream
			FileInputStream fis = new FileInputStream(inputFile);
			Workbook workbook = null;

			// Get the workbook object for Excel file based on file format
			if (inputFile.getName().endsWith(".xlsx")) {
				workbook = new XSSFWorkbook(fis);
			} else if (inputFile.getName().endsWith(".xls")) {
				workbook = new HSSFWorkbook(fis);
			} else {
				fis.close();
				throw new Exception("File not supported!");
			}

			// Get first sheet from the workbook
			Sheet sheet = workbook.getSheetAt(0);

			// Iterate through each rows from first sheet
			Iterator<Row> rowIterator = sheet.iterator();

			while (rowIterator.hasNext()) {
				Row row = rowIterator.next();
				// For each row, iterate through each columns
				Iterator<Cell> cellIterator = row.cellIterator();
				while (cellIterator.hasNext()) {

					Cell cell = cellIterator.next();

					switch (cell.getCellType()) {
					case BOOLEAN:
						data.append(cell.getBooleanCellValue() + ",");
						break;

					case NUMERIC:
						data.append(cell.getNumericCellValue() + ",");
						break;

					case STRING:
						data.append(cell.getStringCellValue() + ",");
						break;

					case BLANK:
						data.append("" + ",");
						break;

					default:
						data.append(cell + ",");
					}
				}
				// appending new line after each row
				data.append('\n');
			}

			FileOutputStream fos = new FileOutputStream(outputFile);
			fos.write(data.toString().getBytes());
			fos.close();

		} catch (Exception e) {
			e.printStackTrace();
		}

		System.out.println("Conversion of an Excel file to CSV file is done!");
	}

}

Advanced Excel to CSV file Converter

Now we will see some advanced concepts to converting or change Excel files to CSV files in Java. We will implement the following features to this converter:

  1. Where the Excel workbook contains more than one worksheet, then a single CSV file will contain the data from all of the worksheets.
  2. The data matrix contained in the CSV file will be square. This means that the number of fields in each record of the CSV file will match the number of cells in the longest row found in the Excel workbook. Any short records will be ‘padded’ with empty fields – an empty field is represented in the CSV file in this way – ,,.
  3. Empty fields will represent missing cells.
  4. A record consisting of empty fields will be used to represent an empty row in the Excel workbook.
  5. Support for embedded characters; the field separator, EOL, and double quotes or speech marks. In addition, giving the client the ability to select how these are handled, either obeying Excel’s or UNIX formatting conventions.

Let’s assume we have the following Excel file;

convert excel to csv java code

Then, the resulting CSV file will contain the following lines (records);

1,2,3,4,5
,,,,
,A,,B,
,,,,Z
“1,400”,,250,,

Typically, the comma is used to separate each of the fields that, together, constitute a single record or line within the CSV file. This is not however a hard and fast rule, this program allows the user to determine which character is used as the field separator.

If a field contains the separator then it will be escaped. If the file should obey Excel’s CSV formatting rules, then the field will be surrounded with speech marks whilst if it should obey UNIX conventions, each occurrence of the separator will be preceded by the backslash character.

If a field contains an end-of-the-line (EOL) character then it too will be escaped. If the file should obey Excel’s CSV formatting rules then the field will again be surrounded by speech marks. On the other hand, if the file should follow UNIX conventions then a single backslash will precede the EOL character. There is no single applicable standard for UNIX and some applications replace the CR with \r and the LF with \n but this converter will not do so.

If the field contains double quotes then that character will be escaped. It seems as though UNIX does not define a standard for this whilst Excel does. Should the CSV file have to obey Excel’s formatting rules then the speech mark character will be escaped with a second set of speech marks. Finally, an enclosing set of speech marks will also surround the entire field. Thus, if the following line of text appeared in a cell – “Hello” he said – it would look like this when converted into a field within a CSV file – “””Hello”” he said”.

Finally, it is worth noting that talk of CSV ‘standards’ is really slightly misleading as there is no such thing. It may well be that the code in this converter has to be modified to produce files to suit a specific application or requirement.

Excel to CSV file converter Java code

Let’s see the Java code for this converter program below:

AdvancedExcelToCSVConverter.java

package com.javacodepoint.excel;

import java.io.BufferedWriter;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import java.util.ArrayList;

import org.apache.poi.ss.usermodel.Cell;
import org.apache.poi.ss.usermodel.CellType;
import org.apache.poi.ss.usermodel.DataFormatter;
import org.apache.poi.ss.usermodel.FormulaEvaluator;
import org.apache.poi.ss.usermodel.Row;
import org.apache.poi.ss.usermodel.Sheet;
import org.apache.poi.ss.usermodel.Workbook;
import org.apache.poi.ss.usermodel.WorkbookFactory;

/**
 * 
 * @author javacodepoint.com
 *
 */
public class AdvancedExcelToCSVConverter {

	public static final int EXCEL_STYLE_ESCAPING = 0;
	public static final int UNIX_STYLE_ESCAPING = 1;
	private static final String DEFAULT_SEPARATOR = ",";
	private static final String CSV_FILE_EXTENSION = ".csv";
	private ArrayList<ArrayList<String>> csvData;
	private int maxRowWidth;
	
	/*
	 * This method convert the contents of the Excel workbook into CSV format and
	 * save the resulting file to the specified folder using the same name as the
	 * original workbook with the .xls or .xlsx extension replaced by .csv
	 */
	public void convertExcelToCSV(String strSource, String strDestination, String separator, int formattingConvention)
			throws FileNotFoundException, IOException, IllegalArgumentException {

		// Check that the source file exists.
		File sourceFile = new File(strSource);
		if (!sourceFile.exists()) {
			throw new IllegalArgumentException("The source Excel file cannot be found at " + sourceFile);
		}

		// Check that the destination folder exists to save the CSV file.
		File destination = new File(strDestination);
		if (!destination.exists()) {
			throw new IllegalArgumentException(
					"The destination directory " + destination + " for the " + "converted CSV file(s) does not exist.");
		}
		if (!destination.isDirectory()) {
			throw new IllegalArgumentException(
					"The destination " + destination + " for the CSV file is not a directory/folder.");
		}

		// Checking the value of formattingConvention parameter is within range.
		if (formattingConvention != ToCSV.EXCEL_STYLE_ESCAPING && formattingConvention != ToCSV.UNIX_STYLE_ESCAPING) {
			throw new IllegalArgumentException("The value passed to the "
					+ "formattingConvention parameter is out of range: " + formattingConvention + ", expecting one of "
					+ AdvancedExcelToCSVConverter.EXCEL_STYLE_ESCAPING + " or "
					+ AdvancedExcelToCSVConverter.UNIX_STYLE_ESCAPING);
		}
		FileInputStream fis = null;
		Workbook workbook = null;
		try {
			fis = new FileInputStream(sourceFile);
			System.out.println("Opening workbook [" + sourceFile.getName() + "]");
			workbook = WorkbookFactory.create(fis);

			// Convert it's contents into a CSV file
			convertToCSV(workbook);

			// Build the name of the csv folder from that of the Excel workbook.
			// Simply replace the .xls or .xlsx file extension with .csv
			String destinationFilename = sourceFile.getName();
			destinationFilename = destinationFilename.substring(0, destinationFilename.lastIndexOf('.'))
					+ CSV_FILE_EXTENSION;

			// Save the CSV file away using the newly constructed file name
			// and to the specified directory.
			saveCSVFile(new File(destination, destinationFilename), formattingConvention);

		} catch (Exception e) {
			System.out.println("Unexpected exception");
			e.printStackTrace();
		} finally {
			if (fis != null) {
				fis.close();
			}
			if (workbook != null) {
				workbook.close();
			}
		}
	}

	/*
	 * Checks to see whether the field - which consists of the formatted contents of
	 * an Excel worksheet cell encapsulated within a String - contains any embedded
	 * characters that must be escaped. The method is able to comply with either
	 * Excel's or UNIX formatting conventions in the following manner;
	 */
	private String escapeEmbeddedCharacters(String field, int formattingConvention) {

		StringBuilder buffer;

		// If the fields contents should be formatted to conform with Excel's
		// convention....
		if (AdvancedExcelToCSVConverter.EXCEL_STYLE_ESCAPING == formattingConvention) {

			// Firstly, check if there are any speech marks (") in the field;
			// each occurrence must be escaped with another set of speech marks
			// and then the entire field should be enclosed within another
			// set of speech marks. Thus, "Yes" he said would become
			// """Yes"" he said"
			if (field.contains("\"")) {
				buffer = new StringBuilder(field.replace("\"", "\\\"\\\""));
				buffer.insert(0, "\"");
				buffer.append("\"");
			} else {
				// If the field contains either embedded separator or EOL
				// characters, then escape the whole field by surrounding it
				// with speech marks.
				buffer = new StringBuilder(field);
				if ((buffer.indexOf(DEFAULT_SEPARATOR)) > -1 || (buffer.indexOf("\n")) > -1) {
					buffer.insert(0, "\"");
					buffer.append("\"");
				}
			}
			return buffer.toString().trim();
		}
		// The only other formatting convention this class obeys is the UNIX one
		// where any occurrence of the field separator or EOL character will
		// be escaped by preceding it with a backslash.
		else {
			if (field.contains(DEFAULT_SEPARATOR)) {
				field = field.replaceAll(DEFAULT_SEPARATOR, ("\\\\" + DEFAULT_SEPARATOR));
			}
			if (field.contains("\n")) {
				field = field.replace("\n", "\\\\\n");
			}
			return field;
		}
	}

	/*
	 * Called to convert the contents of the currently opened workbook into a CSV
	 * file.
	 */
	private void convertToCSV(Workbook workbook) {

		// Create the FormulaEvaluator and DataFormatter instances
		// that will be needed to, respectively,
		// force evaluation of formulae found in cells and create a
		// formatted String encapsulating the cells contents.
		FormulaEvaluator evaluator = workbook.getCreationHelper().createFormulaEvaluator();
		DataFormatter formatter = new DataFormatter(true);

		// Initialize csvData with empty
		this.csvData = new ArrayList<>();

		System.out.println("Converting files contents to CSV format.");

		// Discover how many sheets there are in the workbook and then iterate through
		// them.
		int numSheets = workbook.getNumberOfSheets();

		for (int i = 0; i < numSheets; i++) {

			// Get a reference to a sheet and check to see if it contains
			// any rows.
			Sheet sheet = workbook.getSheetAt(i);
			if (sheet.getPhysicalNumberOfRows() > 0) {
				// Note down the index number of the bottom-most row and
				// then iterate through all of the rows on the sheet starting
				// from the very first row - number 1 - even if it is missing.
				// Recover a reference to the row and then call another method
				// which will strip the data from the cells and build lines
				// for inclusion in the restyling CSV file.
				int lastRowNum = sheet.getLastRowNum();
				for (int j = 0; j <= lastRowNum; j++) {
					Row row = sheet.getRow(j);
					// this.rowToCSV(row);
					ArrayList<String> csvLine = new ArrayList<>();

					// Get the index for the right most cell on the row and then
					// step along the row from left to right recovering the contents
					// of each cell, converting that into a formatted String and
					// then storing the String into the csvLine ArrayList.
					int lastCellNum = row.getLastCellNum();
					for (int k = 0; k <= lastCellNum; k++) {
						Cell cell = row.getCell(k);
						if (cell == null) {
							csvLine.add("");
						} else {
							if (cell.getCellType() != CellType.FORMULA) {
								csvLine.add(formatter.formatCellValue(cell));
							} else {
								csvLine.add(formatter.formatCellValue(cell, evaluator));
							}
						}
					}
					// Make a note of the index number of the right most cell. This value
					// will later be used to ensure that the matrix of data in the CSV file
					// is square.
					if (lastCellNum > this.maxRowWidth) {
						this.maxRowWidth = lastCellNum;
					}

					this.csvData.add(csvLine);
				}
			}
		}
	}

	/*
	 * Called to actually save the data recovered from the Excel workbook as a CSV
	 * file.
	 */
	private void saveCSVFile(File file, int formattingConvention) throws IOException {
		ArrayList<String> line;
		StringBuilder buffer;
		String csvLineElement;

		// Open a writer onto the CSV file.
		try (BufferedWriter bw = Files.newBufferedWriter(file.toPath(), StandardCharsets.ISO_8859_1)) {

			System.out.println("Saving the CSV file [" + file.getName() + "]");

			// Step through the elements of the ArrayList that was used to hold
			// all of the data recovered from the Excel workbooks' sheets, rows
			// and cells.
			for (int i = 0; i < this.csvData.size(); i++) {
				buffer = new StringBuilder();

				// Get an element from the ArrayList that contains the data for
				// the workbook. 
				line = this.csvData.get(i);
				for (int j = 0; j < this.maxRowWidth; j++) {
					if (line.size() > j) {
						csvLineElement = line.get(j);
						if (csvLineElement != null) {
							buffer.append(escapeEmbeddedCharacters(csvLineElement, formattingConvention));
						}
					}
					if (j < (this.maxRowWidth - 1)) {
						buffer.append(DEFAULT_SEPARATOR);
					}
				}

				// Once the line is built, write it away to the CSV file.
				bw.write(buffer.toString().trim());

				// Condition the inclusion of new line characters so as to
				// avoid an additional, superfluous, new line at the end of
				// the file.
				if (i < (this.csvData.size() - 1)) {
					bw.newLine();
				}
			}
		}
	}

	/*
	 * Testing the Excel to CSV converter program using the main method
	 */
	public static void main(String[] args) {
		long startTime = System.currentTimeMillis();
		boolean converted = true;
		try {
			AdvancedExcelToCSVConverter converter = new AdvancedExcelToCSVConverter();
			String strSource = "E:/Excel/employee.xlsx";
			String strDestination = "E:/Excel/";
			converter.convertExcelToCSV(strSource, strDestination, DEFAULT_SEPARATOR, EXCEL_STYLE_ESCAPING);
		} catch (Exception e) {
			System.out.println("Unexpected exception");
			e.printStackTrace();
			converted = false;
		}

		if (converted) {
			System.out.println("Conversion took " + ((System.currentTimeMillis() - startTime) / 1000) + " seconds");
		}
	}

}

Convert multiple Excel to CSV

In some scenarios, you may have the requirements like converting multiple Excel files into a single CSV file. Let’s see how you can do it easily with some modifications in the above examples.

You just assume your all excel files are available in a single directory or folder. Now see the following java code to read all the excel files:

	 // Check to see if the sourceFolder variable holds a reference to
     // a folder of excel files.
     final File[] filesList;
     if(source.isDirectory()) {
         // Get a list of all of the Excel spreadsheet files (workbooks) in
         // the source folder/directory
         filesList = source.listFiles(new ExcelFilenameFilter());
     }
	 
	 for(File excelFile : filesList) {
		//open the workbook
		//read csv data from workbook
		//save the data to csv file
	 }
	 
	 
	 /**
	  * An instance of this class can be used to control the files returned
	  * be a call to the listFiles() method when made on an instance of the
	  * File class and that object refers to a folder/directory
	  */
	 class ExcelFilenameFilter implements FilenameFilter {
		@Override
		 public boolean accept(File file, String name) {
			 return(name.endsWith(".xls") || name.endsWith(".xlsx"));
		 }
	 }

Conclusion

In this article, you have seen how you convert excel files to CSV files using the Apache POI library in java. You have also seen some advanced features to change the XLS or XLSX file to a CSV file.

You can download the maven application of this converter program from the end of this article.

FAQ

What is the delimiter in a CSV file?

A comma-separated values (CSV) file is a delimited text file that uses a comma to separate values. Each line of the file is a data record. Each record consists of one or more fields, separated by commas.

How to convert excel to CSV with delimiter?

The data present in the CSV file must be separated with delimiters such as commas, semicolons, etc. In this article, we have explained how you convert Excel to CSV with a comma(,) delimiter.

How to convert an excel file to a text file with comma-delimited?

You can refer to this article, the logic will be the same only you have to do here is save the data in a file with a .txt extension instead of .csv.

Related Articles:

You may also like:


Share with friends

1 thought on “Convert Excel file to CSV with Apache POI”

Leave a Comment

Your email address will not be published. Required fields are marked *