Modern CSV



CSV

Modern caves

Modern CSV is an editor for tabular data. It makes up for the weaknesses of spreadsheet programs in handling CSV files while incorporating the strengths of the best text editors. It features include: powerful editing features that work on multiple cells/row/columns simultaneously, customizable keyboard shortcuts, large file handling,. The ubiquitous open data exchange format is CSV. Excel and LibreOffice Calc are capable to read and save CSV data, but they reach their limits very fast - mostly when dealing with big amounts of data.

-->

March 2016

Volume 31 Number 3

[Modern Apps]

  • Modern CSV (De)Serializer. This code hasn't seen much production use, please use with caution and report any issues you encounter. Consult The Wiki™ for documentation, and Cesil's Github Pages for references.
  • A modern CV doesn’t have to be flashy. All it needs to do, is look professional, be easy-to-read, and show employers that you have the right skill and knowledge to perform in the roles you are applying for. Focus on getting those basics right, and you will be able to create an effective CV that wins you plenty of interviews.

By Frank La

Parsing a Comma Separated Value (CSV) file sounds easy enough at first. Quickly, however, the task becomes more and more intri­cate as the pain points of CSV files become clear. If you’re not familiar with the format, CSV files store data in plain text. Each line in the file constitutes a record. Each record has its fields typically delineated by a comma, hence the name.

Developers today enjoy standards among data exchange formats. The CSV file “format” harkens to an earlier time in the software industry before JSON, before XML. While there’s a Request for Comments (RFC) for CSV files (bit.ly/1NsQlvw), it doesn’t enjoy official status. Additionally, it was created in 2005, decades after CSV files started to appear in the 1970s. As a result, there exists quite a bit of variation to CSV files and the rules are a bit murky. For instance, a CSV file might have fields separated by tabs, a semicolon or any character.

In practical terms, the Excel implementation of CSV import and export has become the de-facto standard and is seen most widely in the industry, even outside the Microsoft ecosystem. Free 3d house design software for mac. Accordingly, the assumptions I make in this article about what constitutes “correct” parsing and formatting will be based upon how Excel imports/exports CSV files. While most CSV files will fall in line with the Excel implementation, not every file will. Toward the end of this column, I introduce a strategy to handle such uncertainty.

A fair question to ask is, “Why even write a parser for a decades-old quasi-format in a very new platform?” The answer is simple: Many organizations have legacy data systems. Thanks to the file format’s long life span, nearly all of these legacy data systems can export to CSV. Furthermore, it costs very little in terms of time and effort to export data to CSV. Accordingly, there are plenty of CSV-formatted files in larger enterprise and government data sets.

Designing an All-Purpose CSV Parser

Despite the lack of an official standard, CSV files typically share some common traits. Bluebeam revu mac download.

Generally speaking, CSV files: are plain text, contain one record per line, have records in each line separated by a delimiter, have one-character delimiters and present fields in the same order.

Modern CSV

These common traits outline a general algorithm, which would consist of three steps:

  1. Split a string along the line delimiter.
  2. Split each line along the field delimiter.
  3. Assign each field value to a variable.

This would be fairly easy to implement. The code in Figure 1 parses the CSV input string into a List<Dictionary<string, string>>.

Figure 1 Parsing the CSV Input String into List<Dictionary<string,string>>

This approach works great using an example like the following office divisions and their sales numbers:

To retrieve values from the string, you would iterate through the List and pull out values in the Dictionary using the zero-based field index. Retrieving the office division field, for example, would be as simple as this:

While this works, the code isn’t as readable as it could be.

A Better Dictionary

Many CSV files include a header row for the field name. The parser would be easier for developers to consume if it used the field name as a key for the dictionary. As any given CSV file might not have a header row, you should add a property to convey this information:

For instance, a sample CSV file with a header row might look something like this:

Cvs

Ideally, the CSV parser would be able to take advantage of this piece of metadata. This would make the code more readable. Retrieving the office division field would look something like this:

Blank Fields

Blank fields occur commonly in data sets. In CSV files, a blank field is represented by having an empty field in a record. The delimiter is still required. For example, if there were no Employee data for the East office, the record would look like this:

If there were no Unit Sales data, as well as no Employee data, the record would look like this:

Every organization has its own -President Roosevelt'Logic will get you from A to B. Imagination will take you everywhere.' -Albert Einstein

The data in Figure 3 would be represented in CSV as this:

Quote

It might be clearer now that the field is wrapped in quotation marks and that the individual quotation marks in the field’s content are doubled up.

Edge Cases

As I mentioned in the opening section, not all files will adhere to the Excel implementation of CSV. The lack of a true specification for CSV makes it difficult to write one parser to handle every CSV file in existence. Edge cases will most certainly exist and that means the code has to leave a door open to interpretation and customization.

Inversion of Control to the Rescue

Given the CSV format’s hazy standard, it’s not practical to write a comprehensive parser for all imaginable cases. It might be more ideal to write a parser to suit a particular need of an app. Using Inversion of Control lets you customize a parsing engine for a particular need.

To accomplish this, I’ll create an interface to outline the two core functions of parsing: extracting records and extracting fields. I decided to make the IParserEngine interface asynchronous. This makes sure any app using this component will remain responsive no matter how large the CSV file is:

Then I add the following property to the CSVParser class:

I then offer developers a choice: use the default parser or inject their own. To make it simple, I’ll overload the constructor:

The CSVParser class now provides the basic infrastructure, but the actual parsing logic is contained within the IParserEngine interface. For convenience of developers, I created the DefaultParserEngine, which can process most CSV files. I took into account the most likely scenarios developers will encounter.

Reader Challenge

I have taken into account the bulk of scenarios developers will encounter with CSV files. However, the indefinite nature of the CSV format makes creating a universal parser for all cases impractical. Factoring all the variations and edge cases would add significant cost and complexity of the cost along with impacting performance.

I’m certain that there are CSV files out “in the wild” that the DefaultParserEngine will not be able to handle. This is what makes the dependency injection pattern a great fit. If developers have a need for a parser that can handle an extreme edge case or write something more performant, they certainly are welcome to do so. Parser engines could be swapped out with no changes to the consuming code.

The code for this project is available at bit.ly/1To1IVI.

Modern Csv Review

Wrapping Up

CSV files are a leftover from days gone by and, despite the best efforts of XML and JSON, are still a commonly used data exchange format. CSV files lack a common specification or standard and, while they often have common traits, are not certain to be in place in any given file. This makes parsing a CSV file a non-trivial exercise.

Given a choice, most developers would probably exclude CSV files from their solutions. However, their widespread presence in legacy enterprise and government data sets may preclude that as an option in many scenarios.

Simply put, there is a need for a CSV parser for Universal Windows Platform (UWP) apps and a real-world CSV parser has to be flexible and robust. Along the way, I demonstrated here a practical use for dependency injection to provide that flexibility. While this column and its associated code target UWP apps, the concept and code apply to other platforms capable of running C#, such as Microsoft Azure or Windows desktop development.

Frank La Vigneis a technology evangelist on the Microsoft Technology and Civic Engagement team, where he helps users leverage technology in order to create a better community. He blogs regularly at FranksWorld.com and has a YouTube channel called Frank’s World TV (youtube.com/FranksWorldTV).

Download Free Csv File Reader

Thanks to the following technical expert for reviewing this article: Rachel Appel