CSV vs. Delimited Flat Files: How to Choose | ThoughtSpot

If you work with data, eventually you're going to want to move it from one database to another. No big deal, right? You use the native table exporter to produce a nifty text file. Then you use the target database's loader, and the data is magically reconstituted. 

But when you run a report, the data is unrecognizable, with bizarre characters, transposed fields, and disappearing NULL values. "Why is this not easier?" you wail. 

I can't remove all of your pain in one blog post, but I'll start with the basics and build from here in future posts. The simplest concept in data loading is the mighty flat file. 

Understanding flat files in depth is the first step to mastering data loading. Flat files are the universal mechanism for moving data from one database or system to another. There are two common types of flat files: CSV (comma separated values) and delimited files. Both are file formats that represent relational data in a text file. 

What's the difference between CSV and delimited files? 

CSV files include both a delimiter and an optional enclosing character. 

  • A delimiter separates the data fields. It is usually a comma, but can also be a pipe, a tab, or any single value character. 

  • An enclosing character occurs at the beginning and the end of a value. It is sometimes called a quote character (because it is usually double quotes), but you can use another character instead.

Delimited files only have a delimiter; an enclosing character is not used. 

In practice, I've found that CSV is the only format that will guarantee seamless data movement between systems. But there are other instances where delimited files make sense, like when you're more comfortable working with this type of file and the delimiter itself does not occur within the data.

Simply put, there's a time and place to use each. We'll dive deeper into best practices for creating and manipulating the two file formats over the next few weeks. Stay tuned!