Overview
CSV is the simplest and most universally supported format for tabular data exchange. A CSV file represents a table as plain text: each line is a row, and values within each row are separated by commas (or sometimes semicolons, tabs, or other delimiters depending on regional conventions). The first line often serves as a header row naming the columns. This extreme simplicity means CSV files can be opened in any spreadsheet application, parsed by any programming language, imported into any database, and processed by any ETL pipeline.
CSV owes its longevity to its transparency. There is no binary encoding, no compression, no schema — just text that a human can read in a terminal window. A developer can write a CSV parser in a few lines of code, and debugging data issues is as simple as opening the file in a text editor. This accessibility makes CSV the lowest-common-denominator format for data interchange between systems that may have nothing else in common.
Despite its apparent simplicity, CSV has subtleties that catch the unwary. Values containing commas, newlines, or quotation marks must be enclosed in double quotes, and literal double quotes within quoted values must be escaped by doubling them. There is no standard way to represent data types (numbers, dates, and booleans are all plain text), no encoding declaration, and no way to embed metadata about the dataset. RFC 4180, the closest thing to a formal standard, defines the most common conventions but is informational rather than normative.
History
Comma-separated data formats predate personal computing. IBM Fortran supported list-directed I/O with comma-separated values as early as 1972. The concept was natural and obvious: write values with a delimiter between them. As personal computers and spreadsheet software (VisiCalc in 1979, Lotus 1-2-3 in 1983, Excel in 1985) became widespread, CSV emerged as the standard way to import and export tabular data between applications.
The IETF published RFC 4180 (Common Format and MIME Type for Comma-Separated Values) in 2005, codifying the most widely used conventions: CRLF line endings, optional header row, double-quote escaping, and the text/csv MIME type. Despite this, CSV remains a convention rather than a strict standard, and variations abound: European systems often use semicolons as delimiters (because the comma is the decimal separator in many European locales), tab-separated values (TSV) are common in scientific contexts, and some systems use pipe characters or other delimiters.
Technical Details
A CSV file is plain text (typically UTF-8 or ASCII) with records delimited by line breaks and fields delimited by a separator character (most commonly a comma). Per RFC 4180: fields containing the delimiter, double quotes, or line breaks must be enclosed in double quotes; double quotes within a quoted field are escaped by doubling them; each record should have the same number of fields; and the optional header record has the same format as data records.
There is no type system — every value is a string that the consumer must parse. Dates are notoriously problematic because different producers use different formats (2024-01-15, 01/15/2024, 15-Jan-2024). Numbers may use dots or commas as decimal separators depending on locale. Empty fields are represented by consecutive delimiters or empty quoted strings. Most parsers handle common variations gracefully, but edge cases around encoding, newlines within quoted fields, and trailing delimiters continue to cause interoperability issues.
Pros & Cons
Pros
- Universal support — every spreadsheet, database, and programming language handles CSV
- Human-readable plain text that can be inspected and edited in any text editor
- Minimal overhead — no metadata, headers, or encoding beyond the raw data
- Trivial to generate programmatically with simple string concatenation
- Extremely compact for simple tabular data compared to XML or JSON alternatives
Cons
- No data type information — everything is a string that consumers must interpret
- No standard encoding declaration, leading to character encoding issues
- Quoting and escaping edge cases cause frequent parsing failures
- Cannot represent hierarchical or nested data structures
- No way to include metadata, schemas, or documentation within the file
Common Use Cases
- Exporting and importing data between spreadsheet applications and databases
- Providing bulk data downloads from government open-data portals
- Feeding data into ETL pipelines for transformation and warehouse loading
- Exchanging financial transaction records between banking and accounting systems
- Sharing scientific datasets and experimental results in academic research
- Migrating contact lists, product catalogs, and CRM records between platforms