Comma-delimited text, or comma separated variables/values (CSV), is a popular format for storing tabular record data in TextFile
?s. Most spreadsheets and databases support import and export to this format.
Compared to
ExtensibleMarkupLanguage, CSV is more compact and simpler to parse.
However, XML can do a few things CSV can't handle:
- There is no way to specify encoding.
- There is no easy (visual) way to specify hierarchy. (Although hierarchy is often not needed -- LimitsOfHierarchies)
- No standard for specifying the schema (but one could include "headlines").
Compared to
TabDelimitedTables, CSV is ...
CSV has the
PowerOfPlainText compared to binary formats.
Can't CSV files really be something other than comma separated? From a microsoft web page:
If a user selects English (United States), the decimal symbol is a period (for example, 3.14). If a user selects German (Germany), the decimal symbol is a comma (for example, 3,14). Similarly, the list separator character used in .csv files is a comma (,) in the United States but a semicolon (;) in Germany.
''Someone once wrote:"
- Commas themselves must be escaped when used in field data.
Huh? Yes, if I need to put a string into a CSV file, if a string has one or more commas anywhere in it, the string must be quoted -- but the quotes go once around the entire string. There's no need to escape each individual comma. (In XML, each and every less-than sign in the data must be escaped with <).
- So then any field containing both a comma and a quotation mark must be surrounded by quotation marks, and the quotation marks in the field must be escaped. You can't escape some form of escaping ...
From what I've seen, typically CSV strings are encoded using C-style escape characters.
CSV strings are also frequently encoded using two quotation marks to represent one inside the string.
Also newlines in a quoted string MAY be sometimes converted to C escapes or left unconverted.
It's also common for quotation marks within quotation marks to be unquoted, this is normally considered a bug but tends not to get fixed.
RFC4180 recommends quoting fields containing double-quote, comma and/or newline, and doubling double-quotes within the fields. According to RFC4180, there's nothing special about backslash.
See also:
TabDelimitedTables,
ExtensibleMarkupLanguage,
RelationalAlternativeToXml