Software such as SuperCalc?
, dBase II, Excel, and many other RegisteredTrademarks?
have had the ability to emit TabDelimitedTables
for years. Often, these tables have a header which names the field in the column below it. This facilitates data interchange between various pieces of software.
Rod Manis, Evan Schaffer, and Robert Jørgensen have written a book titled Unix Relational Database Management
. ISBN 013938622x
They describe a suite of tools which they collectively call /rdb
If it uses something other than tab for the field separator, it's more generally a DelimiterSeparatedValues
Their tools operate on TabDelimitedTables
, in a Unix shell environment. They interoperate with other Unix tools. I found it just as I was learning the Bourne Shell. Their suite operates on tab delimited tables with "headlines" (see below).
- Without indexing, some operations, such as joins, are very difficult to do. I was *forced* to use a tool like that once, and I almost went insane trying to work around it.
The table object has been quite useful to me, and to others I work with - including the computers I interact with. With tables of named information, both the computer and I can do operations on named columns, such as sort or subtotal.
With the promise of XML, humans and machines will be able to refer to the same sorts of information. If a computer were to join StateAbbreviations
on the ST field, it could name the capital of the state with area code 513. Can you?
Headlines are bits of meta data included within the result of the query.
In /rdb, from RevolutionarySoftware?
, a head line consists of a tab delimited list of fields followed by a tab delimited string of hyphens underlining each field.
In /RDB, Hobbes replaces the hyphens with an N for a numeric field, and with M for a field with a month.
The name of the fields in the headlines simplify pipelines, by allowing a field to be referred to by name. Imagine a table called checkbook with a headline of checknumber date payee amount withholding memo
Dataflow for making W2 forms might be about as simple as
cat checkbook | sorttable payee | subtotal -l payee amount withholding
or simpler, sorttable payee < checkbook | subtotal -l payee amount withholding
The beauty of this suite of tools is that the beginning and end of almost every pipeline are TabDelimitedTables
Tabs tend to be problematic characters for cross-system data sharing. Sometimes they end up getting converted into spaces for various reasons. Thus, CommaSeparatedValues
tends to be more popular because they rely purely on WYSIWYG characters.
Note: If a user puts a comma in a single field, put quotes around that field. If the user wants to put quotes in a single field, escape them with a backslash, or double the quotes character.
That's a weird mashup of both DelimiterSeparatedValues
quoting rules and I haven't seen the program that handles both conventions in the same file.
Precisely because tab is *not* a WYSIWYG character, it's useful as an out-of-band character to indicate "next field", while allowing *all* WYSIWYG characters to be used in any field without any weird escape codes.
That is true, but its downsides override that in my opinion.
allocates four control characters (28-31) for record structuring:
- FS - File Separator
- GS - Group Separator
- RS - Record Separator
- US - Unit Separator
It is a shame that these aren't used more widely for their intended purpose.
Probably because it defeats the purpose of being "ASCII", which usually implies it is human-viewable in a generic editor. Using those would make it not much different than a "binary format".
Only because the ASCII separator codes never caught on for general use, unlike other standard ASCII control codes used in text files, such as TAB, CR, and LF.
In some of the seminal texts about UnixProgramming?
and the MakeProgram
, the concept that the tab is difficult to spot among spaces has been mentioned by some as a design flaw [SyntacticallySignificantWhitespaceConsideredHarmful
Perhaps it is okay for same-program usage, but not for cross-system.
See also: RelationalAlternativeToXml