Yaml Aint Markup Language

http://www.yaml.org/ and a wiki at http://yaml.kwiki.org/

From yaml.org: YAML(tm) (rhymes with "camel") is a straightforward machine parsable data serialization format designed for human readability and interaction with scripting languages such as Perl and Python.

YAML has a lot of the same appeal for me as Wiki does. Comparing YAML to XML (ExtensibleMarkupLanguage) is roughly like comparing the WikiWiki MarkupLanguage (see TextFormattingRules) to HTML.

Contributors: SteveHowell


Simple Example:
 ---
 language: YAML
 acronymn stands for: Yaml Aint Markup Language
 acronym doesn't stand for: Yet Another Markup Language
 features: 
  - documents are very readable by humans.
  - interacts well with scripting languages.
  - uses host languages' native data structures.
  - has a consistent information model.
  - enables stream-based processing.
  - is expressive and extensible.
  - is easy to implement.

Which, in the perl implementation, turns into this (as stringified by Data::Dumper):

 $VAR1 = {
          'acronym doesn\'t stand for' => 'Yet Another Markup Language',
          'language' => 'YAML',
          'acronymn stands for' => 'Yaml Aint Markup Language',
          'features' => [
                          'documents are very readable by humans.',
                          'interacts well with scripting languages.',
                          'uses host languages\' native data structures.',
                          'has a consistent information model.',
                          'enables stream-based processing.',
                          'is expressive and extensible.',
                          'is easy to implement.'
                        ]
        };

And in Python:

 {"acronym doesn't stand for": 'Yet Another Markup Language',
  'acronymn stands for': 'Yaml Aint Markup Language',
  'features': ['documents are very readable by humans',
               'interacts well with scripting languages',
               "uses host languages' native data structures",
               'has a consistent information model',
               'enables stream-based processing',
               'is expressive and extensible',
               'is easy to implement'],
  'language': 'YAML'}

More Full-Featured Example:
 --- 
 invoice: 34843 
 date   : '2001/01/23'
 bill-to: &id001
  first name: Chris
  last name : Dumars
  address:
   lines: |
    458 Walkman Dr.
    Suite #292
   city    : Royal Oak
   state   : MI
   postal  : 48046
 ship-to: *id001
 products:
  - sku         : BL394D
    quantity    : 4
    description : Basketball
    price       : 45.00
  - sku         : BL4438H
    quantity    : 1
    description : Super Hoop
    price       : 2392.00
 tax  : 251.42
 total: 4443.52
 note: >
  Late afternoon is best.
  Backup contact is Nancy
  Billsmer @ 338-4338.

Which, in the perl implementation, turns into this (as stringified by Data::Dumper):

 $VAR1 = {
          'tax' => '251.42',
          'invoice' => 34843,
          'total' => '4443.52',
          'bill-to' => {
                         'address' => {
                                        'state' => 'MI',
                                        'postal' => 48046,
                                        'city' => 'Royal Oak',
                                        'lines' => '458 Walkman Dr.
 Suite #292
 '
                                      },
                         'first name' => 'Chris',
                         'last name' => 'Dumars'
                       },
          'date' => '2001/01/23',
          'ship-to' => $VAR1->{'bill-to'},
          'note' => 'Late afternoon is best. Backup contact is Nancy Billsmer @ 338-4338.',
          'products' => [
                          {
                            'description' => 'Basketball',
                            'price' => '45',
                            'quantity' => 4,
                            'sku' => 'BL394D'
                          },
                          {
                            'description' => 'Super Hoop',
                            'price' => '2392',
                            'quantity' => 1,
                            'sku' => 'BL4438H'
                          }
                        ]
        };

And in Python:

 {'bill-to': {'address': {'city': 'Royal Oak',
                          'lines': '458 Walkman Dr.\nSuite #292\n',
                          'postal': 48046,
                          'state': 'MI'},
              'family': 'Dumars',
              'given': 'Chris'},
  'comments': 'Late afternoon is best. Backup contact is Nancy Billsmer @ 338-4338.\n',
  'date': yaml.timestamp('2001-01-23T00:00:00.00Z'),
  'invoice': 34843,
  'product': [{'description': 'Basketball',
               'price': 450.0,
               'quantity': 4,
               'sku': 'BL394D'},
              {'description': 'Super Hoop',
               'price': 2392.0,
               'quantity': 1,
               'sku': 'BL4438H'}],
  'ship-to': {'address': {'city': 'Royal Oak',
                          'lines': '458 Walkman Dr.\nSuite #292\n',
                          'postal': 48046,
                          'state': 'MI'},
              'family': 'Dumars',
              'given': 'Chris'},
  'tax': 251.41999999999999,
  'total': 4443.5200000000004}


YamlAintMarkupLanguage QuickQuestions

Q: For YAML to function as an alternative to ExtensibleMarkupLanguage, it needs conversions to/from ExtensibleMarkupLanguage. Does that exist for PythonLanguage? Also in an earlier IBM article it was mentioned YAML spec did not have good namespace representation (see http://www-106.ibm.com/developerworks/library/x-matters23.html). Would this and other "unfinished" area prove to be ShowStoppers for using YAML in real-life enterprises where web services are processed (supplied and consumed)?

A: YAML isn't necessarily meant as a drop-in replacement for each and every instance of XML. It's primarily meant as an alternative for those instances where XmlSucks and readability is considered important - you can avoid all the angle brackets without RollingYourOwnDataFormat?. XML bindings are still in draft: http://yaml.org/xml.html

YAML 1.1 does have shortcuts for specifying namespace-like collections of tags. Unlike XML, this is done without complicating the InfoSet; it's just SyntacticSugar.

Q: For WindowsOperatingSystems users, are there freeware tools to create YAML structures, and to import/export these to XML? How about tools to process YAML in manners similar to XSLT, DOM and SAX? And what additional requirements (e.g. Python xxx) do these tools have?

A: For creating YAML structures: TextEditors! Or marshaling from a YAML-enabled language. For parsing you have to rely on the capacities of your favorite language. Support is built into RubyLanguage 1.8. There are also libraries (check GoogleSearch since they're evolving) for PythonLanguage, PhpLanguage, JavaScript, PerlLanguage and JavaLanguage. In C there's libyaml, though it seems to be targeted more at ScriptingLanguages than CeeLanguage itself.
This looks like what I have thought XML may as well evolve to but how do you specify schemas?

XML can't evolve to become simpler without breaking BackwardCompatibility? (and YAML is considerably simpler than XML). How you specify schemas is a work in progress. Note that none of the other AlternativesToXml have schemas yet, either.

For instance "postal" field above is it left to functions in the host language to test if only numeric characters length 5 are valid?

At the moment, yes, but that's not difficult.

Or is there some way to specify metadata in YAML itself? (It does not seem clear from yaml.org either).

-- MartinZarate

[MarcosEliziario] Most of time, an XML file is the serialized form of some object coming from an application, and thus, it has its metadata already defined in the classes that make up these objects(at least on statically typed languages). But even when it's not the case, as it is when you want to share a piece of data accross diferent systems written in different language, It's easy to see that as soon as the needs arise, ways of describing metadata will be designed for YAML, and with time one or more standars will solidify from use. Once again, I think that YAML simpler sintax will lead also to more readable metadata definition languages.

[Another person] Martin your viewpoint appear to be supported by a KuroShin story at http://www.kuro5hin.org/story/2004/10/29/14225/062 . Having said that, there are strong views expressed in comments responding to that post, not all for YAML over XML


Is the use of SyntacticallySignificantWhitespace mandatory (or highly desirable) in some YAML usage scenarios? Why should this not be generated/compressed as required using a simple utility? SyntacticallySignificantWhitespace is not mandatory. YAML, at least in its current versions, is largely JSON compatible (JSON values are YAML values). This provides access to sequences between [] and maps between {} and strings inside "". However, the use of SyntacticallySignificantWhitespace is desirable for text editing. As far as: "Why should this not be generated/compressed as required?", it is (typically) generated/compressed as required, or at least access to that sort of feature is provided in any YAML serialization library compatible with the model described on the YAML webpage.

But the problem is that you cannot control what other people give you, but only what you give them. Thus, if a vendor gives you white-space-sensitive markup because you are a one-to-many recipient, then you risk TabMunging anytime an editor touches the code. I would prefer built-in white-space insensitivity.

If a vendor gives you white-space-sensitive markup, you're free to convert it however you wish while editing the code. As far as 'TabMunging' in particular goes, since you are so hung up on that: YAML simply doesn't allow tabs for indentation. Thus, there is no risk of TabMunging.

But suppose you go in with an editor and hand edit a few items, and put in spaces inadvertently (which is easy to do). Sure, the parser may detect the problem, but not fix it.

I'm not certain how this problem you describe is specific to whitespace. I can mess up almost any file by inadvertently inserting characters, and it isn't as though spaces at the beginning of a line are invisible. The one exception is the use of empty lines. I'm not certain how often empty lines are used by others, but I do know that they require special treatment in most other data transport languages as well.

"Real" characters are usually easy to spot appearing/changing on your screen. White-space chars are not. (I suppose this is degenerating into a standard white-space fight as found in existing topics such as SyntacticallySignificantWhitespaceConsideredHarmful.)

I already anticipated and responded to that complaint... as you'd know if you bothered reading my response. White-space characters are easy to see at the beginning of any line except an empty line. If you really need them to be visible, of course, you can use a text editor that marks whitespace.

That is wrong. If a tab is displayed as 4 spaces in a given editor and you type 4 actual spaces in its place, they are indistinguishable. And if they are close, such as 3 or 5, one may not notice for an indented portion or where say parenthesis are compared to capital letters. I've been through that dance already. Sure, an editor with the right settings may make it easier to spot, but that is a work-around to what is a poor design decision (in my opinion). You can't just grab plane-jane Wordpad or the like, you have to have your Special Editor around and set properly. And if you've ever done consulting work, the client site does not always let you install your own software without special permission.

It isn't unusual that working with 'plane-jane' tools requires you take a little extra care. But unless you happen to be a moronic monkey who blindly pounds away at the space and tab keys (even knowing that YAML doesn't use tabs for indentation), I really don't see how this is a problem of sufficient significance to warrant special attention. As to whether this is a "poor design decision", you aren't examining both sides of the equation. A "real" engineer would look at the Cons AND the Pros before declaring it a "poor design decision".

When one is switching between many different tools, systems, languages, and files in the course of a high-pressure work-day; the chance of accidents is reasonably high such that I'd just rather avoid tab-reliance to begin with. Let's not risk PersonalChoiceElevatedToMoralImperative here. Given a choice, I'd rather not have it available. If your preference is different, so be it. Let's not degenerate into calling each other "monkeys".

RE: "the chance of accidents is reasonably high" - Maybe so. But the idea that SyntacticallySignificantWhitespace related accidents is statistically greater than accidents related to the alternatives (such as forgetting to terminate a block or a quote) is unsupported. You don't have sufficient reason to believe that avoiding SyntacticallySignificantWhitespace will actually help you reduce accidents. And, therefore, your arguments in that vein are fallacious.

RE: "I'd just rather avoid tab-reliance to begin with" - YAML doesn't rely on tabs.

RE: "Let's not risk PersonalChoiceElevatedToMoralImperative here. Given a choice, I'd rather not have it available" - Technically, making the alternatives to one's personal choice become 'unavailable' is a fine example of PersonalChoiceElevatedToMoralImperative. Allowing both options, as is done by YAML, is the solution you should favor if your engineering aim is to avoid PersonalChoiceElevatedToMoralImperative.

I don't favor or disfavor SyntacticallySignificantWhitespace. There are reasons to not support both (such as LanguageIdiomClutter). But this idea you have that you can just 'avoid accidents' by avoiding SyntacticallySignificantWhitespace is utter nonsense; you'll just trade one class of accidents for another. And your "high-pressure work day" won't be any easier on these other accidents than they are on whitespace related ones.

I'd rather have WYSIWYG accidents over ghostly accidents. Anyhow, this debate is going nowhere.


And is Syck, fast YAML serialization, works equally well in PhpLanguage as it does for the PythonLanguage?


See also InfoSet, JavaScriptObjectNotation.
OctoberZeroEight

CategoryInformation

EditText of this page (last edited October 9, 2008) or FindPage with title or text search