Before you go, check out these stories!

0
Hackernoon logoConsistently bad parsing of YAML by@jstoiko

Consistently bad parsing of YAML

Author profile picture

@jstoikoJonathan Stoikovitch

Parsers are not easy to get right. libyaml, the reference parser for YAML, does most things right. However, thereโ€™s one little thing that it does wrong but since it does everything else sooo right, this little thing has been ignored. Even worse, other parser implementations have been doing it knowingly wrong because โ€œthatโ€™s how libyaml does itโ€. There is hope though, keep reading.

Found unexpected โ€˜:โ€™ 

If youโ€™re parsing YAMLโ€Šโ€”โ€Šand chances are you are one way or another โ€”, you may have stumbled upon this error while parsing things like:

(i.e. flow sequences)

or:

location: {url: https://medium.com}

(i.e. flow mappings)

even though the YAML specs explicitly say itโ€™s valid:

Normally, YAML insists the โ€œ:โ€ mapping value indicator be separated from the value by white space. A benefit of this restriction is that the โ€œ:โ€ character can be used inside plain scalars, as long as it is not followed by white space. This allows for unquoted URLs and timestamps.

Whatโ€™s happening?

This is because your YAML parser for <insert_your_language_here> either relies on libyaml (loading it as shared library and providing bindings to it) or used libyaml as their reference parser, in other words as the โ€œmother of all YAML parsersโ€ and mirrored its behavior rather than following the YAML specs, strictly. Nothing extremely wrong about that but I am relaying the facts.

The good news is that there is an easy fix. Quoting a string that contains a colon in a flow context will do (i.e. 'https://medium.com').

The bad news is that it seems like parsers across languages are inconsistently handling this:

  • Pythonpyyaml throws an error, a PR fixes this and has been merged but hasnโ€™t been released yet
  • Ruby psych throws an error
  • Golang go-yaml throws an error, issue submitted here
  • Java snakeyaml throws an error, issue submitted here
  • JavaScript JS-YAML handles this properly

and:

  • libyaml throws an error, but the other good news is that there is a PR that addresses it

To sum-up, the only parser that handles this properly as of now is the JavaScript one. The problem with is that if your stack consists of JavaScript and any other language, and that youโ€™re parsing YAML across the board, it may lead to inconsistent parsing behaviors, and thatโ€™s not great.

Tags

Join Hacker Noon

Create your free account to unlock your custom reading experience.