Parsers are not easy to get right. libyaml
, the reference parser for YAML, does most things right. However, there’s one little thing that it does wrong but since it does everything else sooo right, this little thing has been ignored. Even worse, other parser implementations have been doing it knowingly wrong because “that’s how libyaml
does it”. There is hope though, keep reading.
Found unexpected ‘:’
If you’re parsing YAML — and chances are you are one way or another —, you may have stumbled upon this error while parsing things like:
urls: [https://medium.com]
(i.e. flow sequences)
or:
location: {url: https://medium.com}
(i.e. flow mappings)
even though the YAML specs explicitly say it’s valid:
Normally, YAML insists the “
**_:_**
” mapping value indicator be separated from the value by white space. A benefit of this restriction is that the “**:**
” character can be used inside plain scalars, as long as it is not followed by white space. This allows for unquoted URLs and timestamps.
This is because your YAML parser for <insert_your_language_here> either relies on libyaml
(loading it as shared library and providing bindings to it) or used libyaml
as their reference parser, in other words as the “mother of all YAML parsers” and mirrored its behavior rather than following the YAML specs, strictly. Nothing extremely wrong about that but I am relaying the facts.
The good news is that there is an easy fix. Quoting a string that contains a colon in a flow context will do (i.e. 'https://medium.com'
).
The bad news is that it seems like parsers across languages are inconsistently handling this:
pyyaml
throws an error, a PR fixes this and has been merged but hasn’t been released yetpsych
throws an errorgo-yaml
throws an error, issue submitted heresnakeyaml
throws an error, issue submitted hereJS-YAML
handles this properlyand:
libyaml
throws an error, but the other good news is that there is a PR that addresses itTo sum-up, the only parser that handles this properly as of now is the JavaScript one. The problem with is that if your stack consists of JavaScript and any other language, and that you’re parsing YAML across the board, it may lead to inconsistent parsing behaviors, and that’s not great.