Disclaimer: This is a rant, but I mean it to be a useful rant. The point of this article is to be mad and things, offer solutions and explore them. However, I will not fully explore the solutions offered so they may appear to be half-baked. My intention is not to dwell too much on their details but to stimulate your mind and try to shake some of the stupid status-quo.
Oh, and it’s also not as long as it seems. It features some images and an optional appendix. It’s going to be fun, promise!
Think about a program you’ve had to write. Because you’re a true Unixer you accept your input from stdin output to stdout. There’s a beautiful simplicity to it, which is further enhanced with pipes. Pipes are truly beautiful. No joke. They’re an amazing concept, and they make sense conceptually. Not a lot of things have that.
So let’s look at a program we all know and love, curl:
% curl -vvv http://api.icndb.com/jokes/random | jq '.value.joke'
"Two wrongs don't make a right. Unless you're Chuck Norris. Then two wrongs make a roundhouse kick to the face."
And curl heartily replies with the IPs it’s connecting to, the request and response headers, and even progress bars! Finally the output is passed to jq where it can go its merry way and get us our joke.
Hold that thought. Something’s fishy here.
How did curl say what’s data (the http response) to be passed on to the pipe and what’s info meant for the user (progress bars et al)? After all, jq isn’t interested in the request’s progress bar, but the user sure is.
You most likely know the answer: stderr. stdout is what’s piped to the command while stderr which isn’t handled by the pipe is instead handled by your shell. We can explicitly handle stderr like this:
% curl -vvv http://api.icndb.com/jokes/random 2>/dev/null | jq '.value.joke'
"Chuck Norris can win at solitaire with only 18 cards."
By convention 2 is stderr’s file descriptor so we can tell our shell to pipe it to some file. Now you won’t see it.
You may not have noticed what’s “wrong” here, especially if you’re a shell user with experience under your hat. I urge you to reread the previous couple of paragraphs again.
Take your time.
What the Fuck
How on Earth do we use stderr as a convention to signify output meant for the user? It’s literally named “standard error”, and we bastardised it into also meaning “completely legit output”.
I’ve tried to come up with an appropriate analogy but failed, since the only suitable one seems to be a restaurant with a dish named “punch my face” which is sometimes a delicious soufflé and sometimes earns you a punch in the face.
This isn’t just stderr. stdin is also faulty:
% echo 4 | read -E
Have a program which accepts input from stdin but also wants user input (“are you sure”s, multi-step input, etc)? Too bad if you’re piping!
Yes, you can do things like manipulate /dev/tty and friends. But that’s besides the point. You can also use curses, or write a GUI or a server, or you can just kick off and go to Bermuda or the Bahamas (come on pretty mama).
The point is that the holy trio of stdin, stdout and stderr is flawed. They convolute two distinct concepts: Program interaction and user interaction. We want pipes for programmatic manipulation of data and the terminal for interactive manipulation of data.
A Humble Proposal
We add two more streams: userin and userout. Pipes bind between stdin and stdout, userin and userout are bound to the terminal.
If there’s no pipe before you and you read from stdin then you’re reading from the terminal, which is userin.
If you’re not connected to a terminal, then things go as normal: Reading will block forever, writing has no effect.
Here are some complications which arise:
- How do we, and should we, redirect userin or userout to other pipes?
- Anything which hardcodes the fds 3 or 4 will be screwed.
- What about programs like apt-get, which you usually combine with yes? If they accept input from userin, how will yes write to it, considering point 1 in this list? Frankly, I’m not too sorry about them. That’s supposed to be a flag you pass to the program. yes is a hack.
- It does however raise a valid programming decision: What do we accept from stdin and what from userin? There’s no straight cut answer, but if you think about it you may already have it: Structured input which suits programs (like init file formats, html, or any structured data really) come from stdin. Human input (like prompts, queries, etc) comes from userin.
It’ll have to be thought through.
- Same question goes to stdout and userout. My gut tells me that this is also not that complex of a decision: Use userout whenever you felt like you needed to write to stderr, but didn’t mean an error. stdout is still your program’s output, that hasn’t changed. userout is output explicitly meant for the user.
- What about ssh? What about tmux?
- This isn’t just a libc change to include those variables — the terminal needs to co-operate. What happens when you use an up-to-date libc, but you use a non-compat terminal? What’re the semantics of userin and userout then?
A possible solution to some of these problems is to have a libc function give you the handles to userin and userout so they won’t be ordinary global variables. That means they won’t be pipe-able, at least not without special syntax. I’m happy with that.
I really like this idea because it adds to the beauty while not breaking everything: Programs still interact through pipes. And it’s opt-in — programs written before the introduction of userin and userout won’t feel anything, programs written after can finally not write user output to stderr and know that they can get user input from userin.
Output Meant for Humans
But enough about that. Let’s move on to even more controversial ground.
It’s currently 2016 and I still can’t view images on my terminal or see a syntax-highlighted file or interactively read a pdf or plot a graph. The trend here is that you can only view text and not anything else.
It doesn’t have to be this way. I’m getting cold feet just from writing these lines but what if…what if…the terminal wasn’t limited to text?
The future is here and its prompt is a Nyan cat. Terminology implements all what I asked for and more. In the picture above I typed tycat 3d-save.png and boom an image appeared, so what am I complaining about, right?
The problem is that of standards. We need a standard way to tell our terminal that hey, this output here is an image or a video or PostScript or a code file, please format it as appropriate. One terminal implementing a separate command for this kind of output isn’t good enough. That guy or gal who just finished installing Ubuntu as their first Linux distro should be able to fire up gnome-terminal and cat a cat image.
Remember that with the addition of userout we have a way to communicate that a piece of information is meant for the user and not for a stray program, so piping isn’t a huge concern of ours.
This isn’t a walk in the park. Once again we face a list of problems:
- How do we communicate the output type? How do I say that I’m about to write an svg or an ogg?
- How do we write several things of a different nature? If I have a video I want to play and also a gif to display, do I have to completely write out the video before starting to write the gif?
- What are the supported formats? Is the list implementation dependant? Is there a minimal requirement? Is there a list of canonical implementations for the various formats? How do we pick one over the other?
- What about controls? Can you play/pause/mute/loop videos? Is it mandatory to have that support? What’re the minimal required controls? Do gifs have controls?
- What about screen and tmux? I connect and in my SuperAwesomeTerm cat an image, but then connect with OldGrumpyTerm. Not only that, I then decide to ssh into myself and attach. What happens?
- How do we turn it off? I actually want to see the image’s binary blob.
- I’m Bob and I wrote BobTerm. You’re making me handle all sorts of awful things! It used to be simple text but now I have to know how to output random binary formats! libvte doesn’t even support this (…yet).
- Isn’t this obsoleting things like mplayer or feh? If you can cat an image then why do you need feh for? In fact, isn’t this giving the terminal too much power?
The simplest problem to tackle is signaling the file format. We have MIMEs for that. But it’s not a matter of simply writing a mime header followed by a newline: What if we don’t want to send a mime header? What if we don’t send a mime header but our output looks exactly like a mime header?
To tackle both 1 and 2 a relatively straightforward way is to be able to “fork” a stream. C pseudo-code:
FILE *userout = getuserout();
FILE *video_stream = forkstream(userout, “video/ogg”);
FILE *gif_tream = forkstream(userout, “image/gif”);
This’ll allow us to provide a mime type and pass the file around without worry. Cooler than that would be ensuring that writing to gif_stream while in the same time writing to video_stream wouldn’t cause a collision. This will be difficult to implement, but I think it’s worth it.
Annoyingly, we need to tackle 3 and 4 and 5: What about terminals that don’t support this? What’s the minimal amount of formats needed to be compliant? What about multiple terminals?
Let’s do a sidebar for a moment, a sidebar which I think is relevant not just for this point or this rant, but for the entire series of rants. Instead of listening to my babbling (which I’m sure you’re tired of by now), I recommend you listen to someone smarter than me say smart things. Tune in until he shows the slide “Do web sites need to look exactly the same in every browser”. Maybe more. You should watch that lecture anyway. Nicholas Zakas on Progressive Enhancement:
Decided to skip it or just want to see me talk? Oh you flatter-mouth. Here’s a recap of what he said:
TVs used to only display black and white. Then, people started making colour TVs, and then high-def TVs. Despite having vastly different capabilities, they’re all capable of showing the same content: You plug them in and they show you the 100th rerun of Friends.
A question I still haven’t acknowledged is whether this is giving too much responsibility to the terminal. I think the answer is No. The terminal should support exactly how much it wants to support and not a format beyond. But it should have this basic capability: Programs should be able to signal what they’re displaying to the user.
This is possible. We’ve been doing it elsewhere. We’ve just been neglecting the terminal.
Let’s Stop Neglecting the Terminal
fish has done amazing things to your shell. You have completion suggestions and colours and so many things which make so much sense that you’re wondering why would anyone still voluntarily use bash!? zsh which is an absolute beast is still not ubiquitous even though it’s a drop-in replacement to bash.
Why? Why doesn’t your distro ship with zsh or fish? To be fair some do, sort of. I can speak of Arch whose installation image comes with a configured zsh. But that’s a drop in the ocean. Why doesn’t Ubuntu use zsh as the default? Why doesn’t CentOS give you fish?
I don’t know why we do this to ourselves. Your screen is amazing. You view HD movies off of it. But when you work in the terminal, you probably don’t even have anti-alised fonts.
The Things We Almost Had
While writing this I’ve googled around for my ideas. It’s impossible that I’m the only one who had the insane ideas of adding standard streams or making the shell less horrible. And indeed, two popular projects cropped up: FinalTerm and TermKit.
FinalTerm hits some of the points I raised in the terminal section. It’s also never been out of alpha, and it also suffers from the severe disadvantage of being dead.
TermKit amazed me, for better and for worse. It hit on what I aimed for and then some. It also suffers from the same sympton of death. I’m not going to cover everything here since a post-morterm was published on this reddit page.
The failure of these projects isn’t making me happy. FinalTerm died because of technological reasons alongside regular OSS reasons (with the lead developer demotivated, who picks up the reins?), and TermKit died both because of technological reasons and because it tried to do too many things in a too short amount of time. This isn’t motivating.
The Way Forward
What saddens me most is that I just don’t know what to do with these ideas. Even if I were to implement my own shell and terminal that wouldn’t be enough — these things require co-operation on the side of program makers. It’s possible that it’s too late and we’ll always be in the state that we are today. It’s not that bad. It’s just…ugh.
I don’t know what’s next.
Update: What I did next
I started working on a terminal emulator called plex. It won’t be a complete TE and will only provide a POC of the ideas presented here (hopefully).
Let’s see where it goes.
But wait a minute…where do you think you’re going? We’re not finished yet. This section details problems to which I have no solution, or my thoughts in the matter are still too immature to matter. Feel free to skip this section or the entire article! Might be a bit too late to do the latter.
Shell and getopt Syntax
Let’s start out with something fresh: passing arguments to commands.
# It always starts out simple:
% ls something
% ls -l something
% ls --format=long
# wait, or was it
% ls --format long
# what about spaces
% ls ’01. Serenity.mp4'
# and quotes
% ls “I’m a teapot”
% ls I\’m\ a\ teapot
# now you’ll want to kill yourself
% grep -EIrn ‘can \’you [\\doubt] \”w\\/hy’
I don’t even know if the last example is correct. I don’t have the heart to go through it.
You have a bunch of ways to pass arguments to a command, some of which have to be prefixed in a certain format. Some flags need a single dash, some flags need a double. And there’s a lot of inconsistent logic: Usually double dash prefixes a multi-letter word, but sometimes it doesn’t (example: mplayer -sub subtitle-path but also mplayer --help, and find is a repeat offender). Sometimes you can specify several single-letter arguments in one dash, a la grep, but sometimes things are just weird like in head -20 path which treats numeric argument after a dash to mean -n, and the list just goes on.
And don’t you get me started on find -exec and the kind of juggling you have to go through there, not to mention piping and passing things into sh -c!
Oh and let’s not forget --help which sometimes works but sometimes isn’t handled. Imagine how powerful it’d be if your shell could intelligently infer the program’s arguments by first running it with --help and parsing the results, providing you with intelligent hints and completions. But of course the usage output isn’t standardised at all so you can forget that. It is somewhat alleviated by builtin argument parsers like Python’s argparse or Ruby’s optparse but you’ll never see something like that as part of libc.
There’s also how downright weird variable expansion is. It’s like dumb textual macros. We deserve so much better.
fish sort of improves upon this, and on the way breaks a lot of things (which is both good and bad). But I don’t think they went far enough. I want a proper programming language.
Now tell me: Have you ever run ipython?
It’s too mind-boggling of an experience. If I was a braver man I’d do chsh ipython. This is much more like what we deserve. We deserve a proper programming language, with sane syntax for piping and variables and an actual help.
We’re never going to get it.
Let’s say you have two programs and they want to send data to each other. To ease things a bit you run both through the shell so you pipe them up. One of the most famous examples is ps and grep:
% ps aux | grep dbus
Once again, there’s something wrong here. Something that I’ve subtly alluded to in my first example with curl and jq. Programs don’t send structured, machine-friendly data to one another. Because of the convolution between program output and human output programs over time adopted their own output conventions and you, the output consumer, are left on your own to make sense of what’s going on here.
To clarify: ps’ output isn’t easily machine readable and parseable. Our program needs to both know that we’re accepting input from ps and how to parse that input.
This is both good and bad. This is good because having your program output free-form text is amazing. This is bad because having your program output free-form text is difficult to deal with.
This is not an easy problem to solve. Most people say “just start passing objects around instead of text”, but it’s not as easy as it seems. The semantics of communication between programs is a hard problem. Here are some difficulties:
- Changing this breaks everything. Without some sort of hacky solution, programs will need to know when the program that yesterday wrote outputs in format X now write it in format Y. Most easily solvable via flags, but that’s just meh.
- This requires changing everything. From ls to ssh to mount to strace. This change is huge and won’t happen overnight which means adoption will be laggard.
- What happens if you write structured output to stdout? You don’t want users to see some ugly raw data.
- You lose a lot of freedom. If you’re bound to an output format and you want to express things which aren’t included there, you’re stuck with hacks.
- Speaking of which, I haven’t said what format we’ll be using. JSON? MessagePack? BERT? Cap’n Proto? Any other of the millions out there? How will we pick one over the other?
If we opt for strongly typed solutions like Protobuf or Thrift then how do communicate our schemas?
If we opt for weakly type solutions like S-Expressions we’re left with problems of validation and de-serialisation, which points us to the next problem:
- How are we going to implement the serialization and deserialization? Different languages have vastly different concepts about types and values. How do you write a really large integer over the wire without blowing up the program next to you which can’t handle that without a BigDecimal library? What about dates and random objects? Can you send functions?
- Who’s in charge of verification? Of course programs should always sanitize their inputs, but what happens if a program writes out invalid output? Does the shell give it a stern talking to? Will it just roll with it?
- Once a format is chosen we’re pretty much stuck, forever. New languages will have to implement this serialization and deserialization. Who’s going to be writing all of these parsers? Writing parsers is hard.
- Not to mention streaming — our data serialization format needs to be streamable to properly be consumed via pipes.
- How does this effect the numerous Linux APIs which are accessed through files like those in the proc tree? Will /proc/mount and /proc/net/arp and so forth be changed to give you this serialized output? These are a major source of backwards incompatibility.
- Fuck this shit, I’m out.
You may be thinking that Powershell has already solved this problem. That’d be like saying Python solved this problem — Powershell only communicates between programs written in a specific platform, it is not a general-purpose mediator between entirely different platforms. How does Java communicate date objects to Erlang? How does Io (which uses prototypical inheritance) send objects to C?
This is a hard problem. One which isn’t going to be solved today. For a smarter person than me saying smart things, I recommend you watch Joe Armstrong’s “The How and Why of Fitting Things Together”: