Taking the Harder Path
Pluralitas non est ponenda sine neccesitate.
Is almost never the right answer when developing systems, at least the sort that I build. The easy path is almost always best and that feeling of “something being easy” is often a sure sign of being on the right track.
So last week when I found myself “grinding” on something, I had to stop and ask myself why?
What it was is so only so-so interesting: some data ingestion code that unzipped a large downloaded archive onto the file system. It was part of a larger application I was migrating to the google application engine. When you use the file system on GAE, you are actually using RAM. Unzipping a 2.5 GB csv file is going to blow up even the largest instance possible (2GB). We were were already using gcloud buckets for file storage, so moving to that instead of an in-memory file system was the obvious way forward.
Unzipping files in a bucket with PHP proved to be more way more difficult than I expected! Check out this sad blog post
concluding it was impossible and this equally sad, long running thread in the google bug tracker
I persevered and finally figured out that you download the relatively small file to gae instance and then you could use a stream to move it over to the gcloud bucket, minimizing the memory used. The details, for anyone interested, is on Stack Overflow
I could have taken the easy path here. The code had been containerized and could have been deployed easily and without modification on any number of gcloud services with access to a real file system. But this would have required doubling the number of services in our stack and the introduction of some new tooling. Instead, I worked a bit harder to "make do" for the sake of those who would come after me. I probably wouldn't be around to explain the circumstances that drove me to introduce the new twist, and even if super well documented, I'd be doubling the conceptual area of the code base.
Often when people think of Occam’s razor
, they think in terms of explanation, something like simplest solution is most likely right. But in this case I prefer this translation: "Entities should not be multiplied unnecessarily."
Adhering to this maxim makes it easier for others to interpret the artifacts I'm creating, enabling future collaborators to orient themselves quickly and be productive with the fewest WTFs
per minute possible.
Subscribe to get your daily round-up of top tech stories!