paint-brush
Taking the Harder Pathby@robertmoskal
630 reads
630 reads

Taking the Harder Path

by Robert MoskalFebruary 29th, 2020
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

The easy path is almost never the right answer when developing systems, at least the sort that I build. Unzipping a 2.5 GB csv file is going to blow up even the largest instance possible (2GB) Unzipped files in a bucket with PHP proved to be more way more difficult than I expected. I persevered and finally figured out that you download the relatively small file to gae instance and then you could use a stream to move it over to the gcloud bucket.

Companies Mentioned

Mention Thumbnail
Mention Thumbnail
featured image - Taking the Harder Path
Robert Moskal HackerNoon profile picture

Is almost never the right answer when developing systems, at least the sort that I build. The easy path is almost always best and that feeling of “something being easy” is often a sure sign of being on the right track.

So last week when I found myself “grinding” on something, I had to stop and ask myself why? 

What it was is so only so-so interesting: some data ingestion code that unzipped a large downloaded archive onto the file system. It was part of a larger application I was migrating to the google application engine. When you use the file system on GAE, you are actually using RAM. Unzipping a 2.5 GB csv file is going to blow up even the largest instance possible (2GB). We were were already using gcloud buckets for file storage, so moving to that instead of an in-memory file system was the obvious way forward.

Unzipping files in a bucket with PHP proved to be more way more difficult than I expected! Check out this sad blog post concluding it was impossible and this equally sad, long running thread in the google bug tracker.

I persevered and finally figured out that you download the relatively small file to gae instance and then you could use a stream to move it over to the gcloud bucket, minimizing the memory used. The details, for anyone interested, is on Stack Overflow.

I could have taken the easy path here. The code had been containerized and could have been deployed easily and without modification on any number of gcloud services with access to a real file system. But this would have required doubling the number of services in our stack and the introduction of some new tooling. Instead, I worked a bit harder to "make do" for the sake of those who would come after me. I probably wouldn't be around to explain the circumstances that drove me to introduce the new twist, and even if super well documented, I'd be doubling the conceptual area of the code base.

Often when people think of Occam’s razor , they think in terms of explanation, something like simplest solution is most likely right. But in this case I prefer this translation: "Entities should not be multiplied unnecessarily." Adhering to this maxim makes it easier for others to interpret the artifacts I'm creating, enabling future collaborators to orient themselves quickly and be productive with the fewest WTFs per minute possible.