Port Your Medium Articles to Your Personal Blog with a Simple Bash Script [A How To Guide]

Written by michael-li | Published 2020/02/02
Tech Story Tags: programming | blogging-tips | scripting | bash | automation | programming-top-story | hackernoon-top-story | remove-my-stories-from-medium

TLDR This script only works on Pelican static site generator, but the gist of it can be applied to any of your blogging platforms. For every Medium article, I need to copy the URL, run some command to transfer it into Markdown file, then generate the blog site using Pelican. This script is simple, but not as simple as I like it to be. So this is a great opportunity for some quick and dirty Bash script to come for the rescue. We structure the loop to read every line of the file, and for each line extract the title and subtitle.via the TL;DR App

As good as it is, having your own blog outside of Medium is still not a bad idea. It enables you to have another channel you can totally own to communicate with your readers. And who knows, no company can last forever, what if Medium got acquired by some other company or something even worse happen. You can still sleep well at night knowing you won’t lose all your articles.

Medium is a great publication platform. It has good exposure, quality content, readers that really appreciate good articles and a neat and easy to use UI. It’s especially great for writers that just start their journey.
I built my own using Pelican, a Python-based static site generator. I wrote an article explaining the whole process. For every Medium article, I need to copy the URL, run some command to transfer it into Markdown file, then generate the blog site using Pelican. It is simple, but not as simple as I like it to be. So this is a great opportunity for some quick and dirty Bash script to come for the rescue. Let’s see what we can do.

Structure the Script

Before start writing the script, it helps to structure out what we want to accomplish, makes it easier to write quality code. Basically, we need to:
Put all article URLs into one text file manually(plan to automate this part too in the future, using some scraping framework maybe)Read every line of the file, and for each line.Extract the title and subtitleUse the title and subtitle to create meta-data needed for Pelican to turn the Markdown file into a post.Run Pelican command to generate the static site.Push the site to GitHub and trigger Netlify’s auto-buildProfit.

Let’s Write the Code

First of all, define our variables
#!/bin/bash 
# Define variables
filename='articles.txt'
n=1
The structure the loop to read every line of the text file:
# Read in file and do processing on each one
while read line; do 
    # reading each line
    n=$((n+1)) 
    slug=$(echo $line | sed 's/https:\/\/towardsdatascience.com\///' )  # get slug from URL 
    FILE="$HOME/wayofnumbers.github.io/content/$slug.md"   # generate Markdown file name from slug 
    mediumexporter $line > $FILE   # convert medium article to markdown file    
    # some processing ...
done < $filename
We used the
sed
command to remove the first part of the URL:
https://towardsdatascience.com/
so the rest could be used as our slug. For example,
https://towardsdatascience.com/9-things-i-learned-from-blogging-on-medium-for-the-first-month-2bace214b814
turns into
9-things-i-learned-from-blogging-on-medium-for-the-first-month-2bace214b814
, perfect for a slug. Here we also uses the slug to create the filename for the MarkDown file. Then we use
mediumexporter
to transfer URL into the Markdown file. You can find out more about
mediumexporter
here.
Now that we have the Markdown file, let’s fill in the processing code we want:
# Processing the markdown file 
    tail -n +2 "$FILE" > "$FILE.tmp" && mv "$FILE.tmp" "$FILE"  # remove the first line 
    fl=$(head -n 1 $FILE) # put first line (title) into fl 
    firstline=$(echo $fl | sed 's/# //') # Remove '# ' 
    tail -n +3 "$FILE" > "$FILE.tmp" && mv "$FILE.tmp" "$FILE"  # remove the first line 
    subtitle=$(head -n 1 $FILE) # put first line (subtitle) into subtitle 
    tail -n +2 "$FILE" > "$FILE.tmp" && mv "$FILE.tmp" "$FILE"  # remove the first two line
These lines are rather self-explanatory. Now we have
firstline
variable as the title and
subtitle
variable as the subtitle, we are now ready to construct the Markdown file meta-data for Pelican:
# handle metadata for Pelican  
meta="
Title: $firstline
Slug: $slug
Subtitle: $subtitle
Date: $(date)
Category: Machine Learning
Tags: Machine Learning, Artificial Intelligence
author: Michael Li
Summary: $firstline
[TOC]
"
You can refer to Pelican’s document here for more information about the meta-data format. Simply put, the Markdown file doesn’t need to specifically write the title and subtitle, as long as we specify the title and subtitle field in our meta-data, Pelican will automatically generate them for you in the post, with specific styles per the theme you choose.
With the correct meta-data, now we can finally update the Markdown and get it ready for site generation:
{ echo -n "$meta"; cat $FILE; } >$FILE.new # sticth meta-data and article content together 
mv $FILE{.new,} 
head -n -8 $FILE > $FILE.new # Remove medium's recommended articles
mv $FILE{.new,}
done < $filename  # don't forget to enclose the loop.
All my Medium articles have several recommendations for further readings. I removed those for my blog(the last line of code above). Now that the Markdown file is ready, time to generate the site and push it to the server:
# push to server
cd $HOME/wayofnumbers.github.io
pelican content -s publishconf.py 
git add .
git commit -m "fix"
git push origin dev

Conclusion

So there you go. This script only works on Pelican static site generator, but the gist of it can be applied to any of your blogging platforms. I hope you learned a thing or two. And happy blogging/coding!
Found this article useful? Follow me (Michael Li) on Medium or you can find me on Twitter @lymenlee or my blog site wayofnumbers.com.

Written by michael-li | | Product Manager | Machine Learning Practitioner | UI/UX Designer/Preacher | Full-Stack Developer |
Published by HackerNoon on 2020/02/02