About a year ago, Github launched a feature that allows adding of README to a user profile. To add the README to your profile, you have to:
README.md
in the root of the repository
You can learn more about it in Github documentation.
The dynamic Github profile is updated automatically on some external event or by schedule. It is possible with the use of Github Actions. Github Actions are another recently released Github feature. Github Actions is essentially a CI/CD system that allows creating and running of custom workflows.
I first learned about the profile README in this article on Hackernoon. The guy used PHP to fetch and update a list of the latest posts in his blog. Although I am a PHP expert, I desired to make it more challenging. I realized that XML parsing and replacing text in a file is achievable using Bash native tools only.
RSS feed is a plain XML file with a simple schema. Here’s a sample from my freshly launched blog:
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/">
<channel>
<title>Posts on Aleksandr Tabakov's technical blog | atabakoff</title>
<link>https://atabakoff.com/posts/</link>
<description>Recent content in Posts on Aleksandr Tabakov's technical blog | atabakoff</description>
<image>
<url>https://atabakoff.com/aleksandr_tabakov.jpeg</url>
<link>https://atabakoff.com/aleksandr_tabakov.jpeg</link>
</image>
<generator>Hugo -- gohugo.io</generator>
<lastBuildDate>Fri, 27 May 2022 21:33:51 +0200</lastBuildDate><atom:link href="https://atabakoff.com/posts/index.xml" rel="self" type="application/rss+xml" />
<item>
<title>How to run Vaultwarden in Docker/Podman as a systemd service</title>
<link>https://atabakoff.com/how-to-run-vaultwarden-in-podman-as-a-systemd-service/</link>
<pubDate>Fri, 27 May 2022 21:33:51 +0200</pubDate>
<guid>https://atabakoff.com/how-to-run-vaultwarden-in-podman-as-a-systemd-service/</guid>
<description>Running Vaultwarden in a container as a systemd service using Podman. How to install Podman, run Vaultwarden in a container, create a systemd config for Vaultwarden service and manage it using systemctl.</description>
</item>
</channel>
</rss>
Each post is represented by an item
element where we need title
, link
, and pubDate
.
grep
The naive approach is to use grep
and then build markdown in a bash loop. Let’s first try to grep
:
wget --quiet -O rss.xml https://atabakoff.com/posts/index.xml
cat rss.xml | grep -Po '<(title|link|pubDate)>[^<]+'
<title>Posts on Aleksandr Tabakov's technical blog | atabakoff
<link>https://atabakoff.com/posts/
<link>https://atabakoff.com/aleksandr_tabakov.jpeg
<title>How to run Vaultwarden in Docker/Podman as a systemd service
<link>https://atabakoff.com/how-to-run-vaultwarden-in-podman-as-a-systemd-service/
<pubDate>Fri, 27 May 2022 21:33:51 +0200
Not bad, but we need to get rid of the first three lines and the opening tag:
cat rss.xml | grep -Po '<(title|link|pubDate)>[^<]+' | tail -n +4 \
| grep -oE '>([^>]+)' | grep -oE '([^>]+)'
How to run Vaultwarden in Docker/Podman as a systemd service
https://atabakoff.com/how-to-run-vaultwarden-in-podman-as-a-systemd-service/
Fri, 27 May 2022 21:33:51 +0200
Just to test grep
https://atabakoff.com/testing-grep/
Fri, 31 May 2022 18:33:51 +0200
I added one extra item to test if my expression works with multiple posts. At this point, I start thinking that grep
might not be the best option. I quickly wrote a converter to convert RSS to markdown, before researching other options:
#!/bin/bash
items=$( cat rss.xml | grep -Po '<(title|link|pubDate)>[^<]+' | tail -n +4 \
| grep -oE '>([^>]+)' | grep -oE '([^>]+)' )
IFS=$'\n'
count=0
for item in $items
do
case $(expr $count % 3) in
'0')
title=$item
link=''
pubDate=''
;;
'1')
link=$item
;;
'2')
pubDate=$( date -d "$item" +'%d/%m/%Y' )
cat<<EOF
* $pubDate [$title]($link)
EOF
;;
esac
count=$(($count + 1))
done
Run it to validate:
./test.sh
* 27/05/2022 [How to run Vaultwarden in Docker/Podman as a systemd service](https://atabakoff.com/how-to-run-vaultwarden-in-podman-as-a-systemd-service/)
* 31/05/2022 [Just to test grep](https://atabakoff.com/testing-grep/)
My RSS feed is tiny. To measure performance, we need to run the parser many times. I created a test.sh
file:
#!/bin/bash
x=100
while [ "$x" -gt "0" ] ; do
$(/bin/bash $1) 2>/dev/null
x=$((x-1))
done
It accepts a script file as a parameter and runs it 100 times in a loop. Let’s run it with time
to see how much it’s taking to parse the feed:
> time ./test.sh grep-rss.sh
./test.sh grep.sh 1,87s user 0,72s system 137% cpu 1,883 total
Not very impressive but expected due to the use of regular expressions.
I started googling if there’s a way to parse XML in Bash and found this awesome solution. It describes the same problem of parsing the RSS feed. I modified the code for my needs and stored it in the parse-rss.sh
file:
#!/bin/bash
xmlgetnext () {
local IFS='>'
read -d '<' TAG VALUE
}
cat $1 | while xmlgetnext ; do
case $TAG in
'item')
title=''
link=''
pubDate=''
;;
'title')
title=$VALUE
;;
'link')
link=$VALUE
;;
'pubDate')
pubDate=$( date -d "$VALUE" +'%d/%m/%Y' )
;;
'/item')
cat<<EOF
* $pubDate [$title]($link)
EOF
;;
esac
done
I ran the same test to compare performance:
time ./test.sh parse-rss.sh
./test.sh parse.sh 0,81s user 0,33s system 109% cpu 1,042 total
Almost two times faster: 1,042
vs 1,883
. It is the final approach I chose for processing of RSS feed.
Updating a list of posts is simply a replacement. Since markdown allows the usage of HTML code, we can use HTML comments to mark a placeholder for posts:
<!--blog:start-->
<!--blog:end-->
The standard tool to replace text in Bash is sed
but it has one limitation. It is a string editor, only processing one string in one step. In our case, both the placeholder and the posts list is a multiline text. Here’s how I solved it:
#!/bin/bash
NUM=$(($2*3))
POSTS=$( cat $1 | head $NUM | tr '\n' '\t' )
cat README.md | tr '\n' '\t' \
| sed -E "s#(<\!--blog:start-->).*(<\!--blog:end-->)#\1\t${POSTS}\2#g" \
| tr '\t' '\n' > README.tmp
mv README.tmp README.md
rm -f rss.xml posts.md
Some things worth explaining:
NUM=$(($2*3))
is the number of lines for the specified number of posts; in my case, I want to show five posts taking three lines each (title, link, date)tr '\n' '\t'
is to convert the text to a single line to process it by sed
tr '\t' '\n'
is to bring back newlinesNow we have our scripts, and we need to put them into a pipeline. Github Actions are looking at a special .gihub/workflows
directory and process each .yaml
file there. I’ve created a posts.yml
file there with the following content:
name: Update blog posts
on:
push:
workflow_dispatch:
schedule:
- cron: '0 0 * * *'
jobs:
update-readme-with-latest-posts:
runs-on: ubuntu-latest
steps:
- name: Clone repository
uses: actions/checkout@v2
with:
fetch-depth: 1
- name: Fetch RSS feed
run: wget --quiet -O rss.xml https://atabakoff.com/posts/index.xml
- name: Parse RSS feed
run: |
cd ${GITHUB_WORKSPACE}
./src/parse-rss.sh rss.xml > posts.md
- name: Update README.md
run: |
cd ${GITHUB_WORKSPACE}
./src/update-readme.sh posts.md 5
- name: Push changes
run: |
git config --global user.name "${GITHUB_ACTOR}"
git config --global user.email "${GITHUB_ACTOR}@users.noreply.github.com"
git commit -am "Updated blog posts" | exit 0
git push
Here’s what needs to be explained:
push
is to run it on pushcron: '0 0 * * *'
is a schedule, in my case every day at midnightuses: actions/checkout@v2
clones a repository
Then I split fetching, parsing, and updating into separate steps. It allows me quickly localize a problem if something goes wrong. Something worth noting:
cd ${GITHUB_WORKSPACE}
is to move to the current working directory, which is the newly cloned repository${GITHUB_ACTOR}
is your username${GITHUB_ACTOR}@users.noreply.github.com
is a special Github email one can use to push the changes to the repositoryYou can find the full solution in my profile repository. It’s been a lot of fun solving this problem with pure Bash.
That being said, there’re lots of community-made Github actions. They allow creating a dynamic profile without writing any code. All you need to do is to write some YAML. But there’s little challenge in that. It is not a warrior way.
Also published here.