Felipe Hoffa

@hoffa

The digital divide: Software developers in Africa through GitHub and Stack Overflow stats

April 11th 2017

I wish we had more data — in average each African country only has 190.3 active GitHub users. Let’s take a look (or see the original article for context):

Average number of GitHub stars and users per country in each continent 2016.

These are the top countries in Africa by GitHub active users — and the stats from Stack Overflow on BigQuery too:

On the left: Active GitHub users in African countries. On the right: Active Stack Overflow users. 2016

My main message: Don’t stop where I stopped. If there’s any area that’s interesting for you — please take these queries and tools, and dig deeper into the data. I’ll be happy to help.

Queries

Continent stats:

#standardSQL
SELECT continent, SUM(stars) stars
  , ROUND(SUM(stars)/COUNT(DISTINCT country_code), 1) avg_stars_per_country
  , ROUND(SUM(users)/COUNT(DISTINCT country_code), 1) users_per_country
FROM (
      SELECT country_code, COUNT(*) stars
        , COUNT(DISTINCT actor.login) AS users
      FROM `githubarchive.year.2016` a
      JOIN `ghtorrent-bq.ght_2017_01_19.users` b  
      ON a.actor.login=b.login
      WHERE country_code IS NOT null 
      AND a.type='WatchEvent'
      GROUP BY 1
      HAVING stars>5
) a
JOIN `gdelt-bq.extra.countryinfo` c
ON a.country_code=LOWER(c.iso)
GROUP BY 1
ORDER BY stars DESC

African countries stats:

#standardSQL
SELECT country, users, stars, ROUND(stars/users, 1) stars_per_user
FROM (
      SELECT country_code, COUNT(*) stars
        , COUNT(DISTINCT actor.login) AS users
      FROM `githubarchive.year.2016` a
      JOIN `ghtorrent-bq.ght_2017_01_19.users` b  
      ON a.actor.login=b.login
      WHERE country_code IS NOT null 
      AND a.type='WatchEvent'
      GROUP BY 1
      HAVING stars>5
) a
JOIN `gdelt-bq.extra.countryinfo` c
ON a.country_code=LOWER(c.iso)
WHERE continent='AF'
ORDER BY users DESC

African countries by Stack Overflow users

SELECT country,SUM(users) stackoverflow_users
FROM (
  SELECT location, COUNT(*) users
  FROM `bigquery-public-data.stackoverflow.users` a
  WHERE EXTRACT(YEAR FROM last_access_date)>=2016
  GROUP BY 1
) a
CROSS JOIN (SELECT * FROM `gdelt-bq.extra.countryinfo` WHERE continent='AF') b
WHERE ENDS_WITH(LOWER(a.location), LOWER(b.country))
GROUP BY 1
ORDER BY 2 DESC

The main article:

The top GitHub projects per country

See also:

What countries have more open source developers per capita than the US?

Note: In the GitHub queries we can only see the country of a GitHub user if GHTorrent has been able to parse it from each user’s profile. Check out yours on https://github.com/settings/profile.

More by Felipe Hoffa

More Related Stories