The making of Good Dogs of NYC, a Twitter bot about real doggies 🎉 I’ve launched that tells the stories of all of the very good dogs who live in New York City. a new Twitter bot The data the bot uses originates from a ~90k row of dog licensing data obtained from the city of New York via . I was alerted to the existence of this data by , who was and had converted it to a . CSV a FOIA request Parker Higgins messing around with it more useful format The moment I got my hands on the data, I knew what I had to do: Now that the bot is , I wanted to share some info about the process of making it. live Nuts and Bolts Ok, this is pretty banal, but there are a thousand ways to build and deploy a Twitter bot, and sometimes it’s interesting to see how others do it. So here goes, very quickly: Good Dogs of NYC is a Ruby application that uses Active Record to talk to a PostgreSQL database and the gem to tweet. It’s deployed on Heroku as an app with no dynos. I use the free add-on to run an executable script that does the tweeting. It runs every 10 minutes and tweets 1 out of every 8 times it runs. Twitter Heroku Scheduler Ok that was pretty boring. Here’s another good dog to cleanse your palate: The Neighborhood Problem As always with this kind of project, a bunch of labor went into cleaning and massaging the data into a usable form. The most time-consuming part of this was around locations. The data provided each dog’s , but I wanted to refer to NYC . So I set out to find a mapping of NYC ZIP codes to neighborhoods. Easy, right? ZIP code neighborhoods Nope. The only ZIP-neighborhood mapping available online is , but it isn’t specific enough with its neighborhoods (“Southern Brooklyn” doesn’t cut it), and it was missing a bunch of codes that appeared in my data set. provided by the NYC Department of Health So I took a closer look at the ZIP codes in the data, and discovered why no one had created this mapping before: The dataset contained 307 ZIP codes (!). There are of ZIP codes. Some ZIP codes in the data set are the standard kind that map to a region of the city. Others (so-called “unique ZIP codes”) are for large entities such as hospitals, and still others only apply to PO Boxes. several types (Do dogs live in hospitals and PO Boxes? WTH?) Neighborhood boundaries in NYC are and there is no central authority that defines them. vague : Even for standard ZIP codes, , meaning that there are many ZIPs that span multiple neighborhoods, and many neighborhoods that span multiple ZIP codes. And the big one the ZIP-neighborhood relationship is highly many-to-many So, I improvised. I settled on 182 codes that covered more than 99% of my doggie friends. For each of these, I searched for the ZIP on Maps. Then —and this is the real 1337 haxor part—I squinted at the screen and chose the neighborhood that, according to Google, looked like it mapped best to the ZIP codes’s region. Google of that tedious work. (If you want to use this data, Please keep in mind that some arbitrary choices went into the generation of that data because of (4) above.) Here’s the final result Data into Words I’ve long been interested in programmatically generating prose descriptions of data. This is something in a pretty naive way in my first tech job, and it’s still (for at least one firm). I worked on a big business This interest is one reason I like Twitter bots. Each bot I’ve created— , (now defunct), and —is in the text generation genre. Each time I built a bot, I did just enough work to generate the text for that bot. This time, I wanted to take a more abstract, reusable approach. The Meaning of Life Important Animal Father John Botsy The result is the . Wordz is a lightweight generative text library. You declare the structure of the text you want to generate—a “grammar”—and optionally specify some objects that hold the text bits you want to compose. Here’s a small fragment of Good Dogs’ grammar as json: Wordz gem {"<root>": [["<body>", "<tag>|0.1", "<sign_off>"]],"<body>": [["<salutation_with_name>", "Hi.|0.1", "<short_sentences_body>", "Hi.|0.1"],["<long_sentence_body>", "<salutation_with_name>", "Hi.|0.1"]],"<salutation_with_name>": [["Hi, my name is", "#dog#name#", "."],["<bark>|0.2", "I am", "#dog#name#", "."],["Oh hi. It is", "#dog#name#", "here|0.3", "."]], .... etc etc etc } The angle-bracketed elements here, like refer to other keys in the grammar, and enable you to build up complex structures from simple pieces. Method call elements such as will pass the message (in this case )to the specified object (in this case the one you’ve called ). For Good Dogs, this is a record of a good dog pulled form the database. Other elements are string literals. For every type of element, you can optionally append a probability, e.g. . This means: add “Hi.” to the text 10% of the time, and do nothig the rest of the time. Wordz recursively evaluates your grammar using the objects you input, and assembles it into a list of phrases. <body>, #dog#name# name dog Hi.|0.1 Once Wordz has a list of phrases, a post-processing step that joins the text together into sentences with proper spacing. Most of the time we want to join everything together with spaces in between, but there are exceptions, such as , which we want to generate not . ["I am", "#dog#name#", "!"] "I am dog name!" "I am dog name !" There’s also facility for string literals which have special effects in post-processing. Right now the only example of this is the string , which gets replaced in post-processing by either “a” or “an” depending on the phrase that follows it. $INDEF_ARTICLE Planned future work on Wordz: Add in conditional logic into the generation step, whereby you could specify in the grammar that a fragment of text should be included only if some condition, defined by a function call, is met. Allow Wordz users to define their own post-processing functions. (E.g. you might want one that pluralizes a noun phrase if it’s proceeded by a number word greater than one.) Links Ok, thanks for reading. Here’s a list of links to the work described above! on Twitter Good Dogs of NYC on GitHub Good Dogs of NYC on RubyGems Wordz on GitHub Wordz (that’s me!) on Twitter Michael

BUNCH

Google

🐶 All of the Good Dogs in New York 🐶

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

12 Lessons Learned from 12 Rejections Submitting Actions on Google

22 Steps to Making The Ultimate Chatbot

20 Tips for Selling on Depop App: 2021 Edition

Tools for Bot Development

4 Reasons Why Email Is Obsolete, and You Should Move On

44 Stories To Learn About Bots

12 Lessons Learned from 12 Rejections Submitting Actions on Google

22 Steps to Making The Ultimate Chatbot

20 Tips for Selling on Depop App: 2021 Edition

Tools for Bot Development

4 Reasons Why Email Is Obsolete, and You Should Move On

44 Stories To Learn About Bots

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps