Programmatic Hack Idea to Bulk Spell Check (Sort Of) An Entire Web Site

Is it just me, or have modern open-source CMSs basically failed where spell check comes into play? I mean, sure, most editors (meaning, the box you type your blog posts into in WordPress, Joomla, etc.) feature a spell check button. But typos slip into web sites here and there nevertheless and, once they’re there, they may sit unnoticed by the site administrator(s) for ages.

Some sites — even ones containing massive numbers of articles — have never been spell checked at all. Sadly, it’s just not at the top of everyone’s mind all the time, prior to publishing.

But, approaching spell check later — after you’ve published tons of posts — is a real chore because you have to open each post, run the check, fix anything that needs it, and then save. (Not to mention text that doesn’t sit nicely inside an article or post, eh? What about footer text and sidebar modules / widgets?)

It’d be nice to be able to treat ALL of one’s site content as a single document and run a spell check over that, wouldn’t it?

I got to thinking about that a while back and momentarily considered building out something of my own to handle it. But, in the end, I decided that coding up an entire spell checker would probably take way longer than the time I’d spend just doing the boring one-by-one approach already described. So, I gave up on it.

However, it struck me the other day that a standard web browser has a built-in feature that can be leveraged to accomplish such a task — well, maybe not in a fully automated way, but at least in a hackable way that helps.

I’m speaking of the magical <textarea> field in normal HTML. This is usually encountered as large comment boxes, such as you see in HTML forms that take in longer messages or comment fields. Ever notice that, if you make a typo in one of those, the browser has its own spell check that kicks in?

See the red squiggly? That’s not HTML; it’s from the web browser!

That’s kind of an “aha” moment because, if you load an article (doesn’t matter if it’s Wordpress, Joomla, or whatever) into a textarea, that textarea will automatically incorporate your browser’s native spell check functionality. (At least, this works in Chrome, my own browser of choice.)

As a spell checking hack, this opens up some pretty cool possibilities for spell checking web content. I believe one could leverage this in a variety of ways ranging from (1) a simplistic yet powerful way to bulk-recognize spelling errors to (2) a more sophisticated means of identifying and actually fixing them.

The Quick / Easy Way:

On the easier side of this (identification only), what you would do is write a small script to iterate through however many chunks of text you care to look at, at once, and then have it dump the various fields into textareas. Your pseudo-code would be something like:

// go to your database and retrieve a custom recordset of various items you want to iterate through (e.g., a whole bunch of blog posts, and/or any other text-content from your site)

// iterate through the whole bunch, dumping the contents into textareas.

// It would probably be a good idea to identify these items with whatever identifying information would help you locate the items on your site (e.g., posting IDs or titles, or whatever).

From there, load your page and then you’ll have to click into each area to engage it. Errors are easily spotted as they’re underlined in a nice, visible red squiggly line.

If you see no errors, simply go on to the next one. If you do see an error, just right-click and fix the error (which usually works for common misspellings). Then, select all inside that textarea, go to your CMS, pull up that article, and paste the fixed code.

Your time savings here is, say, 1 minute for each article or posting for which you do not have to do anything. In other words, this saves you the time of having to open up the post, manually spell check it, save it, find your place in the list of posts you were looking at, and then resume.

The More Complex Way:

While the above is simple and easy for small to medium web sites, you may want to build out something more robust for larger sites. Such pseudo-code would be something like this:

// go to your database and retrieve a custom recordset of various items you want to iterate through (e.g., a whole bunch of blog posts, and/or any other text-content from your site)

// iterate through the whole bunch, dumping the contents into textareas (perhaps even paginated).

// Instead of dumping contents only into textareas, you would rather generate entire forms, either bulk-forms by the page or individual forms by the item. (I'd probably do the latter.) That way, you could build in "save" buttons and actually save any changes you make right there.

Sure, it would take some time to script that. But, for a large-enough site, the investment would be worthwhile.

Put it this way: If it takes 1 minute per article to spell check a site, and your site has 10,000 articles, that’s 166 hours of work. But, if you could get to 5 articles per minute via the above approach (not having to open, spell check, save, close, and keep track of your place in the stack), then you’re down to 33.2 hours of work, plus another 4 or 5 to code up the functionality described here.

If your site has 1,000 articles, that’s still 16.6 hours of manual work versus about 8 total this new way (counting the coding). So, I guess the cost-benefit analysis, assuming you know PHP already, is probably somewhere around a 500+ article web site.

I really like the thought of this, as it’s technically applicable to all CMSs as well as to the extensions used by those CMSs. In other words, you could use the approach above to spell check all of the articles on a web site, and then tweak a few items in your code and use it to spell check all of the items for sale on that same site, then tweak a few items and use it to spell check all modular content, etc.

Jim Dee heads up Array Web Development, LLC in Portland, OR. He’s the editor of “Web Designer | Web Developer” magazine and a contributor to many online publications. You can reach him at: Jim [at] ArrayWebDevelopment.com. Photo atop piece is adapted from “typewriter” by Ak~i (Flickr, Creative Commons). Please 👏 👏 👏 for this article if you liked it (by clicking the applause icon below), as it really helps.