

In a previous post, I was working on a small site, where you could search Donald Trumpβs recent speeches and get statistics on how often he uses certain words or phrases, when he uses them, and where he most frequently says certain things. For reference, heβs the working prototype of the site Iβm building:
https://trumpspeechdata.herokuapp.com/
Now, suppose I want to add a feature, so that you can search for a topic, and return any speeches on that topic. Now, have you ever searched for something on a given site, and the search results were not too close to what you actually had in mind? Weβve probably all had that happen before. If we have tons and tons of data, what Iβm about to suggest might not be feasible, but for smaller chunks of data like this site, itβs both reasonable, and allows us to return more accurate search results.
Hereβs the data I have so far, if you want to follow along:
https://www.dropbox.com/s/u4vuwazx609uvvw/trumpspeeches.json?dl=0
Our current data structure looks like this:
{
"speechtitle": "Title of Speech",
"speechdate": "Date of Speech",
"speechlocation": "Location of Speech",
"text": "Entire transcript of Speech",
}
And if a user searches for a specific topic, weβll probably want to return all of that data to the user, if their search matches the topic of the speech. To start off, letβs use this User Story:
Bob, a user searches for βBudgetβ so he can view speeches made by Donald Trump on The Budget.
One way we could do this would be the following:
let matchingSpeeches = [];
for (var i = 0; i < api.length; i++) {
if(api[i].speechtitle.indexOf(inputValue) > -1) {
matchingspeeches.push([
api[i].speechtitle,
api[i].speechdate,
api[i].speechlocation,
api[i].text,
])
}
}
This basically says, βif the speech title contains the word or phrase searched for, push it into the matching speeches arrayβ. Then we would be able to format the results of the array to the User. Back to Bob though, would this get him what he wants? Yes, as long as βBudgetβ is included in the speech title. But what if thereβs a speech on the Budget, but Budget doesnβt appear in the speech title? Sorry Bob, youβre SOL.
Maybe we could do the same thing, as above, but include a search of the speech text too, like this:
let matchingSpeeches = [];
for (var i = 0; i < api.length; i++) {
if(api[i].speechtitle.indexOf(inputValue) > -1 || api[i].text.indexOf(inputValue) > -1) {
matchingspeeches.push([
api[i].speechtitle,
api[i].speechdate,
api[i].speechlocation,
api[i].text,
])
}
}
Here weβre saying, βokay, if the search value appears in either the speech title or the text of the speech, weβll push that into the array.β Better right? Well, yes and no. What if a speech is all about abortion, or gun control, but mentions the budget once? The speech isnβt really about the budget at all, but weβre still returning it to Bob and making him sort through that mess. Or, what if the speech is about the Budget, but uses another word, like βSpendingβ through the speech but doesnβt actually mention βBudget?β We could put our user, Bob, in a scenario where he gets a speech on abortion, but doesnβt get the one on Spending. Not a great thing for our end user, Bob. Hereβs another idea. Letβs add a field to our data structure called βtagsβ. Then, for each talk, we can add tags for topics. For instance, letβs take this entry from our JSON data:
{
"speechtitle": "Remarks by President Trump at Tax Reform Event",
"speechdate": "September 2017",
"speechlocation": "Indiana",
"text": "speech text here",
}
We could modify that to the following:
{
"speechtitle": "Remarks by President Trump at Tax Reform Event",
"speechtags": ["budget", "taxes"],
"speechdate": "September 2017",
"speechlocation": "Indiana",
"text": "speech text here",
}
Then when Bob makes his search, we could use our code from earlier, and loop over the tags instead, and return a tag that matches the search input. However, even though this could be more targeted, and in theory make our search results better, we could still run into issues here. For example, what if Bob searches for βSpendingβ instead of βBudgetβ. Again, even though they are close, this speech wouldnβt be sent to bob because the query doesnβt match. So hereβs one way we could solve that problem. What we want to do is to boil down many of the popular search terms. So if a user searches for βSpendingβ, βBudgetβ, βTax Reformβ or βDeficitβ, we will still send the user the results with the βBudgetβ tag, since thatβs a pretty close match. What weβre going to do is build a word Object. Then, we can put any words we want in the Object. The structure will look something like this:
var mapObj = {
"a" : "b",
"c" : "b",
"d" : "b",
"e" : "b",
}
The idea here is that, if the user, Bob searches for βaβ, weβll give him βbβ. If he searches for βcβ or βdβ or βeβ, weβre still going to give him βbβ. This is just what weβve described above. Basically, if he searches for βspendingβ weβll return βbudgetβ. But if he searches for βtax reformβ or βdeficitβ or βbudgetβ, weβll still return the results containing βbudgetβ, since thatβs still a good match.
Now, weβll need to add some regex, to match the input string. Hereβs what the code could look like:
var mapObj = {
"spending" : "budget",
"tax reform" : "budget",
"deficit" : "budget",
"budget" : "budget",
};
var re = new RegExp(Object.keys(mapObj).join("|"), "gi");
keyWord = str.replace(re, function(matched) {
return mapObj[matched.toLowerCase()];
});
Weβre using Regex to match what the user searched for, and then replace it with something else. So now we can use the variable βkeyWordβ in our code from earlier, and use the speechtags field we created:
let matchingSpeeches = [];
for (var i = 0; i < api.length; i++) {
if(api[i].speechtags.indexOf(keyWord) > -1) {
matchingspeeches.push([
api[i].speechtitle,
api[i].speechdate,
api[i].speechlocation,
api[i].text,
])
}
}
Now, as I mentioned earlier, this might not work well on a really large scale, but I think in this scenario it works because we have a pretty limited scope. Because the topic is politics, there are only so many search terms a user might enter. And we can always put some code in there that returns the user something, if we donβt get any matches to their input. For example, we probably wonβt have a lot of data available if Bob searches for βchicken soupβ. Since the possible search terms are somewhat limited, we can modify our search object to include as many possibilities as we would like, to match all the tags we are using, like so:
var mapObj = {
"spending" : "budget",
"tax reform" : "budget",
"deficit" : "budget",
"budget" : "budget",
"abortion" : "abortion",
"women's rights" : "abortion",
"pro life" : "abortion",
"pro choice" : "abortion",
"healthcare" : "healthcare",
"obamacare" : "healthcare",
"health reform" : "healthcare",
"medicaid" : "healthcare",
};
var re = new RegExp(Object.keys(mapObj).join("|"), "gi");
keyWord = str.replace(re, function(matched) {
return mapObj[matched.toLowerCase()];
});
Then we could go back and add tags to each speech to match the keyWords weβre using. We could do that manually, especially if our data set is small, or we could use JavaScript or Python or whatever, but I wonβt cover that here. Again, I even with this scenario, we would still run into some problems, but itβs not bad if youβre looking for a quick way to make the search results youβre returning more targeted, especially for smaller data sets.
Create your free account to unlock your custom reading experience.