Way back in January (remember January), I wrote a blog post describing how to use gen ai to improve image filenames. This worked by uploading the image to Google Gemini, asking for a short description, and using that description for a new filename. Recently, I was thinking about that demo and was curious how well it would work for video games.
As always, I did a few quick tests in Google AI Studio. I did some quick Googling for various games and screenshots, and the results were pretty impressive. Here are three mostly modern examples:
And here's a first failure, identifying this as Final Fantasy 14, not 16.
It did well for one really old game, although, to be fair, the name is in the picture:
Although failed on this rather obscure one. It's Bard's Tale 2, not Betrayal at Krondor.
Also, note it didn't follow my directions to answer with just the name of the game.
I then tried a few race games. I was really curious about this as, with the high fidelity of modern racing games, it feels like it would be a difficult task. Surprisingly, it got them right:
I then went super old school and obscure, and it failed, but it was worth a shot. (The first person to identify it without using reverse image search earns 200 Nerd Points.)
So, all in all... it worked reasonably well. So, let's talk automation!
Nearly two years ago, I blogged about copying Nintendo Switch screenshots to Dropbox. This made use of the fact that the Switch can post to Twitter, and you can use Pipedream to scrape media from a Twitter account. As far as I know, that probably doesn't work since Twitter crapped the bed on developer access. While it did work, the images I got had random names for images.
The XBox can upload to OneDrive and actually does name the images, which is super helpful.
In case you can't read it, the filename is Diablo IV-2024_02_19-1514-24.png
. It's not only got the game in the screenshot but the date and time as well. For the purposes of this blog post, I'm going to ignore that, but in a real application, I'd probably just handle the XBox images with a bit of custom code, so no AI is needed.
PlayStation is a bit more wonky and a bit surprising. As far as I can tell, you can share videos on YouTube, but screenshots can only be added to the PS app or copied to a USB drive. Again, surprising.
Here's a screenshot showing how the images are named, and as you can see, nothing relevant is included.
For the purpose of this, let's make some assumptions (nothing ever goes wrong with that, right?). We will assume that our screenshots all get stored in a Dropbox folder named, SSIn
.
I'm going to build a Pipedream workflow that will:
/SSOut/Galaga
Let's break it down:
This part is relatively simple in Pipedream. I created a trigger for new files based on Dropbox action. I specified the path and told it to include a link so it could be downloaded.
The next step is another built-in Pipedream action, downloading a file. In this case, it gets downloaded to /tmp
with the original filename.
The next step is a Node.js step that essentially takes the code output from AI Studio and has it work with the file in /tmp
:
import fs from 'fs';
import {
GoogleGenerativeAI,
HarmCategory,
HarmBlockThreshold,
} from "@google/generative-ai";
async function identifyPic(path, key) {
const MODEL_NAME = "gemini-1.0-pro-vision-latest";
const API_KEY = key;
const genAI = new GoogleGenerativeAI(API_KEY);
const model = genAI.getGenerativeModel({ model: MODEL_NAME });
const generationConfig = {
temperature: 0.4,
topK: 32,
topP: 1,
maxOutputTokens: 4096,
};
const safetySettings = [
{
category: HarmCategory.HARM_CATEGORY_HARASSMENT,
threshold: HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,
},
{
category: HarmCategory.HARM_CATEGORY_HATE_SPEECH,
threshold: HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,
},
{
category: HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT,
threshold: HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,
},
{
category: HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT,
threshold: HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,
},
];
const parts = [
{text: "Can you identify what game this screenshot comes from? Just tell me the game.\n\n"},
{
inlineData: {
mimeType: "image/png",
data: Buffer.from(fs.readFileSync(path)).toString("base64")
}
},
];
const result = await model.generateContent({
contents: [{ role: "user", parts }],
generationConfig,
safetySettings,
});
return result.response.candidates[0].content.parts[0].text;
}
export default defineComponent({
async run({ steps, $ }) {
return await identifyPic(steps.download_file_to_tmp.$return_value[1], process.env.PALM_KEY);
},
})
The only really important part here is the first argument to identifyPic
, the path. Note that the path is the second result value from the download step. This definitely surprised me.
The next step was also a 'built-in,'’ creating the folder. Now, I should warn you. I'm pretty sure this step is going to throw an error if you run twice in the same game. The action's configuration lets you pick a new name if it already exists but doesn't have a "don't make it if it already exists" parameter. I didn't get a chance to test that, but if it is an issue, I'd simply switch to the Pipedream feature that lets you hit any Dropbox API with the right credentials, check the Dropbox API docs, and see if it's possible there, or heck, just wrap in a try/catch
.
Outside of that, note that I added a trim()
to the result from Google as it had a space in front.
Also note that sometimes Gemini will return a sentence, not just a game name. For example, "The game is X". I figure that will be pretty obvious in the output and a human can handle that.
The final step is to upload the picture to the new output directory. To get the name to be date-based, I used a pretty long expression:
Did it work? Sure did! I mean it's brittle as heck, but here's the output from a few runs. First, top-level folder:
And here's the contents of the Valhalla one:
All in all, I think this works "OK" to "Well", but it is not perfect. I definitely think a human could help, and in fact, one thing Pipedream makes easy is sending emails. You could easily add an email notification at the end where a person could see if something got misfiled.
Anyway, this was fun, and if you want to use this yourself in Pipedream, you can find the workflow here: https://github.com/cfjedimaster/General-Pipedream-AI-Stuff/tree/production/identify-and-sort-screenshots-p_brCmQz3
Also published here.