In this article, we will go over a project on image recognition using Go. We will also create a Telegram bot, through which we can send images for recognition.
The first thing we need is an already trained model. Yes, in this article we will not train our model. For this exercise, let's take a ready-made module from the docker image of ctava/tfcgo.
To launch our project, we will need 4 terminals at the same time.
In the first case, we will launch an image recognition server. In the second case, we will launch the bot. In the third case, we will launch a public tunnel for sending our bot "out". In the fourth - we will execute the command to register our bot.
To start the recognition server, create a Dockerfile:
FROM ctava/tfcgo
RUN mkdir -p /model && \
curl -o /model/inception5h.zip -s "http://download.tensorflow.org/models/inception5h.zip" && \
unzip /model/inception5h.zip -d /model
WORKDIR /go/src/imgrecognize
COPY src/ .
RUN go build
ENTRYPOINT [ "/go/src/imgrecognize/imgrecognize" ]
EXPOSE 8080
This way we will run our server in the image. Inside this image, we will have our server: src/imgrecognize. In addition, we will unpack the model in the directory: /model.
For the server, the first thing we need is to set the value of the constant
os.Setenv("TF_CPP_MIN_LOG_LEVEL", "2")
This is necessary so as not to get an error:
I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
unable to make a tensor from image: Expected image (JPEG, PNG, or GIF), got empty file
Here, we will not optimize our server, but simply run it through "ListenAndServe". On port 8080. Before starting the server, we will load our model (loadModel) and get our graph (modelGraph) and labels (labels). From the graph, which is stored in a file in the protobuf format "/model/tensorflow_inception_graph. pb".
func loadModel() (*tensorflow.Graph, []string, error) {
// Load inception model
model, err := ioutil.ReadFile(graphFile)
if err != nil {
return nil, nil, err
}
graph := tensorflow.NewGraph()
if err := graph.Import(model, ""); err != nil {
return nil, nil, err
}
// Load labels
labelsFile, err := os.Open(labelsFile)
if err != nil {
return nil, nil, err
}
defer labelsFile.Close()
scanner := bufio.NewScanner(labelsFile)
var labels []string
for scanner.Scan() {
labels = append(labels, scanner.Text())
}
return graph, labels, scanner.Err()
}
Actually, in "modelGraph" we keep the "structure" of our model and the key tools for working with it. And "labels" contains a "dictionary" for working with our model.
Inside our HTTP handler, we are required to normalize the resulting image-normalizeImage. In order to pass it on to the recognition input in the future. To normalize, we convert our image from a Go value to a Tensor:
tensor, err := tensorflow.NewTensor(buf.String())
After that, we get three variables
graph, input, output, err := getNormalizedGraph()
"graph" - we need to decode, resize, and normalize an image. The "input", together with the tensor, will be the "input point" for "communication" between our application and tensorflow. The "output" will be used as the output signal.
Through "graph", we will also open a session to start normalization directly.
session, err := tensorflow.NewSession(graph, nil)
Normalization Code:
func normalizeImage(imgBody io.ReadCloser) (*tensorflow.Tensor, error) {
var buf bytes.Buffer
_, err := io.Copy(&buf, imgBody)
if err != nil {
return nil, err
}
tensor, err := tensorflow.NewTensor(buf.String())
if err != nil {
return nil, err
}
graph, input, output, err := getNormalizedGraph()
if err != nil {
return nil, err
}
session, err := tensorflow.NewSession(graph, nil)
if err != nil {
return nil, err
}
normalized, err := session.Run(
map[tensorflow.Output]*tensorflow.Tensor{
input: tensor,
},
[]tensorflow.Output{
output,
},
nil)
if err != nil {
return nil, err
}
return normalized[0], nil
}
After normalizing the image, we create a session for inference over modelGraph.
session, err := tensorflow.NewSession(modelGraph, nil)
With the help of this session (session), we will start the recognition itself. The input is our normalized image
modelGraph.Operation("input").Output(0): normalizedImg,
The result of the calculation (recognition) will be saved in the "outputRecognize"variable.
From the received data we get the last 3 results:
res := getTopFiveLabels(labels, outputRecognize[0].Value().([][]float32)[0])
func getTopFiveLabels(labels []string, probabilities []float32) []Label {
var resultLabels []Label
for i, p := range probabilities {
if i >= len(labels) {
break
}
resultLabels = append(resultLabels, Label{Label: labels[i], Probability: p})
}
sort.Sort(Labels(resultLabels))
return resultLabels[:ResultCount]
}
And for the HTTP response, we will give only one most likely result:
msg := fmt.Sprintf("This is: %s (%.2f%%)", res[0].Label, res[0].Probability*100)
_, err = w.Write([]byte(msg))
All the code of our server for recognition:
package main
import (
"bufio"
"bytes"
"fmt"
"io"
"io/ioutil"
"log"
"net/http"
"os"
"sort"
tensorflow "github.com/tensorflow/tensorflow/tensorflow/go"
"github.com/tensorflow/tensorflow/tensorflow/go/op"
)
const (
ResultCount = 3
)
var (
graphFile = "/model/tensorflow_inception_graph.pb"
labelsFile = "/model/imagenet_comp_graph_label_strings.txt"
)
type Label struct {
Label string
Probability float32
}
type Labels []Label
func (l Labels) Len() int {
return len(l)
}
func (l Labels) Swap(i, j int) {
l[i], l[j] = l[j], l[i]
}
func (l Labels) Less(i, j int) bool {
return l[i].Probability > l[j].Probability
}
var (
modelGraph *tensorflow.Graph
labels []string
)
func main() {
// I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
// unable to make a tensor from image: Expected image (JPEG, PNG, or GIF), got empty file
err := os.Setenv("TF_CPP_MIN_LOG_LEVEL", "2")
if err != nil {
log.Fatalln(err)
}
modelGraph, labels, err = loadModel()
if err != nil {
log.Fatalf("unable to load model: %v", err)
}
log.Println("Run RECOGNITION server ....")
http.HandleFunc("/", mainHandler)
err = http.ListenAndServe(":8080", nil)
if err != nil {
log.Fatalln(err)
}
}
func mainHandler(w http.ResponseWriter, r *http.Request) {
normalizedImg, err := normalizeImage(r.Body)
if err != nil {
log.Fatalf("unable to make a normalizedImg from image: %v", err)
}
// Create a session for inference over modelGraph
session, err := tensorflow.NewSession(modelGraph, nil)
if err != nil {
log.Fatalf("could not init session: %v", err)
}
outputRecognize, err := session.Run(
map[tensorflow.Output]*tensorflow.Tensor{
modelGraph.Operation("input").Output(0): normalizedImg,
},
[]tensorflow.Output{
modelGraph.Operation("output").Output(0),
},
nil,
)
if err != nil {
log.Fatalf("could not run inference: %v", err)
}
res := getTopFiveLabels(labels, outputRecognize[0].Value().([][]float32)[0])
log.Println("--- recognition result:")
for _, l := range res {
fmt.Printf("label: %s, probability: %.2f%%\n", l.Label, l.Probability*100)
}
log.Println("---")
msg := fmt.Sprintf("This is: %s (%.2f%%)", res[0].Label, res[0].Probability*100)
_, err = w.Write([]byte(msg))
if err != nil {
log.Fatalf("could not write server response: %v", err)
}
}
func loadModel() (*tensorflow.Graph, []string, error) {
// Load inception model
model, err := ioutil.ReadFile(graphFile)
if err != nil {
return nil, nil, err
}
graph := tensorflow.NewGraph()
if err := graph.Import(model, ""); err != nil {
return nil, nil, err
}
// Load labels
labelsFile, err := os.Open(labelsFile)
if err != nil {
return nil, nil, err
}
defer labelsFile.Close()
scanner := bufio.NewScanner(labelsFile)
var labels []string
for scanner.Scan() {
labels = append(labels, scanner.Text())
}
return graph, labels, scanner.Err()
}
func getTopFiveLabels(labels []string, probabilities []float32) []Label {
var resultLabels []Label
for i, p := range probabilities {
if i >= len(labels) {
break
}
resultLabels = append(resultLabels, Label{Label: labels[i], Probability: p})
}
sort.Sort(Labels(resultLabels))
return resultLabels[:ResultCount]
}
func normalizeImage(imgBody io.ReadCloser) (*tensorflow.Tensor, error) {
var buf bytes.Buffer
_, err := io.Copy(&buf, imgBody)
if err != nil {
return nil, err
}
tensor, err := tensorflow.NewTensor(buf.String())
if err != nil {
return nil, err
}
graph, input, output, err := getNormalizedGraph()
if err != nil {
return nil, err
}
session, err := tensorflow.NewSession(graph, nil)
if err != nil {
return nil, err
}
normalized, err := session.Run(
map[tensorflow.Output]*tensorflow.Tensor{
input: tensor,
},
[]tensorflow.Output{
output,
},
nil)
if err != nil {
return nil, err
}
return normalized[0], nil
}
// Creates a graph to decode, rezise and normalize an image
func getNormalizedGraph() (graph *tensorflow.Graph, input, output tensorflow.Output, err error) {
s := op.NewScope()
input = op.Placeholder(s, tensorflow.String)
decode := op.DecodeJpeg(s, input, op.DecodeJpegChannels(3)) // 3 RGB
output = op.Sub(s,
op.ResizeBilinear(s,
op.ExpandDims(s,
op.Cast(s, decode, tensorflow.Float),
op.Const(s.SubScope("make_batch"), int32(0))),
op.Const(s.SubScope("size"), []int32{224, 224})),
op.Const(s.SubScope("mean"), float32(117)))
graph, err = s.Finalize()
return graph, input, output, err
}
Now, we need to build this image (build it). Of course, we can build an image and run it in the console using the appropriate commands. But it is more convenient to build these commands in a Makefile. So, let's create this handy file:
recognition_build:
docker build -t imgrecognition .
recognition_run:
docker run -it -p 8080:8080 imgrecognition
After that, open the terminal and run the command:
make recognition_build && make recognition_run
Now, in the first terminal, we have a local HTTP server that can accept images. In response, it sends a text message containing information about what was recognized in the image.
This is so to say the "core" of our project.
Next, we need to create a Telegram bot.
We need to "build" the bot; to do this, we need to write a second HTTP server. The first server recognizes our images and uses port 8080. The second one will be the Bot's server and will use port 3000.
First, we need to create a bot through your account in the app via BotFather. With this registration, you will receive the bot's name and its token. Don't tell anyone about this token.
Let's put this token in the "BotToken" constant. You should get something like this:
const BotToken = "1695571234:AAEbodyrfOjto2xNE5yjpQpW2Gyq0Ob5X24D5"
Our bot's handler will decode the JSON response body.
json.NewDecoder(r.Body).Decode(webhookBody)
We are interested in the photo in the sent message
webhookBody.Message.Photo.
By the unique image ID-photoSize.FileID
let's collect a link to the image itself fmt.Sprintf(GetFileUrl, BotToken, photoSize.FileID)
. And download it downloadResponse, err = http.Get(downloadFileUrl).
We will send the image bytes to the handler of our first server:
msg := recognitionClient.Recognize(downloadResponse)
In response, we get a certain message - a text string.
After that, we simply send this string to the User, as is, in the Telegram Bot.
The entire bot code:
package main
import (
"bytes"
"encoding/json"
"errors"
"fmt"
"io/ioutil"
"log"
"net/http"
"github.com/romanitalian/recognition/src/bot/recognition"
)
// Register Bot: curl -F "url=https://9068b6869da7.ngrok.io " https://api.telegram.org/bot1695571234:AAEbodyrfOjto2xNE5yjpQpW2Gyq0Ob5X24D5/setWebhook
const (
BotToken = "1695571234:AAEbodyrfOjto2xNE5yjpQpW2Gyq0Ob5X24D5"
GetFileUrl = "https://api.telegram.org/bot%s/getFile?file_id=%s"
DownloadFileUrl = "https://api.telegram.org/file/bot%s/%s"
SendMsgToUserUrl = "https://api.telegram.org/bot%s/sendMessage"
)
type webhookReqBody struct {
Message Msg
}
type Msg struct {
MessageId int `json:"message_id"`
Text string `json:"text"`
From struct {
ID int64 `json:"id"`
FirstName string `json:"first_name"`
Username string `json:"username"`
} `json:"from"`
Photo *[]PhotoSize `json:"photo"`
Chat struct {
ID int64 `json:"id"`
FirstName string `json:"first_name"`
Username string `json:"username"`
} `json:"chat"`
Date int `json:"date"`
Voice struct {
Duration int64 `json:"duration"`
MimeType string `json:"mime_type"`
FileId string `json:"file_id"`
FileSize int64 `json:"file_size"`
} `json:"voice"`
}
type PhotoSize struct {
FileID string `json:"file_id"`
Width int `json:"width"`
Height int64 `json:"height"`
FileSize int64 `json:"file_size"`
}
type ImgFileInfo struct {
Ok bool `json:"ok"`
Result struct {
FileId string `json:"file_id"`
FileUniqueId string `json:"file_unique_id"`
FileSize int `json:"file_size"`
FilePath string `json:"file_path"`
} `json:"result"`
}
func main() {
log.Println("Run BOT server ....")
err := http.ListenAndServe(":3000", http.HandlerFunc(Handler))
if err != nil {
log.Fatalln(err)
}
}
// This handler is called everytime telegram sends us a webhook event
func Handler(w http.ResponseWriter, r *http.Request) {
// First, decode the JSON response body
webhookBody := &webhookReqBody{}
err := json.NewDecoder(r.Body).Decode(webhookBody)
if err != nil {
log.Println("could not decode request body", err)
return
}
// ------------------------- Download last img
var downloadResponse *http.Response
if webhookBody.Message.Photo == nil {
log.Println("no photo in webhook body. webhookBody: ", webhookBody)
return
}
for _, photoSize := range *webhookBody.Message.Photo {
// GET JSON ABOUT OUR IMG (ORDER TO GET FILE_PATH)
imgFileInfoUrl := fmt.Sprintf(GetFileUrl, BotToken, photoSize.FileID)
rr, err := http.Get(imgFileInfoUrl)
if err != nil {
log.Println("unable retrieve img by FileID", err)
return
}
defer rr.Body.Close()
// READ JSON
fileInfoJson, err := ioutil.ReadAll(rr.Body)
if err != nil {
log.Println("unable read img by FileID", err)
return
}
// UNMARSHAL JSON
imgInfo := &ImgFileInfo{}
err = json.Unmarshal(fileInfoJson, imgInfo)
if err != nil {
log.Println("unable unmarshal file description from api.telegram by url: "+imgFileInfoUrl, err)
}
// GET FILE_PATH
downloadFileUrl := fmt.Sprintf(DownloadFileUrl, BotToken, imgInfo.Result.FilePath)
downloadResponse, err = http.Get(downloadFileUrl)
if err != nil {
log.Println("unable download file by file_path: "+downloadFileUrl, err)
return
}
defer downloadResponse.Body.Close()
}
// --------------------------- Send img to server recognition.
recognitionClient := recognition.New()
msg := recognitionClient.Recognize(downloadResponse)
if err := sendResponseToUser(webhookBody.Message.Chat.ID, msg); err != nil {
log.Println("error in sending reply: ", err)
return
}
}
// The below code deals with the process of sending a response message
// to the user
// Create a struct to conform to the JSON body
// of the send message request
// https://core.telegram.org/bots/api#sendmessage
type sendMessageReqBody struct {
ChatID int64 `json:"chat_id"`
Text string `json:"text"`
}
// sendResponseToUser notify user - what found on image.
func sendResponseToUser(chatID int64, msg string) error {
// Create the request body struct
msgBody := &sendMessageReqBody{
ChatID: chatID,
Text: msg,
}
// Create the JSON body from the struct
msgBytes, err := json.Marshal(msgBody)
if err != nil {
return err
}
// Send a post request with your token
res, err := http.Post(fmt.Sprintf(SendMsgToUserUrl, BotToken), "application/json", bytes.NewBuffer(msgBytes))
if err != nil {
return err
}
if res.StatusCode != http.StatusOK {
buf := new(bytes.Buffer)
_, err := buf.ReadFrom(res.Body)
if err != nil {
return err
}
return errors.New("unexpected status: " + res.Status)
}
return nil
}
The client code that sends the image request from the Bot to the Recognition Server:
package recognition
import (
"io/ioutil"
"log"
"net/http"
)
const imgRecognitionAddress = "http://localhost:8080/"
type Client struct {
httpClient *http.Client
}
func New() *Client {
return &Client{
httpClient: &http.Client{},
}
}
func (c *Client) Recognize(downloadResponse *http.Response) string {
var msg string
method := "POST"
req, err := http.NewRequest(method, imgRecognitionAddress, downloadResponse.Body)
if err != nil {
log.Println("error from server recognition", err)
return msg
}
req.Header.Add("Content-Type", "image/png")
// do request to server recognition.
recognitionResponse, err := c.httpClient.Do(req)
if err != nil {
log.Println(err)
return msg
}
defer func() {
er := recognitionResponse.Body.Close()
if er != nil {
log.Println(er)
}
}()
recognitionResponseBody, err := ioutil.ReadAll(recognitionResponse.Body)
if err != nil {
log.Println("error on read response from server recognition", err)
return msg
}
msg = string(recognitionResponseBody)
return msg
}
By the way, to make our bot work correctly-register our handler. To do this, run:
ngrok http 3000
Immediately after executing this command, you will see a list of public addresses. The last one will be an address with HTTPS - we need it. For example, it can be:
https://9068b6869da7.ngrok.io.
And directly register our bot-say Telegram where to send webhooks:
curl -F "url=https://9068b6869da7.ngrok.io" https://api.telegram.org/bot1695571234:AAEbodyrfOjto2xNE5yjpQpW2Gyq0Ob5X24D5/setWebhook
Now you can send a file with a photo to your bot and get information about what is depicted on it.
Thanks for your attention.