The modern world requires fast and cheap delivery of value to the end-user. That’s why we test tens of hypotheses per week in IT companies. For fast experiments, we usually prefer to use a ready-made solution instead of a self-developed one. Therefore, there is always a need to integrate with external services via API. And today I’d like to talk about best practices for these integrations.
Timeouts are a crucial part of your fault tolerance. You should set it for all external calls. Otherwise, an external service can hang up and you will be frozen with it. For example, if you use Golang, then your code would be something like that:
import "net/http"
type Service struct {
httpClient *http.Client
}
func NewService() *Service {
httpClient := &http.Client{
Timeout: 5 * time.Second, // set up your own timeout
}
return &Service{
httpClient: httpClient,
}
}
func (s *Service) CallAPI(req *http.Request) error {
res, err := s.httpClient.Do(req)
if err != nil {
...
}
...
}
Any external service (even Google or Amazon) can be down. You should consider the fallback logic for 5xx responses or unexpected responses. For instance, you can return a default response object or do some fallback job.
import (
"log"
"io/ioutil"
"net/http"
)
type Service struct {
httpClient *http.Client
}
type CallResponse struct {
Payload string
}
func NewService() *Service {
httpClient := &http.Client{
Timeout: 5 * time.Second,
}
return &Service{
httpClient: httpClient,
}
}
func (s *Service) CallAPI(req *http.Request) (CallResponse, error) {
res, err := s.httpClient.Do(req)
if err != nil {
return CallResponse{}, fmt.Errorf("do request: %w", err)
}
content, err := ioutil.ReadAll(res.Body)
if err != nil {
return CallResponse{}, fmt.Errorf("read response body: %w", err)
}
// gracefully handle the bad responses
if res.StatusCode >= 400 && res.StatusCode < 500 {
log.Printf("external service returned bad response. Code: %s. Content: %s\n", res.StatusCode, string(content))
return CallResponse{Payload: "default"}, nil
}
...
}
Every extra API call is an overhead to you and the external systems. Pore over the API docs to find batch methods for your needs.
For example, 1 call to create one item takes 20ms. Therefore, the synchronous creation of 10 items would take 200ms (actually it will take more because on load external services usually start to throttle your requests). But you can use the batch API method and create 10 items per single request and it takes 50 ms.
Usually, when your requests count is increasing the difference becomes much more prominent. It can save you a tremendous amount of execution time. In the corner case if there is no batch method, try to parallel your requests.
Most services have API limits. Investigate them and calculate how your requests will be placed within the limits. There is a useful lib in Go that can help you to control the API calls count.
import (
"go.uber.org/ratelimit"
"net/http"
)
type Service struct {
limiter ratelimit.Limiter
httpClient *http.Client
}
func NewService() *Service {
httpClient := &http.Client{
Timeout: 5 * time.Second,
}
return &Service{
httpClient: httpClient,
limiter: ratelimit.New(10), // 10 is the max RPS that external API can handle
}
}
func (s *Service) CallAPI(req *http.Request) error {
s.limiter.Take() // hangs if the max RPS is reached
res, err := s.httpClient.Do(req)
if err != nil {
...
}
...
}
Even if an external service returns successful responses, it can have issues with performance sometimes. For cases like these, you should use metrics and alerts on your side to see when it happens and react quickly.
In my team, we prefer to use widespread solutions like Prometheus and Grafana:
import (
"github.com/prometheus/client_golang/prometheus"
)
var (
// ExternalServiceHTTPCallHistogram observes http call duration in seconds
ExternalServiceHTTPCallHistogram = prometheus.NewHistogramVec(
prometheus.HistogramOpts{
Namespace: "namespace",
Subsystem: "subsystem",
Name: "external_service_call_duration_in_seconds",
Help: "http call duration in seconds to an external service",
}, []string{"path", "method"},
)
)
type Service struct {
httpClient *http.Client
}
func NewService() *Service {
httpClient := &http.Client{
Timeout: 5 * time.Second,
}
return &Service{
httpClient: httpClient,
}
}
func (s *Service) CallAPI(req *http.Request) error {
// save the request starting time point
start := time.Now()
// do the API call
res, err := s.httpClient.Do(req)
// calculate how much time the request takes
spentSeconds := time.Since(start).Seconds()
// send the measurement to Prometheus
metric.ExternalServiceHTTPCallHistogram.WithLabelValues(req.URL.Path, req.Method).Observe(spentSeconds)
...
}
With data in Prometheus, we can set up alerts in Grafana when an external service is down or its response is taking too long.