The following is an example project on how to track each request to our RESTful API, analyze and visualize the data in Googles DataStudio Reports.
This solution is very straight forward way of tracking API calls while having the ability to have different questions answered from our collected data. GitHub link is attached on the bottom of this article.
We’ll be using the following Google Services:
Appengine Flex gives as a lot of advantages over manual server management. We can autoscale our instances and not handle increased traffic. Deployment management is easy as pie. TLS management is automated by google so renewing certificates is also managed for us if we want to use custom domain secured by TLS. But for our example this could by any Java Application Server or host provider.
BigQuery is going to be our data warehouse for analytics.
We’re going to use DataStudio which connects to BigQuery to visualize the analytics.
All we need is a filter that would intercept each request and queue it for BigQuery ingestion. But first lets prepare the data structure for events we will be ingesting.
Event object stores name of the event which is our API request path. In our example there are 2 API services: /api/ping and _/api/greeting, c_urrent system time of the event, userId (random in the GitHub example), method GET/POST/PUT/DELETE and geo location from which request originates. You can easily extend the data you’d like to capture simply by adding fields to the Event object.
public class Event {public String name;public LocalDateTime time;public String userId;public String method;public String country;public String city;public String region;}
Country, City and Region are extracted from googles headers which are added by load balancer which sits in front of our Appengine Flex instances. Check the filter code bellow. You can also Read more about it here.
With a simple filter we intercept each request to our API. We ignore all other requests since we’re trying to analyze only RESTful API.
@WebFilter(filterName = "RequestFilter", urlPatterns="/api/*")public class RequestFilter implements Filter {
@Overridepublic void doFilter(ServletRequest request, ServletResponse response, FilterChain chain)throws IOException, ServletException {
HttpServletRequest req = (HttpServletRequest)request;// catching all requests to api and loggingEvent event = new Event();event.name = req.getRequestURI();event.time = LocalDateTime.now();event.method = req.getMethod();event.userId = "user_" + (new Random().nextInt(1000) + 1);
event = augmentEventWithHeaders(event, req);
BigQueryUtil.eventQueue.offer(event);
chain.doFilter(request, response);}
private Event augmentEventWithHeaders(Event event, HttpServletRequest request) {
event.country = request.getHeader("X-AppEngine-Country");event.city = request.getHeader("X-AppEngine-City");event.region = request.getHeader("X-AppEngine-Region");return event;}
}
Before we dive into the utility methods lets just think about scalability and reliability. In the example we chose to use Producer-Consumer pattern. On each request we insert event in our bounded thread-safe BlockingQueue with capacity of 10000 events. Learn more about BlockingQueue here.
Now we need a consumer to push those queued events into BigQuery. We create separate thread and start it on server start by implementing a WebListener — ServletContextListener. That way we ingest events independent from REST API functionality. For this example we use only 1 consumer so we define ExecutorService with a fixed thread pool of 1.
private static ExecutorService queueService = Executors.newFixedThreadPool(1);public static final BlockingQueue<Event> eventQueue = new LinkedBlockingQueue<>(10000);
public static void startQueueListener() {Runnable startTask = () -> {try {
while (true) {int numberOfIngested = 0;if (eventQueue.size() > 0) {List<Event> events = new ArrayList<>();eventQueue.drainTo(events, 50);
insertEventsToBigQuery(events);
numberOfIngested = eventQueue.size();
}
if (numberOfIngested < 50) {
Thread.sleep(5000);
}
}} catch (Exception e) {e.printStackTrace();}};queueService.submit(startTask);}
There is a couple of things we need to be aware of:
For production systems with heavy loads where we want ingestion of our events to be as precise as possible this is not a good solution. When we need more robust approach we would want to look into separating our producers and consumers. Lets see what we could gain by using something like Google Cloud Pub/Sub:
All of the interactions with BigQuery are handled in BigQueryUtil class.
BigQueryUtil also implements methods for checking if the table we need exists, extracts the schema from our Event object and creates necessary table structure in BigQuery. All of this happens on server start so when the first request reaches our REST API methods we’re all ready.
After we have a table ready in BigQuery we can start sending our events:
private static void streamRows(final BigQuery bigquery, final String tableId, final String suffix, final List<Map<String, Object>> rows) throws IOException {
TableId table = TableId.of(BIGQUERY_DATASET, tableId);Builder insertRequestBuilder = InsertAllRequest.newBuilder(table);insertRequestBuilder.setTemplateSuffix(suffix);
for (Map<String, Object> row : rows) {insertRequestBuilder.addRow(row);}
InsertAllRequest insertallRows = insertRequestBuilder.build();
InsertAllResponse insertResponse = bigquery.insertAll(insertallRows);}
We use a template suffix in this example. This simply creates multiple tables based on our suffix. In our case this is YEAR_MONTH. Suffix can be anything we want. We just chose to have it formatted as eventsYEAR_MONTH (example: events2017_DECEMBER, next month would be as events2018_JANUARY and so on). Read more about it here.
We can now deploy our example and start calling our REST API methods:
mvn appengine:deploy
Check in the terminal for your endpoint URL and replace http://localhost:8080 with your own. Lets make a couple of calls to our API:
while true; do curl -H "Content-Type: application/json" http://localhost:8080/api/ping ;done
Command-C to stop.
Lets do a couple more:
while true; do curl -H "Content-Type: application/json" -X POST -d '{"time":"2017-12-25T14:26:50.466"}' [http://localhost:8080/api/ping ;](http://localhost:8080/api/ping;)done
while true; do curl -H "Content-Type: application/json" http://localhost:8080/api/greeting ;done
while true; do curl -H "Content-Type: application/json" -X POST -d '{"userName":"Igor"}' http://localhost:8080/api/greeting ;done
In BigQuery console we can preview our events with simple SQL query:
SELECT * FROM [myproject:stream.events2017_DECEMBER] order by time desc
Our events in BigQuery table
Lets connect our table to Google Data Studio and try to visualize our how many calls were made to each of our REST APIs methods.
Google Data Studio
First we need to create a new Custom DataSource to BigQuery. Go to Data Sources and find BigQuery in the list. Then select Custom Query. The input field opens up. Insert following SQL query (replace ‘myproject’ with your project id and replace events2017_DECEMBER with the desired table):
SELECT concat(method, ' ', name) as name_method, count(*) as request_count FROM [myproject:stream.events2017_DECEMBER] group by name_method
Result of the SQL query if we ran it in BigQuery Query editor
SQL query will count all request made to our API and group the results by endpoint and method of call (either GET or POST in our case).
Click Connect. You should see our custom BigQuery connection.
BigQuery Custom connection from Data Studio
Click Create Report and in the menu select Insert -> Pie Chart.
Now you can click on View button and Voila we have our custom analytics for REST API. There is also refresh data button which fires a new query to our BigQuery data table so we can monitor the calls in almost real-time.
Analyzing request counts to each REST method
Based on collected data you could see number of request from each user or requests grouped by country, city or basically anything SQL language allows you to do.
Depends on your SQL skills and use case. If you’re tracking number of calls to each endpoint API that is probably all you’ll need. You could go pretty far with SQL querying.
If your goal is to see how many users clicked on a specific button from an email you’ve sent them and what was the conversion rate in terms of new subscribers or purchases from that specific email button then you’re probably looking at the wrong approach here.
Just in case you are looking for a solution that would give you more insight into customer behavior from email marketing campaigns then maybe you should have a look at SixthMass (still in development but if you’d like to learn more leave your email bellow).
SixthMass Customer Journeys Retention Analysis
Example GitHub project:
igorrendulic/GoogleFlexBigQuery_GoogleFlexBigQuery - Analytics for RESTful interface via Google Appengine Flex, BigQuery and Datastudio_github.com