The following is an example project on how to track each request to our RESTful API, analyze and visualize the data in Googles DataStudio Reports. This solution is very straight forward way of tracking API calls while having the ability to have different questions answered from our collected data. GitHub link is attached on the bottom of this article. We’ll be using the following Google Services: for Java (hosting our REST API) Google Appengine Flex (storing request events) Google BigQuery (visualization) Google DataStudio Appengine Flex gives as a lot of advantages over manual server management. We can autoscale our instances and not handle increased traffic. Deployment management is easy as pie. TLS management is automated by google so renewing certificates is also managed for us if we want to use custom domain secured by TLS. But for our example this could by any Java Application Server or host provider. BigQuery is going to be our data warehouse for analytics. We’re going to use DataStudio which connects to BigQuery to visualize the analytics. Code — Model All we need is a filter that would intercept each request and queue it for BigQuery ingestion. But first lets prepare the data structure for events we will be ingesting. Event object stores name of the event which is our API request path. In our example there are 2 API services: and _/api/greeting, c_urrent system time of the event, userId (random in the GitHub example), method and geo location from which request originates. You can easily extend the data you’d like to capture simply by adding fields to the Event object. /api/ping GET/POST/PUT/DELETE public class Event {public String name;public LocalDateTime time;public String userId;public String method;public String country;public String city;public String region;} Country, City and Region are extracted from googles headers which are added by load balancer which sits in front of our Appengine Flex instances. Check the filter code bellow. You can also Read more about it here. Code — Filter With a simple filter we intercept each request to our API. We ignore all other requests since we’re trying to analyze only RESTful API. (filterName = "RequestFilter", urlPatterns="/api/*")public class RequestFilter implements Filter { @WebFilter public void doFilter(ServletRequest request, ServletResponse response, FilterChain chain)throws IOException, ServletException { @Override HttpServletRequest req = (HttpServletRequest)request;// catching all requests to api and loggingEvent event = new Event();event.name = req.getRequestURI();event.time = LocalDateTime.now();event.method = req.getMethod();event.userId = "user_" + (new Random().nextInt(1000) + 1); event = augmentEventWithHeaders(event, req); BigQueryUtil.eventQueue.offer(event); chain.doFilter(request, response);} private Event augmentEventWithHeaders(Event event, HttpServletRequest request) { event.country = request.getHeader("X-AppEngine-Country");event.city = request.getHeader("X-AppEngine-City");event.region = request.getHeader("X-AppEngine-Region");return event;} } Before we dive into the utility methods lets just think about scalability and reliability. In the example we chose to use Producer-Consumer pattern. On each request we insert event in our bounded thread-safe BlockingQueue with capacity of 10000 events. . Learn more about BlockingQueue here Now we need a consumer to push those queued events into BigQuery. We create separate thread and start it on server start by implementing a WebListener — ServletContextListener. That way we ingest events independent from REST API functionality. For this example we use only 1 consumer so we define ExecutorService with a fixed thread pool of 1. private static ExecutorService queueService = Executors.newFixedThreadPool(1);public static final BlockingQueue<Event> eventQueue = new LinkedBlockingQueue<>(10000); public static void startQueueListener() {Runnable startTask = () -> {try { while (true) {int numberOfIngested = 0;if (eventQueue.size() > 0) {List<Event> events = new ArrayList<>();eventQueue.drainTo(events, 50); insertEventsToBigQuery(events);  
   
   numberOfIngested = eventQueue.size();  
}  
if (numberOfIngested < 50) {   
 Thread.sleep(5000);  
} }} catch (Exception e) {e.printStackTrace();}};queueService.submit(startTask);} Disadvantages of this approach There is a couple of things we need to be aware of: What happens if the queue is full? Are we ok with dropping the events? How do we handle scenario when BigQuery is not available (this can and probably will happen in some point in time) What if our consumer dies due to a bug or something else? How do we monitor it, restart it? How many messages can we afford to loose? Is one consumer enough if load surges? For production systems with heavy loads where we want ingestion of our events to be as precise as possible this is not a good solution. When we need more robust approach we would want to look into separating our producers and consumers. Lets see what we could gain by using something like : Google Cloud Pub/Sub Loose coupling — we can make our consumers run on different computer instances, separating the BigQuery ingestion logic from our REST API service code. This also makes our design cleaner. Scalability — In cases of heavy traffic we could scale number of our consumers to match the load made by producers without touching the core REST API. There is an option with Google Pub/Sub to have queue-based autoscaling. . Read more about it here Easier testing and monitoring of our Pub/Sub Reliability — In case there is a problem with our consumer or BigQuery service, our events will persist on Googles Pub/Sub for about 7 days from the day they were published. That would give our consumers possibility to catch up with ingestion after all services are operational again. BigQuery table scheme and event ingestion All of the interactions with BigQuery are handled in . BigQueryUtil class BigQueryUtil also implements methods for checking if the table we need exists, extracts the schema from our Event object and creates necessary table structure in BigQuery. All of this happens on server start so when the first request reaches our REST API methods we’re all ready. After we have a table ready in BigQuery we can start sending our events: private static void streamRows(final BigQuery bigquery, final String tableId, final String suffix, final List<Map<String, Object>> rows) throws IOException { TableId table = TableId.of(BIGQUERY_DATASET, tableId);Builder insertRequestBuilder = InsertAllRequest.newBuilder(table);insertRequestBuilder.setTemplateSuffix(suffix); for (Map<String, Object> row : rows) {insertRequestBuilder.addRow(row);} InsertAllRequest insertallRows = insertRequestBuilder.build(); InsertAllResponse insertResponse =   bigquery.insertAll(insertallRows);} We use a template suffix in this example. This simply creates multiple tables based on our suffix. In our case this is YEAR_MONTH. Suffix can be anything we want. We just chose to have it formatted as eventsYEAR_MONTH (example: events2017_DECEMBER, next month would be as events2018_JANUARY and so on). . Read more about it here We can now deploy our example and start calling our REST API methods: mvn appengine:deploy Check in the terminal for your endpoint URL and replace http://localhost:8080 with your own. Lets make a couple of calls to our API: while true; do curl -H "Content-Type: application/json" http://localhost:8080/api/ping ;done Command-C to stop. Lets do a couple more: while true; do curl -H "Content-Type: application/json" -X POST -d '{"time":"2017-12-25T14:26:50.466"}' [http://localhost:8080/api/ping ;](http://localhost:8080/api/ping;)done while true; do curl -H "Content-Type: application/json" http://localhost:8080/api/greeting ;done while true; do curl -H "Content-Type: application/json" -X POST -d '{"userName":"Igor"}' http://localhost:8080/api/greeting ;done Visualization In we can preview our events with simple SQL query: BigQuery console SELECT * FROM [myproject:stream.events2017_DECEMBER] order by time desc Our events in BigQuery table Lets connect our table to and try to visualize our how many calls were made to each of our REST APIs methods. Google Data Studio Google Data Studio First we need to create a new Custom DataSource to BigQuery. Go to and find in the list. Then select . The input field opens up. Insert following SQL query (replace ‘myproject’ with your project id and replace events2017_DECEMBER with the desired table): Data Sources BigQuery Custom Query SELECT concat(method, ' ', name) as name_method, count(*) as request_count FROM [myproject:stream.events2017_DECEMBER] group by name_method Result of the SQL query if we ran it in BigQuery Query editor SQL query will count all request made to our API and group the results by endpoint and method of call (either GET or POST in our case). Click . You should see our custom BigQuery connection. Connect BigQuery Custom connection from Data Studio Click and in the menu select . Create Report Insert -> Pie Chart Now you can click on View button and Voila we have our custom analytics for REST API. There is also button which fires a new query to our BigQuery data table so we can monitor the calls in almost real-time. refresh data Analyzing request counts to each REST method Based on collected data you could see number of request from each user or requests grouped by country, city or basically anything SQL language allows you to do. What else can we do with those metrics? Depends on your SQL skills and use case. If you’re tracking number of calls to each endpoint API that is probably all you’ll need. You could go pretty far with SQL querying. If your goal is to see how many users clicked on a specific button from an email you’ve sent them and what was the conversion rate in terms of new subscribers or purchases from that specific email button then you’re probably looking at the wrong approach here. Just in case you are looking for a solution that would give you more insight into customer behavior from email marketing campaigns then maybe you should have a look at (still in development but if you’d like to learn more leave your email bellow). SixthMass SixthMass Customer Journeys Retention Analysis Example GitHub project: _GoogleFlexBigQuery - Analytics for RESTful interface via Google Appengine Flex, BigQuery and Datastudio_github.com igorrendulic/GoogleFlexBigQuery

Chain

Analytics for RESTful interfaces

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

Untitled Story

AngularJS ng-routing links without hash bang on Appengine with Python

10 Great Articles On Data Science And Programming!

10 FinTech Trends in 2021 [Part II]

10 Financial Predictions for 2021 to Protect Your Wealth

13 API Metrics That Every Platform Team Should be Tracking

AngularJS ng-routing links without hash bang on Appengine with Python

10 Great Articles On Data Science And Programming!

10 FinTech Trends in 2021 [Part II]

10 Financial Predictions for 2021 to Protect Your Wealth

13 API Metrics That Every Platform Team Should be Tracking

Light-Mode

Classic

Newspaper

Dark-Mode

Neon Noir

Minty

HN StartUps