A lot of data over the web or internally in an enterprise is available and accessible only through Rest APIs. Rest APIs do provide a secure and authenticated mechanism to access these services, but they are not very intuitive sometimes to query upon the data which is fetched through them. For example, this fetches the real-time stock data in the Nifty 50 index from (National Stock Exchange) India. It returns a JSON response with a list of symbols in the Nifty 50 index and their price and other values. url NSE Apologies if the stock market is something that is not of your interest, you can give it a quick read though, if curious. If this data would have been present in a database, then we can very easily write different queries like below and query at ease: , , , Nifty50 trdVol > ; Select open high low close From where 100 Imagine if all the data through Rest APIs would have been available and hosted like a database, how easy it would have been to query them or join them with other data tables. Another good thing with a database is that they also mostly provide a standard ODBC / JDBC driver, which can be used to connect through any language and used for analysis. So how do we convert Rest APIs into a database…? Apache Calcite This is an amazing open-source project from Apache which can be used to create a JDBC driver over Rest APIs, or for that matter, allows you to create a JDBC driver over any underlying data source (like Cassandra, ElasticSearch, Local File Folder, AWS S3, etc). Apache Calcite To begin with, this is a well-explained from Apache Calcite, which gives a walk-through of an example to run JDBC queries over data stored in local CSV files. article Once you follow the example and set up the same over the CSV files, the JDBC config ( file in Calcite terms) to use Apache Calcite JDBC driver looks like this: model.json If we look at the config, it has a key called which allows fetching the schema in your own custom way. factory In this case, it is defined as a custom class called which inherits and implements the interface. The interface has a method which creates and returns CsvSchemaFactory SchemaFactory create Schema . Schema internally defines the and its definitions. The complete CSV example implementation can be found in this . Table github project Going back to our example of NSE data - the NSE website provides data across multiple different categories, like: Broad Market Indices Sectoral Indices, etc and, within each category data is provided for multiple Indexes, ETFs or Bonds, like Nifty50, Nifty Auto, Nifty FMCG, etc. So, when we the NSE data, the categories can be the and could be the tables. jdbcify schema indexes / ETFs +-----------------------+-----------------+ | Schema Name | Table Name | +-----------------------+-----------------+ | Broad Market Indices | Nifty 50 | | | Nifty Next 50 | | Sectoral Indices | Nifty Auto | +-----------------------+-----------------+ For the implementation, we implement class, which has a method and return the Map of Table with their names. AbstractSchema getTableMap The in the above code snippet should return an implementation of the interface There are multiple types of tables supported and provided in Apache Calcite, as: createTable Table . ScannableTable QueryableTable , etc.. StreamableTable The above serves the purpose as their name suggests. For our example, both Scannable or Streamable table can be created, the latter would be an interesting implementation as the data emitted by NSE APIs ticks and provides the latest prices at a particular frequency. For now and for simplicity we would create a ScannableTable. ScannableTable implementation has to implement a method called which should return: scan, Enumerable<java.lang.Object[]> The returned enumerable should emit each row as an array of values, where values should be in the order of the fields defined in the table column definition. For example, if the table definition contained below fields defined in this same order: +----------------+--------------+ | Column Name | Column Type | +----------------+--------------+ | symbol | String | | open | Float | | high | Float | +----------------+--------------+ then the method should return rows as: scan | Infosys | 123.24 | 128.65 | | Britannia | 1232.4 | 1244.01 | wherein the first row above, “Infosys” is the value for “symbol, 123.24 for “open” and 128.65 for “high” Table definition for the can be specified and returned by the method: Table getRowType And that is it!!.. that covers all missing links and places where to write some more code to fetch data from NSE and jdbcify it. The complete source code of the example can be found at this project. github You can experiment with the code using Calcite JDBC driver and . Apache Calcite internally uses for the JDBC driver implementation. sqlline utility Apache Avatica This is how the output looks like, when queried using command in !tables sqlline: You can run other SQL queries over Tables, the plan for the same is generated and can be viewed if logs are enabled and setup at DEBUG level in log4j.properties file. With that simple implementation, any rest API hosted can be used as a database, which then allows data exploration and query using a simple old SQL construct. Also published on: https://ashishjain-ash.medium.com/how-to-create-a-jdbc-driver-over-rest-apis-1571ab156e6a