MapR Platform is an excellent choice for solving many of the problems associated with having humongous, continuously increasing dataset of nowadays businesses. The distributed, highly efficient file system along with the powerful yet simple and standard streaming API, are key components of the success of the platform. However, one of its most celebrated pieces is its distributed, non- , highly available, JSON database. SQL MapR-DB supports the HBase API for retro-compatibility, but the newest OJAI API is the core of it. Let’s look at an example of a document we could store in MapR-DB. This is a pure JSON that MapR-DB can store. The document could be as complicated as we want. There is virtually no limit on the document size, the number of fields or recursively nested fields. Documents are stored across the MapR cluster so that reading and writing from/to a table happens in parallel, distributing the workload and gaining impressive performance numbers as shown in some independent benchmarks. The images below show some of them. MapR-DB can do quite more operations per seconds than the rivals. MapR-DB keeps latency low, constant, and predictable. The entire comparison can be found . here When reading or updating documents, MapR-DB knows what part of a document needs to be read or updated and only those parts are actually touched. MapR-DB try to efficiently manipulate documents, tables, and the underlying file system in order to keep performance at its best. Querying MapR-DB MapR-DB is a non- database, so it does not support natively. The is the preferred way to interact to MapR-DB and by using this API we can take advantage of every feature this database offers. SQL SQL OJAI API We can use any of the provided clients to run queries on MapR-DB. An example of creating a document using the Java API is the following. As we can see, the API allows manipulating objects in a friendly way as they represent JSON documents. Through the OJAI API, we can do all kinds of operations against MapR-DB such as inserts, updates, etc… Basically, from any application that is able to use the OJAI API, we are able to do most of the work in MapR-DB. However, we could ask ourselves, what about other types of tools that required different processing capabilities? Example of these is BI tools doing aggregations such as counts, groups by, sums, etc… On the other hand, we also should be able to quickly look at values on the database without the need for writing applications, but is this possible in MapR-DB? Let’s explore our options. MapR DB Shell MapR-DB offers a tool called that can be used to query the database using its native language. dbshell Using the we can explore what tables we have, query them in all possible ways and more. Let’s see some examples. dbshell Let’s start by listing the tables we have under a . path Let’s insert some values into this table. Now, let’s list the documents. We can query by . id Or we can use any other fields. Notice how the query is done. This is the OJAI query language and API playing their roles. This is native to MapR-DB. Remember, it is not a SQL database. As you could imagine, the is nice way to taste of how MapR-DB works and for doing quick and simple explorations. However, it might be hard to think about it as the preferred tool for large and complex queries. dbshell Let’s continue to explore the options we have and how to use them. MapR-DB Connector for Apache Spark MapR offers a connector for Apache Spark that can be used for large data processing on top of MapR-DB. The connector can be used on the different Spark APIs such as , , and . RDD[A] DStream[A] DataFrame/DataSet[A] For using the connector we must add the right dependencies to our spark project first. The following is a file from the project. build.sbt [Reactor](https://github.com/anicolaspp/reactor) Now, we should be able to use the connector without problems. The example above only shows a fragment of the app, but notice how the connector is used to load and save DataFrames from/to MapR-DB. The same can be done for other Apache Spark abstractions as mentioned before. Using the MapR-DB connector for Apache Spark we open a limitless of possibilities since we can combine the distributed nature of MapR-DB and Apache Spark together so we are able to truly process data at scale. Even though Apache Spark is one of the best tools we can have in our toolset, sometimes it is just not enough. We need to ask ourselves how users that have no coding experience can use the powerful features of MapR-DB without going through the learning process of Spark which, sincerely, it not short nor easy. Distributed Processing using Apache Drill When we need SQL, we have Drill. Using Apache Drill we can query almost dataset living in the MapR Platform regardless where it is stored, how it is formatted, or its size. Interacting with Drill can be done through its different interfaces. Let’s start by using the drill shell since it offers a very simple, shell based solution. As we can see, we can query MapR-DB, which is a non- database, using pure through Apache Drill. The result, as expected, comes back as a table. As you might suspect, queries of all kind can be executed, aggregations are especially interesting. SQL SQL Running queries like this on top of MapR-DB is mind-blowing. Drill knows exactly how to transform the queries to the underlying MapR-DB query language. SQL It is important to notice that Drill also runs distributed on the MapR cluster so the same principles for data distributions and high performance continue to apply here. Other Apache Drill Interfaces The shell is not the only interface Drill supports. We can also use Drill through the REST interface. Also, Drill offers a Web interface for a more friendly usage. Accompanying these interfaces comes the JDBC and ODBC interfaces. These ones are very important to BI tools like Tableau, Microstrategy, and others to connect and interact with Drill. The same ideas we discussed before apply here. For example, Tableau could connect to Drill through JDBC and Drill will run distributed queries on top of MapR-DB. This makes MapR-DB a very versatile and capable database. Conclusions MapR-DB is one of the most capable, non- options out there. It offers HBase and JSON capabilities under the same platform. It runs distributed on the MapR cluster, sharing most of the properties on the underlying platform (MapR-FS). MapR-DB can be queried in many forms such as OJAI API for application, for quick and simple interactions, Apache Spark for data processing at scale and Apache Drill for queries and data analytics and BI tools integrations. Regardless of the tool being used, MapR-DB keeps performance a priority by maintaining low latency and fast operations per seconds at any scale which makes it perfect for the next generation workloads of the future. SQL dbshell SQL Other tools for MapR-DB are independently developed, for instance, [_maprdbcls_](https://github.com/anicolaspp/maprdb-cleaner) that can be found here . It allows deleting documents (records) based on queries.

Interacting with MapR-DB

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

Miami Scala 2017 Presentations and Conferences Journal, with Pictures.

The Noonification: Feature Optimization for Price Prediction (11/26/2023)

10 Ways to Optimize Your Database

10 Essential Computer Skills for Data Mining

10 Most Evolving Big Data Technologies to Catch Up on in 2022

Top 10 JavaScript Charting Libraries for Every Data Visualization Need

Miami Scala 2017 Presentations and Conferences Journal, with Pictures.

The Noonification: Feature Optimization for Price Prediction (11/26/2023)

10 Ways to Optimize Your Database

10 Essential Computer Skills for Data Mining

10 Most Evolving Big Data Technologies to Catch Up on in 2022

Top 10 JavaScript Charting Libraries for Every Data Visualization Need

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps