Last summer MapD open-sourced their technology and made it available for everybody. At that moment me and my colleagues at where working on a POV for a potential big client which we had to impress. Our data analytics capabilities where already advanced, but couldn’t handle a lot of data due to the fact it worked client side. We decided to hop on the MapD-train and the results thus far are pretty amazing. Dimebox MapD is a GPU database platform. It consists of a few standalone packages that work together. These are: Core: an in-memory, column store, SQL relational database that was designed from the ground up to run on GPUs. MapD MapD Charting: Dimensional charting built to work natively with crossfilter rendered using d3.js MapD Crossfilter: library for exploring large multivariate datasets in the browser. Based on . JavaScript crossfilter MapD Connector: A JavaScript library for connecting to a MapD GPU database and running queries. Combine them all together and you have a platform that can almost instantly visualize billions of data records. The GPU database The big difference between the MapD platform and a lot of other data visualization platforms is the fact that MapD runs on a GPU database. GPU databases offer significant improvements over the conventional CPU database when performing repetitive operations on large amounts of data. This is because a GPU can have thousands of cores and a CPU usually just has a few. This way a GPU can handle a lot of simultaneous streams while a CPU can handle only a few. conducted a benchmark with a 1.1 billion record taxi dataset. The results are as follows: Mark Litwintschik Image from https://www.mapd.com SELECT cab_type, count() FROM trips GROUP BY cab_type; Query 1: SELECT passenger_count, avg(total_amount) FROM trips GROUP BY passenger_count; Query 2: SELECT passenger_count, extract(year from pickup_datetime) AS pickup_year, count() FROM trips GROUP BY passenger_count, pickup_year; Query 3: SELECT passenger_count, extract(year from pickup_datetime) AS pickup_year, cast(trip_distance as int) AS distance, count(*) AS the_count FROM trips GROUP BY passenger_count, pickup_year, distance ORDER BY pickup_year, the_count desc; Query 4: System configurations : 1 machine (16 cores, 512 GB RAM, 2 x 1TB SSD, 8 Nvidia Pascal Titan X GPUs) MapD : 6 machines (36 cores, 244 GB RAM, 16TB HDD, AWS ds2.8xlarge) Redshift : 50 machines (4 cores, 15 GB RAM, 100GB SSD, GCP n1-standard-4) Presto : 11 machines (4 cores, 15 GB RAM, 2 X 40GB storage, AWS m3.xlarge) Spark As you can see MapD runs only on one machine, but is around 10 to a 100 times faster than the other options. Pretty awesome, isn’t it? Visualizing the data For me as a front-end developer this obviously is the most exciting part. As mentioned earlier, we at Dimebox used a client-side solution first. This solution was a combination of and . These libraries are pretty awesome, but since they run client-side the amount of data you can display is limited. With MapD this problem is solved. When using MapD Charting and Mapd Crossfilter you have the same libraries but with the ability to display billions of data records. The possibilities are endless, here are some examples: dc.js crossfilter Because of crossfilter all graphs are linked You can also crossfilter while drawing on a map Especially the map examples are pretty awesome, but you probably end up with some “normal” graphs more often. Graphs that are supported are: Bar chart Bubble chart Row chart Pie chart Line chart Count chart Number chart Geochoropleth chart Some charts still have some issues, they’re working on improving those and adding new ones. Nevertheless, combine the above charts and you can already make some pretty powerful dashboards. I have also created an example dashboard with all those graphs. You can find this on my github profile: . This is also a nice reference for if you want to get started with any of the above graphs. https://github.com/luukgruijs/mapd-examples This all looks very promising Yes it is, but there are also a few points which certainly can be improved or should be adressed: First of all, the documentation is not very rich. A lot of the graphs have no examples, so it’s a bit of a shot in the dark if you’re new to dc.js. Also there are not really written guidelines yet on how to for example leverage MapD in your existing API. You can of course ask yourself wether it’s their job to provide this, but i think it could help with bigger adoption and thus more open-source contributions. Luckily there is where you can ask questions and usually you get quality responses in a decent timeframe. https://community.mapd.com/ Second, the database does not support UPDATE and DELETE queries yet. They say that they are working on this though. This however means that with the current possibilities you have to wipe the entire database and re-insert new data or that you have to work with partly duplicate data. here Third, by default MapD is vulnerable to SQL injections. Since queries are send from the browser to the server. You can intercept the requests and extend or change the query in whatever you like. You need to create some logic on your server to fix this and prevent bad shit from happening. Fourth, MapD did not publish their packages on NPM yet. You can ofcourse still get it by getting it directly from their github, but an NPM package would make it a lot easier to install in existing projects. Last but not least, GPU instances are relatively expensive. While this of course is not really MapD’s problem, it’s worth mentioning. If you for example have multiple clients and need to run multiple GPU instances things can get costly quite quickly. The cheapest GPU instance on Amazon costs 700 dollars a month. While you always have to place costs like this in perspective, let’s just say you probably can’t use MapD for a fun data rich hobby project. Conclusion To me MapD is certainly one the most exciting technologies out there now. But it’s not for everyone, yet. To use MapD in your existing product you have to have some knowledge about d3, dc and crossfilter in the front-end. You should also have some knowledge to make everything safe and polished to your needs in the back-end. I hope the project receives more contributions over the next month. I already started with some contributions myself in the Mapd Charting project and am planning to do more. Exciting times! Thanks for reading. Please hit the clap button if you liked this article. Any feedback? Let me know. Also check my other articles: https://hackernoon.com/validating-reactive-forms-with-default-and-custom-form-field-validators-in-angular-5586dc51c4ae https://hackernoon.com/manage-your-observable-subscriptions-in-angular-with-help-of-rx-js-f574b590a5cb https://hackernoon.com/understanding-creating-and-subscribing-to-observables-in-angular-426dbf0b04a3 https://hackernoon.com/managing-large-s-css-projects-using-the-inverted-triangle-architecture-3c03e4b1e6df https://hackernoon.com/understanding-map-filter-and-reduce-in-javascript-5df1c7eee464 Follow me on Medium or twitter and let’s connect on LinkedIn

Amazon

Instantly

NVIDIA

Serious about big data visualization? Consider using MapD.

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

Abstracting mongoose CRUD operations into a shared file

Top 10 JavaScript Charting Libraries for Every Data Visualization Need

10 Data Table Libraries for JavaScript

10 Best React Native Chart Libraries

10 Best Datasets for Time Series Analysis

12 Mistakes that Data Scientists Make and How to Avoid Them

Abstracting mongoose CRUD operations into a shared file

Top 10 JavaScript Charting Libraries for Every Data Visualization Need

10 Data Table Libraries for JavaScript

10 Best React Native Chart Libraries

10 Best Datasets for Time Series Analysis

12 Mistakes that Data Scientists Make and How to Avoid Them

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps