In the of this series, we showed how we can use MPI on top of the MapR Data Platform to successfully find prime numbers within a rather large range. Also, we compare our implementation in MPI and the same algorithm in Spark just to discover how they both behave while looking at some interesting implementation details. first part Sieve of Eratosthenes In the second part of the series, we are going to implement in both, MPI and Apache Spark. Again, we will look at how each implementation behaves when running on MapR. However, our MPI implementation will be based on while in Spark we will use the MLlib BlockMatrix functions for multiplying matrices. Matrix Multiplication Cannon Algorithm Cannon Algorithm There are implicit implications in parallel computing. The memory requirements tend to increase as we increase the number of processors participating in the computation. This is not trivial but proven. The Cannon algorithm is extremely scalable. That means that by increasing the number of processors we keep the memory requirements low. The following is a scalability analysis on using multiplication against used by . Matrix Multiplication Matrix to Matrix Block Decomposition Matrix Multiplication Cannon Algorithm As we can see, in the first part ( ) we don’t get any scalability since the memory depends on — the number of processors. In the second part ( ) the memory requirement is independent of the number of processors, more specifically, it is a constant which allows us to scale way better. Matrix-Matrix P Cannon-Algorithm Cannon Algorithm Implementation using MPI Now, we are going to show the main pieces of the implementation, it’s core. Then we will link the entire source code to be reviewed for those interested in it. First of all, let’s look at the functions we are going to use so we can grasp a better idea about the application flow. Notice, that reading and writing the matrices from/to file, happens in parallel and distributed. Each processor is in charge or its own block (piece) of the matrix in question. Our application then becomes the following. As we can see, this is not a simple piece of code, but it presents the core elements on . The entire source code can be found in this Cannon Algorithms Github Repository . In Spark, the same can be achieved by simple constructs. The following code snippet shows this process. As we can see it is very simplistic, but let’s see how fast it is compared to our MPI version. Using MPI 1000x1000 matrix multiplication mpirun -np 16 multiply 1000x1000.in 1000x1000.in 1000x1000.out elapsed time: 2.082910 // around 2 seconds 10000x10000 matrix multiplication mpirun -np 16 multiply 10000x10000.in 10000x10000.in 10000x10000.out elapsed time: 307200 // around 5 mins In Spark 1000x1000 matrix multiplication multiply("1000x1000.txt")(sc) res19: Long = 4118 // around 4 seconds 10000x10000 matrix multiplication multiply("10000x10000.txt")(sc) res0: Long = 973943 // around 16 mins As we can see, the time taken by Spark increases incredibly fast, yet MPI is able to run the process with a small time penalty. At the same time, when multiplying 10000x10000 matrices, I ran out of heap space in Spark and it had to be increased to 12g per executor to run this operation. Again, there is a we cannot avoid. That is simplicity and reliability from Spark compared to high performance but difficult of coding and maintaining from MPI. tradeoff At the end of the day, it is all about choosing the right tool for the job and MapR Data Platform is able to run without any problems workloads build on Spark or MPI. Notes on MPI reading from files In order to multiply matrices, our MPI application must read the corresponding input from files. These files are stored in the MapR-FS which supports different interfaces to interact with it such as HDFS, POSIX, and others. Our code for MPI uses regular calls to read the matrices so that each process reads only a part of the matrix that we call . When using Spark, Spark uses, when possible, data locality which helps with speeding up the process, since each processor reads its data locally from the node it is running. In MPI, that is dependable on the MapR client. C POSIX blocks POSIX Conclusions Even though MPI presents some limitations when interacting with the MapR distributed file system, the overall processing time is quite impressive when compared to what Spark offers for . Spark, on the other hand, offers constructs for doing these operations with easy while keeping the entire process reliable and fault tolerant, features lacking on MPI. Matrix Multiplication The tradeoff between simple usage and high performance has been there for years and cannot be ignored, being aware of it helps us to decide on every situation.

Flow

MPI workloads performance on MapR Data Platform Part 2 — Matrix Multiplication

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

Untitled Story

Miami Scala 2017 Presentations and Conferences Journal, with Pictures.

The Noonification: Feature Optimization for Price Prediction (11/26/2023)

10 Ways to Optimize Your Database

10 Essential Computer Skills for Data Mining

10 Most Evolving Big Data Technologies to Catch Up on in 2022

Miami Scala 2017 Presentations and Conferences Journal, with Pictures.

The Noonification: Feature Optimization for Price Prediction (11/26/2023)

10 Ways to Optimize Your Database

10 Essential Computer Skills for Data Mining

10 Most Evolving Big Data Technologies to Catch Up on in 2022

Light-Mode

Classic

Newspaper

Dark-Mode

Neon Noir

Minty

HN StartUps