An important part of any application is the underlying log system we incorporate into it. Logs are not only for debugging and traceability, but also for business intelligence. Building a robust logging system within our apps could be use as a great insights of the business problems we are solving. Log4j in Apache Spark Spark uses as the standard library for its own logging. Everything that happens inside Spark gets logged to the shell console and to the configured underlying storage. Spark also provides a template for app writers so we could use the same libraries to add whatever we want to the existing and in place implementation of logging in Spark. log4j log4j messages Configuring Log4j Under the folder, there is file which serves as an starting point for our own system. SPARK_HOME/conf log4j.properties.template logging Based on this file, we created the file and put it under the same directory. log4j.properties looks like follows: log4j.properties Basically, we want to hide all logs Spark generates so we don’t have to deal with them in the shell. We redirect them to be logged in the file system. On the other hand, we want our own logs to be logged in the shell and a separated file so they don’t get mixed up with the ones from Spark. From here, we will point to the files where our own logs are which in this particular case is Splunk /var/log/sparkU.log. This ( ) file is picked up by Spark when the application starts so we don’t have to do anything aside of placing it in the mentioned location. log4j.properties Writing Our Own Logs Now that we have configured the components that Spark requires in order to manage our logs, we just need to start writing logs within our apps. In order to show how this is done, let’s write a small app that helps us in the demonstration. Our App: Running this Spark app will demonstrate that our log system works. We will be able to see how and messages being logged in the shell and in the file system while the Spark logs will only go to the file system. Hello demo I am done So far, everything seems easy, yet there is a problem we haven’t mentioned. The class is not which implies we cannot use it inside a while doing operations on some parts of the Spark API. org.apache.log4j.Logger serializable closure For example, if we do in our app: this will fail when running on Spark. Spark complaints that the object is not so it cannot be sent over the network to the Spark workers. log Serializable This problem is actually easy to solve. Let’s create a class that does something to our data set while doing a lot of logging. receives a and returns a and it also logs what value its being mapped. In this case, noted how the object has been marked as which allows the serialization system to ignore the object. Now, is being serialized and sent to each worker but the log object is being resolved when it is needed in the worker, solving our problem. Mapper RDD[Int] RDD[String] log @transient log Mapper Another solution is to wrap the object into a construct and use it all over the place. We rather have within the class we are going to use it, but the alternative is also valid. log object log At this point, our entire app looks like follows: Conclusions Our logs are now being shown in the shell and also stored in their own files. Spark logs are being hidden from the shell and being logged into their own file. We also solved the serialization problem that appears when trying to log in different workers. We now can build more robust BI systems based on our own Spark logs as we do with other non distributed systems and applications we have today. Business Intelligence is for us a very big deal and having the right insights is always nice to have. If you find this post useful, please recommend it so others can benefit from it. Read next: How to log in Apache Spark, a functional approach is how hackers start their afternoons. We’re a part of the family. We are now and happy to opportunities. Hacker Noon @AMI accepting submissions discuss advertising &sponsorship To learn more, , , or simply, read our about page like/message us on Facebook tweet/DM @HackerNoon. If you enjoyed this story, we recommend reading our and . Until next time, don’t take the realities of the world for granted! latest tech stories trending tech stories