paint-brush
Why I'm in love with Juliaby@DavidDataScience
968 reads
968 reads

Why I'm in love with Julia

by DavidMarch 5th, 2020
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

In this article I'm going to make a case why people serious about creating machine learning algorithms and high performance data science programming should use Julia rather than Python. 

Companies Mentioned

Mention Thumbnail
Mention Thumbnail
featured image - Why I'm in love with Julia
David HackerNoon profile picture

In this article I'm going to make a case why people serious about creating machine learning algorithms and high performance data science programming should use Julia rather than Python. 

For the record, I love Python - I love the freedom it gave me in the space between scripting and application programming. As I have been programming high-performance data processing and analysis codes that live on HPC clusters for more than 25 years, I have a different perspective than perhaps many of the user/programmers that employ Python to solve problems.

For big data especially, be it in biotech, fintech, earth science, or otherwise, data processing and analysis codes can run for days or weeks. A 2x, 3x, or 5x slow-down in exchange for programmer productivity or abstractions is an unacceptable trade off.

Maybe, languages like Python, Matlab, R and others that allow the researcher to formulate new approaches to a solution, are useful for research, but they cannot compete in the Big Data/HPC world unless they get within less than a large factor of two in run-times when compared to optimized Fortran.

Myself, for general HPC applications I use a mixture of HPF (high-performance fortan) and C/C++ with GPGPU. For commercial applications that need to be shipped to customers, I use a mixture of Java(Netbeans)/C and GPGPU. Currently for my front-end application user-facing development I use Java Netbeans, but more and more so I now use Python and Node.js, plainly because they're simple and portable.

That being said, I was unsatisfied with the results, until I met Julia.

The launch post for Julia promised "the language combined the speed of C with the usability of Python, the dynamism of Ruby, the mathematical prowess of Matlab, and the statistical chops of R."

The language's core features from a technical standpoint, according to Edelman, one of Julia’s core creators and promoters, "are its multiple dispatch paradigm, which allows it to express object-oriented and functional programming patterns, its support for "generic programming", and its "aggressive type system". This type systems caters to many different use cases. It is dynamically typed, but with support for optional type declarations. The language "feels like a scripting language", but can be compiled to "efficient native code" for multiple platforms via LLVM."

"Julia’s published benchmarks show it performing close to or slightly worse than C, and Fortran, as usual, performing better than C for most tasks."

Julia's syntax is much like Matlab, but with the flexibility of Python, and offers Lisp-like macros, thereby making it easier for programmers to get started. For me the separation from Matlab/Python really shows itself when using Julia's startlingly quick LLVM-based just-in-time (JIT) compiler, with easy to implement distributed parallel execution, and care for numerical accuracy. 

Julia also features a mathematical function library, most of which is written in Julia, as well as C and Fortran libraries. This can't be considered a bad thing from any perspective. Most of the major machine learning tool sets out there (like TensorFlow for example) are written in C under the covers. This is a good thing from a performance perspective, but sometimes a drawback as we see below.

For programmers, Julia is powerful because it is built around types. We can easily build generic code which has good performance over a large range of types. The result is something that approaches high-performance Fortran code, but that also has many features.

Edelman said "Matlab and the other environments take previously written Fortran or C, or proprietary code, and then glue it together with what I call bubblegum and paperclips. This offers the advantage of easy access to programs written in more difficult languages, but at a cost. When you're ready to code yourself you don't have the benefit of the Fortran or C speeds".

The advantage of Julia that makes the biggest difference for me concerns legacy library issues. Fortran and C routines (compiled as shared libraries) can be called directly, even from the REPL, with "no glue code and no ceremony." 

Here is an example of using the Fortran NAG library without much fuss or muss:

function nag_nearest_correlation(g::Matrix{Float64}, order = Nag_ColMajor, errtol=0.0, maxits=NagInt(0), maxit=NagInt(0))
    n, pdg = size(g)
    pdx = pdg
    x = Array(Float64,n,pdg)
    nag_nearest_correlation!(order, g, pdg, n, errtol, maxits, maxit, x, pdx)
    return x
end


G = [ 2.0 -1.0  0.0  0.0;
     -1.0  2.0 -1.0  0.0;
      0.0 -1.0  2.0 -1.0;
      0.0  0.0 -1.0  2.0]

X = nag_nearest_correlation(G)

This means that I can quickly build on years of previous development without spending weeks or months of complicated front-end wrapper building, that itself must be tested and debugged, whereas the code I'm importing is already tested and ready to use.

A case Edelman uses to promote this idea:

Unfortunately, the machine-learning model's ability to predict whether an individual had TB was hampered when those coughing had different accents.

"What you want to do, of course, is learn whether somebody was sick or not and you didn't want it to learn the difference in accents," said Edelman.

Resolving the confusion caused by different accents was difficult using a high-level language like Python with standard machine-learning libraries, which are typically written in a better performing low-level language like C++.

"What I was told is that all of the regular libraries just couldn't do it. It wasn't very difficult, but you had to tweak the neural networks in a way that the standard libraries just wouldn't let you do," he said.
"The current libraries are sort of like brick edifices, and if you to move them around you've got to be a pretty heavy-duty programmer to change them.
"But this fellow said with Julia, because it's high level, he was able to go in readily and solve this problem.
"So what we really want to do is enable more and more people to do that sort of thing, to be able to get beyond the walls of these existing libraries and to innovate with machine learning."

Julia's power to import these "brick edifices" at will lend to it's power, but it's ability to performantly supplant them makes it the clear choice for someone like myself versus say Python.

Here is an snippet example for a part of neuron activation function from TensorFlow, the whole file

softsign_op.cc
, and it's associated headers, are inaccessible in some sense to the average Python developer. In Julia, we can write performant code like this, but it's easier to understand and modify.

template <typename Device, typename T>
class SoftsignGradOp
    : public BinaryElementWiseOp<T, SoftsignGradOp<Device, T>> {
 public:
  explicit SoftsignGradOp(OpKernelConstruction* context)
      : BinaryElementWiseOp<T, SoftsignGradOp<Device, T>>(context) {}

  void OperateNoTemplate(OpKernelContext* context, const Tensor& g,
                         const Tensor& a, Tensor* output);

// INPUTS:
//   g (gradients): backpropagated gradients
//   a (inputs): inputs that were passed to SoftsignOp()
// OUTPUT:
//   gradients to backprop
template <int NDIMS>
void Operate(OpKernelContext* context, const Tensor& g, const Tensor& a,
               Tensor* output) {
    OperateNoTemplate(context, g, a, output);
  }
};

template <typename Device, typename T>
void SoftsignGradOp<Device, T>::OperateNoTemplate(OpKernelContext* context,
                                                  const Tensor& g,
                                                  const Tensor& a,
                                                  Tensor* output) {
  OP_REQUIRES(context, a.IsSameSize(g),
              errors::InvalidArgument("g and a must be the same size"));
  functor::SoftsignGrad<Device, T> functor;
  functor(context->eigen_device<Device>(), g.flat<T>(), a.flat<T>(),
          output->flat<T>());
}

Recalling that I love Python almost as much as Julia:

Development in Julia is accelerated by a package manager that eases the development of add-ons.

IJulia
was developed in conjunction with the
IPython
community to link together with the
Jupyter
browser-based graphical notebook interface, which our community utilizes for demonstrating research with easily repeatable results.

Edelman waxes poetic about Julia's built-in features that make it easier for developers to spread workloads between multiple CPU cores, both in the same processor and across multiple chips in a distributed system. Current development is focusing on Julia's native support for parallel processing on other types of processors, such as Graphics Processing Units (GPUs) and Google's Tensor Processing Units (TPUs) that are used to more and more being used to accelerate machine learning.

Julia is a powerful language, but it is still relatively young when compared to Python, but perhaps you want to use a library of your own from outside Julia. For situations like this, Julia provides ways to call libraries from R and Python (and with extensions even Java via

JNI
, in a much more mature way then
Jython
and friends)

julia> using JavaCall

julia> JavaCall.init(["-Xmx128M"])

julia> jlm = @jimport java.lang.Math
JavaObject{:java.lang.Math} (constructor with 2 methods))

julia> jcall(jlm, "sin", jdouble, (jdouble,), pi/2)
1.0

Here is an example of using the 

Pandas
 machine-learning toolkit (from Python directly in Julia)

julia> Pkg.add("PyCall.jl")
using PyCall

@pyimport pandas as pd

df = pd.read_csv("train.csv")

Or more directly using https://github.com/JuliaPy/Pandas.jl

In this next example we call Python's 

matplotlib
 directly from Julia with little or no overhead (arrays are passed without making a copy).

using Pkg

Pkg.add("PyPlot")

using PyPlot

x = range(0,stop=2*pi,length=1000); y = sin.(3*x + 4*cos.(2*x))

plot(x, y, color="red", linewidth=2.0, linestyle="--")

This is what I truly love about Julia. I have accumulated a lot of specialist code and libraries over the last 30 years. For the most part, there is nothing to port or rewrite, and no tiring tasks.

I want to be clear, I'm not saying that "Python is for kids" or some such thing, but, for serious research combined with rapid development and deployment in a big-data world, Julia has distinct advantages, that in my opinion far outweigh the disadvantages over say Python. 

As the language gains support (and it's one of the fastest growing languages out there) much of what is missing will have been added. It's my opinion that Python cannot grow into Julia, and for that reason,

I am in love with Julia.

Previously published at https://www.linkedin.com/pulse/why-im-love-julia-david-markus/