Preparing for an interview, exam, or just curious to learn about what compilers and interpreters do? In a very basic sense, a compiler compiles the entire code altogether for later use whereas an interpreter reads the code line by line at run time.
However, to understand the depths of how modern-day compilers and interpreters work for various programming languages, we need to go through a lot more details.
Given below are the topics we will cover to settle the compiler vs interpreter debate –
The historical definition of a compiler defines it as software that converts the source code of a computer program to machine instructions or machine code. Source code is the code that developers write whereas machine code is all 0s and 1s, and consists of instructions for the computer CPU to perform tasks.
The above definition essentially means that a compiler must understand the mechanics of the programming language in which source code is written, and for that reason, compilers are language-specific. Example: One of the C compilers is GCC and the java bytecode compiler is Javac.
The process of compiling source code to machine-executable requires many in-between steps.
The below diagram illustrates the typical flow of converting a human-readable program to the computer-executable code –
Also note that the output in the form of machine code/executable is not 100% generic, it includes processor-specific instructions. AMD for example may not understand the binary/machine code generated for the Intel processors. So, compilers need to be platform-specific too.
Nowadays, the term compiler is used to cover many other use cases too. It is also used for the software that translates source code from one format to another. The more appropriate term for this kind of software is transpiler but it is all blurry.
The input to the transpiler doesn’t necessarily have to be a programming language source code and the output doesn’t necessarily have to be the machine code.
Take an example of a Java compiler that translates “.Java” files to .class files. “.Class” file is not the final machine code, it is the intermediate bytecode, which needs further translation or interpretation for the machine. But still, we call the software that converts “.Java” to “.Class” a compiler.
You can also find an alternate compiler for Java that translates Java code to C code by converting Java files to C language files. C language code then can further be compiled to machine code by the C compiler. Many would call this type of software a language source code converter (transpiler) rather than a compiler.
Again, if we look at the historical definition of an interpreter, an interpreter is software that reads the source code line by line and generates machine instructions at run time.
So, it doesn’t pre-compile anything but interprets the provided input, on the fly, to instruct the CPU for performing tasks in sequence.
The below diagram illustrates the simplistic flow of how the interpreter works-
Like compilers, interpreters are not universal either and are designed to read specific input formats. For example, an interpreter can be designed to interpret JavaScript source code or Java bytecode or for that matter any other input format.
Also, note that different programming languages implement interpreters differently, we will see a few in the subsequent sections of this article.
Just in Time compiler is another variation of the compilers that you encounter in today’s world. JIT compiler typically reads pre-compiled bytecode generated by the compiler and translates it into the machine code, on the fly, during run time.
Let us extend our understanding of the java compiler. Java compiler converts .java to .class. “.Class: contains the bytecode which runs on the Java virtual machine – JVM. But what does it have to do with JIT?
JVMs’ earlier implementations used to read bytecode line by line and generate machine instructions on the fly, more like an interpreter. Soon after, JVMs started to implement JIT as well, to convert the entire bytecode into machine code just before the execution, in the memory.
Why JIT? To improve execution performance!
Pre-compiled machine code is optimized and the CPU performs faster as compared to an interpreter which executes bytecode line by line. JIT itself brings some overhead in terms of memory consumption, but the benefits are generally more than the overhead.
Let us see how the source code of various programming languages ultimately translates to machine instructions –
This is a straightforward use case where a C compiler translates C code to the machine instructions. There are multiple tasks performed by the compiler in between though –
Once developer-written source code is compiled, it is no longer needed for program execution. All that a CPU need is the final compiled code.
Java is clearly a two-step process. The first step is to generate platform-independent bytecode in the form of class files from java files. This is what makes Java a platform-independent language too from a developer perspective since developers just need to worry about generating standard class files.
If classes are platform-specific then what about the different architecture of processors. Well, that is taken care of by the JVMs, you have platform-specific JVMs (Interpreters) that produce the output as per the instructional pattern required by the specific platform.
You may want to read about Online Java Compilers
Python is more like Java from a life cycle perspective. There is a minor difference though, developers do not need to compile the code, python implementation takes care of it and converts the source code in .py files to compiled code in .pyc files behind the scene.
.pyc files are then interpreted by the PVM – Python Virtual Machine, at runtime, similar to how Java bytecode is interpreted by the JVM.
From a developer’s perspective, Python looks like an interpreted language but in practice, it is a compiled language and the code is actually pre-compiled.
Furthermore, the Python ecosystem also has something called Jython that converts .py code to the bytecode that can run on JVM itself instead of PVM. Not only that, but it also has IronPython which makes python code run on .Net environments.
You may also like to read:
Compiler vs interpreter is more like a scholarly discussion these days and brings in differing views without concrete definitions.
The ultimate goal is to get to the machine code, whether you do it in one way or the other, and using one tool or multiple is purely dependent on the use cases.
Furthermore, the journey of the source code is not as simple as it might seem, there are multiple steps in between including code cleanup, removing comments, the inclusion of referred files, pre-compiling, assembly of the code, modules linking, language conversion in some cases, bytecode generation in few others and whatnot.
Also Published here