If you are like me, you probably had the experience of going over code you wrote a few months ago and not understanding a single line. I know it happened to me numerous times. Contrary to that, if I will read this write-up 10 years from now, I will still understand its meaning (although I might think I was naive and ignorant writing it). The reason is obvious, this piece of text is written in plain English, not a programming language like Javascript or (lo and behold) C++. Even experienced programmers understand natural language better.
It’s pretty obvious that even though we might have spent years writing computer code, we are still much more comfortable with English. Using non-natural language for coding has a significant cost attached to it. Any programmer knows that writing the code is just the beginning. The code needs to be documented, debugged, maintained and refactored. The non-natural nature of code requires significant effort and time to understand the logic of the code whenever it is touched. Training other developers to use the code is even more demanding. Working with public APIs (or company wide APIs) presents an even greater challenge of documentation and training. If only we could code in English.
Initial coding is just 5% of software life-cycle costs
Natural language programming has long been the holy grail of the software world. There were some attempts at designing programming languages to resemble natural language (Cobol, SQL, AppleScript), but non was close to resemble actual natural language. Natural Language Programming was just considered an unattainable goal. Non of the modern, frequently used programming languages attempt to resemble natural language. The most common reasons for abandoning NLP (Natural Language Programming) are:
While no one can deny the viability of these statements, they present a binary view of the goal, either you have natural language programming or you don’t.
I would like to suggest that reaching the goal of NLP should be a gradual process, moving along a continuum between non natural computer code and natural language. I would like to offer the following phases towards achieving the daunting goal:
At this phase, we suggest a computer language that is read like a natural language but writing it requires some understanding of programming and knowledge of available natural language functions. Consider the following JavaScript expressions:
strlen(Str);getElementById('button1').focus();Node2.parentElement.insertBefore(Node1,Node2);
The notation is borrowed from mathematical functions, a remnant from the days software was studied in Math departments. There is nothing natural about it. Now, consider the following parallels:
length of Strfocus on element "button1"insert Node1 before Node2
The meaning of the second sequence are exactly the same as the first sequence but the second sequence is readable and seems natural even to non-programmers. the main difference is that arguments are inlined in the function name rather than grouped in parenthesis. A small syntactical difference makes a huge difference in legibility of code. The inline format does not require any complex machine learning algorithm. It just has to do with changing the syntax of the language. consider the following definition of a function:
DEFINE length of S ASreturn S.length;END
This definition can be part of a deterministic programming language, working with the same programming language paradigms we are used to. The only difference is that arguments are inlined.
Another important “feature” of natural language than can be imported into formal programming language is using context. In natural language, it is common to refer to an entity in the context of the phrase, using the determiner “the”, e.g. move **the image** 23 pixels to the right
. In traditional programming paradigm, one of the greatest sins is referencing objects outside of the function body. In practice, referencing context is essential for programming. The workaround is passing the context as an additional argument. Needless to say, that does not improve readability of code. In natural programming language this might look like this:
//defining the functionDEFINE move to next token CONTEXT the text, the current position,the current token AS ... END// ==> function moveToNextToken(theText, theCurrentPosition, theCurrentToken){...};
//calling the functionthe text = "hello world"; the current position = 0;move to next token;//referencing the text and the current positionshow (the current token)
Obviously, the natural language compiler needs to support calling by reference using some context value object holding the primitive values.
There are additional “tweaks” to existing programming languages that can make them a lot more “natural” like typing and using pronouns.
Notice, that while the programmer writing the code, needs to be aware of available function length of S
, the arguments the function takes, and, in general, how programs are written, any person reading the text length of Str
can easily understand the meaning of the phrase. Restricted NLP means the writer must use existing NLP function but the reader can understand the code with no prior knowledge. Implied by this is that no additional documentation is required. The code is the documentation.
I suggested a possible syntax for such a language in a github project naturaljs. This is a suggestions for a Javascript natural language extension that transpiles into Javascript code, similarly to the way transpiling typescript works. If you believe in the project please star it. I pledged to implement it once the project gets 100 stars.
In phase I, the programming experience is pretty similar to current programming language paradigms. The programmer is required to:
show (length of S)
rather than show length of S
Computer aided NLP would ease the programming process by relaxing these limitation. The output would still be the same deterministic code. However, the programmer can write a bit more freely and the interactive compiler would suggest different options based on previous programming history, available NLP programming corpus and machine learning. It can suggest the following:
I do believe that by implementing phase I and phase II we will drastically improve the programming experience and significantly reduce the cost of software development and maintenance. However, phase III will totally change our concept of software development.
In my opinion the ultimate goal of natural language programming should be automatically deriving computer code from a requirements document. This implies deeper natural language understanding capabilities and, a great part of the solution would be some level of automatic software design. The “computer” might use online profiling or requirement hinting to devise the best design for the job. Design might also automatically change as requirement are adapted.
While this might seem far fetched at the present, I do believe that with the advancement of machine learning technology and relying on the previous two phases, this goal is within reach. this might require some formalism to the requirements writing, but the significance of reaching such a phase is tremendous. This would totally change software engineering the way we know it.
If you believe in the concept of natural language programming I would suggest you have a look at my naturaljs github project. It contains some ideas of what would phase I look like. I pledged to start implementing phase I once the project receives 100 stars.
elshor/naturaljs_naturaljs - Add natural language programming capabilities to javascript_github.com
Quote from Martin Fowler