What is a Programming Language?


A programming language is very similar to a real language. There is a structure behind it, and some rules which dictate what is, and isn’t, a valid thing to say. When we read and write natural language, we are unconsciously learning these rules, and the same is true for programming languages. We can utilise these rules to understand others, and generate our own speech, or code.

In the 1950s the linguist Noam Chomsky formalised a number of important observations about languages. Many of these form the basis of our understanding of language today. One of these was the observation that natural languages are built up of recursive and repeated substructures.

cat

Cat • cannot speak or program

As an example of this, we can examine the phrase.

The cat walked on the carpet.

Using the rules of English, the noun cat can be replaced by two nouns separated by and.

The **cat and dog** walked on the carpet.

Each of these new nouns could in turn be replaced again. We could use the same rule as before, and replace cat with two new nouns joined with and. Or we could use a different rule and replace each of the nouns with an adjective and a noun, to add description to them.

The **cat and mouse** and dog walked on the carpet.

The **white cat** and **black dog** walked on the carpet.

These are just two examples, but English has many different rules for how types of words can be changed, manipulated and replaced.

We notice this exact behaviour in programming languages too. In C, the body of an if statement contains a list of new statements. Each of these new statements, could themselves be another if statement. These repeated structures and replacements are reflected in all parts of the language. These are sometimes called re-write rules because they tell you how one thing can be re-written as something else.

if (x > 5) { return x; }

if (x > 5) { **if (x > 10) { return x; }** }

The consequence of this observation by Chomsky is important. It means that although there are an infinite number of different things that can be said, or written down in a particular language, it is still possible to process and understand all of them with a finite number of re-write rules. The name given to a set of re-write rules is a grammar.

We can describe re-write rules in a number of ways. One way is textual. We could say something like, “a sentence must be a verb phrase“, or “a verb phrase can be either a verb or, an adverb and a verb“. This method is good for humans but it is too imprecise for computers to understand. When programming we need to write down a more formal description of a grammar.

To write a programming language such as our Lisp we are going to need to understand grammars. For reading in the user input we need to write a grammar which describes it. Then we can use it along with our user input, to decide if the input is valid. We can also use it to build a structured internal representation, which will make the job of understanding it, and then evaluating it, performing the computations encoded within, much easier.

This is where a library called mpc comes in.