Break the Loop: How I Finally Understood Functional Programming (Without the Math)

Introduction

My first attempt to get acquainted with functional programming was a failure. The very first program on the opening pages of the textbook looked like Celtic runes. The familiar "here we define a function named foo, which uses a loop to accumulate the sum of odd elements from a list passed as an argument" — I just couldn't see it. As you’ve already gathered, I’m perfectly fine with imperative programming.

The second attempt — a video course — didn't lead to anything good either. By the start of the third lecture, it turned out that alpha-equivalence has little to do with beta-reduction for lambda terms. I realized that life is short, and there are still plenty of letters left in the Greek alphabet.

But since functional programming exists, it can be mastered! I didn't want to back down. And I thought: damn it, I’m somewhat familiar with programming language theory. I even know what a computation model is. Maybe it’s worth trying to approach it from this angle?

Spoiler: it worked.

Getting to Grips with the Model of Computation

A program is a way to obtain a result by processing input data. All definitions below are not strict in the academic sense. However, I suppose they are quite sufficient to begin with.

Data types define what values can exist in a program.
Memory management is responsible for storing data of these types.
Expressions describe the process of computation. The result of evaluating any expression is a value.
Control of computation organizes this process.

For a specific programming paradigm — for example, functional programming (referred to as FP from here on) — these are axioms.

Expressions connect data with functions.

Expressions allow connecting data with functions. Any expression is replaced by its value during evaluation.

Evaluation is a process of substitutions: parts of an expression are replaced by their values until everything “collapses” into a result. In more formal terms, this is called reduction.

We can use names, but there are no variables with mutable values. That means there is no assignment. Instead of loops, we use recursion. This is handled by recursive functions. But not all functions are necessarily recursive.

Functions in FP are values just like numbers or lists. They can be passed around and returned.

For example, if f = 2x - 1, then the expression f 4 will be replaced with 7 during the process of reduction. Parentheses are not required.

The natural fit for recursive functions is recursively defined data: lists and trees.

All data stored in memory is immutable. During execution, any number of temporary, immutable values may be created. There are no other kinds. There is no need to store them forever; garbage collectors work quite efficiently in FP.

Imagine you’ve written a program by defining what you want to obtain from the input data. Once finished, the program "comes alive" and begins to modify its own source text, replacing expressions with their values. The sum of two expressions is replaced by the resulting sum. A function call with an argument is replaced by the function's value for that argument. During this transformation, the imaginary program text might even grow in size. But ultimately, the result of the final reduction will be the solution to the problem for which you wrote the program.

Now we can start solving problems. I would recommend trying to find a solution yourself after reading the problem statement — using only the permitted methods of storing and processing data.

Reasoning

Task 0: Determining the length of a list

If the list is empty, its length is 0. No questions there. If the list is not empty, it consists of a head and a tail. The head is the first element, and the tail is the rest of the list. There’s your recursive definition! A list with a single element has an empty tail. Once we reach the empty tail, we stop.

The length of a non-empty list is defined as one plus the length of the tail. The logic of the solution strictly mirrors the structure of the list itself. Two list variants — empty or non-empty — define two paths in the algorithm.

So, we ask: what is the length of the list [3, 7, 11] — this is our expression. We then perform a substitution. It becomes: one plus the length of the list [7, 11]. Then another substitution: one plus one plus the length of the list [11]. As I was writing this, I came up with Figure 1. The cool thing is that we don’t care about the elements of the list at all. They could be numbers, strings, or other lists. The solution remains the same.

Task 1: Find the sum of the elements in a list

Variables and loops are not allowed.

Given list: [3, 7, 11].

We’ll reason just like in the previous task. The sum of an empty list is zero. If the list is not empty, the sum is the value of the head plus the sum of the tail. I’ve tried to show in the figure how the expression Sum([3, 7, 11]) changes through substitutions until, eventually, the function call for the original list equals the result: 21. I’ll show you how to name all of this in FP terms later; for now, let's move to the next task.

Task 2: Find the maximum value in the list [7, 11, 3]

If you feel like everything has become clear by this point, then it was worth suggesting this task. It’s unclear what to do with an empty list. An empty list has no maximum element. It has no minimum either. Let’s do this: we’ll forbid searching for the maximum element of an empty list. But for a non-empty one, we work the usual way. If a list has only one element, its value is the maximum. (And the minimum, by the way, too). If the list has more than one element, as in our task, the maximum will be chosen between the value of the head and the maximum of the tail's elements. As an exercise, try to draw the substitution process yourself.

Hint: for the first comparison, will the value 11 be compared with 3 or with 7?

Task 3: Increase each element of the list [3, 7, 11] by 5

To solve this, it’s helpful to imagine a conveyor belt with parts moving along it. A single action is performed on each part: drill a hole, paint it green, or add five to it. In FP, this action is performed by a function.

This function will know how to add five to its single argument and return the sum.

As we remember, data is immutable, so it’s logical to assume the result will be a new list: [8, 12, 16].

We apply a function—let’s call it Plus_5—to each element of the original list. We take the head of the original list, apply the Plus_5 function to it, and move on to processing the tail, which has its own head. As a result, the resulting list will be formed element by element.

Question: what should the Plus_5 function do with an empty list? Think about it for now; I’ll give the answer later.

Task 4: Remove all odd elements from the list [2, 3, 6, 7, 11, 12]

As usual, a new list appears.

There is nothing to remove from an empty list.

If the list is not empty, we take its head. If the value is even, we add it to the resulting list. If it’s odd, we simply skip it. Then we move on to processing the tail.

Notice that, unlike the previous task, we aren’t transforming every element here; instead, we are deciding whether to include it in the result or not.

Through recursive processing, we sequentially "collect" a new list consisting only of the suitable elements. In the end, we get [2, 6, 12].

We have considered solutions to five tasks using only the capabilities of FP. Naturally, questions must have arisen—about terminology, the details of creating intermediate data, and how to implement these ideas. We will discuss all of this in the following sections.

Definitions

As you may have noticed, most of the algorithms we built follow the same pattern:

base case;
recursive call for the “decreasing remainder” (poor tail);
combining the result.

For those who studied mathematics well, I will note that this is the same principle as mathematical induction, but applied to a data structure. Smart words appear!

Task 0 — list length. We process the list by considering two cases: an empty list and a list consisting of a head and a tail. This way of building an algorithm is called structural recursion on data. The function follows the structure of the list itself. This task demonstrates control of computation through the shape of the data: we simply follow the structure of the list until we reach emptiness.

Task 1 — sum of elements. Here we use the same pattern as in computing the length, but instead of adding one, we use addition. In functional programming, this universal way of processing is called a fold. A fold gradually combines elements of a structure into a single value.

Strictly speaking, the expression during reduction aggregates data, turning a recursive structure into a single inert value — in our case, a number — the sum of the list elements.

Task 2 — maximum of a list. This is another example of a fold: we compare elements pairwise, choosing the larger one. But an important detail appears — the maximum is not defined for an empty list. Such functions are called partial, because they are not defined for all possible inputs. This solution cannot take an empty list as input. This is a restriction at the level of “Data types” in the computation model: we declare that the domain of the function does not include the empty list, thus protecting the logic of computation from undefined behavior.

Task 3 — increase each element. In this task, the same function is applied to each element of the list. This technique is called map — mapping a function over a list. The result is a new list where each element is obtained from the corresponding element of the original list. Here “Function as data” is clearly visible: we pass the logic (add five) into the traversal mechanism, while keeping the data structure unchanged.

Task 4 — keep elements by condition. Here we select only those elements of the list that satisfy the condition. This pattern is called filter. It creates a new list containing only the elements that pass the check.

This pattern demonstrates “Memory management” through the creation of a new subset of data — we do not remove elements from the old list, but build a new one “on the fly” only from the required values of the original list.

If you look closely, the first three tasks are solved in exactly the same way.

We take the head of the list, recursively process the tail, and then combine the results in some way. Only the combining operation changes: addition, adding one, or selecting the maximum. The same data processing pattern is used — fold.

Implementation — Syntax + Haskell

Now let’s see how these same ideas look in a real language. The syntax might seem unusual, but behind it lie the same substitutions we discussed. By the way, why Haskell? A textbook for this language simply happened to be at hand.

Some notation:

[] — an empty list;

(x:xs) — a list as an argument, where x is the head and xs is the tail.

Task 0

lengthList :: [a] -> Int               -- function: list → number (length)
lengthList []     = 0                  -- base case: empty list → 0 (this is not an assignment!)
lengthList (_:xs) = 1 + lengthList xs  -- ignore the head (_), recursively count the tail

The _ is an "anonymous variable." To determine the length of a list, it’s important that it contains elements. It doesn't matter what the head stores; what matters is that it stores something. The same applies to the elements of the tail.

How to run it: In any Haskell compiler (for example, an online one), write:

main = print(lengthList [3, 7, 11])

You might notice that we called the function, but the parameter is provided without parentheses — get used to it. Single-line comments start with two dashes.

Task 1

sumList :: Num a => [a] -> a     -- function: list of numbers → number (sum)
sumList []     = 0               -- base case: empty list → 0
sumList (x:xs) = x + sumList xs  -- head + result for the tail (reduction / fold)

While in the previous task we didn’t care about the type of elements in the list, in this one, we need to add them up. Therefore, Num represents data types that behave like numbers (they can be added, multiplied, etc.).

Task 2

We agreed that searching for a maximum in an empty list makes no sense. There are several ways to solve this. Below is a fairly simple version that nonetheless demonstrates an important property of the computation model.

I’d be happy if you suggest your own versions in the comments — for example, using standard functions or more clever ways of traversing the list.

maxList :: Ord a => [a] -> a    -- Function: list of comparable values → one value (maximum)
maxList [x] = x                 -- Base case: list has one element? Then it's the maximum.
maxList (x:xs) =                -- Recursion: separate the "head" (x) from the "tail" (xs)
    let maxRest = maxList xs    -- SHARING: bind the tail's search result to the name maxRest
    in if x > maxRest then x else maxRest  -- Compare the head with the already found max of the tail

What’s interesting here: Ord means the function only works for elements for which comparison is defined. For numbers, this is handled at the compiler level. If you want to compare "round" and "green" — you’ll have to write the function yourself.

What’s important here: the use of Sharing.

An even simpler implementation (without Sharing) for choosing a local maximum would look like this:

if x > maxList xs then x else maxList xs

In this expression, the maxList function is called twice for the exact same argument. Using recursion this way leads to multiple redundant recalculations of the same sub-expressions. I’ve drawn a call tree for the maxList function for an initial list of three elements. Don’t even try to build a similar tree for a list with 30 elements.

To solve this problem, we used Sharing: let name = expression in compound_expression

The key here is that compound_expression includes several occurrences of expression.

Using let creates a single shared computation (thunk), the result of which is used in several places.

Thanks to laziness, this expression is evaluated only when necessary and no more than once.

Thus, instead of a computation tree, we get a graph: several references point to the same node.

This solution allows processing the list in linear time. The graph diagram will help you understand how this works.

Task 3

First, the promised plus5 function:

plus5 :: Num a => a -> a      -- function: number → number (add 5)
plus5 x = x + 5               -- pure function with no side effects

add5List :: Num a => [a] -> [a]  -- function: list → list (map-like behavior)
add5List []     = []             -- base case: empty list → empty list
add5List (x:xs) = plus5 x : add5List xs  -- apply function to the head and build a new list (:)

The : symbol is used to construct a list by adding an element to the front.

Repetition is the mother of learning: the plus5 function has one argument, and instead of plus5(x), we write plus5 x — get used to it.

Remember the question about plus5 and the empty list? The answer is unexpected: it’s not its concern. plus5 works with a number, not a list. It is add5List that knows when to stop — it checks if the list is empty before even calling plus5. We have separated what to do (add 5) from how to traverse the data (walk through the list).

Task 4

evenOnly :: [Int] -> [Int]    -- function: list → list (filtering)
evenOnly [] = []              -- base case: empty list → empty list
evenOnly (x:xs)
  | x `rem` 2 == 0 = x : evenOnly xs  -- head is even — keep the element
  | otherwise      = evenOnly xs      -- otherwise skip the element

Pay attention! `rem` is not an attempt to insert Markdown into the code! It’s a way to turn the remainder function into infix form. Otherwise, we would have to write rem x 2, which isn't very convenient.

In this task, we use the Int type because the rem operation only works with integers (Integral type). This is a clear example of how the choice of action dictates the requirements for the data type.

First, the remainder of the division is calculated (x rem 2), then the result is compared to zero. Only after that does the entire expression "collapse" (I meant reduce) into a boolean True or False.

The vertical bar | (Guards) is just a more convenient form of if-then-else.

If the expression to the left of = is true, one reduction branch is chosen; if not, otherwise sends us down another. Control depends entirely on the result of the expression's evaluation.

We are working with the Int type, but the result of x rem 2 == 0 is of the Bool type. Data of one type (numbers) generates data of another type (boolean values) through a transformation function.

Let’s recall temporary values in the context of memory management. The intermediate remainder value (e.g., the number 0 or 1) is born in memory and then disappears without our involvement as soon as the comparison with zero is complete. We don't need to create a variable for it or track its fate — the computation model does it for us. Garbage collection in action.

A small bonus: let's solve the task "create a list of even elements greater than 6, increased by 5, from the first twenty natural numbers." Here is the solution in two languages at once!

Python:

result = [x + 5 for x in range(1, 21) if x % 2 == 0 and x > 6]

Haskell:

result = map plus5 $ filter (>6) $ evenOnly [1..20]  

-- evenOnly      — keep the even ones 
-- filter (>6)   — keep elements greater than 6
-- map plus5     — apply the function to each element (map) 
-- [1..20]       — data source (lazy list)
-- $             — right-to-left function application (pipeline) to avoid extra parentheses

By the way, try writing a function that checks whether its argument is greater than 6 or not, and then apply it to the list processing pipeline.

Conclusion

Functional programming often seems intimidating because of its mathematical rigor, but behind the “Celtic runes” of the syntax there is a simple and honest computation model. Here, data structures naturally turn into the ways they are processed. When you stop thinking about creating temporary variables, programming becomes a description of computation rather than management of state.

Now any new language — be it Haskell, Scala, or functional features in Python — will be for you just a convenient syntax for ideas that, I hope, you already understand.

Let’s take a break from the loop, gentlemen. Recursion is waiting for you!

That’s all. Probably it’s time to look for a textbook on programming in... However, as you understood, a specific FP language does not matter. What matters is understanding the methods of solving problems.

P.S.

Did you solve the problem of summing the odd elements of a list? Good luck with FP!