Today I Learned: Tail Call Elimination

Who said interviews were just about testing knowledge? In some cases, it can be a bit of a learning experience for everyone involved. Case in point, I was asked about what kind of compiler optimization may be made from the following code:

int CountOccurences(LinkedNode const * const head, int const dataToCount, int occurences)
{
	if (head == nullptr)
	{
		return occurences;
	}
	if (head->data == dataToCount)
	{
		++occurences;
	}
	return CountOccurences(head->link, dataToCount, occurences);
}

My initial reaction was that maybe the compiler would attempt to do something to our const vars on the stack, considering they never change in the function. Turns out, there is something more going on here.

In recursion, the function call is that of an accordion typically. If you have a 1,000,000 element linked list and you recurse through it, you will have 1,000,000 stacks. But under certain conditions, this is not actually true. The compiler is smart enough to condense 1,000,000 stacks to just one (1) if the function is tail-recursive.

Now, I’ve had some experience with functional programming and tail-recursion is a pretty significant subject. But when using languages like C++ and Java, even when I use tail-recursion-like logic, it never occurred to me that the compiler would have the ability to reason that the stacks don’t need to be held under the prime condition I’ll demonstrate in a second. Even in functional programming, it never stood out. What I mean is sometimes I considered functions to be tail-recursive but they were not. It has to do with the final call in the function and how it gets handled.

Let’s consider a different scenario. We’re instead counting the number of elements in the list…

int CountList(LinkedNode const * const head)
{
	if (head == nullptr) { return 0; }
	return CountList(head->link) + 1;
}

The magic comes when we’re focusing on return control. In this scenario, the compiler cannot do the tail call elimination optimization because the return depends on the previous stacks. This is not tail end recursion and no optimization is made. Whereas in the first set of code counting the occurrences of a number in the list, it is tail end recursion and that stack and be eliminated.

Now that we’ve discussed what tail recursion is strictly, let’s discuss what the tail call elimination optimization is actually doing.

As I mentioned, if the stack isn’t needed, you can just disregard it. And when this was brought up in my interview, the first thing that came to mind was… “Is this something that only happens under release?” Otherwise, you wouldn’t be able to debug a recursive function which I know I have. This spawned a great discussion and experimentation about examining the dissasembly – the compiler and architecture’s point of view.

Let’s compare the first code snippet (CountOccurences) to the dissasembly in Debug vs Release:

LinkedNode a;
a.data = 5;
LinkedNode b;
b.data = 6;
a.link = &b;
CountOccurences(&a, 5, 0);
Debug:

[cpp] push ebp
mov ebp,esp
sub esp,0C0h
push ebx
push esi
push edi
lea edi,[ebp-0C0h] mov ecx,30h
mov eax,0CCCCCCCCh
rep stos dword ptr es:[edi] if (head == nullptr)
cmp dword ptr [head],0
jne CountOccurences+29h (0409489h)
{
return occurences;
mov eax,dword ptr [occurences] jmp CountOccurences+53h (04094B3h)
}
if (head->data == dataToCount)
mov eax,dword ptr [head] mov ecx,dword ptr [eax] cmp ecx,dword ptr [dataToCount] jne CountOccurences+3Ch (040949Ch)
{
++occurences;
mov eax,dword ptr [occurences] add eax,1
mov dword ptr [occurences],eax
}
return CountOccurences(head->link, dataToCount, occurences);
mov eax,dword ptr [occurences] push eax
mov ecx,dword ptr [dataToCount] push ecx
mov edx,dword ptr [head] mov eax,dword ptr [edx+4] push eax
call CountOccurences (0401E2Eh)
add esp,0Ch
[/cpp]
Release:
[cpp] push ebp
mov ebp,esp
if (head == nullptr)
mov eax,dword ptr [occurences] test ecx,ecx
je CountOccurences+1Ch (01C13ECh)
lea ebx,[ebx] {
return occurences;
}
if (head->data == dataToCount)
cmp dword ptr [ecx],edx
jne CountOccurences+15h (01C13E5h)
{
++occurences;
inc eax
}
return CountOccurences(head->link, dataToCount, occurences);
mov ecx,dword ptr [ecx+4] test ecx,ecx
jne CountOccurences+10h (01C13E0h)
[/cpp]

Right away, the sheer size difference should jump out at you. But the most significant difference is the return of the two functions. When return occurences; happens – the function will do a jmp in Debug to CountOcurrences+53h. However, in Release, the compiler will simply leave the function and return to the caller. There is no unrolling of the stack, its never kept around. There is no need to remember the previous stacks. The return simply aborts as opposed to going back in the accordion so-to-speak.

Wikipedia has an even better demonstration of the dissasembly with its own example here. The tail call elimination. Functional programming garuntees this optimization while other languages might not. I believe in Python this does not exist because the creator opposes it arguing it is harder to debug. If one is like C++ where debug and release allow this optimization to safely be turned on/off, I personally do not see the issue. Apparently even Javascript has this functionality now with ECMAScript6.

Leave a Reply

Your email address will not be published. Required fields are marked *