Today I Learned: Tail Call Elimination

Who said interviews were just about testing knowledge? In some cases, it can be a bit of a learning experience for everyone involved. Case in point, I was asked about what kind of compiler optimization may be made from the following code:

int CountOccurences(LinkedNode const * const head, int const dataToCount, int occurences)
{
	if (head == nullptr)
	{
		return occurences;
	}
	if (head->data == dataToCount)
	{
		++occurences;
	}
	return CountOccurences(head->link, dataToCount, occurences);
}

My initial reaction was that maybe the compiler would attempt to do something to our const vars on the stack, considering they never change in the function. Turns out, there is something more going on here.

In recursion, the function call is that of an accordion typically. If you have a 1,000,000 element linked list and you recurse through it, you will have 1,000,000 stacks. But under certain conditions, this is not actually true. The compiler is smart enough to condense 1,000,000 stacks to just one (1) if the function is tail-recursive.

Now, I’ve had some experience with functional programming and tail-recursion is a pretty significant subject. But when using languages like C++ and Java, even when I use tail-recursion-like logic, it never occurred to me that the compiler would have the ability to reason that the stacks don’t need to be held under the prime condition I’ll demonstrate in a second. Even in functional programming, it never stood out. What I mean is sometimes I considered functions to be tail-recursive but they were not. It has to do with the final call in the function and how it gets handled.

Let’s consider a different scenario. We’re instead counting the number of elements in the list…

int CountList(LinkedNode const * const head)
{
	if (head == nullptr) { return 0; }
	return CountList(head->link) + 1;
}

The magic comes when we’re focusing on return control. In this scenario, the compiler cannot do the tail call elimination optimization because the return depends on the previous stacks. This is not tail end recursion and no optimization is made. Whereas in the first set of code counting the occurrences of a number in the list, it is tail end recursion and that stack and be eliminated.

Now that we’ve discussed what tail recursion is strictly, let’s discuss what the tail call elimination optimization is actually doing.

As I mentioned, if the stack isn’t needed, you can just disregard it. And when this was brought up in my interview, the first thing that came to mind was… “Is this something that only happens under release?” Otherwise, you wouldn’t be able to debug a recursive function which I know I have. This spawned a great discussion and experimentation about examining the dissasembly – the compiler and architecture’s point of view.

Let’s compare the first code snippet (CountOccurences) to the dissasembly in Debug vs Release:

LinkedNode a;
a.data = 5;
LinkedNode b;
b.data = 6;
a.link = &b;
CountOccurences(&a, 5, 0);
Debug:

push        ebp
mov         ebp,esp
sub         esp,0C0h
push        ebx
push        esi
push        edi
lea         edi,[ebp-0C0h]
mov         ecx,30h
mov         eax,0CCCCCCCCh
rep stos    dword ptr es:[edi]
	if (head == nullptr)
cmp         dword ptr [head],0
jne         CountOccurences+29h (0409489h)
	{
		return occurences;
mov         eax,dword ptr [occurences]
jmp         CountOccurences+53h (04094B3h)
	}
	if (head->data == dataToCount)
mov         eax,dword ptr [head]
mov         ecx,dword ptr [eax]
cmp         ecx,dword ptr [dataToCount]
jne         CountOccurences+3Ch (040949Ch)
	{
		++occurences;
mov         eax,dword ptr [occurences]
add         eax,1
mov         dword ptr [occurences],eax
	}
	return CountOccurences(head->link, dataToCount, occurences);
mov         eax,dword ptr [occurences]
push        eax
mov         ecx,dword ptr [dataToCount]
push        ecx
mov         edx,dword ptr [head]
mov         eax,dword ptr [edx+4]
push        eax
call        CountOccurences (0401E2Eh)
add         esp,0Ch
Release:

push        ebp
mov         ebp,esp
	if (head == nullptr)
mov         eax,dword ptr [occurences]
test        ecx,ecx
je          CountOccurences+1Ch (01C13ECh)
lea         ebx,[ebx]
	{
		return occurences;
	}
	if (head->data == dataToCount)
cmp         dword ptr [ecx],edx
jne         CountOccurences+15h (01C13E5h)
	{
		++occurences;
inc         eax
	}
	return CountOccurences(head->link, dataToCount, occurences);
mov         ecx,dword ptr [ecx+4]
test        ecx,ecx
jne         CountOccurences+10h (01C13E0h)

Right away, the sheer size difference should jump out at you. But the most significant difference is the return of the two functions. When return occurences; happens – the function will do a jmp in Debug to CountOcurrences+53h. However, in Release, the compiler will simply leave the function and return to the caller. There is no unrolling of the stack, its never kept around. There is no need to remember the previous stacks. The return simply aborts as opposed to going back in the accordion so-to-speak.

Wikipedia has an even better demonstration of the dissasembly with its own example here. The tail call elimination. Functional programming garuntees this optimization while other languages might not. I believe in Python this does not exist because the creator opposes it arguing it is harder to debug. If one is like C++ where debug and release allow this optimization to safely be turned on/off, I personally do not see the issue. Apparently even Javascript has this functionality now with ECMAScript6.

Leave a Reply

Your email address will not be published. Required fields are marked *