Let's say we have some (total) function \(f\), defined on some finite domain \(A\): $$ f : A \mapsto B $$ For example, f might be a square function defined on \( \{1, 2, 3, 4\} \):
Input    Output
1        1
2        4
3        9
4        16
The question now arises, is it possible to define an effective procedure that will determine the fastest Turing machine that computes the function \(f\).

Without any limitations on the Turing machine, we can simply construct a Turing machine that computes f in one step - you basically encode f into the transition function of the Turing machine.

Theorem: There is an effective procedure for finding the fastest Turing machine that computes a function f with finite domain. The fastest Turing machine takes just 1 step.

Proof: To do this we construct a machine \(T_1\) that has a different tape symbol for each input value to the function - e.g. element in the domain \( A \). The transition function of \(T_1\) then simply maps from the read element \(a\) to the output element \( f(a) \). The machine \(T_1\) writes \( f(a) \) to the tape, overwriting \( a \), then halts. Q.E.D.

I don't find this a very interesting proof, since it doesn't really follow the ethos of Turing machines - which it violates by throwing more table symbols at the problem until it can do it in one step. A more interesting problem is the case where the tape alphabet \( \Gamma \) is fixed - e.g. to the binary case, where it consists of just the set \( \{0, 1, b\} \) - zero, one, and blank. This corresponds a lot better to 'real' computers. So let's fix the tape alphabet and ask the question again:

Given a fixed tape alphabet \( \Gamma \), is there an effective procedure for finding the fastest Turing machine that computes the given function \(f\), where \(f\) has a finite domain.

Let's also say that we have some 'reference' Turing machine \(T_{ref}\) that computes this function. Such a reference Turing machine can always be constructed by simply handling each possible input value as a special case and returning the corresponding output value. In normal programming pseudocode it could look something like

if input == 1
  return 1
else if input == 2
  return 4
else if input == 3
  return 9
else if input == 4
  return 16

First we need to define what it means to be fastest. I'll define the speed of a Turing machine that computes such a function (finite domain, total) to be the greatest number of steps that the machine takes over all of its inputs - so it's a worst case measure.

Given that definition of speed, the fastest turing machine that computes the function f will be the machine that takes the least number of steps for its worst case input - e.g. the input that it takes the largest number of steps on.

Theorem: Given a fixed tape alphabet \( \Gamma \), There is an effective procedure for finding the fastest Turing machine that computes the given function \(f\), where \(f\) has a finite domain. (e.g. the problem is decidable)

Proof: To prove this, we will first show that there is a finite number of effectively-different Turing machines that need to be considered. Then the fastest can be found by testing each Turing machine in turn, and selecting the fastest.

Suppose the worst case number of steps for the reference Turing machine \( T_{ref} \) is \(N_{ref}\).

Lemma: The maximum number of states that may be visited after N steps is $$ |\Gamma| ^ N $$ where \( |\Gamma| \) is the number of different tape symbols.

Proof: After zero steps the machine is in the initial state. From the initial state, it may transition to a different state based on each possible tape symbol, so after 1 step it may be in \( |\Gamma| \) states. Likewise, after two steps it may be in one of \( |\Gamma|^2 \) states. So after N steps it may be in one of up to \( |\Gamma|^N \) states. Q.E.D.

Consider any Turing machine faster than \( T_{ref} \). Since its worst case number of steps will be less than \(N_{ref}\), when run on any input in A, the number of possible states it may be in is less than or equal to \( |\Gamma|^ {N_{ref}} \).

Therefore we need only consider Turing machines with number of states less than or equal to \( M = |\Gamma|^ {N_{ref}} \).

Any machine with more states than \(M\) will be effectively the same as one of the machines with number of states <= \(M\), since the extra states can only be reached after more than \(N_{ref}\) steps.

The procedure to find the fastest Turing machine \(T_{f}\), then is as follows:

Consider each Turing machine \(T_{i}\) with \(M\) states, and a given transition function using the \(M\) states. The number of such Turing machines is finite, and bounded by a exponential function of \(M\) and \(|\Gamma|\).

For each element \(a\) in the domain \(A\), run \(T_{i}\) with input \(a\), for up to \(N_{ref}\) steps. If it halts with incorrect output, reject the machine. If it does not halt after \(N_{ref}\) steps, we can reject the machine, as it will not be faster than \(T_{ref}\), and therefore cannot be the fastest Turing machine. If it computes \(f(a)\) correctly for all elements in \(A\), and it has smallest worst-case running time of all Turing machines considered so far, remember it.

Once all potential Turing machines have been considered, the one with the lowest worst-case running time that computes \(f\) correctly will be the fastest Turing machine that computes \(f\) we are looking for. Q.E.D.

Remarks

There are a couple of interesting things about this proof.

First, we avoid the issue you usually get when enumerating over Turing machines run on some input - that the machine may not halt. We sidestep this problem as we have an upper bound on the number of steps to run the machine for.

Secondly, a lot of functions we deal with in computer programs are defined on finite domains, especially functions that take as input finite precision integers or floating point numbers. For example, the number of single precision floating point values is of course finite. So in theory, we can write a program to compute, for example, the fastest sqrt(float) function.

There's a humorous theorem called the 'Full employment theorem', which says that writing a compiler that outputs the smallest program for some function is impossible, as the problem is uncomputable. However, this article has shown that finding the fastest program is computable, at least for a finite domain.

A related question is if finding the fastest program for infinitely sized domains, e.g. the integers, is computable. I'm not sure about this one, let me know if you have a proof either way :)

Edit, 7th May 2014: Reddit user wildeye pointed me at Manual Blum's work: A Machine-Independent Theory of the Complexity of Recursive Functions (PDF) which has some things to say about bounds on more general functions.