Problem of the Month Solution to Problem 2: Manuals for Robots
November 2024
Solution
Here is Course 2, with some
labels on the places.
The string ends at place
. The string ends at place . The string ends at place . The string ends at place .
The finishing places of the course are and . Therefore and are in the manual, and and are not in the manual.
The manual for this course is the set of strings with exactly one
.
To justify this statement, name the places (for left, middle, and right)
from left to right respectively, as above. Note that is the starting place, and is the only finishing place.
There are a few important properties to notice about this course.
The only way to move from place to place is through the arrow.
The only way to move from place to place is through the arrow.
There is no way to move to place from another place.
There is no way to move from place to any other place.
Suppose now that a string contains zero 's. Then the robot ends at place so the string is not in the manual.
Suppose a string contains at least two 's. After the first two 's, the robot is at place . Since there is no way to move from
place , the robot stays at place
until the string is completed.
Therefore the string is not in the manual.
Finally, suppose a string contains exactly .
Until the robot encounters the ,
it stays at place . The moves the robot to place . The rest of the string consists only
of 's, so the robot ends at place
. We can conclude the string is
part of the manual.
Therefore the manual consists precisely of those strings with exactly
one .
The manual is the set of strings that has at least one and an even number of 's.
Let’s justify this. Label the places ,,, for top left, top right, bottom left,
and bottom right as above. The starting place is and the only finishing place is . Here are the important properties of
this course.
If the robot is in the top row ( or ), then moving along an arrow labelled
moves the robot to the bottom row
( or ). This is the only way to move from
the top row to the bottom row.
If the robot is in the bottom row, then moving along an arrow
labelled moves the robot to the
top row. This is the only way to move from the bottom row to the top
row.
If the robot is in the left column ( or ), then moving along an arrow labelled
moves the robot to the right
column ( or ). Further, this is the only way to
move from the left column to the right column.
Once the robot is in the right column, it stays in the right
column.
Suppose we have a string in the manual. Since the robot starts in the
left column and finishes in the right column, the string must contain at
least one by Properties (iii) and
(iv). Since the robot starts and ends in the top row, by Properties (i)
and (ii), there must be an even number of 's in the string.
On the other hand, suppose there is a string with an even number of
's and at least one . By Properties (i) and (ii), the robot
must move between the top and bottom rows an even number of times. Since
it starts at the top row, it must end at the top row. By Properties
(iii) and (iv), since the robot has traversed at least one arrow
labelled , it must end up in the
right column. We conclude that the robot ends in the place , and therefore the string belongs to
the manual.
Therefore the manual consists exactly of those strings with at least
one and an even number of 's.
There are many possible correct courses for this question. We
will give one correct answer for each part.
Here is such a course:
There are seven places, with six arranged into a row, and the
seventh above this row. From left to right, the places in the row are as
follows: shaded circle, unshaded circle, unshaded circle, unshaded
square, shaded circle, shaded circle. The place on top is an unshaded
circle.
There are fourteen arrows:
Starting at the first shaded circle in the row (left end), the
arrows labelled and both end at the top circle.
Starting at the first unshaded circle in the row, the arrow
labelled ends at the top circle,
and the arrow labelled ends at
the shaded circle (immediately to the left).
Starting at the second unshaded circle in the row, the arrow
labelled ends at the top circle,
and the arrow labelled ends at
the first shaded circle (immediately to the left).
Starting at the unshaded square in the row, the arrow labelled
ends at the the shaded circle
immediately to the right, and the arrow labelled ends at the second unshaded circle
(immediately to the left).
Starting at the second shaded circle in the row (two from the right end),
the arrow labelled ends at the top
circle, and the arrow labelled
ends at the third shaded circle in the row (right end).
Starting at the third shaded circle in the row (right end), the arrows
labelled and both end at the top circle.
Starting at the top circle, the arrows labelled and both loop around and end at the top
circle.
Let’s justify ourselves. There are three finishing places, and for
each one there is only one way to get from the starting place to that
finishing place. The three paths are ,, and . Therefore the manual for this course
is the set of three strings .
Notice how there is a place above which acts as a kind of garbage can
for strings that don’t belong to the desired manual. We can imitate this
construction in general to construct a course with any finite set of
strings as its manual!
Here is such a course:
There are three places arranged to form the vertices of a
triangle. Moving clockwise around the places, there is a shaded square,
a first unshaded circle, and a second unshaded circle.
There are six arrows:
Starting at the shaded square, the arrow labelled ends at the first unshaded circle, and
the arrow labelled loops around
and ends at the square.
Starting at the first unshaded circle, the arrow labelled ends at the second unshaded circle, and
the arrow labelled loops around
and ends at the first unshaded circle.
Starting at the second unshaded circle, the arrow labelled ends at the square, and the arrow
labelled loops around and ends at the second unshaded circle.
To see why this is the case, notice that every time an appears in a string, the robot moves
one place clockwise. Since there are places, the robot returns to the
starting place (which is also the only finishing place) exactly when
there has been a multiple of 's in the string.
Of course, there is nothing special about the number here. Fix a positive integer . We can imitate this construction to
build a course whose manual is the set of all strings with a multiple of
appearances of the letter .
Notice that if a string has length , then the robot will visit two, not
necessarily distinct, places - the starting place and some other place.
If a string has length , then the
robot will visit three, not necessarily distinct, places on its journey.
If a string has length , then the
robot will visit not
necessarily distinct places on its journey. Since there are places in our course, if a robot
follows the instructions of a string of length at least , then there must be some place which
the robot visits twice (by the pigeonhole principle).
Let be a string of length at
least in the manual. Let’s call
the first place that is visited twice . Let’s split up into three substrings. The first
substring, , will be the string
that takes the robot from the starting place to for the first time. Note that may be empty if is the starting place. Then there is
some substring that takes the
robot from back to for the second time. Note that since we are waiting for the
robot to visit again. Let be the rest of the string , which takes the robot from (which has now been visited twice by
the robot) to one of the finishing places. Note that may be empty. While travelling along
the course according to the instructions in , the robot may visit again, but we don’t care!
We now have that , where
and takes the robot from to . The string takes the robot from the starting place
to , and the string takes the robot from to a finishing place. Therefore the
string takes the robot from the
starting place to a finishing place. Alas, is also in the manual!
Let , where ,, and are constructed as in the solution to
part (a). Notice that since takes
the robot from place to place
, then also takes the robot from place to place . It simply repeats the path times.
Consider now the string
for any natural number . The
string takes the robot from the
starting place to , the string
takes the robot from to , and the string takes the robot from to a finishing place. Therefore a robot
following the instructions given by the string travels from the starting place to
a finishing place, so is in
the manual.
Suppose the set of strings of perfect-square length is the manual
for some course, and suppose that course has places. Choose any string of length . Since the length of is a perfect square, it is in the
manual. Since , by part
(b) we can write for some
substring so that is in the manual for every natural
number . Let be the length of and let be the length of .
Then the sequence of words in the manual have lengths
respectively. Said another way, the lengths of the words form an
arithmetic sequence. Therefore, in order to complete the problem, we
have to prove the following fact:
Fact: In an arithmetic sequence, there is at least
one number that is not a perfect square.
Proof. Let and be integers with , and consider the arithmetic
sequence .
Let be an integer. Then
Exactly one of every
consecutive numbers greater than
is in the arithmetic sequence. Choose to be any integer satisfying and . Since the difference between
and is greater than , there must be some number in the
arithmetic sequence strictly between and . However, there are no perfect
squares between and , completing the proof. ◻
Great! We have just shown that at least one of the lengths of is not a perfect
square, but they are all in the manual. Therefore, there is no course
whose manual consists exactly of all strings with perfect square
lengths.
As above, suppose there is a course with manual consisting
exactly of all palindromes, and suppose the course has places. Then the string is in the manual and has length
. By Part (b), we can
write with and in the manual for all natural
numbers .
By the construction of ,, and in the solution to Part (a), the
substring takes the robot from
the starting place to the second occurrence of some place. By the
pigeonhole principle, this has to happen within the robot visiting places, which happens after moving
across arrows. It follows that
has length at most .
Therefore to write
with and the length of
at most , we must have that and with and
. This is because any
substring at the beginning of the string of length at most must just be a string of 's. It follows that .
Now . Since , is not a palindrome but it is in
the manual. We have a contradiction and can conclude that there is no
course whose manual is the set of all palindromes.
Curious Context for Problem 2
Regular languages
In November's problem you were asked to study the set of strings that form the manual for a particular course. In computer science, these courses are called Deterministic Finite Automata, or DFAs. A set of strings which arises as the manual for a DFA is called a regular language. For example, the manual for the DFA below is the set of all strings with an odd number of s. So is an example of a regular language.
A DFA with two places: an unshaded square and a shaded circle.
There are four arrows:
Starting at the unshaded square, the arrow labelled loops around and ends at the square, and the arrow labelled ends at the circle.
Starting at the shaded circle, the arrow labelled loops around and ends at the circle,
and the arrow labelled ends at the square.
In Problem 4 you showed that the set of strings with length equal to a perfect square, and the set of palindromes, are both not regular languages. The tool used to prove that these are not regular languages is presented in Problem 4(b), and is known as the pumping lemma. The pumping lemma was first proved in 1959 by Michael Rabin and Dana Scott [1]. They used the pumping lemma to prove that the set of all strings of the form , where is a natural number, is not a regular language.
Nondeterministic finite automata
In the definition of a DFA, we could relax the condition on the arrows, and allow any number of arrows labelled , and any number of arrows labelled to leave from any place (instead of exactly one of each). We can even have some arrows without a label at all, allowing you to move between places without using a letter. Such courses are called Nondeterministic finite automata, or NFAs. Notice that in a DFA, once you have a string, your path through the arrows is completely determined. In an NFA, you may have some choices. Here is an example of an NFA:
An NFA with three places: an unshaded square, an unshaded circle, and a shaded circle.
There are six arrows:
Starting at the unshaded square, an arrow labelled loops around and ends at the square, another arrow labelled ends at the unshaded circle, and an unlabelled arrow ends at the unshaded circle.
Starting at the unshaded circle, an arrow labelled ends at the shaded circle.
Starting at the shaded circle, an arrow labelled ends at the unshaded circle and another arrow labelled ends at the square.
On first glance, it appears that NFAs allow you to have much more interesting languages (or "manuals" in the language of the POTM). Remarkably, this is not the case! In the same paper that introduced the pumping lemma, Rabin and Scott proved that any language attained from an NFA can be attained from a DFA. Said another way, if a set of strings is the manual for some NFA, then it's the manual for some DFA and is therefore a regular language.
Regular expressions
Regular expressions are a tool used in computer programming to search through a list of words and find those which satisfy some condition. For example, you can use regular expressions to search through a list of words and return those which start and end with the letter 'c'. Regular expressions are a great tool to study patterns in written pieces of work, or to get a computer to solve a crossword by brute force! When you instruct a computer to use a regular expression, the computer creates an NFA and runs each word in your list through the NFA, checking if it's part of the manual.
Chomsky hierarchy
In linguistics, there is a notion of computational complexity of a particular language, captured by something called the Chomsky hierarchy. Roughly, it measures how much computational power is needed to parse a sentence in that language. There are four types of grammars in the hierarchy, and regular languages are the simplest type (what is referred to as Type-3). The other types of grammars, in increasing order of complexity, are called context-free, context-sensitive, and recursively enumerable.
References
[1] Rabin, M. O. and Scott, D. Finite automata and their decision problems. IBM J. Res. Develop. 3 (1959), pp. 114-125.↩︎