Introduction To Automata Theory oyg

INTRODUCTION TO AUTOMATA THEORY Contents 1. Finite-state machine 1.1. State diagram 1.2. Deterministic finite automaton 1.3. Nondeterministic finite automaton 1.4. DFA minimization 1.5. NDFA to DFA conversion algorithm 1.6. Finite-state transducer 1.6.1. Moore machine 1.6.2. Mealy machine 2. Regular grammar 2.1. Regular expression 2.2. Regular language 2.3. Pumping lemma for regular languages 3. Context-free grammar 3.1. Production (computer science) 3.2. Context-free language 3.3. Ambiguous grammar 3.4. Chomsky normal form 3.5. Greibach normal form 3.6. Pumping lemma for context-free languages 4. Pushdown automaton 4.1. Nested stack automaton 5. Turing machine 5.1. Linear bounded automaton 5.2. Multitape Turing machine 5.3. Multi-track Turing machine 5.4. Non-deterministic Turing machine 6. Recursive language 6.1. Recursive set 6.2. Decision problem 6.3. Undecidable problem 6.3.1. Halting problem 6.4. P (complexity) 6.5. R (complexity) 6.6. RP (complexity) 6.7. Recursively enumerable set 6.8. Rice's theorem

1. Finite-state machine A finite state machine (FSM) or finite-state automaton (FSA, plural: automata), finite automaton, or simply a state machine, is a mathematical model of computation. It is an abstract machine that can be in exactly one of a finite number of states at any given time. The FSM can change from one state to another in response to some external inputs; the change from one state to another is called a transition. An FSM is defined by a list of its states, its initial state, and the conditions for each transition. Finite state machines are of two types - deterministic finite state machines and non-deterministic finite state machines.[1] The behavior of state machines can be observed in many devices in modern society that perform a predetermined sequence of actions depending on a sequence of events with which they are presented. Examples are vending machines, which dispense products when the proper combination of coins is deposited, elevators, whose sequence of stops is determined by the floors requested by riders, traffic lights, which change sequence when cars are waiting, and combination locks, which require the input of combination numbers in the proper order. The finite state machine has less computational power than some other models of computation such as the Turing machine.[2] The computational power distinction means there are computational tasks that a Turing machine can do but a FSM cannot. This is because a FSM's memory is limited by the number of states it has.

1 of 76

Example: coin-operated turnstile

State diagram for a turnstile

A turnstile An example of a simple mechanism that can be modeled by a state machine is a turnstile.[3][4] A turnstile, used to control access to subways and amusement park rides, is a gate with three rotating arms at waist height, one across the entryway. Initially the arms are locked, blocking the entry, preventing patrons from ing through. Depositing a coin or token in a slot on the turnstile unlocks the arms, allowing a single customer to push through. After the customer es through, the arms are locked again until another coin is inserted. Considered as a state machine, the turnstile has two possible states: Locked and Unlocked.[3] There are two possible inputs that affect its state: putting a coin in the slot (coin) and pushing the arm (push). In the locked state, pushing on the arm has no effect; no matter how many times the input push is given, it stays in the locked state. Putting a coin in – that is, giving the machine a coin input – shifts the state from Locked to Unlocked. In the unlocked state, putting additional coins in has no effect; that is, giving additional coin inputs does not change the state. However, a customer pushing through the arms, giving a push input, shifts the state back to Locked. The turnstile state machine can be represented by a state transition table, showing for each possible state, the transitions between them (based upon the inputs given to the machine) and the outputs resulting from each input: Current State Input Next State Output coin Unlocked Unlocks the turnstile so that the customer can push through. Locked push Locked None coin Unlocked None Unlocked push Locked When the customer has pushed through, locks the turnstile. The turnstile state machine can also be represented by a directed graph called a state diagram (above). Each state is represented by a node (circle). Edges (arrows) show the transitions from one state to another. Each arrow is labeled with the input that triggers that transition. An input that doesn't cause a change of state (such as a coin input in the Unlocked state) is represented by a circular arrow returning to the original state. The arrow into the Locked node from the black dot indicates it is the initial state.

Concepts and terminology A state is a description of the status of a system that is waiting to execute a transition. A transition is a set of actions to be executed when a condition is fulfilled or when an event is received. For example, when using an audio system to listen to the radio (the system is in the "radio" state), receiving a "next" stimulus results in moving to the next station. When the system is in the "CD" state, the "next" stimulus results in moving to the next track. Identical stimuli trigger different actions depending on the current state. In some finite-state machine representations, it is also possible to associate actions with a state: an entry action: performed when entering the state, and an exit action: performed when exiting the state.

2 of 76

Representations

Fig. 1 UML state chart example (a toaster oven)

Fig. 2 SDL state machine example

Fig. 3 Example of a simple finite state machine For an introduction, see State diagram.

State/Event table Several state transition table types are used. The most common representation is shown below: the combination of current state (e.g. B) and input (e.g. Y) shows the next state (e.g. C). The complete action's information is not directly described in the table and can only be added using footnotes. A FSM definition including the full actions information is possible using state tables (see also virtual finite-state machine). State transition table Current state State A State B State C Input Input X … … … Input Y … State C … Input Z … … …

Classification

3 of 76

Finite state machines can be subdivided into transducers, acceptors, classifiers and sequencers. [5]

Acceptors and recognizers

Fig. 4 Acceptor FSM: parsing the string "nice" Acceptors, also called recognizers and sequence detectors, produce binary output, indicating whether or not the received input is accepted. Each state of an FSM is either "accepting" or "not accepting". Once all input has been received, if the current state is an accepting state, the input is accepted; otherwise it is rejected. As a rule, input is a sequence of symbols (characters); actions are not used. The example in figure 4 shows a finite state machine that accepts the string "nice". In this FSM, the only accepting state is state 7. A (possibly infinite) set of symbol sequences, aka. formal language, is called a regular language if there is some Finite State Machine that accepts exactly that set. For example, the set of binary strings with an even number of zeroes is a regular language (cf. Fig. 5), while the set of all strings whose length is a prime number is not.[6]:18,71 A machine could also be described as defining a language, that would contain every string accepted by the machine but none of the rejected ones; that language is "accepted" by the machine. By definition, the languages accepted by FSMs are the regular languages—; a language is regular if there is some FSM that accepts it. The problem of determining the language accepted by a given finite state acceptor is an instance of the algebraic path problem—itself a generalization of the shortest path problem to graphs with edges weighted by the elements of an (arbitrary) semiring.[7][8][9][jargon]

Fig. 5: Representation of a finite-state machine; this example shows one that determines whether a binary number has an even number of 0s, where is an accepting state. The start state can also be an accepting state, in which case the automaton accepts the empty string.

Classifiers A classifier is a generalization of a finite state machine that, similar to an acceptor, produces a single output on termination but has more than two terminal states.[citation needed]

Transducers

4 of 76

Fig. 6 Transducer FSM: Moore model example Main article: Finite-state transducer Transducers generate output based on a given input and/or a state using actions. They are used for control applications and in the field of computational linguistics. In control applications, two types are distinguished: Moore machine The FSM uses only entry actions, i.e., output depends only on the state. The advantage of the Moore model is a simplification of the behaviour. Consider an elevator door. The state machine recognizes two commands: "command_open" and "command_close", which trigger state changes. The entry action (E:) in state "Opening" starts a motor opening the door, the entry action in state "Closing" starts a motor in the other direction closing the door. States "Opened" and "Closed" stop the motor when fully opened or closed. They signal to the outside world (e.g., to other state machines) the situation: "door is open" or "door is closed".

Fig. 7 Transducer FSM: Mealy model example Mealy machine The FSM also uses input actions, i.e., output depends on input and state. The use of a Mealy FSM leads often to a reduction of the number of states. The example in figure 7 shows a Mealy FSM implementing the same behaviour as in the Moore example (the behaviour depends on the implemented FSM execution model and will work, e.g., for virtual FSM but not for event-driven FSM). There are two input actions (I:): "start motor to close the door if command_close arrives" and "start motor in the other direction to open the door if command_open arrives". The "opening" and "closing" intermediate states are not shown.

Generators Sequencers, or generators, are a subclass of the acceptor and transducer types that have a single-letter input alphabet. They produce only one sequence which can be seen as an output sequence of acceptor or transducer outputs.[citation needed]

Determinism A further distinction is between deterministic (DFA) and non-deterministic (NFA, GNFA) automata. In a deterministic automaton, every state has exactly one transition for each possible input. In a non-deterministic automaton, an input can lead to one, more than one, or no transition for a given state. The powerset construction algorithm can transform any nondeterministic automaton into a (usually more complex) deterministic automaton with identical functionality. A finite state machine with only one state is called a "combinatorial FSM". It only allows actions upon transition into a state. This concept is useful in cases where a number of finite state machines are required to work together, and when it is convenient to consider a purely combinatorial part as a form of FSM to suit the design tools.[10]

5 of 76

1.1. State diagram

A state diagram for a door that can only be opened and closed A state diagram is a type of diagram used in computer science and related fields to describe the behavior of systems. State diagrams require that the system described is composed of a finite number of states; sometimes, this is indeed the case, while at other times this is a reasonable abstraction. Many forms of state diagrams exist, which differ slightly and have different semantics. State diagrams are used to give an abstract description of the behavior of a system. This behavior is analyzed and represented as a series of events that can occur in one or more possible states. Hereby "each diagram usually represents objects of a single class and track the different states of its objects through the system".[1] State diagrams can be used to graphically represent finite state machines. This was introduced by C.E. Shannon and W. Weaver in their 1949 book "The Mathematical Theory of Communication". Another source is Taylor Booth in his 1967 book "Sequential Machines and Automata Theory". Another possible representation is the State transition table.

1.2. Deterministic finite automaton A deterministic finite automaton (DFA) is a finite state machine that accepts and rejects strings of symbols and only produces a unique computation (or run) of the automaton for each input string. [1] DFA is also known as a deterministic finite acceptor (DFA) and a deterministic finite state machine (DFSM) or a deterministic finite state automaton (DFSA). Deterministic refers to the uniqueness of the computation. In search of the simplest models to capture finite-state machines, Warren McCulloch and Walter Pitts were among the first researchers to introduce a concept similar to finite automata in 1943.[2][3]

An example of a deterministic finite automaton that accepts only binary numbers that are multiples of 3. The state S0 is both the start state and an accept state. The figure illustrates a deterministic finite automaton using a state diagram. In the automaton, there are three states: S0, S1, and S2 (denoted graphically by circles). The automaton takes a finite sequence of 0s and 1s as input. For each state, there is a transition arrow leading out to a next state for both 0 and 1. Upon reading a symbol, a DFA jumps deterministically from one state to another by following the

6 of 76

transition arrow. For example, if the automaton is currently in state S 0 and the current input symbol is 1, then it deterministically jumps to state S1. A DFA has a start state (denoted graphically by an arrow coming in from nowhere) where computations begin, and a set of accept states (denoted graphically by a double circle) which help define when a computation is successful. A DFA is defined as an abstract mathematical concept, but is often implemented in hardware and software for solving various specific problems. For example, a DFA can model software that decides whether or not online input such as email addresses are valid.[4]

Formal definition A deterministic finite automaton M is a 5-tuple, (Q, Σ, δ, q0, F), consisting of a finite set of states (Q) a finite set of input symbols called the alphabet (Σ) a transition function (δ : Q × Σ → Q) an initial or start state (q0 ∈ Q) a set of accept states (F ⊆ Q) Let w = a1a2 ... an be a string over the alphabet Σ. The automaton M accepts the string w if a sequence of states, r0,r1, ..., rn, exists in Q with the following conditions: 1. 2. 3.

r0 = q 0 ri+1 = δ(ri, ai+1), for i = 0, ..., n−1 rn ∈ F.

In words, the first condition says that the machine starts in the start state q0. The second condition says that given each character of string w, the machine will transition from state to state according to the transition function δ. The last condition says that the machine accepts w if the last input of w causes the machine to halt in one of the accepting states. Otherwise, it is said that the automaton rejects the string. The set of strings that M accepts is the language recognized by M and this language is denoted by L(M). A deterministic finite automaton without accept states and without a starting state is known as a transition system or semiautomaton.

Complete and incomplete According to the above definition, deterministic finite automata are always complete: they define a transition for each state and each input symbol. While this is the most common definition, some authors use the term deterministic finite automaton for a slightly different notion: an automaton that defines at most one transition for each state and each input symbol; the transition function is allowed to be partial.[citation needed] When no transition is defined, such an automaton halts.

Example The following example is of a DFA M, with a binary alphabet, which requires that the input contains an even number of 0s.

The state diagram for M M = (Q, Σ, δ, q0, F) where Q = {S1, S2}, Σ = {0, 1}, q 0 = S1 ,

7 of 76

F = {S1}, and δ is defined by the following state transition table: 0 S1 S2 S2 S1

1 S1 S2

The state S1 represents that there has been an even number of 0s in the input so far, while S2 signifies an odd number. A 1 in the input does not change the state of the automaton. When the input ends, the state will show whether the input contained an even number of 0s or not. If the input did contain an even number of 0s, M will finish in state S1, an accepting state, so the input string will be accepted.

Closure properties If DFAs recognize the languages that are obtained by applying an operation on the DFA recognizable languages then DFAs are said to be closed under the operation. The DFAs are closed under the following operations. Union Intersection Concatenation Negation Kleene closure

Reversal Init Quotient[citation needed] Substitution[citation needed] Homomorphism[citation needed]

For each operation, an optimal construction with respect to the number of states has been determined in the state complexity research. Since DFAs are equivalent to nondeterministic finite automata (NFA), these closures may also be proved using closure properties of NFA.

Advantages and disadvantages DFAs are one of the most practical models of computation, since there is a trivial linear time, constantspace, online algorithm to simulate a DFA on a stream of input. Also, there are efficient algorithms to find a DFA recognizing: the complement of the language recognized by a given DFA. the union/intersection of the languages recognized by two given DFAs. Because DFAs can be reduced to a canonical form (minimal DFAs), there are also efficient algorithms to determine: whether a DFA accepts any strings whether a DFA accepts all strings whether two DFAs recognize the same language the DFA with a minimum number of states for a particular regular language DFAs are equivalent in computing power to nondeterministic finite automata (NFAs). This is because, firstly any DFA is also an NFA, so an NFA can do what a DFA can do. Also, given an NFA, using the powerset construction one can build a DFA that recognizes the same language as the NFA, although the DFA could have exponentially larger number of states than the NFA. [11][12] On the other hand, finite state automata are of strictly limited power in the languages they can recognize; many simple languages, including any problem that requires more than constant space to solve, cannot be recognized by a DFA. The classic example of a simply described language that no DFA can recognize is bracket or Dyck language, i.e., the language that consists of properly paired brackets such as word "(()())". Intuitively, no DFA can recognize the Dyck language because DFAs are not capable of counting: a DFA-like automaton needs to have a state to represent any possible number of "currently open" parentheses, meaning it would need an unbounded number of states. Another simpler example is the language consisting of strings of the form anbn for some finite but arbitrary number of a's, followed by an equal number of b's.[13]

1.3. Nondeterministic finite automaton A finite state machine is called a deterministic finite automaton (DFA), if

8 of 76

each of its transitions is uniquely determined by its source state and input symbol, and reading an input symbol is required for each state transition. A nondeterministic finite automaton (NFA), or nondeterministic finite state machine, does not need to obey these restrictions. In particular, every DFA is also an NFA. Sometimes the term NFA is used in a narrower sense, referring to a NFA that is not a DFA, but not in this article. NFAs were introduced in 1959 by Michael O. Rabin and Dana Scott,[2] who also showed their equivalence to DFAs. An NFA, similar to a DFA, consumes a string of input symbols. For each input symbol, it transitions to a new state until all input symbols have been consumed. Unlike a DFA, it is non-deterministic, i.e., for some state and input symbol, the next state may be nothing or one or two or more possible states. Thus, in the formal definition, the next state is an element of the power set of the states, which is a set of states to be considered at once. The notion of accepting an input is similar to that for the DFA. When the last input symbol is consumed, the NFA accepts if and only if there is some set of transitions that will take it to an accepting state. Equivalently, it rejects, if, no matter what transitions are applied, it would not end in an accepting state.

Formal definition An NFA is represented formally by a 5-tuple, (Q, Σ, Δ, q0, F), consisting of a finite set of states Q a finite set of input symbols Σ a transition function Δ : Q × Σ → P(Q). an initial (or start) state q0 ∈ Q a set of states F distinguished as accepting (or final) states F ⊆ Q. Here, P(Q) denotes the power set of Q. Let w = a1a2 ... an be a word over the alphabet Σ. The automaton M accepts the word w if a sequence of states, r0,r1, ..., rn, exists in Q with the following conditions: 1. 2. 3.

r0 = q 0 ri+1 ∈ Δ(ri, ai+1), for i = 0, ..., n−1 rn ∈ F.

In words, the first condition says that the machine starts in the start state q0. The second condition says that given each character of string w, the machine will transition from state to state according to the transition function Δ. The last condition says that the machine accepts w if the last input of w causes the machine to halt in one of the accepting states. In order for w being accepted by M it is not required that every state sequence ends in an accepting state, it is sufficient if one does. Otherwise, i.e. if it is impossible at all to get from q0 to a state from F by following w, it is said that the automaton rejects the string. The set of strings M accepts is the language recognized by M and this language is denoted by L(M). We can also define L(M) in of Δ*: Q × Σ* → P(Q) such that: 1. Δ*(r, ε)= {r} where ε is the empty string, and 2. If x ∈ Σ*, a ∈ Σ, and Δ*(r, x)={r1, r2,..., rk} then Δ*(r, xa)= Δ(r1, a)∪...∪Δ(rk, a). Now L(M) = {w | Δ*(q0, w) ∩ F ≠ ∅}. Note that there is a single initial state, which is not necessary. Sometimes, NFAs are defined with a set of initial states. There is an easy construction that translates a NFA with multiple initial states to a NFA with single initial state, which provides a convenient notation. For a more elementary introduction of the formal definition see automata theory.

Example

9 of 76

The state diagram for M. It is not deterministic since in state p reading a 1 can lead to p or to q. Let M be an NFA, with a binary alphabet, that determines if the input ends with a 1. In formal notation, let M = ({p, q}, {0, 1}, Δ, p, {q}) where the transition function Δ can be defined by this state transition table: Input 0 1 State p {p} {p,q} q ∅ ∅ Note that Δ(p,1) has more than one state therefore M is nondeterministic. Some possible state sequences for the input word "1011" are: Input: 1 0 1 1 State sequence 1: p q ? State sequence 2: p p p q ? State sequence 3: p p p p q The word is accepted by M since state sequence 3 satisfies the above definition; it doesn't matter that sequences 1 and 2 fail to do so. In contrast, the word "10" is rejected by M, since there is no way to reach the only accepting state, q, by reading the final 0 symbol or by an ε-transition.

Equivalence to DFA A Deterministic finite automaton (DFA) can be seen as a special kind of NFA, in which for each state and alphabet, the transition function has exactly one state. Thus, it is clear that every formal language that can be recognized by a DFA can be recognized by a NFA. Conversely, for each NFA, there is a DFA such that it recognizes the same formal language. The DFA can be constructed using the powerset construction. This result shows that NFAs, despite their additional flexibility, are unable to recognize languages that cannot be recognized by some DFA. It is also important in practice for converting easier-to-construct NFAs into more efficiently executable DFAs. However, if the NFA has n states, the resulting DFA may have up to 2n states, which sometimes makes the construction impractical for large NFAs.

NFA with ε-moves Nondeterministic finite automaton with ε-moves (NFA-ε) is a further generalization to NFA. This automaton replaces the transition function with the one that allows the empty string ε as a possible input. The transitions without consuming an input symbol are called ε-transitions. In the state diagrams, they are usually labeled with the Greek letter ε. ε-transitions provide a convenient way of modeling the systems whose current states are not precisely known: i.e., if we are modeling a system and it is not clear whether the current state (after processing some input string) should be q or q', then we can add an ε-transition between these two states, thus putting the automaton in both states simultaneously.

Formal definition An NFA-ε is represented formally by a 5-tuple, (Q, Σ, Δ, q0, F), consisting of a finite set of states Q a finite set of input symbols called the alphabet Σ a transition function Δ : Q × (Σ ∪ {ε}) → P(Q) an initial (or start) state q0 ∈ Q a set of states F distinguished as accepting (or final) states F ⊆ Q. Here, P(Q) denotes the power set of Q and ε denotes empty string. For a q ∈ Q, let E(q) denote the set of states that are reachable from state q by following ε-transitions in the transition function Δ, i.e., p ∈ E(q) if there is a sequence of states q1,..., qk such that q1 = q,

10 of 76

qi+1 ∈ Δ(qi, ε) for each 1 ≤ i < k, and qk = p. E(q) is known as the ε-closure of q. ε-closure is also defined for a set of states. The ε-closure of a set of states, P, of an NFA is defined as the set of states reachable from any state in P following ε-transitions. Formally, for P ⊆ Q, E(P) = ∪q∈P E(q). Let w = a1a2 ... an be a word over the alphabet Σ. The automaton M accepts the word w if a sequence of states, r0,r1, ..., rn, exists in Q with the following conditions: 1. 2. 3.

r0 ∈ E(q0), ri+1 ∈ E(r') where r' ∈ Δ(ri, ai+1) for each i = 0, ..., n−1, and rn ∈ F.

In words, the first condition says that the machine starts at the state that is reachable from the start state q0 via ε-transitions. The second condition says that after reading ai, the machine takes a transition of Δ from ri to r', and then takes any number of ε-transitions according to Δ to move from r' to ri+1. The last condition says that the machine accepts w if the last input of w causes the machine to halt in one of the accepting states. Otherwise, it is said that the automaton rejects the string. The set of strings M accepts is the language recognized by M and this language is denoted by L(M).

Example

The state diagram for M Let M be a NFA-ε, with a binary alphabet, that determines if the input contains an even number of 0s or an even number of 1s. Note that 0 occurrences is an even number of occurrences as well. In formal notation, let M = ({s0, s1, s2, s3, s4}, {0, 1}, Δ, s0, {s1, s3}) where the transition relation Δ can be defined by this state transition table: Input 0 1 ε State S0 {} {} {S1, S3} S1 {S2} {S1} {} S2

{S1} {S2}

{}

S3

{S3} {S4}

{}

S4

{S4} {S3}

{}

M can be viewed as the union of two DFAs: one with states {S1, S2} and the other with states {S3, S4}. The language of M can be described by the regular language given by this regular expression (1*(01*01*)*) ∪ (0*(10*10*)*). We define M using ε-moves but M can be defined without using ε-moves.

Equivalence to NFA To show NFA-ε is equivalent to NFA, first note that NFA is a special case of NFA-ε, so it remains to show for every NFA-ε, there exists an equivalent NFA. Let A = (Q, Σ,Δ, q0, F) be a NFA-ε. The NFA A' = (Q, Σ, Δ', E(q0), F) is equivalent to A, where for each a ∈ Σ and q ∈ Q, Δ'(q,a) = E( Δ(q,a) ). Thus NFA-ε is equivalent to NFA. Since NFA is equivalent to DFA, NFA-ε is also equivalent to DFA.

11 of 76

Closure properties

Composed NFA accepting the union of the languages of some given NFAs N(s) and N(t). For an input word w in the language union, the composed automaton follows an ε-transition from q to the start state (left colored circle) of an appropriate subautomaton — N(s) or N(t) — which, by following w, may reach an accepting state (right colored circle); from there, state f can be reached by another ε-transition. Due to the ε-transitions, the composed NFA is properly nondeterministic even if both N(s) and N(t) were DFAs; vice versa, constructing a DFA for the union language (even of two DFAs) is much more complicated. NFAs are said to be closed under a (binary/unary) operator if NFAs recognize the languages that are obtained by applying the operation on the NFA recognizable languages. The NFAs are closed under the following operations. Union (cf. picture) Intersection Concatenation Negation Kleene closure Since NFAs are equivalent to nondeterministic finite automaton with ε-moves (NFA-ε), the above closures are proved using closure properties of NFA-ε. The above closure properties imply that NFAs only recognize regular languages. NFAs can be constructed from any regular expression using Thompson's construction algorithm.

Properties The machine starts in the specified initial state and reads in a string of symbols from its alphabet. The automaton uses the state transition function Δ to determine the next state using the current state, and the symbol just read or the empty string. However, "the next state of an NFA depends not only on the current input event, but also on an arbitrary number of subsequent input events. Until these subsequent events occur it is not possible to determine which state the machine is in". [3] If, when the automaton has finished reading, it is in an accepting state, the NFA is said to accept the string, otherwise it is said to reject the string. The set of all strings accepted by an NFA is the language the NFA accepts. This language is a regular language. For every NFA a deterministic finite automaton (DFA) can be found that accepts the same language.

12 of 76

Therefore, it is possible to convert an existing NFA into a DFA for the purpose of implementing a (perhaps) simpler machine. This can be performed using the powerset construction, which may lead to an exponential rise in the number of necessary states. For a formal proof of the powerset construction, please see the Powerset construction article.

Implementation There are many ways to implement a NFA: Convert to the equivalent DFA. In some cases this may cause exponential blowup in the number of states.[4] Keep a set data structure of all states which the NFA might currently be in. On the consumption of an input symbol, unite the results of the transition function applied to all current states to get the set of next states; if ε-moves are allowed, include all states reachable by such a move (ε-closure). Each step requires at most s2 computations, where s is the number of states of the NFA. On the consumption of the last input symbol, if one of the current states is a final state, the machine accepts the string. A string of length n can be processed in time O(ns2),[5] and space O(s). Create multiple copies. For each n way decision, the NFA creates up to copies of the machine. Each will enter a separate state. If, upon consuming the last input symbol, at least one copy of the NFA is in the accepting state, the NFA will accept. (This, too, requires linear storage with respect to the number of NFA states, as there can be one machine for every NFA state.) Explicitly propagate tokens through the transition structure of the NFA and match whenever a token reaches the final state. This is sometimes useful when the NFA should encode additional context about the events that triggered the transition. (For an implementation that uses this technique to keep track of object references have a look at Tracematches.[6])

Application of NFA NFAs and DFAs are equivalent in that if a language is recognized by an NFA, it is also recognized by a DFA and vice versa. The establishment of such equivalence is important and useful. It is useful because constructing an NFA to recognize a given language is sometimes much easier than constructing a DFA for that language. It is important because NFAs can be used to reduce the complexity of the mathematical work required to establish many important properties in the theory of computation. For example, it is much easier to prove closure properties of regular languages using NFAs than DFAs.

1.4. DFA minimization

DFA minimization is the task of transforming a given deterministic finite automaton (DFA) into an equivalent DFA that has a minimum number of states. Several different algorithms accomplishing this task are known and described in standard textbooks on automata theory.[1]

Example DFA. If in state c, it exhibits the same behavior for every input string as in state d, or in state e. Similarly, states a and b are nondistinguishable. The DFA has no unreachable states.

Equivalent minimal DFA. Nondistinguishable states have been ed into a

13 of 76

single one.

Minimum DFA For each regular language, there also exists a minimal automaton that accepts it, that is, a DFA with a minimum number of states and this DFA is unique (except that states can be given different names). [2][3] The minimal DFA ensures minimal computational cost for tasks such as pattern matching. There are two classes of states that can be removed or merged from the original DFA without affecting the language it accepts to minimize it. Unreachable states are the states that are not reachable from the initial state of the DFA, for any input string. Nondistinguishable states are those that cannot be distinguished from one another for any input string. DFA minimization is usually done in three steps, corresponding to the removal or merger of the relevant states. Since the elimination of nondistinguishable states is computationally the most expensive one, it is usually done as the last step.

Unreachable states The state p of DFA M=(Q, Σ, δ, q0, F) is unreachable if no such string w in Σ* exists for which p=δ*(q0, w). Reachable states can be obtained with the following algorithm:[citation needed] let reachable_states := {q0}; let new_states := {q0}; do { temp := the empty set; for each q in new_states do for each c in Σ do temp := temp ∪ {p such that p = δ(q,c)}; end; end; new_states := temp \ reachable_states; reachable_states := reachable_states ∪ new_states; } while (new_states ≠ the empty set); unreachable_states := Q \ reachable_states;

Unreachable states can be removed from the DFA without affecting the language that it accepts.

NFA minimization While the above procedures work for DFAs, the method of partitioning does not work for non-deterministic finite automata (NFAs).[9] While an exhaustive search may minimize an NFA, there is no polynomial-time algorithm to minimize general NFAs unless P=PSPACE, an unsolved conjecture in computational complexity theory which is widely believed to be false. However, there are methods of NFA minimization that may be more efficient than brute force search.[10]

1.5. NDFA to DFA conversion algorithm The powerset construction or subset construction is a standard method for converting a nondeterministic finite automaton (NFA) into a deterministic finite automaton (DFA) which recognizes the same formal language. It is important in theory because it establishes that NFAs, despite their additional flexibility, are unable to recognize any language that cannot be recognized by some DFA. It is also important in practice for converting easier-to-construct NFAs into more efficiently executable DFAs. However, if the NFA has n states, the resulting DFA may have up to 2n states, an exponentially larger number, which sometimes makes the construction impractical for large NFAs. The construction, sometimes called the Rabin–Scott powerset construction (or subset construction) to distinguish it from similar constructions for other types of automata, was first published by Michael O. Rabin and Dana Scott in 1959.[1]

To simulate the operation of a DFA on a given input string, one needs to keep track of a single state at any

14 of 76

time: the state that the automaton will reach after seeing a prefix of the input. In contrast, to simulate an NFA, one needs to keep track of a set of states: all of the states that the automaton could reach after seeing the same prefix of the input, according to the nondeterministic choices made by the automaton. If, after a certain prefix of the input, a set S of states can be reached, then after the next input symbol x the set of reachable states is a deterministic function of S and x. Therefore, the sets of reachable NFA states play the same role in the NFA simulation as single DFA states play in the DFA simulation, and in fact the sets of NFA states appearing in this simulation may be re-interpreted as being states of a DFA. [2]

Construction The powerset construction applies most directly to an NFA that does not allow state transformations without consuming input symbols (aka: "ε-moves"). Such an automaton may be defined as a 5-tuple (Q, Σ, T, q0, F), in which Q is the set of states, Σ is the set of input symbols, T is the transition function (mapping a state and an input symbol to a set of states), q0 is the initial state, and F is the set of accepting states. The corresponding DFA has states corresponding to subsets of Q. The initial state of the DFA is {q0}, the (one-element) set of initial states. The transition function of the DFA maps a state S (representing a subset of Q) and an input symbol x to the set T(S,x) = ∪{T(q,x) | q ∈ S}, the set of all states that can be reached by an x-transition from a state in S. A state S of the DFA is an accepting state if and only if at least one member of S is an accepting state of the NFA.[2][3] In the simplest version of the powerset construction, the set of all states of the DFA is the powerset of Q, the set of all possible subsets of Q. However, many states of the resulting DFA may be useless as they may be unreachable from the initial state. An alternative version of the construction creates only the states that are actually reachable.[4]

NFA with ε-moves For an NFA with ε-moves (also called an ε-NFA), the construction must be modified to deal with these by computing the ε-closure of states: the set of all states reachable from some given state using only ε-moves. Van Noord recognizes three possible ways of incorporating this closure computation in the powerset construction:[5] 1. Compute the ε-closure of the entire automaton as a preprocessing step, producing an equivalent NFA without ε-moves, then apply the regular powerset construction. This version, also discussed by Hopcroft and Ullman,[6] is straightforward to implement, but impractical for automata with large numbers of ε-moves, as commonly arise in natural language processing application.[5] 2. During the powerset computation, compute the ε-closure of each state q that is considered by the algorithm (and cache the result). 3. During the powerset computation, compute the ε-closure of each subset of states Q' that is considered by the algorithm, and add its elements to Q'.

Multiple initial states If NFAs are defined to allow for multiple initial states,[7] the initial state of the corresponding DFA is the set of all initial states of the NFA, or (if the NFA also has ε-moves) the set of all states reachable from initial states by ε-moves.

Example The NFA below has four states; state 1 is initial, and states 3 and 4 are accepting. Its alphabet consists of the two symbols 0 and 1, and it has ε-moves.

15 of 76

The initial state of the DFA constructed from this NFA is the set of all NFA states that are reachable from state 1 by ε-moves; that is, it is the set {1,2,3}. A transition from {1,2,3} by input symbol 0 must follow either the arrow from state 1 to state 2, or the arrow from state 3 to state 4. Additionally, neither state 2 nor state 4 have outgoing ε-moves. Therefore, T({1,2,3},0) = {2,4}, and by the same reasoning the full DFA constructed from the NFA is as shown below.

As can be seen in this example, there are five states reachable from the start state of the DFA; the remaining 11 sets in the powerset of the set of NFA states are not reachable.

Complexity

NFA with 5 states (left) whose DFA (right) requires 16 states.[4] Because the DFA states consist of sets of NFA states, an n-state NFA may be converted to a DFA with at most 2n states.[2] For every n, there exist n-state NFAs such that every subset of states is reachable from the initial subset, so that the converted DFA has exactly 2n states, giving Θ(2n) worst-case time complexity.

16 of 76

[8][9]

A simple example requiring nearly this many states is the language of strings over the alphabet {0,1} in which there are at least n characters, the nth from last of which is 1. It can be represented by an (n + 1)-state NFA, but it requires 2n DFA states, one for each n-character suffix of the input; cf. picture for n=4.[4]

Applications Brzozowski's algorithm for DFA minimization uses the powerset construction, twice. It converts the input DFA into an NFA for the reverse language, by reversing all its arrows and exchanging the roles of initial and accepting states, converts the NFA back into a DFA using the powerset construction, and then repeats its process. Its worst-case complexity is exponential, unlike some other known DFA minimization algorithms, but in many examples it performs more quickly than its worst-case complexity would suggest.[10] Safra's construction, which converts a non-deterministic Büchi automaton with n states into a deterministic Muller automaton or into a deterministic Rabin automaton with 2O(n log n) states, uses the powerset construction as part of its machinery.[11]

1.6. Finite-state transducer

A finite-state transducer (FST) is a finite-state machine with two memory tapes, following the terminology for Turing machines: an input tape and an output tape. This contrasts with an ordinary finite-state automaton, which has a single tape. An FST is a type of finite-state automaton that maps between two sets of symbols.[1] An FST is more general than a finite-state automaton (FSA). An FSA defines a formal language by defining a set of accepted strings while an FST defines relations between sets of strings. An FST will read a set of strings on the input tape and generates a set of relations on the output tape. An FST can be thought of as a translator or relater between strings in a set. In morphological parsing, an example would be inputting a string of letters into the FST, the FST would then output a string of morphemes.

Overview An automaton can be said to recognize a string if we view the content of its tape as input. In other words, the automaton computes a function that maps strings into the set {0,1}. Alternatively, we can say that an automaton generates strings, which means viewing its tape as an output tape. On this view, the automaton generates a formal language, which is a set of strings. The two views of automata are equivalent: the function that the automaton computes is precisely the indicator function of the set of strings it generates. The class of languages generated by finite automata is known as the class of regular languages. The two tapes of a transducer are typically viewed as an input tape and an output tape. On this view, a transducer is said to transduce (i.e., translate) the contents of its input tape to its output tape, by accepting a string on its input tape and generating another string on its output tape. It may do so nondeterministically and it may produce more than one output for each input string. A transducer may also produce no output for a given input string, in which case it is said to reject the input. In general, a transducer computes a relation between two formal languages. Each string-to-string finite-state transducer relates the input alphabet Σ to the output alphabet Γ. Relations R on Σ*×Γ* that can be implemented as finite-state transducers are called rational relations. Rational relations that are partial functions, i.e. that relate every input string from Σ* to at most one Γ*, are called rational functions. Finite-state transducers are often used for phonological and morphological analysis in natural language processing research and applications. Pioneers in this field include Ronald Kaplan, Lauri Karttunen, Martin Kay and Kimmo Koskenniemi.[2][non-primary source needed] A common way of using transducers is in a so-called "cascade", where transducers for various operations are combined into a single transducer by repeated application of the composition operator (defined below).

Formal construction Formally, a finite transducer T is a 6-tuple (Q, Σ, Γ, I, F, δ) such that:

17 of 76

Q is a finite set, the set of states; Σ is a finite set, called the input alphabet; Γ is a finite set, called the output alphabet; I is a subset of Q, the set of initial states; F is a subset of Q, the set of final states; and (where ε is the empty string) is the transition relation. We can view (Q, δ) as a labeled directed graph, known as the transition graph of T: the set of vertices is Q, and means that there is a labeled edge going from vertex q to vertex r. We also say that a is the input label and b the output label of that edge. NOTE: This definition of finite transducer is also called letter transducer (Roche and Schabes 1997); alternative definitions are possible, but can all be converted into transducers following this one. Define the extended transition relation

as the smallest set such that:

; for all whenever

; and and

then

.

The extended transition relation is essentially the reflexive transitive closure of the transition graph that has been augmented to take edge labels into . The elements of are known as paths. The edge labels of a path are obtained by concatenating the edge labels of its constituent transitions in order. The behavior of the transducer T is the rational relation [T] defined as follows: if and only if there exists and such that . This is to say that T transduces a string into a string if there exists a path from an initial state to a final state whose input label is x and whose output label is y.

Weighted automata Finite State Transducers can be weighted, where each transition is labelled with a weight in addition to the input and output labels. A Weighted Finite State Transducer (WFST) over a set K of weights can be defined similarly to an unweighted one as an 8-tuple T=(Q, Σ, Γ, I, F, E, λ, ρ), where: Q, Σ, Γ, I, F are defined as above; (where ε is the empty string) is the finite set of transitions; maps initial states to weights; maps final states to weights. In order to make certain operations on WFSTs well-defined, it is convenient to require the set of weights to form a semiring.[3] Two typical semirings used in practice are the log semiring and tropical semiring: unweighted automata may be regarded as having weights in the Boolean semiring.[4]

Stochastic FST Stochastic FSTs (also known as probabilistic FSTs or statistical FSTs) are presumably a form of weighted FST.[citation needed]

Operations on finite-state transducers The following operations defined on finite automata also apply to finite transducers: Union. Given transducers T and S, there exists a transducer such that if and only if or . Concatenation. Given transducers T and S, there exists a transducer such that if and only if there exist with and Kleene closure. Given a transducer T, there exists a transducer with the following properties:

;

and

18 of 76

(k1)

(k2)

then ; and

does not hold unless mandated by (k1) or (k2).

Composition. Given a transducer T on alphabets Σ and Γ and a transducer S on alphabets Γ and Δ, there exists a transducer on Σ and Δ such that if and only if there exists a string such that

and

. This operation extends to the weighted case.[5]

This definition uses the same notation used in mathematics for relation composition. However, the conventional reading for relation composition is the other way around: given two relations T and S, when there exist some y such that and Projection to an automaton. There are two projection functions: preserves the input tape, and preserves the output tape. The first projection, is defined as follows: Given a transducer T, there exists a finite automaton exists a string y for which The second projection,

such that

accepts x if and only if there

is defined similarly.

Determinization. Given a transducer T, we want to build an equivalent transducer that has a unique initial state and such that no two transitions leaving any state share the same input label. The powerset construction can be extended to transducers, or even weighted transducers, but sometimes fails to halt; indeed, some non-deterministic transducers do not it equivalent deterministic transducers.[6] Characterizations of determinizable transducers have been proposed[7] along with efficient algorithms to test them:[8] they rely on the semiring used in the weighted case as well as a general property on the structure of the transducer (the twins property). Weight pushing for the weighted case.[9] Minimization for the weighted case.[10] Removal of epsilon-transitions.

1.6.1. Moore machine A Moore machine is a finite-state machine whose output values are determined only by its current state. This is in contrast to a Mealy machine, whose output values are determined both by its current state and by the values of its inputs. The Moore machine is named after Edward F. Moore, who presented the concept in a 1956 paper, “Gedanken-experiments on Sequential Machines.”[1]

Formal definition A Moore machine can be defined as a 6-tuple

consisting of the following:

A finite set of states A start state (also called initial state) which is an element of A finite set called the input alphabet A finite set called the output alphabet A transition function mapping a state and the input alphabet to the next state An output function mapping each state to the output alphabet A Moore machine can be regarded as a restricted type of finite-state transducer.

Visual representation States a b output q0 q1 q2 1 q1 q1 q1 0 q2 q1 q0 1

19 of 76

Table State transition table is a table showing relation between an input and a state. [clarification needed]

Diagram The state diagram for a Moore machine or Moore diagram is a diagram that associates an output value with each state. Moore machine is an output producer.

1.6.2. Mealy machine A Mealy machine is a finite-state machine whose output values are determined both by its current state and the current inputs. (This is in contrast to a Moore machine, whose output values are determined solely by its current state.) A Mealy machine is a deterministic finite-state transducer: for each state and input, at most one transition is possible. The Mealy machine is named after George H. Mealy, who presented the concept in a 1955 paper, “A Method for Synthesizing Sequential Circuits”.[1]

Formal definition A Mealy machine is a 6-tuple

consisting of the following:

a finite set of states a start state (also called initial state) which is an element of a finite set called the input alphabet a finite set called the output alphabet a transition function mapping pairs of a state and an input symbol to the corresponding next state. an output function mapping pairs of a state and an input symbol to the corresponding output symbol. In some formulations, the transition and output functions are coalesced into a single function .

Comparison of Mealy machines and Moore machines 1. Mealy machines tend to have fewer states: Different outputs on arcs (n2) rather than states (n). 2. Moore machines are safer to use: Outputs change at clock edge (always one cycle later). In Mealy machines, input change can cause output change as soon as logic is done—a big problem when two machines are interconnected – asynchronous may occur if one isn't careful. 3. Mealy machines react faster to inputs: React in same cycle—don't need to wait for clock. In Moore machines, more logic may be necessary to decode state into outputs—more gate delays after clock edge. Not all sequential circuits can be implemented using the Mealy model. Some sequential circuits can only be implemented as Moore machines.[2]

Diagram The state diagram for a Mealy machine associates an output value with each transition edge (in contrast to the state diagram for a Moore machine, which associates an output value with each state). When the input and output alphabet are both Σ, one can also associate to a Mealy Automata an Helix directed graph (S × Σ, (x, i) → (T(x, i), G(x, i))).[3] This graph has as vertices the couples of state and letters, every nodes are of out-degree one, and the successor of (x, i) is the next state of the automata and the letter that the automata output when it is instate x and it reads letter i. This graph is a union of dist cycles if the automaton is bireversible.

20 of 76

Examples Simple

State diagram for a simple Mealy machine with one input and one output. A simple Mealy machine has one input and one output. Each transition edge is labeled with the value of the input (shown in red) and the value of the output (shown in blue). The machine starts in state Si. (In this example, the output is the exclusive-or of the two most-recent input values; thus, the machine implements an edge detector, outputting a one every time the input flips and a zero otherwise.)

Complex More complex Mealy machines can have multiple inputs as well as multiple outputs.

Applications Mealy machines provide a rudimentary mathematical model for cipher machines. Considering the input and output alphabet the Latin alphabet, for example, then a Mealy machine can be designed that given a string of letters (a sequence of inputs) can process it into a ciphered string (a sequence of outputs). However, although one could use a Mealy model to describe the Enigma, the state diagram would be too complex to provide feasible means of deg complex ciphering machines. Moore/Mealy machines, are DFAs that have also output at any tick of the clock. Modern Us, computers, cell phones, digital clocks and basic electronic devices/machines have some kind of finite state machine to control it. Simple software systems, particularly ones that can be represented using regular expressions, can be modeled as Finite State Machines. There are many of such simple systems, such as vending machines or basic electronics. By finding the intersection of two Finite state machines, one can design in a very simple manner concurrent systems that exchange messages for instance. For example, a traffic light is a system that consists of multiple subsystems, such as the different traffic lights, that work concurrently. Some examples of applications: number classification watch with timer vending machine traffic light bar code scanner gas pumps

2. Regular grammar

21 of 76

A regular grammar is a formal grammar that is right-regular or left-regular.

Strictly regular grammars A right regular grammar (also called right linear grammar) is a formal grammar (N, Σ, P, S) such that all the production rules in P are of one of the following forms: 1. B → a - where B is a non-terminal in N and a is a terminal in Σ 2. B → aC - where B and C are non-terminals in N and a is in Σ 3. B → ε - where B is in N and ε denotes the empty string, i.e. the string of length 0. In a left regular grammar (also called left linear grammar), all rules obey the forms 1. A → a - where A is a non-terminal in N and a is a terminal in Σ 2. A → Ba - where A and B are in N and a is in Σ 3. A → ε - where A is in N and ε is the empty string. A regular grammar is a left or right regular grammar. Some textbooks and articles disallow empty production rules, and assume that the empty string is not present in languages.

Examples An example of a right regular grammar G with N = {S, A}, Σ = {a, b, c}, P consists of the following rules S → aS S → bA A→ε A → cA and S is the start symbol.

2.1. Regular expression

The match results of the pattern (?<=\.) {2,}(?=[A-Z])

At least two spaces are matched, but only if they occur directly after a period (.) and before an upper case letter.

22 of 76

Stephen Cole Kleene, who helped found the concept

A blacklist on Wikipedia which uses regular expressions to identify bad titles A regular expression, regex or regexp[1] (sometimes called a rational expression)[2][3] is a sequence of characters that define a search pattern. Usually this pattern is then used by string searching algorithms for "find" or "find and replace" operations on strings, or for input validation. The concept arose in the 1950s when the American mathematician Stephen Cole Kleene formalized the description of a regular language. The concept came into common use with Unix text-processing utilities. Since the 1980s, different syntaxes for writing regular expressions exist, one being the POSIX standard and another, widely used, being the Perl syntax. Regular expressions are used in search engines, search and replace dialogs of word processors and text editors, in text processing utilities such as sed and AWK and in lexical analysis. Many programming languages provide regex capabilities, built-in or via libraries.

Patterns The phrase regular expressions, and consequently, regexes, is often used to mean the specific, standard textual syntax (distinct from the mathematical notation described below) for representing patterns for matching text. Each character in a regular expression (that is, each character in the string describing its pattern) is either a metacharacter, having a special meaning, or a regular character that has a literal meaning. For example, in the regex a., a is a literal character which matches just 'a' and '.' is a meta character that matches every character except a newline. Therefore, this regex matches, for example, 'a ', or 'ax', or 'a0'. Together, metacharacters and literal characters can be used to identify text of a given pattern, or process a number of instances of it. Pattern-matches may vary from a precise equality to a very general similarity, as controlled by the metacharacters. For example, . is a very general pattern, [a-z] (match all letters from 'a' to 'z') is less general and a is a precise pattern (matches just 'a'). The metacharacter syntax is designed specifically to represent prescribed targets in a concise and flexible way to direct the automation of text processing of a variety of input data, in a form easy to type using a standard ASCII keyboard. A very simple case of a regular expression in this syntax is to locate a word spelled two different ways in a text editor, the regular expression seriali[sz]e matches both "serialise" and "serialize". Wildcards also achieve this, but are more limited in what they can pattern, as they have fewer metacharacters and a simple language-base. The usual context of wildcard characters is in globbing similar names in a list of files, whereas regexes are

23 of 76

usually employed in applications that pattern-match text strings in general. For example, the regex ^[ \t]+|[ \t]+$ matches excess whitespace at the beginning or end of a line. An advanced regular expression that matches any numeral is [+-]?(\d+(\.\d+)?|\.\d+)([eE][+-]?\d+)?.

Translating the Kleene star (s* means 'zero or more of s ') A regex processor translates a regular expression in the above syntax into an internal representation which can be executed and matched against a string representing the text being searched in. One possible approach is the Thompson's construction algorithm to construct a nondeterministic finite automaton (NFA), which is then made deterministic and the resulting deterministic finite automaton (DFA) is run on the target text string to recognize substrings that match the regular expression. The picture shows the NFA scheme N(s*) obtained from the regular expression s*, where s denotes a simpler regular expression in turn, which has already been recursively translated to the NFA N(s).

Basic concepts A regular expression, often called a pattern, is an expression used to specify a set of strings required for a particular purpose. A simple way to specify a finite set of strings is to list its elements or . However, there are often more concise ways to specify the desired set of strings. For example, the set containing the three strings "Handel", "Händel", and "Haendel" can be specified by the pattern H(ä|ae?)ndel; we say that this pattern matches each of the three strings. In most formalisms, if there exists at least one regular expression that matches a particular set then there exists an infinite number of other regular expression that also match it—the specification is not unique. Most formalisms provide the following operations to construct regular expressions. Boolean "or" A vertical bar separates alternatives. For example, gray|grey can match "gray" or "grey". Grouping Parentheses are used to define the scope and precedence of the operators (among other uses). For example, gray|grey and gr(a|e)y are equivalent patterns which both describe the set of "gray" or "grey". Quantification A quantifier after a token (such as a character) or group specifies how often that preceding element is allowed to occur. The most common quantifiers are the question mark ?, the asterisk * (derived from the Kleene star), and the plus sign + (Kleene plus). ? * +

The question mark indicates zero or one occurrences of the preceding element. For example, colou?r matches both "color" and "colour". The asterisk indicates zero or more occurrences of the preceding element. For example, ab*c matches "ac", "abc", "abbc", "abbbc", and so on. The plus sign indicates one or more occurrences of the preceding element. For example, ab+c matches "abc", "abbc", "abbbc", and so on, but not "ac".

{n}[19]

The preceding item is matched exactly n times.

{min,}[19]

The preceding item is matched min or more times.

{min,max}[19]

The preceding item is matched at least min times, but not more than max times.

These constructions can be combined to form arbitrarily complex expressions, much like one can construct arithmetical expressions from numbers and the operations +, −, ×, and ÷. For example, H(ae?|ä)ndel and H(a|ae|ä)ndel are both valid patterns which match the same strings as the earlier example, H(ä|ae?)ndel. The precise syntax for regular expressions varies among tools and with context; more detail is given in the Syntax section. Regular expressions describe regular languages in formal language theory. They have the same expressive power as regular grammars.

Formal definition

24 of 76

Regular expressions consist of constants, which denote sets of strings, and operator symbols, which denote operations over these sets. The following definition is standard, and found as such in most textbooks on formal language theory.[20][21] Given a finite alphabet Σ, the following constants are defined as regular expressions: (empty set) ∅ denoting the set ∅. (empty string) ε denoting the set containing only the "empty" string, which has no characters at all. (literal character) a in Σ denoting the set containing only the character a. Given regular expressions R and S, the following operations over them are defined to produce regular expressions: (concatenation) RS denotes the set of strings that can be obtained by concatenating a string in R and a string in S. For example, let R = {"ab", "c"}, and S = {"d", "ef"}. Then, RS = {"abd", "abef", "cd", "cef"}. (alternation) R | S denotes the set union of sets described by R and S. For example, if R describes {"ab", "c"} and S describes {"ab", "d", "ef"}, expression R | S describes {"ab", "c", "d", "ef"}. (Kleene star) R* denotes the smallest superset of set described by R that contains ε and is closed under string concatenation. This is the set of all strings that can be made by concatenating any finite number (including zero) of strings from set described by R. For example, {"0","1"}* is the set of all finite binary strings (including the empty string), and {"ab", "c"}* = {ε, "ab", "c", "abab", "abc", "cab", "cc", "ababab", "abcab", … }. To avoid parentheses it is assumed that the Kleene star has the highest priority, then concatenation and then alternation. If there is no ambiguity then parentheses may be omitted. For example, (ab)c can be written as abc, and a|(b(c*)) can be written as a|bc*. Many textbooks use the symbols ∪, +, or ∨ for alternation instead of the vertical bar. Examples: denotes {ε, "a", "b", "bb", "bbb", …} denotes the set of all strings with no symbols other than "a" and "b", including the empty string: {ε, "a", "b", "aa", "ab", "ba", "bb", "aaa", …} ab*(c|ε) denotes the set of strings starting with "a", then zero or more "b"s and finally optionally a "c": {"a", "ac", "ab", "abc", "abb", "abbc", …} (0|(1(01*0)*1))* denotes the set of binary numbers that are multiples of 3: { ε, "0", "00", "11", "000", "011", "110", "0000", "0011", "0110", "1001", "1100", "1111", "00000", … } a|b*

(a|b)*

Expressive power and compactness The formal definition of regular expressions is purposely parsimonious and avoids defining the redundant quantifiers ? and +, which can be expressed as follows: a+ = aa*, and a? = (a|ε). Sometimes the complement operator is added, to give a generalized regular expression; here Rc matches all strings over Σ* that do not match R. In principle, the complement operator is redundant, as it can always be circumscribed by using the other operators. However, the process for computing such a representation is complex, and the result may require expressions of a size that is double exponentially larger.[22][23] Regular expressions in this sense can express the regular languages, exactly the class of languages accepted by deterministic finite automata. There is, however, a significant difference in compactness. Some classes of regular languages can only be described by deterministic finite automata whose size grows exponentially in the size of the shortest equivalent regular expressions. The standard example here is the languages Lk consisting of all strings over the alphabet {a,b} whose kth-from-last letter equals a. On one hand, a regular expression describing L4 is given by . Generalizing this pattern to Lk gives the expression:

On the other hand, it is known that every deterministic finite automaton accepting the language Lk must have at least 2k states. Luckily, there is a simple mapping from regular expressions to the more general nondeterministic finite automata (NFAs) that does not lead to such a blowup in size; for this reason NFAs are often used as alternative representations of regular languages. NFAs are a simple variation of the type-3 grammars of the Chomsky hierarchy.[20] In the opposite direction, there are many languages easily described by a DFA that are not easily described a regular expression. For instance, determining the validity of a given ISBN number requires computing the modulus of the integer base 11, and can be easily implemented with an 11-state DFA. However, a regular expression to answer the same problem of divisibility by 11 is at least multiple

25 of 76

megabytes in length.[citation needed] Given a regular expression, Thompson's construction algorithm computes an equivalent nondeterministic finite automaton. A conversion in the opposite direction is achieved by Kleene's algorithm. Finally, it is worth noting that many real-world "regular expression" engines implement features that cannot be described by the regular expressions in the sense of formal language theory; rather, they implement regexes. See below for more on this.

Deciding equivalence of regular expressions As seen in many of the examples above, there is more than one way to construct a regular expression to achieve the same results. It is possible to write an algorithm that, for two given regular expressions, decides whether the described languages are equal; the algorithm reduces each expression to a minimal deterministic finite state machine, and determines whether they are isomorphic (equivalent). Algebraic laws for regular expressions can be obtained using a method by Gischer which is best explained along an example: In order to check whether (X+Y)* and (X* Y*)* denote the same regular language, for all regular expressions X, Y, it is necessary and sufficient to check whether the particular regular expressions (a+b)* and (a* b*)* denote the same language over the alphabet Σ={a,b}. More generally, an equation E=F between regular-expression with variables holds if, and only if, its instantiaton with different variables replaced by different symbol constants holds.[24][25] The redundancy can be eliminated by using Kleene star and set union to find an interesting subset of regular expressions that is still fully expressive, but perhaps their use can be restricted. [clarification needed] This is a surprisingly difficult problem. As simple as the regular expressions are, there is no method to systematically rewrite them to some normal form. The lack of axiom in the past led to the star height problem. In 1991, Dexter Kozen axiomatized regular expressions as a Kleene algebra, using equational and Horn clause axioms.[26] Already in 1964, Redko had proved that no finite set of purely equational axioms can characterize the algebra of regular languages.[27]

Syntax A regex pattern matches a target string. The pattern is composed of a sequence of atoms. An atom is a single point within the regex pattern which it tries to match to the target string. The simplest atom is a literal, but grouping parts of the pattern to match an atom will require using ( ) as metacharacters. Metacharacters help form: atoms; quantifiers telling how many atoms (and whether it is a greedy quantifier or not); a logical OR character, which offers a set of alternatives, and a logical NOT character, which negates an atom's existence; and backreferences to refer to previous atoms of a completing pattern of atoms. A match is made, not when all the atoms of the string are matched, but rather when all the pattern atoms in the regex have matched. The idea is to make a small pattern of characters stand for a large number of possible strings, rather than compiling a large list of all the literal possibilities. Depending on the regex processor there are about fourteen metacharacters, characters that may or may not have their literal character meaning, depending on context, or whether they are "escaped", i.e. preceded by an escape sequence, in this case, the backslash \. Modern and POSIX extended regexes use metacharacters more often than their literal meaning, so to avoid "backslash-osis" it makes sense to have a metacharacter escape to a literal mode; but starting out, it makes more sense to have the four bracketing metacharacters ( ) and { } be primarily literal, and "escape" this usual meaning to become metacharacters. Common standards implement both. The usual metacharacters are {}[]()^$.|*+? and \. The usual characters that become metacharacters when escaped are dswDSW and N.

Delimiters When entering a regex in a programming language, they may be represented as a usual string literal, hence usually quoted; this is common in C, Java, and Python for instance, where the regex re is entered as "re". However, they are often written with slashes as delimiters, as in /re/ for the regex re. This originates in ed, where / is the editor command for searching, and an expression /re/ can be used to specify a range of lines (matching the pattern), which can be combined with other commands on either side, most famously g/re/p as in grep ("global regex print"), which is included in most Unix-based operating systems, such as Linux distributions. A similar convention is used in sed, where search and replace is given by s/re/replacement/ and patterns can be ed with a comma to specify a range of lines as in /re1/,/re2/. This notation is particularly well-known due to its use in Perl, where it forms part of the syntax distinct from normal string literals. In some cases, such as sed and Perl, alternative delimiters can be used to avoid collision with contents, and to avoid having to escape occurrences of the delimiter character in the

26 of 76

contents. For example, in sed the command

s,/,X,

will replace a

/

with an X, using commas as delimiters.

Standards The IEEE POSIX standard has three sets of compliance: BRE (Basic Regular Expressions), [28] ERE (Extended Regular Expressions), and SRE (Simple Regular Expressions). SRE is deprecated,[29] in favor of BRE, as both provide backward compatibility. The subsection below covering the character classes applies to both BRE and ERE. BRE and ERE work together. ERE adds ?, +, and |, and it removes the need to escape the metacharacters ( ) and { }, which are required in BRE. Furthermore, as long as the POSIX standard syntax for regexes is adhered to, there can be, and often is, additional syntax to serve specific (yet POSIX compliant) applications. Although POSIX.2 leaves some implementation specifics undefined, BRE and ERE provide a "standard" which has since been adopted as the default syntax of many tools, where the choice of BRE or ERE modes is usually a ed option. For example, GNU grep has the following options: "grep -E" for ERE, and "grep -G" for BRE (the default), and "grep -P" for Perl regexes. Perl regexes have become a de facto standard, having a rich and powerful set of atomic expressions. Perl has no "basic" or "extended" levels. As in POSIX EREs, ( ) and { } are treated as metacharacters unless escaped; other metacharacters are known to be literal or symbolic based on context alone. Additional functionality includes lazy matching, backtracking, named capture groups, and recursive patterns. POSIX basic and extended[edit] In the POSIX standard, Basic Regular Syntax (BRE) requires that the metacharacters designated  and \{\}, whereas Extended Regular Syntax (ERE) does not. Metacharacter ^ .

[ ]

[^ ]

$ ( )

\n

*

{m,n}

( )

and

The - character is treated as a literal character if it is the last or the first (after the ^, if present) character within the brackets: [abc-], [-abc]. Note that backslash escapes are not allowed. The ] character can be included in a bracket expression if it is the first (after the ^) character: []abc]. Matches a single character that is not contained within the brackets. For example, [âbc] matches any character other than "a", "b", or "c". [â-z] matches any single character that is not a lowercase letter from "a" to "z". Likewise, literal characters and ranges can be mixed. Matches the ending position of the string or the position just before a string-ending newline. In line-based tools, it matches the ending position of any line. Defines a marked subexpression. The string matched within the parentheses can be recalled later (see the next entry, \n). A marked subexpression is also called a block or capturing group. BRE mode requires . Matches what the nth marked subexpression matched, where n is a digit from 1 to 9. This construct is vaguely defined in the POSIX.2 standard. Some tools allow referencing more than nine capturing groups. Matches the preceding element zero or more times. For example, ab*c matches "ac", "abc", "abbbc", etc. [xyz]* matches "", "x", "y", "z", "zx", "zyx", "xyzzy", and so on. (ab)* matches "", "ab", "abab", "ababab", and so on. Matches the preceding element at least m and not more than n times. For example, a{3,5} matches only "aaa", "aaaa", and "aaaaa". This is not found in a few older instances of regexes. BRE mode requires \{m,n\}.

matches any three-character string ending with "at", including "hat", "cat", and "bat". matches "hat" and "cat".

[hc]at

27 of 76

be

Description Matches the starting position within the string. In line-based tools, it matches the starting position of any line. Matches any single character (many applications exclude newlines, and exactly which characters are considered newlines is flavor-, character-encoding-, and platform-specific, but it is safe to assume that the line feed character is included). Within POSIX bracket expressions, the dot character matches a literal dot. For example, a.c matches "abc", etc., but [a.c] matches only "a", ".", or "c". A bracket expression. Matches a single character that is contained within the brackets. For example, [abc] matches "a", "b", or "c". [a-z] specifies a range which matches any lowercase letter from "a" to "z". These forms can be mixed: [abcx-z] matches "a", "b", "c", "x", "y", or "z", as does [a-cx-z].

Examples: .at

{ }

matches all strings matched by .at except "bat". matches all strings matched by .at other than "hat" and "cat". ^[hc]at matches "hat" and "cat", but only at the beginning of the string or line. [hc]at$ matches "hat" and "cat", but only at the end of the string or line. \[.\] matches any single character surrounded by "[" and "]" since the brackets are escaped, for example: "[a]" and "[b]". s.* matches s followed by zero or more characters, for example: "s" and "saw" and "seed". [^b]at

[^hc]at

POSIX extended[edit] The meaning of metacharacters escaped with a backslash is reversed for some characters in the POSIX Extended Regular Expression (ERE) syntax. With this syntax, a backslash causes the metacharacter to be treated as a literal character. So, for example,  is now ( ) and \{ \} is now { }. Additionally, is removed for \n backreferences and the following metacharacters are added: Metacharacter ?

+

|

Description Matches the preceding element zero or one time. For example, ab?c matches only "ac" or "abc". Matches the preceding element one or more times. For example, ab+c matches "abc", "abbc", "abbbc", and so on, but not "ac". The choice (also known as alternation or set union) operator matches either the expression before or the expression after the operator. For example, abc|def matches "abc" or "def".

Examples: [hc]?at [hc]*at [hc]+at cat|dog

matches matches matches matches

"at", "hat", and "cat". "at", "hat", "cat", "hhat", "chat", "hcat", "cchchat", and so on. "hat", "cat", "hhat", "chat", "hcat", "cchchat", and so on, but not "at". "cat" or "dog".

POSIX Extended Regular Expressions can often be used with modern Unix utilities by including the command line flag -E. Character classes[edit] The character class is the most basic regex concept after a literal match. It makes one small sequence of characters match a larger set of characters. For example, [A-Z] could stand for the upper case alphabet, and \d could mean any digit. Character classes apply to both POSIX levels. When specifying a range of characters, such as [a-Z] (i.e. lowercase a to upper-case z), the computer's locale settings determine the contents by the numeric ordering of the character encoding. They could store digits in that sequence, or the ordering could be abc…zABC…Z, or aAbBcC…zZ. So the POSIX standard defines a character class, which will be known by the regex processor installed. Those definitions are in the following table: POSIX

Non-standard

Perl/Tcl Vim

[:ascii:][30]

ASCII

Description

[\x00-\x7F]

ASCII characters

\p{Alnum}

[A-Za-z0-9]

\W

\W

[Â-Za-z0-9_]

[:alpha:]

\a

\p{Alpha}

[A-Za-z]

[:blank:]

\s

\p{Blank}

[ [[\t]]]

Alphanumeric characters Alphanumeric characters plus "_" Non-word characters Alphabetic characters Space and tab

\< \>

\b

(?<=\W)(?=\w)|(?<=\w) (?=\W)

Word boundaries

\B

(?<=\W)(?=\W)|(?<=\w) (?=\w)

Non-word boundaries

[:cntrl:]

\p{Cntrl}

[\x00-\x1F\x7F]

Control characters

[:digit:]

[0-9]

Digits Non-digits Visible characters Lowercase letters Visible characters and the space character

[:alnum:] [:word:]

[citation needed]

\w

\w

\w

[A-Za-z0-9_]

\W

\b

or

\d

\d

\p{Digit} \d

\D

\D

\D

[^0-9]

\p{Graph}

[\x21-\x7E]

[:graph:]

28 of 76

Java \p{ASCII}

[:lower:]

\l

\p{Lower}

[a-z]

[:print:]

\p

\p{Print}

[\x20-\x7E]

[:punct:] [:space:]

\p{Punct} \s

\_s

\p{Space} \s

\S

or

[][!"#$%&'()*+,./:; <=>?@\^_`{|}~-]

Punctuation characters

[ \t\r\n\v\f]

Whitespace characters Non-whitespace characters Uppercase letters Hexadecimal digits

\S

\S

[^ \t\r\n\v\f]

[:upper:]

\u

\p{Upper}

[A-Z]

[:xdigit:]

\x

\p{XDigit}

[A-Fa-f0-9]

POSIX character classes can only be used within bracket expressions. For example, the uppercase letters and lowercase "a" and "b".

[[:upper:]ab]

matches

An additional non-POSIX class understood by some tools is [:word:], which is usually defined as [:alnum:] plus underscore. This reflects the fact that in many programming languages these are the characters that may be used in identifiers. The editor Vim further distinguishes word and word-head classes (using the notation \w and \h) since in many programming languages the characters that can begin an identifier are not the same as those that can occur in other positions. Note that what the POSIX regex standards call character classes are commonly referred to as POSIX character classes in other regex flavors which them. With most other regex flavors, the term character class is used to describe what POSIX calls bracket expressions.

2.2. Regular language A regular language (also called a rational language[1][2]) is a formal language that can be expressed using a regular expression, in the strict sense of the latter notion used in theoretical computer science (as opposed to many regular expressions engines provided by modern programming languages, which are augmented with features that allow recognition of languages that cannot be expressed by a classic regular expression). Alternatively, a regular language can be defined as a language recognized by a finite automaton. The equivalence of regular expressions and finite automata is known as Kleene's theorem[3] (after American mathematician Stephen Cole Kleene). In the Chomsky hierarchy, regular languages are defined to be the languages that are generated by Type-3 grammars (regular grammars). Regular languages are very useful in input parsing and programming language design.

Formal definition The collection of regular languages over an alphabet Σ is defined recursively as follows: The empty language Ø, and the empty string language {ε} are regular languages. For each a ∈ Σ (a belongs to Σ), the singleton language {a} is a regular language. If A and B are regular languages, then A ∪ B (union), A • B (concatenation), and A* (Kleene star) are regular languages. No other languages over Σ are regular. See regular expression for its syntax and semantics. Note that the above cases are in effect the defining rules of regular expression.

Examples All finite languages are regular; in particular the empty string language {ε} = Ø* is regular. Other typical examples include the language consisting of all strings over the alphabet {a, b} which contain an even number of as, or the language consisting of all strings of the form: several as followed by several bs. A simple example of a language that is not regular is the set of strings { anbn | n ≥ 0 }.[4] Intuitively, it cannot be recognized with a finite automaton, since a finite automaton has finite memory and it cannot the exact number of a's. Techniques to prove this fact rigorously are given below.

Closure properties

29 of 76

The regular languages are closed under the various operations, that is, if the languages K and L are regular, so is the result of the following operations: the set theoretic Boolean operations: union K ∪ L, intersection K ∩ L, and complement L, hence also relative complement K-L.[10] the regular operations: K ∪ L, concatenation K ∘ L, and Kleene star L*.[11] the trio operations: string homomorphism, inverse string homomorphism, and intersection with regular languages. As a consequence they are closed under arbitrary finite state transductions, like quotient K / L with a regular language. Even more, regular languages are closed under quotients with arbitrary languages: If L is regular then L/K is regular for any K.[citation needed] the reverse (or mirror image) LR.[citation needed]

The number of words in a regular language Let denote the number of words of length formal power series

in

. The ordinary generating function for L is the

The generating function of a language L is a rational function if L is regular.[26] Hence for any regular language there exist an integer constant , complex constants and complex polynomials such that for every the number of words of length in is .[28][29][30][31] Thus, non-regularity of certain languages can be proved by counting the words of a given length in Consider, for example, the Dyck language of strings of balanced parentheses. The number of words of length

in the Dyck language is equal to the Catalan number

.

, which is not of the form

, witnessing the non-regularity of the Dyck language. Care must be taken since some of the eigenvalues could have the same magnitude. For example, the number of words of length in the language of all even binary words is not of the form , but the number of words of even or odd length are of this form; the corresponding eigenvalues are . In general, for every regular language there exists a constant such that for all , the number of words of length is asymptotically .[32] The zeta function of a language L is[26]

The zeta function of a regular language is not in general rational, but that of a cyclic language is. [33][34]

Generalizations The notion of a regular language has been generalized to infinite words (see ω-automata) and to trees (see tree automaton). Rational set generalizes the notion (of regular/rational language) to monoids that are not necessarily free. Likewise, the notion of a recognizable language (by a finite automaton) has namesake as recognizable set over a monoid that is not necessarily free. Howard Straubing notes in relation to these facts that “The term "regular language" is a bit unfortunate. Papers influenced by Eilenberg's monograph[35] often use either the term "recognizable language", which refers to the behavior of automata, or "rational language", which refers to important analogies between regular expressions and rational power series. (In fact, Eilenberg defines rational and recognizable subsets of arbitrary monoids; the two notions do not, in general, coincide.) This terminology, while better motivated, never really caught on, and "regular language" is used almost universally.”[36] Rational series is another generalization, this time in the context of a formal power series over a semiring. This approach gives rise to weighted rational expressions and weighted automata. In this algebraic context, the regular languages (corresponding to Boolean-weighted rational expressions) are usually called rational languages.[37][38] Also in this context, Kleene's theorem finds a generalization called the

30 of 76

Kleene-Schützenberger theorem.

Induction ^ 1. ⇒ 2. by Thompson's construction algorithm ^ 2. ⇒ 1. by Kleene's algorithm ^ 2. ⇒ 3. by the powerset construction ^ 3. ⇒ 2. since the former definition is stronger than the latter ^ 2. ⇒ 4. see Hopcroft, Ullman (1979), Theorem 9.2, p.219 ^ 4. ⇒ 2. see Hopcroft, Ullman (1979), Theorem 9.1, p.218 ^ 3. ⇔ 9. by the Myhill–Nerode theorem ^ u~v is defined as: uw∈L if and only if vw∈L for all w∈Σ* ^ 3. ⇔ 10. see the proof in the Syntactic monoid article, and see p.160 in Holcombe, W.M.L. (1982). Algebraic automata theory. Cambridge Studies in Advanced Mathematics. 1. Cambridge University Press. ISBN 0-521-60492-3. Zbl 0489.68046. 10. ^ Check if LA ∩ LB = LA. Deciding this property is NP-hard in general; see File:RegSubsetNP.pdf for an illustration of the proof idea. 1. 2. 3. 4. 5. 6. 7. 8. 9.

2.3. Pumping lemma for regular languages The pumping lemma for regular languages is a lemma that describes an essential property of all regular languages. Informally, it says that all sufficiently long words in a regular language may be pumped—that is, have a middle section of the word repeated an arbitrary number of times—to produce a new word that also lies within the same language. Specifically, the pumping lemma says that for any regular language L there exists a constant p such that any word w in L with length at least p can be split into three substrings, w = xyz, where the middle portion y must not be empty, such that the words xz, xyz, xyyz, xyyyz, … constructed by repeating y zero or more times are still in L. This process of repetition is known as "pumping". Moreover, the pumping lemma guarantees that the length of xy will be at most p, imposing a limit on the ways in which w may be split. Finite languages vacuously satisfy the pumping lemma by having p equal to the maximum string length in L plus one. The pumping lemma is useful for disproving the regularity of a specific language in question. It was first proven by Michael Rabin and Dana Scott in 1959.

Formal statement Let L be a regular language. Then there exists an integer p ≥ 1 depending only on L such that every string w in L of length at least p (p is called the "pumping length"[4]) can be written as w = xyz (i.e., w can be divided into three substrings), satisfying the following conditions:

y is the substring that can be pumped (removed or repeated any number of times, and the resulting string is always in L). (1) means the loop y to be pumped must be of length at least one; (2) means the loop must occur within the first p characters. |x| must be smaller than p (conclusion of (1) and (2)), but apart from that, there is no restriction on x and z. In simple words, for any regular language L, any sufficiently long word w (in L) can be split into 3 parts. i.e. w = xyz , such that all the strings xynz for n ≥ 0 are also in L. Below is a formal expression of the Pumping Lemma.

31 of 76

Use of the lemma The pumping lemma is often used to prove that a particular language is non-regular: a proof by contradiction (of the language's regularity) may consist of exhibiting a word (of the required length) in the language that lacks the property outlined in the pumping lemma. For example, the language L = {anbn : n ≥ 0} over the alphabet Σ = {a, b} can be shown to be non-regular as follows. Let w, x, y, z, p, and i be as used in the formal statement for the pumping lemma above. Let w in L be given by w = apbp. By the pumping lemma, there must be some decomposition w = xyz with |xy| ≤ p and |y| ≥ 1 such that xyiz in L for every i ≥ 0. Using |xy| ≤ p, we know y only consists of instances of a. Moreover, because |y| ≥ 1, it contains at least one instance of the letter a. We now pump y up: xy2z has more instances of the letter a than the letter b, since we have added some instances of a without adding instances of b. Therefore, xy2z is not in L. We have reached a contradiction. Therefore, the assumption that L is regular must be incorrect. Hence L is not regular. The proof that the language of balanced (i.e., properly nested) parentheses is not regular follows the same idea. Given p, there is a string of balanced parentheses that begins with more than p left parentheses, so that y will consist entirely of left parentheses. By repeating y, we can produce a string that does not contain the same number of left and right parentheses, and so they cannot be balanced.

Proof of the pumping lemma

Proof idea: Whenever a sufficiently long string xyz is recognized by a finite automaton, it must have reached some state (qs=qt) twice. Hence, after repeating ("pumping") the middle part y arbitrarily often (xyyz, xyyyz, ...) the word will still be recognized. For every regular language there is a finite state automaton (FSA) that accepts the language. The number of states in such an FSA are counted and that count is used as the pumping length p. For a string of length at least p, let q0 be the start state and let q1, ..., qp be the sequence of the next p states visited as the string is emitted. Because the FSA has only p states, within this sequence of p + 1 visited states there must be at least one state that is repeated. Write qs for such a state. The transitions that take the machine from the first encounter of state qs to the second encounter of state qs match some string. This string is called y in the lemma, and since the machine will match a string without the y portion, or with the string y repeated any number of times, the conditions of the lemma are satisfied. For example, the following image shows an FSA.

The FSA accepts the string: abcd. Since this string has a length at least as large as the number of states, which is four, the pigeonhole principle indicates that there must be at least one repeated state among the start state and the next four visited states. In this example, only q1 is a repeated state. Since the substring bc takes the machine through transitions that start at state q1 and end at state q1, that portion could be repeated and the FSA would still accept, giving the string abcbcd. Alternatively, the bc portion could be removed and the FSA would still accept giving the string ad. In of the pumping lemma, the string abcd is broken into an x portion a, a y portion bc and a z portion d.

32 of 76

General version of pumping lemma for regular languages If a language L is regular, then there exists a number p ≥ 1 (the pumping length) such that every string uwv in L with |w| ≥ p can be written in the form uwv = uxyzv with strings x, y and z such that |xy| ≤ p, |y| ≥ 1 and uxyizv is in L for every integer i ≥ 0.[5] From this, the above standard version follows a special case, with both u and v being the empty string. Since the general version imposes stricter requirements on the language, it can be used to prove the non-regularity of many more languages, such as { ambncn : m≥1 and n≥1 }.[6]

Converse of lemma not true While the pumping lemma states that all regular languages satisfy the conditions described above, the converse of this statement is not true: a language that satisfies these conditions may still be non-regular. In other words, both the original and the general version of the pumping lemma give a necessary but not sufficient condition for a language to be regular. For example, consider the following language L: . In other words, L contains all strings over the alphabet {0,1,2,3} with a substring of length 3 including a duplicate character, as well as all strings over this alphabet where precisely 1/7 of the string's characters are 3's. This language is not regular but can still be "pumped" with p = 5. Suppose some string s has length at least 5. Then, since the alphabet has only four characters, at least two of the first five characters in the string must be duplicates. They are separated by at most three characters. If the duplicate characters are separated by 0 characters, or 1, pump one of the other two characters in the string, which will not affect the substring containing the duplicates. If the duplicate characters are separated by 2 or 3 characters, pump 2 of the characters separating them. Pumping either down or up results in the creation of a substring of size 3 that contains 2 duplicate characters. The second condition of L ensures that L is not regular: Consider the string . This string is in L exactly when and thus L is not regular by the Myhill-Nerode theorem. The Myhill-Nerode theorem provides a test that exactly characterizes regular languages. The typical method for proving that a language is regular is to construct either a finite state machine or a regular expression for the language.

3. Context-free grammar

In formal language theory, a context-free grammar (CFG) is a certain type of formal grammar: a set of production rules that describe all possible strings in a given formal language. Production rules are simple replacements. For example, the rule

replaces

33 of 76

with

. There can be multiple replacement rules for any given value. For example,

means that

can be replaced with either

or

.

In context-free grammars, all rules are one-to-one, one-to-many, or one-to-none. These rules can be applied regardless of context. The left-hand side of the production rule is always a nonterminal symbol. This means that the symbol does not appear in the resulting formal language. So in our case, our language [1] contains the letters and but not Rules can also be applied in reverse to check if a string is grammatically correct according to the grammar. Here is an example context-free grammar that describes all two-letter strings containing the letters .

and

If we start with the nonterminal symbol then we can use the rule to turn into . We can then apply one of the two later rules. For example, if we apply to the first we get . If we then apply to the second we get . Since both and are terminal symbols, and in context-free grammars terminal symbols never appear on the left hand side of a production rule, there are no more rules that can be applied. This same process can be used, applying the second two rules in different orders in order to get all possible strings within our simple context-free grammar.

Formal definitions A context-free grammar G is defined by the 4-tuple:[5] where 1. V is a finite set; each element is called a nonterminal character or a variable. Each variable represents a different type of phrase or clause in the sentence. Variables are also sometimes called syntactic categories. Each variable defines a sub-language of the language defined by G. 2. Σ is a finite set of terminals, dist from V, which make up the actual content of the sentence. The set of terminals is the alphabet of the language defined by the grammar G. 3. R is a finite relation from V to , where the asterisk represents the Kleene star operation. The of R are called the (rewrite) rules or productions of the grammar. (also commonly symbolized by a P) 4. S is the start variable (or start symbol), used to represent the whole sentence (or program). It must be an element of V.

Production rule notation A production rule in R is formalized mathematically as a pair , where is a nonterminal and is a string of variables and/or terminals; rather than using ordered pair notation, production rules are usually written using an arrow operator with α as its left hand side and β as its right hand side: . It is allowed for β to be the empty string, and in this case it is customary to denote it by ε. The form is called an ε-production.[6] It is common to list all right-hand sides for the same left-hand side on the same line, using | (the pipe symbol) to separate them. Rules and can hence be written as . In this case, and is called the first and second alternative, respectively.

Rule application For any strings and to u.

, we say u directly yields v, written as , if with such that and . Thus, v is a result of applying the rule

Repetitive rule application

34 of 76

For any strings

we say u yields v, written as such that

(or

in some textbooks), if . In this case, if (i.e.,

),

the relation holds. In other words, and are the reflexive transitive closure (allowing a word to yield itself) and the transitive closure (requiring at least one step) of , respectively.

Proper CFGs A context-free grammar is said to be proper,[7] if it has no unreachable symbols: no unproductive symbols: no ε-productions: no cycles: Every context-free grammar can be effectively transformed into a weakly equivalent one without unreachable symbols,[8] a weakly equivalent one without unproductive symbols,[9] and a weakly equivalent one without cycles.[10] Every context-free grammar not producing ε can be effectively transformed into a weakly equivalent one without ε-productions;[11] altogether, every such grammar can be effectively transformed into a weakly equivalent proper CFG.

Example The grammar

, with productions

S → aSa, S → bSb, S → ε, is context-free. It is not proper since it includes an ε-production. A typical derivation in this grammar is S → aSa → aaSaa → aabSbaa → aabbaa. This makes it clear that proved that it is not regular.

. The language is context-free, however, it can be

3.1. Production (computer science)

A production or production rule in computer science is a rewrite rule specifying a symbol substitution that can be recursively performed to generate new symbol sequences. A finite set of productions is the main component in the specification of a formal grammar (specifically a generative grammar). The other components are a finite set of nonterminal symbols, a finite set (known as an alphabet) of terminal symbols that is dist from and a distinguished symbol that is the start symbol. In an unrestricted grammar, a production is of the form where and are arbitrary strings of terminals and nonterminals however may not be the empty string. If is the empty string, this is denoted by the symbol , or (rather than leave the right-hand side blank). So productions are of the cartesian product , where is the vocabulary, is the Kleene star operator, indicates concatenation, and denotes set union. If we do not allow the start symbol to occur in (the word on the right side), we have to replace by on the right side of the cartesian product symbol.[1] The other types of formal grammar in the Chomsky hierarchy impose additional restrictions on what constitutes a production. Notably in a context-free grammar, the left-hand side of a production must be a single nonterminal symbol. So productions are of the form:

35 of 76

Grammar generation To generate a string in the language, one begins with a string consisting of only a single start symbol, and then successively applies the rules (any number of times, in any order) to rewrite this string. This stops when we obtain a string containing only terminals. The language consists of all the strings that can be generated in this manner. Any particular sequence of legal choices taken during this rewriting process yields one particular string in the language. If there are multiple different ways of generating this single string, then the grammar is said to be ambiguous. For example, assume the alphabet consists of rules:

and , with the start symbol

, and we have the following

1. 2. then we start with , and can choose a rule to apply to it. If we choose rule 1, we replace with and obtain the string . If we choose rule 1 again, we replace with and obtain the string . This process is repeated until we only have symbols from the alphabet (i.e., and ). If we now choose rule 2, we replace with and obtain the string , and are done. We can write this series of choices more briefly, using symbols: . The language of the grammar is the set of all the strings that can be generated using this process: .

3.2. Context-free language In formal language theory, a context-free language (CFL) is a language generated by a context-free grammar (CFG). Context-free languages have many applications in programming languages, in particular, most arithmetic expressions are generated by context-free grammars.

Context-free parsing Main article: Parsing The context-free nature of the language makes it simple to parse with a pushdown automaton. Determining an instance of the hip problem; i.e. given a string , determine whether where is the language generated by a given grammar ; is also known as recognition. Context-free recognition for Chomsky normal form grammars was shown by Leslie G. Valiant to be reducible to boolean matrix multiplication, thus inheriting its complexity upper bound of O(n2.3728639).[2][3][note 2] Conversely, Lillian Lee has shown O(n3−ε) boolean matrix multiplication to be reducible to O(n3−3ε) CFG parsing, thus establishing some kind of lower bound for the latter.[4] Practical uses of context-free languages require also to produce a derivation tree that exhibits the structure that the grammar associates with the given string. The process of producing this tree is called parsing. Known parsers have a time complexity that is cubic in the size of the string that is parsed. Formally, the set of all context-free languages is identical to the set of languages accepted by pushdown automata (PDA). Parser algorithms for context-free languages include the CYK algorithm and Earley's Algorithm. A special subclass of context-free languages are the deterministic context-free languages which are defined as the set of languages accepted by a deterministic pushdown automaton and can be parsed by a LR(k) parser.[5] See also parsing expression grammar as an alternative approach to grammar and parser.

Closure Context-free languages are closed under the following operations. That is, if L and P are context-free languages, the following languages are context-free as well: the union of L and P the reversal of L

36 of 76

the concatenation of L and P the Kleene star of L the image of L under a homomorphism the image of L under an inverse homomorphism the cyclic shift of L (the language ) Context-free languages are not closed under complement, intersection, or difference. This was proved by Scheinberg in 1960.[6] However, if L is a context-free language and D is a regular language then both their intersection and their difference are context-free languages. Nonclosure under intersection, complement, and difference[edit] The context-free languages are not closed under intersection. This can be seen by taking the languages and , which are both context-free.[note 3] Their intersection is , which can be shown to be non-context-free by the pumping lemma for context-free languages. Context-free languages are also not closed under complementation, as for any languages A and B: . Context-free language are also not closed under difference: LC = Σ* \ L.[6]

Languages that are not context-free The set

is a context-sensitive language, but there does not exist a context-free grammar

generating this language.[19] So there exist context-sensitive languages which are not context-free. To prove that a given language is not context-free, one may employ the pumping lemma for context-free languages[18] or a number of other methods, such as Ogden's lemma or Parikh's theorem.[20]

3.3. Ambiguous grammar An ambiguous grammar is a context-free grammar for which there exists a string that can have more than one leftmost derivation, while an unambiguous grammar is a context-free grammar for which every valid string has a unique leftmost derivation or parse tree. Many languages it both ambiguous and unambiguous grammars, while some languages it only ambiguous grammars. Any non-empty language its an ambiguous grammar by taking an unambiguous grammar and introducing a duplicate rule or synonym (the only language without ambiguous grammars is the empty language). For computer programming languages, the reference grammar is often ambiguous, due to issues such as the dangling else problem. If present, these ambiguities are generally resolved by adding precedence rules or other context-sensitive parsing rules, so the overall phrase grammar is unambiguous. [citation needed] The set of all parse trees for an ambiguous sentence is called a parse forest.[1]

Examples Trivial language The simplest example is the following ambiguous grammar for the trivial language, which consists of only the empty string: A→A|ε …meaning that a production can either be itself again, or the empty string. Thus the empty string has leftmost derivations of length 1, 2, 3, and indeed of any length, depending on how many times the rule A → A is used. This language also has the unambiguous grammar, consisting of a single production rule: A→ε …meaning that the unique production can only produce the empty string, which is the unique string in the language.

37 of 76

In the same way, any grammar for a non-empty language can be made ambiguous by adding duplicates.

Unary string The regular language of unary strings of a given character, say unambiguous grammar:

'a'

(the regular expression

a*),

has the

A → aA | ε …but also has the ambiguous grammar: A → aA | Aa | ε These correspond to producing a right-associative tree (for the unambiguous grammar) or allowing both left- and right- association. This is elaborated below.

Addition and subtraction The context free grammar A → A + A | A − A | A * A | id is ambiguous since there are two leftmost derivations for the string a + a + a:      A → A + A

→a+A

     A → A + A → A + A + A (First A is replaced by A+A. Replacement of the second A would yield a      similar derivation)

→a+A+      A →a+a+      A →a+a+      a

→a+A+A →a+a+A →a+a+a

As another example, the grammar is ambiguous since there are two parse trees for the string a + a − a:

The language that it generates, however, is not inherently ambiguous; the following is a non-ambiguous grammar generating the same language: A→A+a|A−a|a

Dangling else Main article: Dangling else A common example of ambiguity in computer programming languages is the dangling else problem. In many languages, the else in an If–then(–else) statement is optional, which results in nested conditionals having multiple ways of being recognized in of the context-free grammar. Concretely, in many languages one may write conditionals in two valid forms: the if-then form, and the if-then-else form – in effect, making the else clause optional: [note 1] In a grammar containing the rules Statement → if Condition then Statement | if Condition then Statement else Statement | ... Condition → ...

some ambiguous phrase structures can appear. The expression if a then if b then s else s2

38 of 76

can be parsed as either if a then begin if b then s end else s2

or as if a then begin if b then s else s2 end

depending on whether the

else

is associated with the first

if

or second

if.

This is resolved in various ways in different languages. Sometimes the grammar is modified so that it is unambiguous, such as by requiring an endif statement or making else mandatory. In other cases the grammar is left ambiguous, but the ambiguity is resolved by making the overall phrase grammar contextsensitive, such as by associating an else with the nearest if. In this latter case the grammar is unambiguous, but the context-free grammar is ambiguous. [clarification needed]

Counter-example The simple grammar S → A + A A → 0 | 1

is an unambiguous grammar for the language { 0+0, 0+1, 1+0, 1+1 }. While each of these four strings has only one leftmost derivation, it has two different derivations, for example S ⇒ A + A ⇒ 0 + A ⇒ 0 + 0

and S ⇒ A + A ⇒ A + 0 ⇒ 0 + 0

Only the former one is a leftmost one.

Recognizing ambiguous grammars The decision problem of whether an arbitrary grammar is ambiguous is undecidable because it can be shown that it is equivalent to the Post correspondence problem.[2] At least, there are tools implementing some semi-decision procedure for detecting ambiguity of context-free grammars.[3] The efficiency of context-free grammar parsing is determined by the automaton that accepts it. Deterministic context-free grammars are accepted by deterministic pushdown automata and can be parsed in linear time, for example by the LR parser.[4] This is a subset of the context-free grammars which are accepted by the pushdown automaton and can be parsed in polynomial time, for example by the CYK algorithm. Unambiguous context-free grammars can be nondeterministic. For example, the language of even-length palindromes on the alphabet of 0 and 1 has the unambiguous context-free grammar S → 0S0 | 1S1 | ε. An arbitrary string of this language cannot be parsed without reading all its letters first which means that a pushdown automaton has to try alternative state transitions to accommodate for the different possible lengths of a semi-parsed string. [5] Nevertheless, removing grammar ambiguity may produce a deterministic context-free grammar and thus allow for more efficient parsing. Compiler generators such as YACC include features for resolving some kinds of ambiguity, such as by using the precedence and associativity constraints.

Inherently ambiguous languages The existence of inherently ambiguous languages was proven with Parikh's theorem in 1961 by Rohit Parikh in an MIT research report.[6] While some context-free languages (the set of strings that can be generated by a grammar) have both ambiguous and unambiguous grammars, there exist context-free languages for which no unambiguous context-free grammar can exist. An example of an inherently ambiguous language is the union of with . This set is context-free, since the union of two context-free languages is always context-free. But Hopcroft & Ullman (1979) give a proof that there is no way to unambiguously parse strings in the (non-context-free) common subset .[7]

3.4. Chomsky normal form In formal language theory, a context-free grammar G is said to be in Chomsky normal form (first

39 of 76

described by Noam Chomsky)[1] if all of its production rules are of the form:[2]:92–93,106 A → BC, or A → a, or S → ε, where A, B, and C are nonterminal symbols, a is a terminal symbol (a symbol that represents a constant value), S is the start symbol, and ε denotes the empty string. Also, neither B nor C may be the start symbol, and the third production rule can only appear if ε is in L(G), the language produced by the context-free grammar G. Every grammar in Chomsky normal form is context-free, and conversely, every context-free grammar can be transformed into an equivalent one[note 1] which is in Chomsky normal form and has a size no larger than the square of the original grammar's size.

Converting a grammar to Chomsky normal form To convert a grammar to Chomsky normal form, a sequence of simple transformations is applied in a certain order; this is described in most textbooks on automata theory. [2]:87–94[3][4][5] The presentation here follows Hopcroft, Ullman (1979), but is adapted to use the transformation names from Lange, Leiß (2009).[6][note 2] Each of the following transformations establishes one of the properties required for Chomsky normal form.

START: Eliminate the start symbol from right-hand sides Introduce a new start symbol S0, and a new rule S0 → S, where S is the previous start symbol. This doesn't change the grammar's produced language, and S0 won't occur on any rule's right-hand side.

TERM: Eliminate rules with nonsolitary terminals To eliminate each rule A → X1 ... a ... Xn with a terminal symbol a being not the only symbol on the right-hand side, introduce, for every such terminal, a new nonterminal symbol Na, and a new rule Na → a. Change every rule A → X1 ... a ... Xn to A → X1 ... Na ... Xn. If several terminal symbols occur on the right-hand side, simultaneously replace each of them by its associated nonterminal symbol. This doesn't change the grammar's produced language. [2]:92

BIN: Eliminate right-hand sides with more than 2 nonterminals Replace each rule A → X1 X2 ... Xn with more than 2 nonterminals X1,...,Xn by rules A → X1 A1, A1 → X2 A2, ... , An-2 → Xn-1 Xn,

40 of 76

where Ai are new nonterminal symbols. Again, this doesn't change the grammar's produced language. [2]:93

DEL: Eliminate ε-rules An ε-rule is a rule of the form A → ε, where A is not S0, the grammar's start symbol. To eliminate all rules of this form, first determine the set of all nonterminals that derive ε. Hopcroft and Ullman (1979) call such nonterminals nullable, and compute them as follows: If a rule A → ε exists, then A is nullable. If a rule A → X1 ... Xn exists, and every single Xi is nullable, then A is nullable, too. Obtain an intermediate grammar by replacing each rule A → X1 ... Xn by all versions with some nullable Xi omitted. By deleting in this grammar each ε-rule, unless its left-hand side is the start symbol, the transformed grammar is obtained.[2]:90 For example, in the following grammar, with start symbol S0, S0 → AbB | C B → AA | AC C→b|c A→a|ε the nonterminal A, and hence also B, is nullable, while neither C nor S0 is. Hence the following intermediate grammar is obtained:[note 3] S0 → AbB | AbB | AbB | AbB | C B → AA | AA | AA | AεA | AC | AC C→b|c A→a|ε In this grammar, all ε-rules have been "inlined at the call site".[note 4] In the next step, they can hence be deleted, yielding the grammar: S0 → AbB | Ab | bB | b | C B → AA | A | AC | C C→b|c A→a This grammar produces the same language as the original example grammar, viz. {ab,aba,abaa,abab,abac,abb,abc,b,bab,bac,bb,bc,c}, but apparently has no ε-rules.

UNIT: Eliminate unit rules A unit rule is a rule of the form A → B, where A, B are nonterminal symbols. To remove it, for each rule B → X1 ... Xn, where X1 ... Xn is a string of nonterminals and terminals, add rule A → X1 ... Xn unless this is a unit rule which has already been (or is being) removed.

Order of transformations When choosing the order in which the above transformations are

41 of 76

Mutual preservation

to be applied, it has to be considered that some transformations of transformation results may destroy the result achieved by other ones. For example, Transformation X always preserves (✓) START will re-introduce a unit rule if it is applied after UNIT. The resp. may destroy (✗) the result of Y: table shows which orderings are itted. Y START TERM BIN DEL UNIT X\ Moreover, the worst-case bloat in grammar size[note 5] depends on START ✓ ✓ ✗ ✗ the transformation order. Using |G| to denote the size of the TERM ✓ ✗ ✓ ✓ original grammar G, the size blow-up in the worst case may range BIN ✓ ✓ ✓ ✓ from |G|2 to 22 |G|, depending on the transformation algorithm DEL ✓ ✓ ✓ ✗ used.[6]:7 The blow-up in grammar size depends on the order * UNIT ✓ ✓ ✓ (✓) between DEL and BIN. It may be exponential when DEL is done first, but is linear otherwise. UNIT can incur a quadratic blow-up *UNIT preserves the result of DEL if START had been called before. in the size of the grammar.[6]:5 The orderings START,TERM,BIN,DEL,UNIT and START,BIN,DEL,UNIT,TERM lead to the least (i.e. quadratic) blow-up.

Example

Abstract syntax tree of the arithmetic expression "a^2+4*b" wrt. the example grammar (top) and its Chomsky normal form (bottom) The following grammar, with start symbol Expr, describes a simplified version of the set of all syntactical valid arithmetic expressions in programming languages like C or Algol60. Both number and variable are considered terminal symbols here for simplicity, since in a compiler front-end their internal structure is usually not considered by the parser. The terminal symbol "^" denoted exponentiation in Algol60. Expr → Term → Factor → Primary → AddOp → MulOp →

Term | Factor | Primary | number | + | * |

Expr AddOp Term | AddOp Term Term MulOp Factor Factor ^ Primary variable | ( Expr ) − /

In step "START" of the above conversion algorithm, just a rule S0→Expr is added to the grammar. After step "TERM", the grammar looks like this: S0

→ Expr → Term → Factor → Primary → AddOp → MulOp → PowOp → Open → Close →

Expr Term | Factor | Primary | number | + | * | ^ ( )

Expr AddOp Term | AddOp Term Term MulOp Factor Factor PowOp Primary variable | Open Expr Close − /

After step "BIN", the following grammar is obtained: S0 Expr

42 of 76

→ Expr → Term

| Expr AddOp_Term

| AddOp Term

Term → Factor → Primary → AddOp → MulOp → PowOp → Open → Close → AddOp_Term → MulOp_Factor → PowOp_Primary → Expr_Close →

Factor | Term MulOp_Factor Primary | Factor PowOp_Primary number | variable | Open Expr_Close + |− * |/ ^ ( ) AddOp Term MulOp Factor PowOp Primary Expr Close

Since there are no ε-rules, step "DEL" doesn't change the grammar. After step "UNIT", the following grammar is obtained, which is in Chomsky normal form: | → | | Open | Factor | Term | Expr AddOp number variable Expr_Close PowOp_Primary MulOp_Factor AddOp_Term Term | → | | Open | Factor | Term | Expr Expr AddOp number variable Expr_Close PowOp_Primary MulOp_Factor AddOp_Term Term → | | Open | Factor | Term Term number variable Expr_Close PowOp_Primary MulOp_Factor → | | Open | Factor Factor number variable Expr_Close PowOp_Primary → | | Open Primary number variable Expr_Close AddOp →+ |− MulOp →* |/ PowOp →^ Open →( Close →) AddOp_Term → AddOp Term MulOp_Factor → MulOp Factor PowOp_Primary → PowOp Primary Expr_Close → Expr Close S0

The Na introduced in step "TERM" are PowOp, Open, and Close. The Ai introduced in step "BIN" are AddOp_Term, MulOp_Factor, PowOp_Primary, and Expr_Close.

3.5. Greibach normal form In formal language theory, a context-free grammar is in Greibach normal form (GNF) if the right-hand sides of all production rules start with a terminal symbol, optionally followed by some variables. A non-strict form allows one exception to this format restriction for allowing the empty word (epsilon, ε) to be a member of the described language. The normal form was established by Sheila Greibach and it bears her name. More precisely, a context-free grammar is in Greibach normal form, if all production rules are of the form:

or

where is a nonterminal symbol, is a terminal symbol, is a (possibly empty) sequence of nonterminal symbols not including the start symbol, is the start symbol, and ε is the empty word.[1] Observe that the grammar does not have left recursions. Every context-free grammar can be transformed into an equivalent grammar in Greibach normal form. [2] Various constructions exist. Some do not permit the second form of rule and cannot transform context-free

43 of 76

grammars that can generate the empty word. For one such construction the size of the constructed grammar is O(n4) in the general case and O(n3) if no derivation of the original grammar consists of a single nonterminal symbol, where n is the size of the original grammar.[3] This conversion can be used to prove that every context-free language can be accepted by a real-time pushdown automaton, i.e., the automaton reads a letter from its input every step. Given a grammar in GNF and a derivable string in the grammar with length n, any top-down parser will halt at depth n.

3.6. Pumping lemma for context-free languages The pumping lemma for context-free languages, also known as the Bar-Hillel [clarification needed] lemma, is a lemma that gives a property shared by all context-free languages and generalizes the pumping lemma for regular languages. As the pumping lemma does not suffice to guarantee that a language is context-free there are more stringent necessary conditions, such as Ogden's lemma.

Proof idea: If s is sufficiently long, its derivation tree w.r.t. a Chomsky normal form grammar must contain some nonterminal N twice on some tree path (upper picture). Repeating n times the derivation part N ⇒...⇒ vNx obtains a derivation for uvnwxny (lower left and right picture for n=0 and 2, respectively). If a language L is context-free, then there exists some integer p ≥ 1 (called a "pumping length"[1]) such that every string s in L that has a length of p or more symbols (i.e. with |s| ≥ p) can be written as s = uvwxy with substrings u, v, w, x and y, such that 1. |vwx| ≤ p, 2. |vx| ≥ 1, and 3. uvnwxny ∈ L for all n ≥ 0. Below is a formal expression of the Pumping Lemma.

Informal statement and explanation The pumping lemma for context-free languages (called just "the pumping lemma" for the rest of this

44 of 76

article) describes a property that all context-free languages are guaranteed to have. The property is a property of all strings in the language that are of length at least p, where p is a constant —called the pumping length—that varies between context-free languages. Say s is a string of length at least p that is in the language. The pumping lemma states that s can be split into five substrings, s = uvwxy, where vx is non-empty and the length of vwx is at most p, such that repeating v and x any (and the same) number of times in s produces a string that is still in the language (it is possible and often useful to repeat zero times, which removes v and x from the string). This process of "pumping up" additional copies of v and x is what gives the pumping lemma its name. Finite languages (which are regular and hence context-free) obey the pumping lemma trivially by having p equal to the maximum string length in L plus one. As there are no strings of this length the pumping lemma is not violated.

Usage of the lemma The pumping lemma is often used to prove that a given language L is non-context-free, by showing that arbitrarily long strings s are in L that cannot be "pumped" without producing strings outside L. For example, the language L = { anbncn | n > 0 } can be shown to be non-context-free by using the pumping lemma in a proof by contradiction. First, assume that L is context free. By the pumping lemma, there exists an integer p which is the pumping length of language L. Consider the string s = apbp in L. The pumping lemma tells us that s can be written in the form s = uvwxy, where u, v, w, x, and y are substrings, such that |vwx| ≤ p, |vx| ≥ 1, and uviwxiy ∈ L for every integer i ≥ 0. By the choice of s and the fact that |vwx| ≤ p, it is easily seen that the substring vwx can contain no more than two distinct symbols. That is, we have one of five possibilities for vwx: 1. 2. 3. 4. 5.

vwx vwx vwx vwx vwx

= = = = =

aj for some j ≤ p. ajbk for some j and k with j+k ≤ p. bj for some j ≤ p. bjck for some j and k with j+k ≤ p. cj for some j ≤ p.

For each case, it is easily verified that uviwxiy does not contain equal numbers of each letter for any i ≠ 1. Thus, uv2wx2y does not have the form aibici. This contradicts the definition of L. Therefore, our initial assumption that L is context free must be false. While the pumping lemma is often a useful tool to prove that a given language is not context-free, it does not give a complete characterization of the context-free languages. If a language does not satisfy the condition given by the pumping lemma, we have established that it is not context-free. On the other hand, there are languages that are not context-free, but still satisfy the condition given by the pumping lemma, for example L = { bjckdl | j, k, l ∈ ℕ } ∪ { aibjcjdj | i, j ∈ ℕ, i≥1 }: for s=bjckdl with e.g. j≥1 choose vwx to consist only of b’s, for s=aibjcjdj choose vwx to consist only of a’s; in both cases all pumped strings are still in L.[2]

4. Pushdown automaton A pushdown automaton (PDA) is a type of automaton that employs a stack. Pushdown automata are used in theories about what can be computed by machines. Deterministic pushdown automata can recognize all deterministic context-free languages while nondeterministic ones can recognize all context-free languages, with the former often used in parser design. The term "pushdown" refers to the fact that the stack can be regarded as being "pushed down" like a tray dispenser at a cafeteria, since the operations never work on elements other than the top element. A stack automaton, by contrast, does allow access to and operations on deeper elements. Stack automata can

45 of 76

recognize a strictly larger set of languages than pushdown automata.[1] A nested stack automaton allows full access, and also allows stacked values to be entire sub-stacks rather than just single finite symbols.

A diagram of a pushdown automaton A finite state machine just looks at the input signal and the current state: it has no stack to work with. It chooses a new state, the result of following the transition. A pushdown automaton (PDA) differs from a finite state machine in two ways: 1. It can use the top of the stack to decide which transition to take. 2. It can manipulate the stack as part of performing a transition. A pushdown automaton reads a given input string from left to right. In each step, it chooses a transition by indexing a table by input symbol, current state, and the symbol at the top of the stack. A pushdown automaton can also manipulate the stack, as part of performing a transition. The manipulation can be to push a particular symbol to the top of the stack, or to pop off the top of the stack. The automaton can alternatively ignore the stack, and leave it as it is. Put together: Given an input symbol, current state, and stack symbol, the automaton can follow a transition to another state, and optionally manipulate (push or pop) the stack. If, in every situation, at most one such transition action is possible, then the automaton is called a deterministic pushdown automaton (DPDA). In general, if several actions are possible, then the automaton is called a general, or nondeterministic, PDA. A given input string may drive a nondeterministic pushdown automaton to one of several configuration sequences; if one of them leads to an accepting configuration after reading the complete input string, the latter is said to belong to the language accepted by the automaton.

Formal definition We use standard formal language notation: empty string.

denotes the set of strings over alphabet

and

denotes the

A PDA is formally defined as a 7-tuple: where is a finite set of states is a finite set which is called the input alphabet is a finite set which is called the stack alphabet is a finite subset of , the transition relation. is the start state is the initial stack symbol is the set of accepting states An element is a transition of . It has the intended meaning that , in state , on the input and with as topmost stack symbol, may read , change the state to , pop , replacing it by pushing . The component of the transition relation is used to formalize that the PDA can either read a letter from the input, or proceed leaving the input untouched. In many texts the transition relation is replaced by an (equivalent) formalization, where is the transition function, mapping

46 of 76

into finite subsets of

Here contains all possible actions in state with on the stack, while reading One writes for example precisely when because . Note that finite in this definition is essential.

on the input.

Computations

a step of the pushdown automaton In order to formalize the semantics of the pushdown automaton a description of the current situation is introduced. Any 3-tuple is called an instantaneous description (ID) of , which includes the current state, the part of the input tape that has not been read, and the contents of the stack (topmost symbol written first). The transition relation defines the step-relation of on instantaneous descriptions. For instruction there exists a step , for every and every . In general pushdown automata are nondeterministic meaning that in a given instantaneous description there may be several possible steps. Any of these steps can be chosen in a computation. With the above definition in each step always a single symbol (top of the stack) is popped, replacing it with as many symbols as necessary. As a consequence no step is defined when the stack is empty. Computations of the pushdown automaton are sequences of steps. The computation starts in the initial state with the initial stack symbol on the stack, and a string on the input tape, thus with initial description . There are two modes of accepting. The pushdown automaton either accepts by final state, which means after reading its input the automaton reaches an accepting state (in ), or it accepts by empty stack ( ), which means after reading its input the automaton empties its stack. The first acceptance mode uses the internal memory (state), the second the external memory (stack). Formally one defines 1.

with

2.

with

and

(final state)

(empty stack)

Here represents the reflexive and transitive closure of the step relation consecutive steps (zero, one or more).

meaning any number of

For each single pushdown automaton these two languages need to have no relation: they may be equal but usually this is not the case. A specification of the automaton should also include the intended mode of acceptance. Taken over all pushdown automata both acceptance conditions define the same family of languages. Theorem. For each pushdown automaton one may construct a pushdown automaton such that , and vice versa, for each pushdown automaton one may construct a pushdown automaton

such that

Example The following is the formal description of the PDA which recognizes the language state:

47 of 76

by final

PDA for (by final state) , where states: input alphabet: stack alphabet: start state: start stack symbol: Z accepting states: The transition relation

consists of the following six instructions:

, , , , , and . In words, the first two instructions say that in state p any time the symbol 0 is read, one A is pushed onto the stack. Pushing symbol A on top of another A is formalized as replacing top A by AA (and similarly for pushing symbol A on top of a Z). The third and fourth instructions say that, at any moment the automaton may move from state p to state q. The fifth instruction says that in state q, for each symbol 1 read, one A is popped. Finally, the sixth instruction says that the machine may move from state q to accepting state r only when the stack consists of a single Z. There seems to be no generally used representation for PDA. Here we have depicted the instruction by an edge from state p to state q labelled by (read a; replace A by ).

Understanding the computation process

accepting computation for 0011 The following illustrates how the above PDA computes on different input strings. The subscript M from the step symbol is here omitted. a. Input string = 0011. There are various computations, depending on the moment the move from state p to state q is made. Only one of these is accepting. i. The final state is accepting, but the input is not accepted this way as it has not been read. ii. No further steps possible. iii. Accepting computation: ends in accepting state, while complete input has been read. b. Input string = 00111. Again there are various computations. None of these is accepting. i. The final state is accepting, but the input is not accepted this way as it has not been read.

48 of 76

ii. No further steps possible. iii. The final state is accepting, but the input is not accepted this way as it has not been (completely) read.

PDA and context-free languages Every context-free grammar can be transformed into an equivalent nondeterministic pushdown automaton. The derivation process of the grammar is simulated in a leftmost way. Where the grammar rewrites a nonterminal, the PDA takes the topmost nonterminal from its stack and replaces it by the right-hand part of a grammatical rule (expand). Where the grammar generates a terminal symbol, the PDA reads a symbol from input when it is the topmost symbol on the stack (match). In a sense the stack of the PDA contains the unprocessed data of the grammar, corresponding to a pre-order traversal of a derivation tree. Technically, given a context-free grammar, the PDA has a single state, 1, and its transition relation is constructed as follows. 1. 2.

for each rule (expand) for each terminal symbol (match)

The PDA accepts by empty stack. Its initial stack symbol is the grammar's start symbol. [citation needed] For a context-free grammar in Greibach normal form, defining (1,γ) ∈ δ(1,a,A) for each grammar rule A → aγ also yields an equivalent nondeterministic pushdown automaton.[2]:115 The converse, finding a grammar for a given PDA, is not that easy. The trick is to code two states of the PDA into the nonterminals of the grammar. Theorem. For each pushdown automaton .[2]:116

one may construct a context-free grammar

such that

The language of strings accepted by a deterministic pushdown automaton is called a deterministic context-free language. Not all context-free languages are deterministic.[note 1] As a consequence, the DPDA is a strictly weaker variant of the PDA and there exists no algorithm for converting a PDA to an equivalent DPDA, if such a DPDA exists. [citation needed]

Generalized pushdown automaton (GPDA) A GPDA is a PDA which writes an entire string of some known length to the stack or removes an entire string from the stack in one step. A GPDA is formally defined as a 6-tuple:

where

, and

are defined the same way as a PDA.

: is the transition function. Computation rules for a GPDA are the same as a PDA except that the instead of symbols.

's and

's are now strings

GPDA's and PDA's are equivalent in that if a language is recognized by a PDA, it is also recognized by a GPDA and vice versa. One can formulate an analytic proof for the equivalence of GPDA's and PDA's using the following simulation: Let

be a transition of the GPDA

where Construct the following transitions for the PDA:

49 of 76

.

Stack automaton As a generalization of pushdown automata, Ginsburg, Greibach, and Harrison (1967) investigated stack automata, which may additionally step left or right in the input string (surrounded by special endmarker symbols to prevent slipping out), and step up or down in the stack in read-only mode. [4][5] A stack automaton is called nonerasing if it never pops from the stack. The class of languages accepted by nondeterministic, nonerasing stack automata is NSPACE(n2), which is a superset of the context-sensitive languages.[1] The class of languages accepted by deterministic, nonerasing stack automata is DSPACE(n⋅log(n)).[1]

Alternating pushdown automata An alternating pushdown automaton (APDA) is a pushdown automaton with a state set where

.

States in and are called existential resp. universal. In an existential state an APDA nondeterministically chooses the next state and accepts if at least one of the resulting computations accepts. In a universal state APDA moves to all next states and accepts if all the resulting computations accept. The model was introduced by Chandra, Kozen and Stockmeyer.[6] Ladner, Lipton and Stockmeyer[7] proved that this model is equivalent to EXPTIME i.e. a language is accepted by some APDA iff it can be decided by an exponential-time algorithm. Aizikowitz and Kaminski[8] introduced synchronized alternating pushdown automata (SAPDA) that are equivalent to conjunctive grammars in the same way as nondeterministic PDA are equivalent to context-free grammars.

4.1. Nested stack automaton

In automata theory, a nested stack automaton is a finite automaton that can make use of a stack containing data which can be additional stacks.[1] Like a stack automaton, a nested stack automaton may step up or down in the stack, and read the current symbol; in addition, it may at any place create a new stack, operate on that one, eventually destroy it, and continue operating on the old stack. This way, stacks can be nested recursively to an arbitrary depth; however, the automaton always operates on the innermost stack only. A nested stack automaton is capable of recognizing an indexed language,[2] and in fact the class of indexed languages is exactly the class of languages accepted by one-way nondeterministic nested stack automata.[1][3] Nested stack automata should not be confused with embedded pushdown automata, which have less computational power.[citation needed]

Formal definition

50 of 76

A (nondeterministic two-way) nested stack automaton is a tuple ‹Q,Σ,Γ,δ,q0,Z0,F,[,],]› where Q, Σ, and Γ is a nonempty finite set of states, input symbols, and stack symbols, respectively, [, ], and ] are distinct special symbols not contained in Σ ∪ Γ, [ is used as left endmarker for both the input string and a (sub)stack string, ] is used as right endmarker for these strings, ] is used as the final endmarker of the string denoting the whole stack. [note 1] An extended input alphabet is defined by Σ' = Σ ∪ {[,]}, an extended stack alphabet by Γ' = Γ ∪ {]}, and the set of input move directions by D = {-1,0,+1}. δ, the finite control, is a mapping from Q × Σ' × (Γ' ∪ [Γ' ∪ {], []}) into finite subsets of Q × D × ([Γ* ∪ D), such that δ maps[note 2] Q × Σ' × [Γ Q × Σ' × Γ' Q × Σ' × [Γ' Q × Σ' × {]}

into into into into

subsets subsets subsets subsets

of of of of

Q Q Q Q

× × × ×

D D D D

× × × ×

(pushdown mode), [Γ* D (reading mode), {+1} (reading mode), {-1} (reading mode),

Q × Σ' × (Γ' ∪ [Γ') into subsets of Q × D × [Γ*] (stack creation mode), and Q × Σ' × {[]} into subsets of Q × D × {ε}, (stack destruction mode), Informally, the top symbol of a (sub)stack together with its preceding left endmarker "[" is viewed as a single symbol;[4] then δ reads the current state, the current input symbol, and the current stack symbol, and outputs the next state, the direction in which to move on the input, and the direction in which to move on the stack, or the string of symbols to replace the topmost stack symbol. q0 ∈ Q is the initial state, Z0 ∈ Γ is the initial stack symbol, F ⊆ Q is the set of final states.

Configuration A configuration, or instantaneous description of such an automaton consists in a triple ‹ q, [a1a2...ai...an-1], [Z1X2...Xj...Xm-1] ›, where q ∈ Q is the current state, [a1a2...ai...an-1] is the input string; for convenience, a0 = [ and an = ] is defined[note 3] The current position in the input, viz. i with 0 ≤ i ≤ n, is marked by underlining the respective symbol. [Z1X2...Xj...Xm-1] is the stack, including substacks; for convenience, X1 = [Z1 [note 4] and Xm = ] is defined. The current position in the stack, viz. j with 1 ≤ j ≤ m, is marked by underlining the respective symbol.

Example An example run (input string not shown): Action

Step Stack 1: [a b [k ] [p ] c ] create substack 2: [a b [k ] [p [r s ] ] c ] pop 3: [a b [k ] [p [s ] ] c ] pop 4: [a b [k ] [p [] ] c ] destroy substack 5: [a b [k ] [p ] c ] move down 6: [a b [k ] [p ] c ] move up 7: [a b [k ] [p ] c ] move up 8: [a b [k ] [p ] c ] push 9: [a b [k ] [n o p ] c ]

Properties

51 of 76

When automata are allowed to re-read their input ("two-way automata"), nested stacks do not result in additional language recognition capabilities, compared to plain stacks. [5] Gilman and Shapiro used nested stack automata to solve the word problem in certain groups.[6]

5. Turing machine A Turing machine is a mathematical model of computation that defines an abstract machine,[1] which manipulates symbols on a strip of tape according to a table of rules. [2] Despite the model's simplicity, given any computer algorithm, a Turing machine capable of simulating that algorithm's logic can be constructed.[3] The machine operates on an infinite[4] memory tape divided into discrete cells.[5] The machine positions its head over a cell and "reads" (scans)[6] the symbol there. Then, as per the symbol and its present place in a finite table[7] of -specified instructions, the machine (i) writes a symbol (e.g., a digit or a letter from a finite alphabet) in the cell (some models allowing symbol erasure or no writing), [8] then (ii) either moves the tape one cell left or right (some models allow no motion, some models move the head), [9] then (iii) (as determined by the observed symbol and the machine's place in the table) either proceeds to a subsequent instruction or halts the computation.[10] The Turing machine was invented in 1936 by Alan Turing,[11][12] who called it an a-machine (automatic machine).[13] With this model, Turing was able to answer two questions in the negative: (1) Does a machine exist that can determine whether any arbitrary machine on its tape is "circular" (e.g., freezes, or fails to continue its computational task); similarly, (2) does a machine exist that can determine whether any arbitrary machine on its tape ever prints a given symbol.[14] Thus by providing a mathematical description of a very simple device capable of arbitrary computations, he was able to prove properties of computation in general—and in particular, the uncomputability of the Entscheidungsproblem ("decision problem").[15] Thus, Turing machines prove fundamental limitations on the power of mechanical computation. [16] While they can express arbitrary computations, their minimalistic design makes them unsuitable for computation in practice: real-world computers are based on different designs that, unlike Turing machines, use random-access memory. Turing completeness is the ability for a system of instructions to simulate a Turing machine. A programming language that is Turing complete is theoretically capable of expressing all tasks accomplishable by computers; nearly all programming languages are Turing complete if the limitations of finite memory are ignored.

A Turing machine is a general example of a U that controls all data manipulation done by a computer, with the canonical machine using sequential memory to store data. More specifically, it is a machine (automaton) capable of enumerating some arbitrary subset of valid strings of an alphabet; these strings are part of a recursively enumerable set. A Turing machine has a tape of infinite length that enables read and write operations to be performed. Assuming a black box, the Turing machine cannot know whether it will eventually enumerate any one specific string of the subset with a given program. This is due to the fact that the halting problem is unsolvable, which has major implications for the theoretical limits of computing. The Turing machine is capable of processing an unrestricted grammar, which further implies that it is capable of robustly evaluating first-order logic in an infinite number of ways. This is famously demonstrated through lambda calculus. A Turing machine that is able to simulate any other Turing machine is called a universal Turing machine (UTM, or simply a universal machine). A more mathematically oriented definition with a similar "universal" nature was introduced by Alonzo Church, whose work on lambda calculus intertwined with Turing's in a formal theory of computation known as the Church–Turing thesis. The thesis states that Turing machines indeed capture the informal notion of effective methods in logic and mathematics, and provide a precise definition of an algorithm or "mechanical procedure". Studying their abstract properties yields many insights into computer science and complexity theory.

52 of 76

The Turing machine mathematically models a machine that mechanically operates on a tape. On this tape are symbols, which the machine can read and write, one at a time, using a tape head. Operation is fully determined by a finite set of elementary instructions such as "in state 42, if the symbol seen is 0, write a 1; if the symbol seen is 1, change into state 17; in state 17, if the symbol seen is 0, write a 1 and change to state 6;" etc. In the original article ("On Computable Numbers, with an Application to the Entscheidungsproblem", see also references below), Turing imagines not a mechanism, but a person whom he calls the "computer", who executes these deterministic mechanical rules slavishly (or as Turing puts it, "in a desultory manner").

The head is always over a particular square of the tape; only a finite stretch of squares is shown. The instruction to be performed (q4) is shown over the scanned square. (Drawing after Kleene (1952) p. 375.)

Here, the internal state (q1) is shown inside the head, and the illustration describes the tape as being infinite and pre-filled with "0", the symbol serving as blank. The system's full state (its complete configuration) consists of the internal state, any non-blank symbols on the tape (in this illustration "11B"), and the position of the head relative to those symbols including blanks, i.e. "011B". (Drawing after Minsky (1967) p. 121.) More precisely, a Turing machine consists of: A tape divided into cells, one next to the other. Each cell contains a symbol from some finite alphabet. The alphabet contains a special blank symbol (here written as '0') and one or more other symbols. The tape is assumed to be arbitrarily extendable to the left and to the right, i.e., the Turing machine is always supplied with as much tape as it needs for its computation. Cells that have not been written before are assumed to be filled with the blank symbol. In some models the tape has a left end marked with a special symbol; the tape extends or is indefinitely extensible to the right. A head that can read and write symbols on the tape and move the tape left and right one (and only one) cell at a time. In some models the head moves and the tape is stationary. A state that stores the state of the Turing machine, one of finitely many. Among these is the special start state with which the state is initialized. These states, writes Turing, replace the "state of mind" a person performing computations would ordinarily be in. A finite table[19] of instructions[20] that, given the state(qi) the machine is currently in and the symbol(aj) it is reading on the tape (symbol currently under the head), tells the machine to do the following in sequence (for the 5-tuple models): 1. Either erase or write a symbol (replacing aj with aj1). 2. Move the head (which is described by dk and can have values: 'L' for one step left or 'R' for one step right or 'N' for staying in the same place). 3. Assume the same or a new state as prescribed (go to state qi1). In the 4-tuple models, erasing or writing a symbol (aj1) and moving the head left or right (dk) are specified as separate instructions. Specifically, the table tells the machine to (ia) erase or write a symbol or (ib) move the head left or right, and then (ii) assume the same or a new state as prescribed, but not both actions (ia) and (ib) in the same instruction. In some models, if there is no entry in the table for the current combination of symbol and state then the machine will halt; other models require all entries to be filled.

53 of 76

Note that every part of the machine (i.e. its state, symbol-collections, and used tape at any given time) and its actions (such as printing, erasing and tape motion) is finite, discrete and distinguishable; it is the unlimited amount of tape and runtime that gives it an unbounded amount of storage space.

Formal definition Following Hopcroft and Ullman (1979, p. 148), a (one-tape) Turing machine can be formally defined as a 7-tuple where is a finite, non-empty set of states; is a finite, non-empty set of tape alphabet symbols; is the blank symbol (the only symbol allowed to occur on the tape infinitely often at any step during the computation); is the set of input symbols, that is, the set of symbols allowed to appear in the initial tape contents; is the initial state; is the set of final states or accepting states. The initial tape contents is said to be accepted by if it eventually halts in a state from . is a partial function called the transition function, where L is left shift, R is right shift. (A relatively uncommon variant allows "no shift", say N, as a third element of the latter set.) If is not defined on the current state and the current tape symbol, then the machine halts;[21] Anything that operates according to these specifications is a Turing machine. The 7-tuple for the 3-state busy beaver looks like this (see more about this busy beaver at Turing machine examples): (states); (tape alphabet symbols); (blank symbol); (input symbols); (initial state); (final states); see state-table below (transition function). Initially all tape cells are marked with .

Tape symbol 0 1

State table for 3 state, 2 symbol busy beaver Current state A Current state B

Current state C

Write symbol Move tape Next state Write symbol Move tape Next state Write symbol Move tape Next state

1 1

R L

B C

1 1

A B

L R

1 1

L R

B HALT

Turing machine "state" diagrams The table for the 3-state busy beaver ("P" = print/write a "1") Tape symbol

0 1

54 of 76

Current state A Write symbol P P

Move tape R L

Next state B C

Current state B Write symbol P P

Move tape L R

Next state A B

Current state C Write symbol P P

Move tape L R

Next state B HALT

The "3-state busy beaver" Turing machine in a finite state representation. Each circle represents a "state" of the table—an "m-configuration" or "instruction". "Direction" of a state transition is shown by an arrow. The label (e.g. 0/P,R) near the outgoing state (at the "tail" of the arrow) specifies the scanned symbol that causes a particular transition (e.g. 0) followed by a slash /, followed by the subsequent "behaviors" of the machine, e.g. "P Print" then move tape "R Right". No general accepted format exists. The convention shown is after McClusky (1965), Booth (1967), Hill, and Peterson (1974). To the right: the above table as expressed as a "state transition" diagram. Usually large tables are better left as tables (Booth, p. 74). They are more readily simulated by computer in tabular form (Booth, p. 74). However, certain concepts—e.g. machines with "reset" states and machines with repeating patterns (cf. Hill and Peterson p. 244ff)—can be more readily seen when viewed as a drawing. Whether a drawing represents an improvement on its table must be decided by the reader for the particular context. See Finite state machine for more.

The evolution of the busy-beaver's computation starts at the top and proceeds to the bottom. The reader should again be cautioned that such diagrams represent a snapshot of their table frozen in time, not the course ("trajectory") of a computation through time and space. While every time the busy beaver machine "runs" it will always follow the same state-trajectory, this is not true for the "copy" machine that can be provided with variable input "parameters". The diagram "Progress of the computation" shows the three-state busy beaver's "state" (instruction) progress through its computation from start to finish. On the far right is the Turing "complete configuration" (Kleene "situation", Hopcroft–Ullman "instantaneous description") at each step. If the machine were to be stopped and cleared to blank both the "state " and entire tape, these "configurations" could be used to rekindle a computation anywhere in its progress (cf. Turing (1936) The Undecidable, pp. 139–140).

5.1. Linear bounded automaton 55 of 76

A linear bounded automaton is a nondeterministic Turing machine that satisfies the following three conditions: Its input alphabet includes two special symbols, serving as left and right endmarkers. Its transitions may not print other symbols over the endmarkers. Its transitions may neither move to the left of the left endmarker nor to the right of the right endmarker.[1]:225 In other words: instead of having potentially infinite tape on which to compute, computation is restricted to the portion of the tape containing the input plus the two tape squares holding the endmarkers. An alternative, less restrictive definition is as follows: Like a Turing machine, an LBA possesses a tape made up of cells that can contain symbols from a finite alphabet, a head that can read from or write to one cell on the tape at a time and can be moved, and a finite number of states. An LBA differs from a Turing machine in that while the tape is initially considered to have unbounded length, only a finite contiguous portion of the tape, whose length is a linear function of the length of the initial input, can be accessed by the read/write head; hence the name linear bounded automaton. [1]:225

This limitation makes an LBA a somewhat more accurate model of a real-world computer than a Turing machine, whose definition assumes unlimited tape. The strong and the weaker definition lead to the same computational abilities of the respective automaton classes,[1]:225 due to the linear speedup theorem. In 1960, John Myhill introduced an automaton model today known as deterministic linear bounded automaton.[2] In 1963, Peter S. Landweber proved that the languages accepted by deterministic LBAs are context-sensitive.[3] In 1964, S.-Y. Kuroda introduced the more general model of (nondeterministic) linear bounded automata, noted that Landweber's proof also works for nondeterministic linear bounded automata, and showed that the languages accepted by them are precisely the context-sensitive languages. [4][5]

LBA and context-sensitive languages Linear bounded automata are acceptors for the class of context-sensitive languages.[1]:225-226 The only restriction placed on grammars for such languages is that no production maps a string to a shorter string. Thus no derivation of a string in a context-sensitive language can contain a sentential form longer than the string itself. Since there is a one-to-one correspondence between linear-bounded automata and such grammars, no more tape than that occupied by the original string is necessary for the string to be recognized by the automaton.

5.2. Multitape Turing machine A multi-tape Turing machine is like an ordinary Turing machine with several tapes. Each tape has its own head for reading and writing. Initially the input appears on tape 1, and the others start out blank.[1] This model intuitively seems much more powerful than the single-tape model, but any multi-tape machine, no matter how many tapes, can be simulated by a single-tape machine using only quadratically more computation time.[2] Thus, multi-tape machines cannot calculate any more functions than single-tape machines,[3] and none of the robust complexity classes (such as polynomial time) are affected by a change between single-tape and multi-tape machines.

Formal definition A k-tape Turing machine can be described as a 6-tuple is a finite set of states is a finite set of the tape alphabet is the initial state is the blank symbol is the set of final or accepting states

56 of 76

where:

is a partial function called the transition function, where k is the number of tapes, L is left shift, R is right shift and S is no shift.

Two-stack Turing machine Two-stack Turing machines have a read-only input and two storage tapes. If a head moves left on either tape a blank is printed on that tape, but one symbol from a "library" can be printed.

5.3. Multi-track Turing machine

A Multitrack Turing machine is a specific type of Multi-tape Turing machine. In a standard n-tape Turing machine, n heads move independently along n tracks. In a n-track Turing machine, one head reads and writes on all tracks simultaneously. A tape position in a n-track Turing Machine contains n symbols from the tape alphabet.

Formal definition A multitrack Turing machine with where

-tapes can be formally defined as a 6-tuple

,

is a finite set of states is a finite set of symbols called the tape alphabet is the initial state is the set of final or accepting states. is a partial function called the transition function. Sometimes also denoted as

, where

A non-deterministic variant can be defined by replacing the transition function .

.

by a transition relation

Proof of equivalency to standard Turing machine This will prove that a two-track Turing machine is equivalent to a standard Turing machine. This can be generalized to a n-track Turing machine. Let L be a recursively enumerable language. Let M= be standard Turing machine that accepts L. Let M' is a two-track Turing machine. To prove M=M' it must be shown that M M' and M' M

If all but the first track is ignored then M and M' are clearly equivalent.

The tape alphabet of a one-track Turing machine equivalent to a two-track Turing machine consists of an ordered pair. The input symbol a of a Turing machine M' can be identified as an ordered pair [x,y] of Turing machine M. The one-track Turing machine is: M=

with the transition function

This machine also accepts L.

5.4. Non-deterministic Turing machine Deterministic Turing Machine In a deterministic Turing machine (DTM), the set of rules prescribes at most one action to be performed for any given situation.

57 of 76

A deterministic Turing machine has a transition function that, for a given state and symbol under the tape head, specifies three things: the symbol to be written to the tape, the direction (left, right or neither) in which the head should move, and the subsequent state of the finite control. For example, an X on the tape in state 3 might make the DTM write a Y on the tape, move the head one position to the right, and switch to state 5.

Non-Deterministic Turing Machine By contrast, a non-deterministic Turing machine (NTM), the set of rules may prescribe more than one action to be performed for any given situation. For example, an X on the tape in state 3 might allow the NTM to: Write a Y, move right, and switch to state 5 or Write an X, move left, and stay in state 3.

Resolution of multiple rules How does the NTM "know" which of these actions it should take? There are two ways of looking at it. One is to say that the machine is the "luckiest possible guesser"; it always picks a transition that eventually leads to an accepting state, if there is such a transition. The other is to imagine that the machine "branches" into many copies, each of which follows one of the possible transitions. Whereas a DTM has a single "computation path" that it follows, an NTM has a "computation tree". If at least one branch of the tree halts with an "accept" condition, we say that the NTM accepts the input.

Definition A non-deterministic Turing machine can be formally defined as a 6-tuple

, where

is a finite set of states is a finite set of symbols (the tape alphabet) is the initial state is the blank symbol is the set of accepting (final) states is a relation on states and symbols called the transition relation. is the movement to the left, is no movement, and is the movement to the right. The difference with a standard (deterministic) Turing machine is that for those, the transition relation is a function (the transition function). Configurations and the yields relation on configurations, which describes the possible actions of the Turing machine given any possible contents of the tape, are as for standard Turing machines, except that the yields relation is no longer single-valued. The notion of string acceptance is unchanged: a non-deterministic Turing machine accepts a string if, when the machine is started on the configuration in which the tape head is on the first character of the string (if any), and the tape is all blank otherwise, at least one of the machine's possible computations from that configuration puts the machine into a state in . (If the machine is deterministic, the possible computations are the prefixes of a single, possibly infinite, path.)

Computational equivalence with DTMs NTMs can compute the same results as DTMs, that is, they are capable of computing the same values, given the same inputs. The time complexity of these computations varies, however, as is discussed below.

DTM as a special case of NTM NTMs effectively include DTMs as special cases, so it is immediately clear that DTMs are not more powerful. It might seem that NTMs are more powerful than DTMs, since they can allow trees of possible computations arising from the same initial configuration, accepting a string if any one branch in the tree accepts it.

58 of 76

DTM simulation of NTM It is possible to simulate NTMs with DTMs, and in fact this can be done in more than one way. Multiplicity of configuration states One approach is to use a DTM of which the configurations represent multiple configurations of the NTM, and the DTM's operation consists of visiting each of them in turn, executing a single step at each visit, and spawning new configurations whenever the transition relation defines multiple continuations. Multiplicity of tapes Another construction[1] simulates NTMs with 3-tape DTMs, of which the first tape always holds the original input string, the second is used to simulate a particular computation of the NTM, and the third encodes a path in the NTM's computation tree. The 3-tape DTMs are easily simulated with a normal single-tape DTM. In this construction, the resulting DTM effectively performs a breadth-first search of the NTM's computation tree, visiting all possible computations of the NTM in order of increasing length until it finds an accepting one. Therefore, the length of an accepting computation of the DTM is, in general, exponential in the length of the shortest accepting computation of the NTM. This is considered to be a general property of simulations of NTMs by DTMs; the most famous unresolved question in computer science, the P = NP problem, is related to this issue.

6. Recursive language A formal language (a set of finite sequences of symbols taken from a fixed alphabet) is called recursive if it is a recursive subset of the set of all possible finite sequences over the alphabet of the language. Equivalently, a formal language is recursive if there exists a total Turing machine (a Turing machine that halts for every given input) that, when given a finite sequence of symbols as input, accepts it if it belongs to the language and rejects it otherwise. Recursive languages are also called decidable. The concept of decidability may be extended to other models of computation. For example one may speak of languages decidable on a non-deterministic Turing machine. Therefore, whenever an ambiguity is possible, the synonym for "recursive language" used is Turing-decidable language, rather than simply decidable. The class of all recursive languages is often called R, although this name is also used for the class RP. This type of language was not defined in the Chomsky hierarchy of (Chomsky 1959). All recursive languages are also recursively enumerable. All regular, context-free and context-sensitive languages are recursive.

Definitions There are two equivalent major definitions for the concept of a recursive language: 1. A recursive formal language is a recursive subset in the set of all possible words over the alphabet of the language. 2. A recursive language is a formal language for which there exists a Turing machine that, when presented with any finite input string, halts and accepts if the string is in the language, and halts and rejects otherwise. The Turing machine always halts: it is known as a decider and is said to decide the recursive language. By the second definition, any decision problem can be shown to be decidable by exhibiting an algorithm for it that terminates on all inputs. An undecidable problem is a problem that is not decidable.

Examples As noted above, every context-sensitive language is recursive. Thus, a simple example of a recursive language is the set L={abc, aabbcc, aaabbbccc, ...}; more formally, the set

59 of 76

is context-sensitive and therefore recursive. Examples of decidable languages that are not context-sensitive are more difficult to describe. For one such example, some familiarity with mathematical logic is required: Presburger arithmetic is the first-order theory of the natural numbers with addition (but without multiplication). While the set of well-formed formulas in Presburger arithmetic is context-free, every deterministic Turing machine accepting the set of true statements in Presburger arithmetic has a worst-case runtime of at least , for some constant c>0 (Fischer & Rabin 1974). Here, n denotes the length of the given formula. Since every context-sensitive language can be accepted by a linear bounded automaton, and such an automaton can be simulated by a deterministic Turing machine with worst-case running time at most for some constant c[citation needed], the set of valid formulas in Presburger arithmetic is not context-sensitive. On positive side, it is known that there is a deterministic Turing machine running in time at most triply exponential in n that decides the set of true formulas in Presburger arithmetic (Oppen 1978). Thus, this is an example of a language that is decidable but not context-sensitive.

Closure properties Recursive languages are closed under the following operations. That is, if L and P are two recursive languages, then the following languages are recursive as well: The The The The The The The

Kleene star image φ(L) under an e-free homomorphism φ concatenation union intersection complement of set difference

The last property follows from the fact that the set difference can be expressed in of intersection and complement.

6.1. Recursive set A subset S of the natural numbers is called recursive if there exists a total computable function f such that f(x) = 1 if x ∈ S and f(x) = 0 if x ∉ S . In other words, the set S is recursive if and only if the indicator function 1S is computable.

Examples Every finite or cofinite subset of the natural numbers is computable. This includes these special cases: The empty set is computable. The entire set of natural numbers is computable. Each natural number (as defined in standard set theory) is computable; that is, the set of natural numbers less than a given natural number is computable. The set of prime numbers is computable. The set of Gödel numbers of arithmetic proofs described in Kurt Gödel's paper "On formally undecidable propositions of Principia Mathematica and related systems I"; see Gödel's incompleteness theorems.

Properties If A is a recursive set then the complement of A is a recursive set. If A and B are recursive sets then A ∩ B, A ∪ B and the image of A × B under the Cantor pairing function are recursive sets. A set A is a recursive set if and only if A and the complement of A are both recursively enumerable sets. The preimage of a recursive set under a total computable function is a recursive set. The image of a computable set under a total computable bijection is computable. 0

A set is recursive if and only if it is at level Δ1 of the arithmetical hierarchy. A set is recursive if and only if it is either the range of a nondecreasing total computable function or the

60 of 76

empty set. The image of a computable set under a nondecreasing total computable function is computable.

6.2. Decision problem

A decision problem has only two possible outputs (yes or no) on any input. In computability theory and computational complexity theory, a decision problem is a problem that can be posed as a yes-no question of the input values. Decision problems typically appear in mathematical questions of decidability, that is, the question of the existence of an effective method to determine the existence of some object or its hip in a set; some of the most important problems in mathematics are undecidable. For example, the problem "given two numbers x and y, does x evenly divide y?" is a decision problem. The answer can be either 'yes' or 'no', and depends upon the values of x and y. A method for solving a decision problem, given in the form of an algorithm, is called a decision procedure for that problem. A decision procedure for the decision problem "given two numbers x and y, does x evenly divide y?" would give the steps for determining whether x evenly divides y, given x and y. One such algorithm is long division, taught to many school children. If the remainder is zero the answer produced is 'yes', otherwise it is 'no'. A decision problem which can be solved by an algorithm, such as this example, is called decidable. The field of computational complexity categorizes decidable decision problems by how difficult they are to solve. "Difficult", in this sense, is described in of the computational resources needed by the most efficient algorithm for a certain problem. The field of recursion theory, meanwhile, categorizes undecidable decision problems by Turing degree, which is a measure of the noncomputability inherent in any solution.

Definition A decision problem is any arbitrary yes-or-no question on an infinite set of inputs. Because of this, it is traditional to define the decision problem equivalently as: the set of possible inputs together with the set of inputs for which the problem returns yes. These inputs can be natural numbers, but may also be values of some other kind, such as strings over the binary alphabet {0,1} or over some other finite set of symbols. The subset of strings for which the problem returns "yes" is a formal language, and often decision problems are defined in this way as formal languages. Alternatively, using an encoding such as Gödel numberings, any string can be encoded as a natural number, via which a decision problem can be defined as a subset of the natural numbers.

Examples A classic example of a decidable decision problem is the set of prime numbers. It is possible to effectively decide whether a given natural number is prime by testing every possible nontrivial factor. Although much more efficient methods of primality testing are known, the existence of any effective method is enough to establish decidability.

61 of 76

6.3. Undecidable problem In computability theory and computational complexity theory, an undecidable problem is a decision problem for which it is known to be impossible to construct a single algorithm that always leads to a correct yes-or-no answer. The halting problem is an example: there is no algorithm that correctly determines whether arbitrary programs eventually halt when run. A decision problem is any arbitrary yes-or-no question on an infinite set of inputs. Because of this, it is traditional to define the decision problem equivalently as the set of inputs for which the problem returns yes. These inputs can be natural numbers, but also other values of some other kind, such as strings of a formal language. Using some encoding, such as a Gödel numbering, the strings can be encoded as natural numbers. Thus, a decision problem informally phrased in of a formal language is also equivalent to a set of natural numbers. To keep the formal definition simple, it is phrased in of subsets of the natural numbers. Formally, a decision problem is a subset of the natural numbers. The corresponding informal problem is that of deciding whether a given number is in the set. A decision problem A is called decidable or effectively solvable if A is a recursive set. A problem is called partially decidable, semi-decidable, solvable, or provable if A is a recursively enumerable set. This means that there exists an algorithm that halts eventually when the answer is yes but may run for ever if the answer is no. Partially decidable problems and any other problems that are not decidable are called undecidable.

Example: the halting problem in computability theory In computability theory, the halting problem is a decision problem which can be stated as follows: Given the description of an arbitrary program and a finite input, decide whether the program finishes running or will run forever. Alan Turing proved in 1936 that a general algorithm running on a Turing machine that solves the halting problem for all possible program-input pairs necessarily cannot exist. Hence, the halting problem is undecidable for Turing machines.

Relationship with Gödel's incompleteness theorem The concepts raised by Gödel's incompleteness theorems are very similar to those raised by the halting problem, and the proofs are quite similar. In fact, a weaker form of the First Incompleteness Theorem is an easy consequence of the undecidability of the halting problem. This weaker form differs from the standard statement of the incompleteness theorem by asserting that a complete, consistent and sound axiomatization of all statements about natural numbers is unachievable. The "sound" part is the weakening: it means that we require the axiomatic system in question to prove only true statements about natural numbers. It is important to observe that the statement of the standard form of Gödel's First Incompleteness Theorem is completely unconcerned with the question of truth, but only concerns the issue of whether it can be proven. The weaker form of the theorem can be proved from the undecidability of the halting problem as follows. Assume that we have a consistent and complete axiomatization of all true first-order logic statements about natural numbers. Then we can build an algorithm that enumerates all these statements. This means that there is an algorithm N(n) that, given a natural number n, computes a true first-order logic statement about natural numbers such that, for all the true statements, there is at least one n such that N(n) yields that statement. Now suppose we want to decide if the algorithm with representation a halts on input i. We know that this statement can be expressed with a first-order logic statement, say H(a, i). Since the axiomatization is complete it follows that either there is an n such that N(n) = H(a, i) or there is an n' such that N(n') = ¬ H(a, i). So if we iterate over all n until we either find H(a, i) or its negation, we will always halt. This means that this gives us an algorithm to decide the halting problem. Since we know that there cannot be such an algorithm, it follows that the assumption that there is a consistent and complete axiomatization of all true first-order logic statements about natural numbers must be false.

Examples of undecidable problems Undecidable problems can be related to different topics, such as logic, abstract machines or topology. Note that since there are uncountably many undecidable problems, any list, even one of infinite length, is necessarily incomplete.

62 of 76

Examples of undecidable statements There are two distinct senses of the word "undecidable" in contemporary use. The first of these is the sense used in relation to Gödel's theorems, that of a statement being neither provable nor refutable in a specified deductive system. The second sense is used in relation to computability theory and applies not to statements but to decision problems, which are countably infinite sets of questions each requiring a yes or no answer. Such a problem is said to be undecidable if there is no computable function that correctly answers every question in the problem set. The connection between these two is that if a decision problem is undecidable (in the recursion theoretical sense) then there is no consistent, effective formal system which proves for every question A in the problem either "the answer to A is yes" or "the answer to A is no". Because of the two meanings of the word undecidable, the term independent is sometimes used instead of undecidable for the "neither provable nor refutable" sense. The usage of "independent" is also ambiguous, however. It can mean just "not provable", leaving open whether an independent statement might be refuted. Undecidability of a statement in a particular deductive system does not, in and of itself, address the question of whether the truth value of the statement is well-defined, or whether it can be determined by other means. Undecidability only implies that the particular deductive system being considered does not prove the truth or falsity of the statement. Whether there exist so-called "absolutely undecidable" statements, whose truth value can never be known or is ill-specified, is a controversial point among various philosophical schools. One of the first problems suspected to be undecidable, in the second sense of the term, was the word problem for groups, first posed by Max Dehn in 1911, which asks if there is a finitely presented group for which no algorithm exists to determine whether two words are equivalent. This was shown to be the case in 1952. The combined work of Gödel and Paul Cohen has given two concrete examples of undecidable statements (in the first sense of the term): The continuum hypothesis can neither be proved nor refuted in ZFC (the standard axiomatization of set theory), and the axiom of choice can neither be proved nor refuted in ZF (which is all the ZFC axioms except the axiom of choice). These results do not require the incompleteness theorem. Gödel proved in 1940 that neither of these statements could be disproved in ZF or ZFC set theory. In the 1960s, Cohen proved that neither is provable from ZF, and the continuum hypothesis cannot be proven from ZFC. In 1970, Russian mathematician Yuri Matiyasevich showed that Hilbert's Tenth Problem, posed in 1900 as a challenge to the next century of mathematicians, cannot be solved. Hilbert's challenge sought an algorithm which finds all solutions of a Diophantine equation. A Diophantine equation is a more general case of Fermat's Last Theorem; we seek the integer roots of a polynomial in any number of variables with integer coefficients. Since we have only one equation but n variables, infinitely many solutions exist (and are easy to find) in the complex plane; however, the problem becomes impossible if solutions are constrained to integer values only. Matiyasevich showed this problem to be unsolvable by mapping a Diophantine equation to a recursively enumerable set and invoking Gödel's Incompleteness Theorem.[1] In 1936, Alan Turing proved that the halting problem—the question of whether or not a Turing machine halts on a given program—is undecidable, in the second sense of the term. This result was later generalized by Rice's theorem. In 1973, the Whitehead problem in group theory was shown to be undecidable, in the first sense of the term, in standard set theory. In 1977, Paris and Harrington proved that the Paris-Harrington principle, a version of the Ramsey theorem, is undecidable in the axiomatization of arithmetic given by the Peano axioms but can be proven to be true in the larger system of second-order arithmetic. Kruskal's tree theorem, which has applications in computer science, is also undecidable from the Peano axioms but provable in set theory. In fact Kruskal's tree theorem (or its finite form) is undecidable in a much stronger system codifying the principles acceptable on basis of a philosophy of mathematics called predicativism. Goodstein's theorem is a statement about the Ramsey theory of the natural numbers that Kirby and Paris showed is undecidable in Peano arithmetic. Gregory Chaitin produced undecidable statements in algorithmic information theory and proved another incompleteness theorem in that setting. Chaitin's theorem states that for any theory that can represent enough arithmetic, there is an upper bound c such that no specific number can be proven in that theory to have Kolmogorov complexity greater than c. While Gödel's theorem is related to the liar paradox, Chaitin's result is related to Berry's paradox.

63 of 76

In 2007, researchers Kurtz and Simon, building on earlier work by J.H. Conway in the 1970s, proved that a natural generalization of the Collatz problem is undecidable.[2]

6.3.1. Halting problem In computability theory, the halting problem is the problem of determining, from a description of an arbitrary computer program and an input, whether the program will finish running or continue to run forever. Alan Turing proved in 1936 that a general algorithm to solve the halting problem for all possible program-input pairs cannot exist. A key part of the proof was a mathematical definition of a computer and program, which became known as a Turing machine; the halting problem is undecidable over Turing machines. It is one of the first examples of a decision problem. Informally, for any program f that might determine if programs halt, a "pathological" program g called with an input can its own source and its input to f and then specifically do the opposite of what f predicts g will do. No f can exist that handles this case. Jack Copeland (2004) attributes the term halting problem to Martin Davis.[1] The halting problem is a decision problem about properties of computer programs on a fixed Turingcomplete model of computation, i.e., all programs that can be written in some given programming language that is general enough to be equivalent to a Turing machine. The problem is to determine, given a program and an input to the program, whether the program will eventually halt when run with that input. In this abstract framework, there are no resource limitations on the amount of memory or time required for the program's execution; it can take arbitrarily long, and use an arbitrary amount of storage space, before halting. The question is simply whether the given program will ever halt on a particular input. For example, in pseudocode, the program while (true) continue

does not halt; rather, it goes on forever in an infinite loop. On the other hand, the program print "Hello, world!"

does halt. While deciding whether these programs halt is simple, more complex programs prove problematic. One approach to the problem might be to run the program for some number of steps and check if it halts. But if the program does not halt, it is unknown whether the program will eventually halt or run forever. Turing proved no algorithm exists that always correctly decides whether, for a given arbitrary program and input, the program halts when run with that input. The essence of Turing's proof is that any such algorithm can be made to contradict itself and therefore cannot be correct.

Programming consequences Some infinite loops can be quite useful. For instance, event loops are typically coded as infinite loops.[2] However, most subroutines are intended to finish (halt).[3] In particular, in hard real-time computing, programmers attempt to write subroutines that are not only guaranteed to finish (halt), but are also guaranteed to finish before a given deadline.[4] Sometimes these programmers use some general-purpose (Turing-complete) programming language, but attempt to write in a restricted style—such as MISRA C or SPARK—that makes it easy to prove that the resulting subroutines finish before the given deadline.[citation needed] Other times these programmers apply the rule of least power—they deliberately use a computer language that is not quite fully Turing-complete, often a language that guarantees that all subroutines are guaranteed to finish, such as Coq.[citation needed]

Common pitfalls The difficulty in the halting problem lies in the requirement that the decision procedure must work for all programs and inputs. A particular program either halts on a given input or does not halt. Consider one algorithm that always answers "halts" and another that always answers "doesn't halt". For any specific program and input, one of these two algorithms answers correctly, even though nobody may know which

64 of 76

one. Yet neither algorithm solves the halting problem generally. There are programs (interpreters) that simulate the execution of whatever source code they are given. Such programs can demonstrate that a program does halt if this is the case: the interpreter itself will eventually halt its simulation, which shows that the original program halted. However, an interpreter will not halt if its input program does not halt, so this approach cannot solve the halting problem as stated; it does not successfully answer "doesn't halt" for programs that do not halt. The halting problem is theoretically decidable for linear bounded automata (LBAs) or deterministic machines with finite memory. A machine with finite memory has a finite number of states, and thus any deterministic program on it must eventually either halt or repeat a previous state: ...any finite-state machine, if left completely to itself, will fall eventually into a perfectly periodic repetitive pattern. The duration of this repeating pattern cannot exceed the number of internal states of the machine... (italics in original, Minsky 1967, p. 24) Minsky warns us, however, that machines such as computers with e.g., a million small parts, each with two states, will have at least 21,000,000 possible states: This is a 1 followed by about three hundred thousand zeroes ... Even if such a machine were to operate at the frequencies of cosmic rays, the aeons of galactic evolution would be as nothing compared to the time of a journey through such a cycle (Minsky 1967 p. 25): Minsky exhorts the reader to be suspicious—although a machine may be finite, and finite automata "have a number of theoretical limitations": ...the magnitudes involved should lead one to suspect that theorems and arguments based chiefly on the mere finiteness [of] the state diagram may not carry a great deal of significance. (Minsky p. 25) It can also be decided automatically whether a nondeterministic machine with finite memory halts on none, some, or all of the possible sequences of nondeterministic decisions, by enumerating states after each possible decision.

Representation as a set The conventional representation of decision problems is the set of objects possessing the property in question. The halting set K := { (i, x) | program i halts when run on input x} represents the halting problem. This set is recursively enumerable, which means there is a computable function that lists all of the pairs (i, x) it contains (Moore and Mertens 2011, pp. 236–237). However, the complement of this set is not recursively enumerable (Moore and Mertens 2011, pp. 236–237). There are many equivalent formulations of the halting problem; any set whose Turing degree equals that of the halting problem is such a formulation. Examples of such sets include: { i | program i eventually halts when run with input 0 } { i | there is an input x such that program i eventually halts when run with input x }.

Proof concept The proof that the halting problem is not solvable is a proof by contradiction. To illustrate the concept of the proof, suppose that there exists a total computable function halts(f) that returns true if the subroutine f halts (when run with no inputs) and returns false otherwise. Now consider the following subroutine: def g(): if halts(g): loop_forever()

halts(g) must either return true or false, because halts was assumed to be total. If halts(g) returns true, then g will call loop_forever and never halt, which is a contradiction. If halts(g) returns false, then g will halt, because it will not call loop_forever; this is also a contradiction. Overall, halts(g) can not return a truth value that is consistent with whether g halts. Therefore, the initial assumption that halts is a total computable function must be false. The method used in the proof is called diagonalization - g does the opposite of what halts says g should do.

65 of 76

The difference between this sketch and the actual proof is that in the actual proof, the computable function halts does not directly take a subroutine as an argument; instead it takes the source code of a program. The actual proof requires additional work to handle this issue. Moreover, the actual proof avoids the direct use of recursion shown in the definition of g.

Sketch of proof The concept above shows the general method of the proof; this section will present additional details. The overall goal is to show that there is no total computable function that decides whether an arbitrary program i halts on arbitrary input x; that is, the following function h is not computable (Penrose 1990, p. 57–63):

Here program i refers to the i th program in an enumeration of all the programs of a fixed Turingcomplete model of computation. The proof proceeds by directly establishing that no total computable function with two arguments can be the required function h. As in the sketch of the concept, given any total computable binary function f, the following partial function g is also computable by some program e:

f(i,j)

j

The verification that g is computable relies on the following constructs (or their equivalents): computable subprograms (the program that computes f is a subprogram in program e), duplication of values (program e computes the inputs i,i for f from the input i for g), conditional branching (program e selects between two results depending on the value it computes for f(i,i)), not producing a defined result (for example, by looping forever), returning a value of 0.

i 1

2

3

4

5

6

1

1

0

0

1

0

1

2

0

0

0

1

0

0

3

0

1

0

1

0

1

4

1

0

0

1

0

0

5

0

0

0

1

1

1

6

1

1

0

0

1

0

f(i,i)

1

0

0

1

1

0

g(i)

U

0

0

U

U

0

Possible values for a total computable function f arranged in a 2D array. The orange cells are the diagonal. The values of f(i,i) and g(i) are shown at the bottom; U indicates that the function g is undefined for a particular input value.

The following pseudocode illustrates a straightforward way to compute g: procedure compute_g(i): if f(i,i) == 0 then return 0 else loop forever

Because g is partial computable, there must be a program e that computes g, by the assumption that the model of computation is Turing-complete. This program is one of all the programs on which the halting function h is defined. The next step of the proof shows that h(e,e) will not have the same value as f(e,e). It follows from the definition of g that exactly one of the following two cases must hold: f(e,e) = 0 and so g(e) = 0. In this case h(e,e) = 1, because program e halts on input e. f(e,e) ≠ 0 and so g(e) is undefined. In this case h(e,e) = 0, because program e does not halt on input e. In either case, f cannot be the same function as h. Because f was an arbitrary total computable function with two arguments, all such functions must differ from h. This proof is analogous to Cantor's diagonal argument. One may visualize a two-dimensional array with one column and one row for each natural number, as indicated in the table above. The value of f(i,j) is placed at column i, row j. Because f is assumed to be a total computable function, any element of the array can be calculated using f. The construction of the function g can be visualized using the main diagonal of this array. If the array has a 0 at position (i,i), then g(i) is 0. Otherwise, g(i) is undefined. The contradiction comes from the fact that there is some column e of the array corresponding to g itself. Now assume f was the halting function h, if g(e) is defined (g(e) = 0 in this case), g(e) halts so f(e,e) = 1. But g(e) = 0 only when f(e,e) = 0, contradicting f(e,e) = 1. Similarly, if g(e) is not defined, then halting function f(e,e) = 0, which leads to g(e) = 0 under g's construction. This contradicts the assumption of g(e) not being defined.

66 of 76

In both cases contradiction arises. Therefore any arbitrary computable function f cannot be the halting function h.

Computability theory The typical method of proving a problem to be undecidable is with the technique of reduction [clarification needed]. To do this, it is sufficient to show that if a solution to the new problem were found, it could be used to decide an undecidable problem by transforming instances of the undecidable problem into instances of the new problem. Since we already know that no method can decide the old problem, no method can decide the new problem either. Often the new problem is reduced to solving the halting problem. (Note: the same technique is used to demonstrate that a problem is NP complete, only in this case, rather than demonstrating that there is no solution, it demonstrates there is no polynomial time solution, assuming P ≠ NP). For example, one such consequence of the halting problem's undecidability is that there cannot be a general algorithm that decides whether a given statement about natural numbers is true or not. The reason for this is that the proposition stating that a certain program will halt given a certain input can be converted into an equivalent statement about natural numbers. If we had an algorithm that could find the truth value of every statement about natural numbers, it could certainly find the truth value of this one; but that would determine whether the original program halts, which is impossible, since the halting problem is undecidable. Gregory Chaitin has defined a halting probability, represented by the symbol Ω, a type of real number that informally is said to represent the probability that a randomly produced program halts. These numbers have the same Turing degree as the halting problem. It is a normal and transcendental number which can be defined but cannot be completely computed. This means one can prove that there is no algorithm which produces the digits of Ω, although its first few digits can be calculated in simple cases. While Turing's proof shows that there can be no general method or algorithm to determine whether algorithms halt, individual instances of that problem may very well be susceptible to attack. Given a specific algorithm, one can often show that it must halt for any input, and in fact computer scientists often do just that as part of a correctness proof. But each proof has to be developed specifically for the algorithm at hand; there is no mechanical, general way to determine whether algorithms on a Turing machine halt. However, there are some heuristics that can be used in an automated fashion to attempt to construct a proof, which succeed frequently on typical programs. This field of research is known as automated termination analysis. Since the negative answer to the halting problem shows that there are problems that cannot be solved by a Turing machine, the Church–Turing thesis limits what can be accomplished by any machine that implements effective methods. However, not all machines conceivable to human imagination are subject to the Church–Turing thesis (e.g. oracle machines). It is an open question whether there can be actual deterministic physical processes that, in the long run, elude simulation by a Turing machine, and in particular whether any such hypothetical process could usefully be harnessed in the form of a calculating machine (a hypercomputer) that could solve the halting problem for a Turing machine amongst other things. It is also an open question whether any such unknown physical processes are involved in the working of the human brain, and whether humans can solve the halting problem (Copeland 2004, p. 15).

Gödel's incompleteness theorems The concepts raised by Gödel's incompleteness theorems are very similar to those raised by the halting problem, and the proofs are quite similar. In fact, a weaker form of the First Incompleteness Theorem is an easy consequence of the undecidability of the halting problem. This weaker form differs from the standard statement of the incompleteness theorem by asserting that a complete, consistent and sound axiomatization of all statements about natural numbers is unachievable. The "sound" part is the weakening: it means that we require the axiomatic system in question to prove only true statements about natural numbers. The more general statement of the incompleteness theorems does not require a soundness assumption of this kind. The weaker form of the theorem can be proven from the undecidability of the halting problem as follows. Assume that we have a consistent and complete axiomatization of all true first-order logic statements about natural numbers. Then we can build an algorithm that enumerates all these statements i.e. an algorithm N that, given a natural number n, computes a true first-order logic statement about natural numbers, such that for all the true statements there is at least one n such that N(n) is equal to that statement. Now suppose we want to decide whether the algorithm with representation a halts on input i. By using Kleene's T predicate, we can express the statement "a halts on input i" as a statement H(a, i) in the language of arithmetic. Since the axiomatization is complete it follows that either there is an n such that N(n) = H(a, i) or there is an n' such that N(n') = ¬ H(a, i). So if we iterate over all n until we either find H(a, i) or its negation, we will always halt. This means that this gives us an algorithm to decide the halting problem. Since we know that there cannot be such an algorithm, it follows that the assumption

67 of 76

that there is a consistent and complete axiomatization of all true first-order logic statements about natural numbers must be false.

Generalization Many variants of the halting problem can be found in Computability textbooks (e.g., Sipser 2006, Davis 1958, Minsky 1967, Hopcroft and Ullman 1979, Börger 1989). Typically their undecidability follows by reduction from the standard halting problem. However, some of them have a higher degree of unsolvability. The next two examples are typical.

Halting on all inputs The universal halting problem, also known (in recursion theory) as totality, is the problem of determining, whether a given computer program will halt for every input (the name totality comes from the equivalent question of whether the computed function is total). This problem is not only undecidable, as the halting problem, but highly undecidable. In of the Arithmetical hierarchy, it is -complete (Börger 1989, p. 121). This means, in particular, that it cannot be decided even with an oracle for the halting problem.

Recognizing partial solutions There are many programs that, for some inputs, return a correct answer to the halting problem, while for other inputs they do not return an answer at all. However the problem ″given program p, is it a partial halting solver″ (in the sense described) is at least as hard as the halting problem. To see this, assume that there is an algorithm PHSR (″partial halting solver recognizer″) to do that. Then it can be used to solve the halting problem, as follows: To test whether input program x halts on y, construct a program p that on input (x,y) reports true and diverges on all other inputs. Then test p with PHSR. The above argument is a reduction of the halting problem to PHS recognition, and in the same manner, harder problems such as halting on all inputs can also be reduced, implying that PHS recognition is not only undecidable, but higher in the Arithmetical hierarchy, specifically -complete.

6.4. P (complexity) In computational complexity theory, P, also known as PTIME or DTIME(nO(1)), is a fundamental complexity class. It contains all decision problems that can be solved by a deterministic Turing machine using a polynomial amount of computation time, or polynomial time. Cobham's thesis holds that P is the class of computational problems that are "efficiently solvable" or "tractable". This is inexact: in practice, some problems not known to be in P have practical solutions, and some that are in P do not, but this is a useful rule of thumb. A language L is in P if and only if there exists a deterministic Turing machine M, such that M runs for polynomial time on all inputs For all x in L, M outputs 1 For all x not in L, M outputs 0 P can also be viewed as a uniform family of boolean circuits. A language L is in P if and only if there exists a polynomial-time uniform family of boolean circuits , such that For all , For all x in L,

takes n bits as input and outputs 1 bit

For all x not in L, The circuit definition can be weakened to use only a logspace uniform family without changing the complexity class.

Notable problems in P P is known to contain many natural problems, including the decision versions of linear programming, calculating the greatest common divisor, and finding a maximum matching. In 2002, it was shown that the

68 of 76

problem of determining if a number is prime is in P.[1] The related class of function problems is FP. Several natural problems are complete for P, including st-connectivity (or reachability) on alternating graphs.[2] The article on P-complete problems lists further relevant problems in P.

Relationships to other classes A generalization of P is NP, which is the class of decision problems decidable by a non-deterministic Turing machine that runs in polynomial time. Equivalently, it is the class of decision problems where each "yes" instance has a polynomial size certificate, and certificates can be checked by a polynomial time deterministic Turing machine. The class of problems for which this is true for the "no" instances is called co-NP. P is trivially a subset of NP and of co-NP; most experts believe it is a proper subset, [3] although this (the hypothesis) remains unproven. Another open problem is whether NP = co-NP (a negative answer would imply ). P is also known to be at least as large as L, the class of problems decidable in a logarithmic amount of memory space. A decider using space cannot use more than time, because this is the total number of possible configurations; thus, L is a subset of P. Another important problem is whether L = P. We do know that P = AL, the set of problems solvable in logarithmic memory by alternating Turing machines. P is also known to be no larger than PSPACE, the class of problems decidable in polynomial space. Again, whether P = PSPACE is an open problem. To summarize:

Here, EXPTIME is the class of problems solvable in exponential time. Of all the classes shown above, only two strict containments are known: P is strictly contained in EXPTIME. Consequently, all EXPTIME-hard problems lie outside P, and at least one of the containments to the right of P above is strict (in fact, it is widely believed that all three are strict). L is strictly contained in PSPACE. The most difficult problems in P are P-complete problems. Another generalization of P is P/poly, or Nonuniform Polynomial-Time. If a problem is in P/poly, then it can be solved in deterministic polynomial time provided that an advice string is given that depends only on the length of the input. Unlike for NP, however, the polynomial-time machine doesn't need to detect fraudulent advice strings; it is not a verifier. P/poly is a large class containing nearly all practical problems, including all of BPP. If it contains NP, then the polynomial hierarchy collapses to the second level. On the other hand, it also contains some impractical problems, including some undecidable problems such as the unary version of any undecidable problem. In 1999, Jin-Yi Cai and D. Sivakumar, building on work by Mitsunori Ogihara, showed that if there exists a sparse language that is P-complete, then L = P.[4]

Properties Polynomial-time algorithms are closed under composition. Intuitively, this says that if one writes a function that is polynomial-time assuming that function calls are constant-time, and if those called functions themselves require polynomial time, then the entire algorithm takes polynomial time. One consequence of this is that P is low for itself. This is also one of the main reasons that P is considered to be a machineindependent class; any machine "feature", such as random access, that can be simulated in polynomial time can simply be composed with the main polynomial-time algorithm to reduce it to a polynomial-time algorithm on a more basic machine. Languages in P are also closed under reversal, intersection, union, concatenation, Kleene closure, inverse homomorphism, and complementation.[5]

6.5. R (complexity) In computational complexity theory, R is the class of decision problems solvable by a Turing machine.

Equivalent formulations

69 of 76

R is equal to the set of all total computable functions.

Relationship with other classes Since we can decide any problem for which there exists a recogniser and also a co-recogniser by simply interleaving them until one obtains a result, the class is equal to RE ∩ coRE.

6.6. RP (complexity) In computational complexity theory, randomized polynomial time (RP) is the complexity class of problems for which a probabilistic Turing machine exists with these properties: It always runs in polynomial time in the input size If the correct answer is NO, it always returns NO If the correct answer is YES, then it returns YES with probability at least 1/2 (otherwise, it returns NO). In other words, the algorithm is allowed to flip a truly random coin while it is running. The only case in which the algorithm can return YES is if the actual answer is YES; therefore if the algorithm terminates and produces YES, then the correct answer is definitely YES; however, the algorithm can terminate with NO regardless of the actual answer. That is, if the algorithm returns NO, it might be wrong.

RP algorithm (1 run) Answer produced Correct Yes No answer Yes ≥ 1/2 ≤ 1/2 No 0 1 RP algorithm (n runs) Answer produced Correct Yes No answer Yes ≥ 1 − 2−n ≤ 2−n

No 0 1 If the correct answer is YES and the algorithm is run n times co-RP algorithm (1 run) with the result of each run statistically independent of the Answer produced others, then it will return YES at least once with probability at Correct Yes No −n least 1 − 2 . So if the algorithm is run 100 times, then the answer chance of it giving the wrong answer every time is lower than Yes 1 0 the chance that cosmic rays corrupted the memory of the No ≤ 1/2 ≥ 1/2 computer running the algorithm.[1] In this sense, if a source of random numbers is available, most algorithms in RP are highly practical. The fraction 1/2 in the definition is arbitrary. The set RP will contain exactly the same problems, even if the 1/2 is replaced by any constant nonzero probability less than 1; here constant means independent of the input to the algorithm.

Related complexity classes The definition of RP says that a YES answer is always right and that a NO answer might be wrong (because a question with the YES answer can be sometimes answered NO). In other words, while NO questions are always answered NO, you cannot trust the NO answer, it may be a mistaken answer to a YES question. The complexity class co-RP is similarly defined, except that NO is always right and YES might be wrong. In other words, it accepts all YES instances but can either accept or reject NO instances. The class BPP describes algorithms that can give incorrect answers on both YES and NO instances, and thus contains both RP and co-RP. The intersection of the sets RP and co-RP is called ZPP. Just as RP may be called R, some authors use the name co-R rather than co-RP.

Connection to P and NP P is a subset of RP, which is a subset of NP. Similarly, P is a subset of co-RP which is a subset of co-NP. It is not known whether these inclusions are strict. However, if the commonly believed conjecture P = BPP is true, then RP, co-RP, and P collapse (are all equal). Assuming in addition that P ≠ NP, this then implies that RP is strictly contained in NP. It is not known whether RP = co-RP, or whether RP is a subset of the intersection of NP and co-NP, though this would be implied by P = BPP.

Unsolved problem in computer science: (more unsolved problems in computer science)

A natural example of a problem in co-RP currently not known to be in P is Polynomial Identity Testing, the problem of deciding whether a given multivariate arithmetic expression over the integers is the zero-polynomial. For instance, x·x − y·y − (x + y)·(x − y) is the zero-polynomial while x·x + y·y is not. An alternative characterization of RP that is sometimes easier to use is the set of problems recognizable

70 of 76

by nondeterministic Turing machines where the machine accepts if and only if at least some constant fraction of the computation paths, independent of the input size, accept. NP on the other hand, needs only one accepting path, which could constitute an exponentially small fraction of the paths. This characterization makes the fact that RP is a subset of NP obvious.

6.7. Recursively enumerable set In computability theory, traditionally called recursion theory, a set S of natural numbers is called recursively enumerable, computably enumerable, semidecidable, provable or Turingrecognizable if: There is an algorithm such that the set of input numbers for which the algorithm halts is exactly S. Or, equivalently, There is an algorithm that enumerates the of S. That means that its output is simply a list of the of S: s1, s2, s3, ... . If necessary, this algorithm may run forever. The first condition suggests why the term semidecidable is sometimes used; the second suggests why computably enumerable is used. The abbreviations r.e. and c.e. are often used, even in print, instead of the full phrase. In computational complexity theory, the complexity class containing all recursively enumerable sets is RE. In recursion theory, the lattice of r.e. sets under inclusion is denoted . A set S of natural numbers is called recursively enumerable if there is a partial recursive function whose domain is exactly S, meaning that the function is defined if and only if its input is a member of S.

Equivalent formulations The following are all equivalent properties of a set S of natural numbers: Semidecidability: The set S is recursively enumerable. That is, S is the domain (co-range) of a partial recursive function. There is a partial recursive function f such that:

Enumerability: The set S is the range of a partial recursive function. The set S is the range of a total recursive function or empty. If S is infinite, the function can be chosen to be injective. The set S is the range of a primitive recursive function or empty. Even if S is infinite, repetition of values may be necessary in this case. Diophantine: There is a polynomial p with integer coefficients and variables x, a, b, c, d, e, f, g, h, i ranging over the natural numbers such that There is a polynomial from the integers to the integers such that the set S contains exactly the non-negative numbers in its range. The equivalence of semidecidability and enumerability can be obtained by the technique of dovetailing. The Diophantine characterizations of a recursively enumerable set, while not as straightforward or intuitive as the first definitions, were found by Yuri Matiyasevich as part of the negative solution to Hilbert's Tenth Problem. Diophantine sets predate recursion theory and are therefore historically the first way to describe these sets (although this equivalence was only remarked more than three decades after the introduction of recursively enumerable sets). The number of bound variables in the above definition of the Diophantine set is the best known so far; it might be that a lower number can be used to define all diophantine sets.

Examples Every recursive set is recursively enumerable, but it is not true that every recursively enumerable set is recursive. For recursive sets, the algorithm must also say if an input is not in the set – this is not

71 of 76

required of recursively enumerable sets. A recursively enumerable language is a recursively enumerable subset of a formal language. The set of all provable sentences in an effectively presented axiomatic system is a recursively enumerable set. Matiyasevich's theorem states that every recursively enumerable set is a Diophantine set (the converse is trivially true). The simple sets are recursively enumerable but not recursive. The creative sets are recursively enumerable but not recursive. Any productive set is not recursively Recursive enumeration of the set of all Turing enumerable. machines halting on a fixed input: Simulate all Turing Given a Gödel numbering of the machines (enumerated on vertical axis) step by step computable functions, the set (horizontal axis), using the shown diagonalization (where is the Cantor scheduling. If a machine terminates, print its number. This way, the number of each terminating machine is pairing function and indicates eventually printed. In the example, the algorithm is defined) is recursively enumerable (cf. prints "9, 13, 4, 15, 12, 18, 6, 2, 8, 0, ..." picture for a fixed x). This set encodes the halting problem as it describes the input parameters for which each Turing machine halts. Given a Gödel numbering of the computable functions, the set is recursively enumerable. This set encodes the problem of deciding a function value. Given a partial function f from the natural numbers into the natural numbers, f is a partial recursive function if and only if the graph of f, that is, the set of all pairs such that f(x) is defined, is recursively enumerable.

Properties If A and B are recursively enumerable sets then A ∩ B, A ∪ B and A × B (with the ordered pair of natural numbers mapped to a single natural number with the Cantor pairing function) are recursively enumerable sets. The preimage of a recursively enumerable set under a partial recursive function is a recursively enumerable set. A set is recursively enumerable if and only if it is at level A set

of the arithmetical hierarchy.

is called co-recursively enumerable or co-r.e. if its complement

enumerable. Equivalently, a set is co-r.e. if and only if it is at level

is recursively

of the arithmetical hierarchy.

A set A is recursive (synonym: computable) if and only if both A and the complement of A are recursively enumerable. A set is recursive if and only if it is either the range of an increasing total recursive function or finite. Some pairs of recursively enumerable sets are effectively separable and some are not.

6.8. Rice's theorem In computability theory, Rice's theorem states that all non-trivial, semantic properties of programs are undecidable. A semantic property is one about the program's behavior (for instance, does the program terminate for all inputs), unlike a syntactic property (for instance, does the program contain an if-then-else statement). A property is non-trivial if it is neither true for every computable function, nor for no computable function. Rice's theorem can also be put in of functions: for any non-trivial property of partial functions, no general and effective method can decide whether an algorithm computes a partial function with that property. Here, a property of partial functions is called trivial if it holds for all partial computable functions or for none, and an effective decision method is called general if it decides correctly for every algorithm. The theorem is named after Henry Gordon Rice, who proved it in his doctoral dissertation of 1951 at Syracuse University. It is also known as the Rice–Myhill–Shapiro theorem after Rice, John Myhill, and Norman Shapiro. Another way of stating Rice's theorem that is more useful in computability theory follows.

72 of 76

Let S be a set of languages that is nontrivial, meaning 1. there exists a Turing machine that recognizes a language in S 2. there exists a Turing machine that recognizes a language not in S Then, it is undecidable to determine whether the language recognized by an arbitrary Turing machine lies in S. In practice, this means that there is no machine that can always decide whether the language of a given Turing machine has a particular nontrivial property. Special cases include the undecidability of whether a Turing machine accepts a particular string, whether a Turing machine recognizes a particular recognizable language, and whether the language recognized by a Turing machine could be recognized by a nontrivial simpler machine, such as a finite automaton. It is important to note that Rice's theorem does not say anything about those properties of machines or programs that are not also properties of functions and languages. For example, whether a machine runs for more than 100 steps on some input is a decidable property, even though it is non-trivial. Implementing exactly the same language, two different machines might require a different number of steps to recognize the same input. Similarly, whether a machine has more than 5 states is a decidable property of the machine, as the number of states can simply be counted. Where a property is of the kind that either of the two machines may or may not have it, while still implementing exactly the same language, the property is of the machines and not of the language, and Rice's Theorem does not apply. Using Rogers' characterization of acceptable programming systems, Rice's Theorem may essentially be generalized from Turing machines to most computer programming languages: there exists no automatic method that decides with generality non-trivial questions on the behavior of computer programs. As an example, consider the following variant of the halting problem. Let P be the following property of partial functions F of one argument: P(F) means that F is defined for the argument '1'. It is obviously non-trivial, since there are partial functions that are defined at 1, and others that are undefined at 1. The 1-halting problem is the problem of deciding of any algorithm whether it defines a function with this property, i.e., whether the algorithm halts on input 1. By Rice's theorem, the 1-halting problem is undecidable. Similarly the question of whether a Turing machine T terminates on an initially empty tape (rather than with an initial word w given as second argument in addition to a description of T, as in the full halting problem) is still undecidable.

Formal statement Let

be an issible numbering of the computable functions; a map from the natural

numbers to the class computable function.

of unary (partial) computable functions. Denote by

the eth (partial)

We identify each property that a computable function may have with the subset of functions with that property. Thus given a set only if . For each property given e, whether .

or

consisting of the has property

there is an associated decision problem

Rice's theorem states that the decision problem and only if

, a computable function

if and

of determining,

is decidable (also called recursive or computable) if

.

Examples According to Rice's theorem, if there is at least one computable function in a particular class C of computable functions and another computable function not in C then the problem of deciding whether a particular program computes a function in C is undecidable. For example, Rice's theorem shows that each of the following sets of computable functions is undecidable: The The The The The The

class class class class class class

of of of of of of

computable functions that return 0 for every input, and its complement. computable functions that return 0 for at least one input, and its complement. computable functions that are constant, and its complement. indices for computable functions that are total [1] indices for recursively enumerable sets that are cofinite indices for recursively enumerable sets that are recursive

Proof by Kleene's recursion theorem

73 of 76

A corollary to Kleene's recursion theorem states that for every Gödel numbering of the computable functions and every computable function , there is an index such that returns . (In the following, we say that "returns" if either , or both and are undefined.) Intuitively, is a quine, a function that returns its own source code (Gödel number), except that rather than returning it directly, es its Gödel number to and returns the result. Let

be a set of computable functions such that . Then there are computable functions and . Suppose that the set of indices such that is decidable; then, there exists a function that returns if , and otherwise. By the corollary to the recursion theorem, there is an index such that returns . But then, if , then is the same function as , and therefore ; and if , then is , and therefore . In both cases, we have a contradiction.

Proof by reduction to the halting problem Proof sketch Suppose, for concreteness, that we have an algorithm for examining a program p and determining infallibly whether p is an implementation of the squaring function, which takes an integer d and returns d2. The proof works just as well if we have an algorithm for deciding any other nontrivial property of programs, and is given in general below. The claim is that we can convert our algorithm for identifying squaring programs into one that identifies functions that halt. We will describe an algorithm that takes inputs a and i and determines whether program a halts when given input i. The algorithm for deciding this is conceptually simple: it constructs (the description of) a new program t taking an argument n, which (1) first executes program a on input i (both a and i being hard-coded into the definition of t), and (2) then returns the square of n. If a(i) runs forever, then t never gets to step (2), regardless of n. Then clearly, t is a function for computing squares if and only if step (1) terminates. Since we've assumed that we can infallibly identify programs for computing squares, we can determine whether t, which depends on a and i, is such a program, and that for every a and i; thus we have obtained a program that decides whether program a halts on input i. Note that our halting-decision algorithm never executes t, but only es its description to the squaring-identification program, which by assumption always terminates; since the construction of the description of t can also be done in a way that always terminates, the halting-decision cannot fail to halt either. halts (a,i) { define t(n) { a(i) return n×n } return is_a_squaring_function(t) }

This method doesn't depend specifically on being able to recognize functions that compute squares; as long as some program can do what we're trying to recognize, we can add a call to a to obtain our t. We could have had a method for recognizing programs for computing square roots, or programs for computing the monthly payroll, or programs that halt when given the input "Abraxas"; in each case, we would be able to solve the halting problem similarly.

Formal proof[edit]

74 of 76

If we have an algorithm that decides a non-trivial property, we can construct a Turing machine that decides the halting problem. For the formal proof, algorithms are presumed to define partial functions over strings and are themselves represented by strings. The partial function computed by the algorithm represented by a string a is denoted Fa. This proof proceeds by reductio ad absurdum: we assume that there is a non-trivial property that is decided by an algorithm, and then show that it follows that we can decide the halting problem, which is not possible, and therefore a contradiction. Let us now assume that P(a) is an algorithm that decides some non-trivial property of Fa. Without loss of generality we may assume that P(no-halt) = "no", with no-halt being the representation of an algorithm that never halts. If this is not true, then this holds for the negation of the property. Since P decides a non-trivial property, it follows that there is a string b that represents an algorithm and P(b) = "yes". We can then define an algorithm H(a, i) as follows: 1. construct a string t that represents an algorithm T(j) such that T first simulates the computation of Fa(i) then T simulates the computation of Fb(j) and returns its result. 2. return P(t). We can now show that H decides the halting problem: Assume that the algorithm represented by a halts on input i. In this case Ft = Fb and, because P(b) = "yes" and the output of P(x) depends only on Fx, it follows that P(t) = "yes" and, therefore H(a, i) = "yes". Assume that the algorithm represented by a does not halt on input i. In this case Ft = Fno-halt, i.e., the partial function that is never defined. Since P(no-halt) = "no" and the output of P(x) depends only on Fx, it follows that P(t) = "no" and, therefore H(a, i) = "no". Since the halting problem is known to be undecidable, this is a contradiction and the assumption that there is an algorithm P(a) that decides a non-trivial property for the function represented by a must be false.

Rice's theorem and index sets Rice's theorem can be succinctly stated in of index sets: Let be a class of partial recursive functions with index set only if or . where

. Then

is recursive if and

is the set of natural numbers, including zero.

An analogue of Rice's theorem for recursive sets One can regard Rice's theorem as asserting the impossibility of effectively deciding for any recursively enumerable set whether it has a certain nontrivial property.[2] In this section, we give an analogue of Rice's theorem for recursive sets, instead of recursively enumerable sets.[3] Roughly speaking, the analogue says that if one can effectively determine for every recursive set whether it has a certain property, then only finitely many integers determine whether a recursive set has the property. This result is analogous to the original theorem of Rice, because both results assert that a property is "decidable" only if one can determine whether a set has that property by examining for at most finitely many (for no , for the original theorem), if belongs to the set. Let be a class (called a simple game and thought of as a property) of recursive sets. If is a recursive set, then for some , computable function is the characteristic function of . We call a characteristic index for . (There are infinitely many such .) Let's say the class is computable if there is an algorithm (computable function) that decides for any nonnegative integer (not necessarily a characteristic index), if if

75 of 76

is a characteristic index for a recursive set belonging to , then the algorithm gives "yes"; is a characteristic index for a recursive set not belonging to , then the algorithm gives "no".

A set extends a string of 0's and 1's if for every (the length of ), the th element of is 1 if ; and is 0 otherwise. For example, extends the string . A string is winning determining if every recursive set extending belongs to . A string is losing determining if no recursive set extending belongs to . We can now state the following analogue of Rice's theorem (Kreisel, Lacombe, and Shoenfield, 1959,[4] Kumabe and Mihara, 2008[5]): A class of recursive sets is computable if and only if there are a recursively enumerable set of losing determining strings and a recursively enumerable set of winning determining strings such that every recursive set extends a string in . This result has been applied to foundational problems in computational social choice (more broadly, algorithmic game theory). For instance, Kumabe and Mihara (2008,[5] 2008[6]) apply this result to an investigation of the Nakamura numbers for simple games in cooperative game theory and social choice theory.

76 of 76

Introduction To Automata Theory oyg

Overview 26281t

More details 6y5l6z

Related Documents 3h463d

Introduction To Automata Theory oyg

Ullman - Introduction To Automata Theory Languages And Computation 2i6534

Introduction To Ct Theory 1642t

Introduction To Coding Theory 5a505

Automata Theory(tutorialspoint.com) 431t1u

Automata Theory Tutorial 3k306i