<- previous    index    next ->

Lecture 14, CFG Derivation Trees


  More theory, Context Free Grammar, see below what was a
  state in a DFA is a variable, in a grammer, no states.
  
  Given a grammar with the usual representation  G = (V, T, P, S)
  with variables V, terminal symbols T, set of productions P and
  the start symbol from V called S.

  Productions are variable, one or more variable, often | used
  for more productions, rather than on seperate lines.
    S -> aab
    T -> aac | aad | epsilon
  
  A derivation tree is constructed with
  1) each tree vertex is a variable or terminal or epsilon
  2) the root vertex is S
  3) interior vertices are from V, leaf vertices are from T or epsilon
  4) an interior vertex A has children, in order, left to right,
     X1, X2, ... , Xk when there is a production in P of the
     form  A -> X1 X2 ... Xk
  5) a leaf can be epsilon only when there is a production A -> epsilon
     and the leafs parent can have only this child.

  Watch out! A grammar may have an unbounded number of derivation trees.
  It just depends on which production is expanded at each vertex.

  For any valid derivation tree, reading the leafs from left to right
  gives one string in the language defined by the grammar.
  There may be many derivation trees for a single string in the language.

  If the grammar is a CFG then a leftmost derivation tree exists
  for every string in the corresponding CFL. There may be more than one
  leftmost derivation trees for some string. See example below and
  ((()())()) example in previous lecture.

  If the grammar is a CFG then a rightmost derivation tree exists
  for every string in the corresponding CFL. There may be more than one
  rightmost derivation tree for some string.

  The grammar is called "ambiguous" if the leftmost (rightmost) derivation
  tree is not unique for every string in the language defined by the grammar.

  The leftmost and rightmost derivations are usually distinct but might
  be the same.

  Given a grammar and a string in the language represented by the grammar,
  a leftmost derivation tree is constructed bottom up by finding a
  production in the grammar that has the leftmost character of the string
  (possibly more than one may have to be tried) and building the tree
  towards the root. Then work on the second character of the string.
  After much trial and error, you should get a derivation tree with a root S.
  We will get to the CYK algorithm that does the parsing in a few lectures.

  Examples: Construct a grammar for L = { x 0^n y 1^n z   n>0 }
            Recognize that  0^n y 1^n is a base language, say B
            B -> y | 0B1    (The base y, the recursion 0B1 )
            Then, the language is completed  S -> xBz
            using the prefix, base language and suffix.
            (Note that x, y and z could be any strings not involving n)

            G = ( V, T, P, S ) where
            V = { B, S }     T = { x, y, z, 0, 1 }  S = S
            P =    S -> xBz
                   B -> y | 0B1
                                               *
  Now construct an arbitrary derivation for  S =>  x00y11z
                                               G

  A derivation always starts with the start variable, S.
  The "=>", "*" and "G" stand for "derivation", "any number of
  steps", and "over the grammar G" respectively.

  The intermediate terms, called sentential form, may contain
  variable and terminal symbols.
  
  Any variable, say B, can be replaced by the right side of any
  production of the form  B -> <right side>

  A leftmost derivation always replaces the leftmost variable
  in the sentential form. (In general there are many possible
  replacements, the process is nondeterministic.)

  One possible derivation using the grammar above is
       S => xBz => x0B1z => x00B11z => x00y11z
  The derivation must obviously stop when the sentential form
  has only terminal symbols. (No more substitutions possible.)
  The final string is in the language of the grammar. But, this
  is a very poor way to generate all strings in the grammar!

  A "derivation tree" sometimes called a "parse tree" uses the
  rules above: start with the starting symbol, expand the tree
  by creating branches using any right side of a starting
  symbol rule, etc.

                               S
                             / | \
                           /   |   \
                         /     |     \
                       /       |       \
                     /         |         \
                    x          B          z
                             / | \
                           /   |   \
                         /     |     \
                       /       |       \
                      0        B        1
                             / | \
                           /   |   \
                          0    B    1
                               |
                               y

  Derivation ends  x  0   0    y     1   1  z   with all leaves

  terminal symbols, a string in the language generated by the grammar.
 
  More examples of grammars are:
  G(L) for L = { x a^n  y b^k z   k > n > 0 }
       note that there must be more b's than a's  thus
       B -> aybb | aBb | Bb
  G = ( V, T, P, S )  where
  V = { B, S }   T = { a, b, x, y, z }  S = S
  P =   S -> xBz     B -> aybb | aBb | Bb

  Incremental changes for "n > k > 0"     B -> aayb | aBb | aB
  Incremental changes for "n >= k >= 0"   B -> y | aBb | aB

  Independent exponents do not cause a problem when nested
  equivalent to nesting parenthesis.
  G(L) for L = { a^i b^j c^j d^i e^k f^k  i>=0, j>=0, k>=0 }
                   |   |   |   |   |   |
                   |   +---+   |   +---+
                   +-----------+

  G = ( V, T , P, S )
  V = { I, J, K, S }  T = { a, b, c, d, e, f }  S = S
  P =   S -> IK
        I -> J | aId
        J -> epsilon | bJc
        K -> epsilon | eKf

 G(L) for L = { a^i b^j c^k | any unbounded relation such as i=j=k>0, 0<i<k<j }
 the G(L) can not be a context free grammar. Try it.
 This will be intuitively seen in the push down automata and provable
 with the pumping lemma for context free languages.


 What is a leftmost derivation trees for some string?
 It is a process that looks at the string left to right and
 runs the productions backwards.
 Here is an example, time starts at top and moves down.

 Given G = (V, T, P, S)  V={S, E, I} T={a, b, c} S=S   P=
       I -> a | b | c
       E -> I | E+E | E*E
       S -> E                (a subset of grammar from book)

 Given a string   a + b * c
                  I
                  E
                  S            derived but not used
                      I
                      E
                 [E + E]
                    E
                    S          derived but not used
                          I
                          E
                   [E   * E]
                      E 
                      S        done! Have S and no more input.

  Left derivation tree, just turn upside down, delet unused.

                      S
                      |
                      E
                    / | \
                   /  |  \
                  /   |   \
                 E    *    E
               / | \       |
              E  +  E      I
              |     |      |
              I     I      c
              |     |
              a     b

Check: Read leafs left to right, must be initial string, all in T.
       Interior nodes must be variables, all in V.
       Every vertical connection must be tracable to a production.


   <- previous    index    next ->

Other links

Go to top