Cognitive Science > Statistical Natural Language Processing > 11

Jump to: navigation, search

Contents

[edit] Question 1: Joint Probability HMM

What is the joint probability of the state sequence and surface string shown below, according to a standard HMM?

q0 → q4 → q1 → q3 → qF
q4 → a
q1 → b
q3 → a


LaTex: Pr(w|M) = \sum_s \prod_{i=1}^{n+1} Pr(S_{i-1} \rightarrow S_{i}) \cdot Pr(w_i|s_i)
LaTex: Pr(aba|M) = Pr(q_{0} \rightarrow q_{4}) \cdot Pr(a|q_4) + Pr(q_{4} \rightarrow q_{1}) \cdot Pr(b|q_1) + Pr(q_{1} \rightarrow q_{3}) \cdot Pr(a|q_3) + Pr(q_{3} \rightarrow q_{f})

[edit] Question 2: Likelihood function

Why are local maxima of the likelihood function Pr(w |M) problematic for unsupervised training (of HMM and PCFG)? What strategy could be used to circumvent such problems?

Since this training method uses the gradient of the likelihood function for termination, the algorithm will terminate whenever the gradient will be zero. In a local maxima this is the case, although there are even further maxima that would yield much better training results.

Solutions for this Problem could be a re-run of the minimalization with a different initial point or a gradient descent method.

[edit] Question 3: Probability distribution

Consider a probability distribution for the formal language L containing all strings of length 4 over the alphabet LaTex: \Sigma = \{a,b\}. How many strings are there in L?
→ I What is the lowest entropy that such a probability distribution can have (in bits)? Give an example of a distribution with this entropy.
→ I What is the highest entropy that a probability distribution for L can have? Describe the distribution that achieves this entropy.

LaTex:  2^4 = 16 possible strings

entropy LaTex: H[C]= - \sum_\gamma Pr(c=\gamma) \cdot log_2 Pr(c=\gamma) average information provided by c

lowest entropy <=> one string is most probable others LaTex: \approx0 => 0 Bits

highest entropy <=> each string : LaTex: Pr(w)= \frac{1}{16}
LaTex: - 16 (\frac{1}{16} \cdot log_2(\frac{1}{16}))= 4 bits
example: LaTex:  Pr(aaaa) = 0.85  Pr(w)= 0.01| w \in L \setminus\{aaaa\}
=> LaTex:  H[C] \rightarrow 0 for Pr(aaaa) \rightarrow 1 \wedge Pr(w) \rightarrow 0 | w \in L \setminus\{aaaa\}

since we cannot calculate LaTex: log_2 0 we will not reach zero

[edit] Question 4: PCFG

How many analyses does the following string have? Draw a parse tree for its most plausible reading (according to your intuition).

He saw young girls and boys on the hill with the telescope.

Would a plain PCFG be able to disambiguate this sentence (i.e. to identify the reading that you consider most plausible)?2 If yes, why? If not, what extensions of the PCFG formalism might be needed?

16

No, advanced lexicalization technique would be needed to capture the dependency of "telescope" and "to see" (e.g. multiple lexicalization).

[edit] Question 5: Viterbi

Assume that you have calculated the forward probabilities Alphaj(omega) and backward probabilities Bethak (tau)for a given input string according to a HMM M, using the Viterbi algorithm.

→ What is the expected frequency3 of a state sequence sigma1 → sigma2 → sigma3 in an analysis of w, according to the model M? Note that the sequence may occur at any time step.

→ Give a general equation that calculates the expected frequency f' (sigma1 → sigma2 → sigma3) using the pre-computed forward and backward probabilities.

[edit] Question 6: EM

Forward-backward training (for HMM) and inside-outside training (for PCFG) are instances of a general algorithm for unsupervised training of probability models with hidden variables.

→ What is the name of this algorithm?
EM

→ Give a brief intuitive description how the training procedure works, in your own words.
E-Step & M-Step
E: Compute the expectation for the hidden variable, given an input and the current model
M: Recompute the parameters for a new model, by maximising the likelihood estimation for the new model, given the expectation of the hidden variable and the old model.

[edit] Question 7: PCFG

  • sth.
    • Pr(NP → boys) = LaTex: \frac {17}{45}
    • Pr(VP → V NP) = LaTex: \frac {3}{4}
    • Pr(S → NP VP) = 1
  • a) boys like dogs
    • Pr(boys like dogs) = Pr(S → NP VP)*Pr(VP → V NP)*Pr(NP → boys)*Pr(V → like)*Pr(NP → dogs)
    • LaTex: 1*\frac{3}{4}*\frac {17}{45}*0*\frac{1}{9} = 0

[edit] Question 8: Noisy Channel Model

s = intended word
t = input

language model as source (could be n-grams)
data-sparseness requires smoothing

[edit] Question 9: PCFG & HMM

a) Specify the rules of a PCFG that is equivalent to the bigram Markov model over LaTex:  \Sigma \{a, b, c\} . If possible, formulate your PCFG in normal form, i.e. avoid LaTex: \epsilon-productions and chain productions. (Hint: Your grammar might use the variables S, A, B and C.)

LaTex: S \rightarrow aA | bB | cC
LaTex: A \rightarrow aA | bB | cC | \bullet
LaTex: B \rightarrow aA | bB | cC | \bullet
LaTex: C \rightarrow aA | bB | cC | \bullet

b) Draw a parse tree for the string LaTex:  w = cbba \bullet according to your grammar.

c) Calculate the probability LaTex: Pr(w) = Pr(cbba\bullet) and compare the resulting equation to the bigram probability shown above. Align corresponding terms in the two equations in order to demonstrate that the MM and the PCFG are equivalent.

LaTex: Pr(cbba\bullet) = Pr(S \rightarrow cC) \cdot Pr(C \rightarrow bB) \cdot Pr(B \rightarrow bB) \cdot Pr(A \rightarrow aA) \cdot Pr(A \rightarrow \bullet)

LaTex: Pr(cbba\bullet) = Pr(c | \bullet) \cdot Pr(b | c) \cdot Pr(b | b) \cdot Pr(a | b) \cdot Pr(\bullet | a)

d) Construct a PCFG that is equivalent to the trigram Markov model. Specify the grammar rules of this PCFG and draw a parse tree for the string LaTex: w = cbba\bullet

LaTex: S \rightarrow a A_\bullet | b B_\bullet_ | c C_\bullet_
LaTex: A_\bullet \rightarrow aA_A | bB_A | cC_A | \bullet
LaTex: A_A \rightarrow aA_A | bB_A | cC_A | \bullet
LaTex: A_B \rightarrow aA_A | bB_A | cC_A | \bullet
LaTex: A_C \rightarrow aA_A | bB_A | cC_A | \bullet
LaTex: B_\bullet \rightarrow aA_B | bB_B | cC_B | \bullet
LaTex: B_A \rightarrow aA_B | bB_B | cC_B | \bullet
LaTex: B_B \rightarrow aA_B | bB_B | cC_B | \bullet
LaTex: B_C \rightarrow aA_B | bB_B | cC_B | \bullet
LaTex: C_\bullet \rightarrow aA_C | bB_C | cC_C | \bullet
LaTex: C_A \rightarrow aA_C | bB_C | cC_C | \bullet
LaTex: C_B \rightarrow aA_C | bB_C | cC_C | \bullet
LaTex: C_C \rightarrow aA_C | bB_C | cC_C | \bullet