Cognitive Science > Statistical Natural Language Processing > 11
Contents |
[edit] Question 1: Joint Probability HMM
What is the joint probability of the state sequence and surface string shown below, according to a standard
HMM?
q0 → q4 → q1 → q3 → qF
q4 → a
q1 → b
q3 → a
[edit] Question 2: Likelihood function
Why are local maxima of the likelihood function Pr(w |M) problematic for unsupervised training (of HMM and PCFG)? What strategy could be used to circumvent such problems?
Since this training method uses the gradient of the likelihood function for termination, the algorithm will terminate whenever the gradient will be zero. In a local maxima this is the case, although there are even further maxima that would yield much better training results.
Solutions for this Problem could be a re-run of the minimalization with a different initial point or a gradient descent method.
[edit] Question 3: Probability distribution
Consider a probability distribution for the formal language L containing all strings of length 4 over the alphabet . How many strings are there in L?
→ I What is the lowest entropy that such a probability distribution can have (in bits)? Give an example of a
distribution with this entropy.
→ I What is the highest entropy that a probability distribution for L can have? Describe the distribution that
achieves this entropy.
possible strings
entropy average information provided by c
lowest entropy <=> one string is most probable others 0 => 0 Bits
highest entropy <=> each string :
example:
=>
since we cannot calculate we will not reach zero
[edit] Question 4: PCFG
How many analyses does the following string have? Draw a parse tree for its most plausible reading (according
to your intuition).
He saw young girls and boys on the hill with the telescope.
Would a plain PCFG be able to disambiguate this sentence (i.e. to identify the reading that you consider most
plausible)?2 If yes, why? If not, what extensions of the PCFG formalism might be needed?
16
No, advanced lexicalization technique would be needed to capture the dependency of "telescope" and "to see" (e.g. multiple lexicalization).
[edit] Question 5: Viterbi
Assume that you have calculated the forward probabilities Alphaj(omega) and backward probabilities Bethak (tau)for a given input string according to a HMM M, using the Viterbi algorithm.
→ What is the expected frequency3 of a state sequence sigma1 → sigma2 → sigma3 in an analysis of w, according to the model M? Note that the sequence may occur at any time step.
→ Give a general equation that calculates the expected frequency f' (sigma1 → sigma2 → sigma3) using the pre-computed forward and backward probabilities.
[edit] Question 6: EM
Forward-backward training (for HMM) and inside-outside training (for PCFG) are instances of a general algorithm for unsupervised training of probability models with hidden variables.
→ What is the name of this algorithm?
EM
→ Give a brief intuitive description how the training procedure works, in your own words.
E-Step & M-Step
E: Compute the expectation for the hidden variable, given an input and the current model
M: Recompute the parameters for a new model, by maximising the likelihood estimation for the new model, given the expectation of the hidden variable and the old model.
[edit] Question 7: PCFG
- sth.
- Pr(NP → boys) =
- Pr(VP → V NP) =
- Pr(S → NP VP) = 1
- Pr(NP → boys) =
- a) boys like dogs
- Pr(boys like dogs) = Pr(S → NP VP)*Pr(VP → V NP)*Pr(NP → boys)*Pr(V → like)*Pr(NP → dogs)
[edit] Question 8: Noisy Channel Model
s = intended word
t = input
language model as source (could be n-grams)
data-sparseness requires smoothing
[edit] Question 9: PCFG & HMM
a) Specify the rules of a PCFG that is equivalent to the bigram Markov model over . If possible, formulate your PCFG in normal form, i.e. avoid
-productions and chain productions. (Hint: Your grammar might use the variables S, A, B and C.)
b) Draw a parse tree for the string according to your grammar.
c) Calculate the probability and compare the resulting equation to the bigram probability shown above. Align corresponding terms in the two equations in order to demonstrate that the MM and the PCFG are equivalent.
d) Construct a PCFG that is equivalent to the trigram Markov model. Specify the grammar rules of this PCFG and draw a parse tree for the string