CS228 Probabilistic Graphical Models Homework 3-Solved
1. (MAP and MPE) Show that marginal MAP assignments do not always match the MPE assignments (Most Probable Explanation). I.e., construct a Bayes net such that the most likely configuration of all variables does not agree with the most likely assignment to a single variable based on its marginal probability (the value that maximizes the marginal probability of a single variable, i.e. after marginalizing out the remaining ones).
2. (Variable Elimination) Suppose we wish to perform exact inference over a chain Markov Random Field given by X1 − X2 − ... − Xn. Assume that each variable Xi has |V al(Xi)| = d.
(a) Derive an O(n3d3) algorithm for computing marginals P(Xi,Xj) over all n2 variable pairs Xi,Xj.
(b) Since we are computing marginals for all variable pairs, we can store computations done for the previous pairs and use them to save time for the computations of the remaining pairs. The key recursive relationship that makes this work is the following equation (for i < j − 1):
P(Xi,Xj) = X P(Xi,Xj−1)P(Xj|Xj−1) (1)
Xj−1 Prove that the equation above holds.
(c) Construct a dynamic programming algorithm that computes marginals over all n2 variable pairs based on the recursive relation in (1), and achieves a running time that is asymptotically faster than O(n3d3). Describe the time complexity of your algorithm in terms of n and d. Note: Make sure to clearly specify how each of the probabilities P you use are computed.
3. (Clique tree calibration)
Suppose that we have a clique tree over a set of factors F with cliques C1,...,CN, which we have calibrated using sum-product message propagation so that we have all messages δi→j.
(a) If we modify a factor in some clique Ci, which message updates do we have to perform to recalibrate the tree?
(b) If we modify a factor in some clique Ci, but we just want the marginal over a single pre-specified variable Xk, which message updates do we have to perform?
4. (Importance Sampling)
Suppose we have a distribution P(X,E) over two sets of variables X and E. Our distribution is represented by a nasty Bayes Net with very dense connectivity, and our sets of variables X and E are spread arbitrarily throughout the network. In this problem our goal is to use the sampling methods we learned in class to estimate the posterior probability P(X = x | E = e). More specifically, we will use a tree-structured Bayes Net as the proposal distribution for use in the importance sampling algorithm.
(a) For a particular value of x and e, can we compute P(x | e) exactly, in a tractable way? Can we sample directly from the distribution P(X | e)? Can we compute P0(x | e) = P(x,e) exactly, in a tractable way? For each question, provide a Yes/No answer and a single sentence explanation or description.
(b) Now, suppose your friendly TAs have given you a tree network (each variable besides the root has exactly one parent) that defines a distribution Q. They tell you that Q(X,E) is “close” to the distribution P(X,E) of the nasty network. You now want to use the posterior in Q as your proposal distribution for importance sampling. You now must perform the two steps of importance sampling:
i. Show how to sample from the posterior in Q. More specifically, describe an algorithm for drawing samples x[m] exactly from Q(X | E = e) using only tractable techniques. You answer should be at the level of pseudocode, and should indicate the specific distribution that each variable will be sampled from, and an explanation of how you computed that distribution.
ii. Now you must reweight the samples according to the rules of importance sampling. You wantyour weighted samples to accurately represent the actual posterior in the original network P(X | E = e). Show precisely how you determine the weights w[m] for the samples.
iii. Show the form of the final estimator Pˆ(X = x | E = e) for P(X = x | E = e), in terms of the samples from part i, and the weights from part ii.
] Programming Assignment [1]
In this programming assignment, you will design algorithms for reliable communication in the presence of noise. We will learn state-of-the-art techniques that some NASA missions use to communicate from deep space, and in the process help The Martian return home. [2]
The Mars Rover is trying to communicate with mission headquarters on Earth. The message the Mars Rover wants to transmit is a binary sequence X ∈ BN of N bits, representing an image. Here B = {0,1}.
To deal with transmission noise, the Mars Rover appends N redundant bits to each message to allow for error correction (using an error-correcting code). The redundant bits are chosen using a clever scheme: the message is encoded as Y = GX through a special matrix G ∈ B2N×N, a generator matrix [3] that you can treat as a ”constant” for the purposes of this assignment. G is chosen such that the first N bits of Y are equal to X, while the remaining N bits are redundant “parity checks” (here GX is computed modulo 2, so that Y ∈ B2N). The encoded message Y ∈ B2N obtained this way is special because it is a codeword: the Mars Rover and mission headquarters have agreed that all valid messages or codewords are such that HY = 0 where H is a pre-specified binary parity check matrix H ∈ RN×2N. The matrices G and H are paired, and chosen so that multiplying by G creates valid messages (codewords), and H can be used to check for errors in a received message (on Earth). Mathematically, HG = 0, so that HY = HGX = 0 for any input message X.
This codeword Y ∈ B2N is then transmitted through deep space back to the mission control on Earth who then receives it as the noisy Y .e The decoding process refers to the procedure of recovering the ground truth codeword Y (and thus also X, the first N bits of Y ) from the noisy version Ye. We will focus on error correcting codes based on highly sparse, low density parity check (LDPC)[4] matrices H, and use the sum-product variant of the loopy belief propagation (BP) algorithm to estimate partially corrupted message bits[5][6], and to bring our Martian safely back home.
To represent the problem using an undirected graphical model, there are two sets of factors you need to consider. The first are the unary factors associating Yi with Yei (messages that are similar to the one received are more likely). You also need to include the parity checks, which are factors defined on Y (assigning zero probability to messages that are not valid codewords) which depend on H.
LDPC codes are specified by a binary parity check matrix H ∈ RN×2N, whose columns correspond to codeword bits, and rows to parity check constraints. We define Hij = 1 if parity check Pi depends on codeword bit Yj, and Hij = 0 otherwise. Valid codewords are those for which the sum of the bits connected to each parity check, as indicated by H, equals zero in modulo-2 addition (i.e., the number of “active” bits must be even). Equivalently, the modulo-2 product of the parity check matrix with the 2N-bit codeword vector must equal a N-bit vector of zeros. As illustrated in Fig. 1, we can visualize these parity check constraints via a corresponding factor graph. The parity check matrix H can then be thought of as an adjacency matrix, where rows correspond to factor (parity) nodes P, columns to variable (message bit) nodes Y , and ones to edges linking factors to variables.
(a) Implement code that, given an arbitrary parity check matrix H, constructs a corresponding factor graph. The parity check factors should evaluate to 1 if an even number of adjacent bits are active (equal 1), and 0 otherwise. For a given H matrix, define a small test case, and verify that your graphical model assigns zero probability to invalid codewords.
(b) Implement loopy belief propagation (sum product) for the factor graphs you generate in Part a.
(c) Load the N = 128-bit LDPC code provided in ldpc36-128.mat. To evaluate decoding performance, we assume that the all-zeros codeword Y is sent, which always satisfies any set of parity checks. Using the rand method, simulate the output Ye of a binary symmetric channel: each transmitted bit is flipped to its complement with error probability 05, and equal to the transmitted bit otherwise. Define unary factors for each variable node Yi which equal 1 − if that bit equals the “received” bit at the channel output, and otherwise. Run loopy belief propagation for 50 iterations of a parallel message update schedule (update all messages in each iteration using BP equations, based on the messages from the previous iteration), initializing by setting all variableto-factor messages to be constant. After the final iteration, plot the estimated posterior probability (conditioned on the received, noisy message) that each codeword bit equals one. If we decode by setting each bit to the maximum of its corresponding marginal, would we find the right codeword?
(d) Repeat the experiment from part (b) for 10 random channel noise realizations with error probability 06. For each trial, run sum-product for 50 iterations. After each iteration, estimate the codeword by taking the maximum of each bit’s marginal distribution, and evaluate the Hamming distance (number of differing bits) between the estimated and true (all-zeros) codeword. On a single plot, display 10 curves showing Hamming distance versus iteration for each trial. Is BP a reliable decoding algorithm?
(e) ] Repeat part (c) with two higher error probabilities, 08 and
10. Discuss any qualitative differences in the behavior of the loopy BP decoder.
(f) Load the N = 1600-bit LDPC code provided in ldpc36-1600.mat. Using this, we will replicate the visual decoding demonstration from MacKay’s Fig. 47.5. Start by converting a 40×40 binary image to a 1600-bit message vector; you may use the logo image we provide, or create your own. Encode the message using the provided generator matrix G, and add noise with error probability 06 (flip each bit with that probability). For this input, plot images showing the output of the sum-product decoder after 0, 1, 2, 3, 5, 10, 20, and 30 iterations. The rem method may be useful for computing modulo-2 sums. You can use the reshape method to easily convert between images and rasterized message vectors.
(g) Repeat part (e) with a higher error probability of 10, and discuss differences.
Notes:
• This problem requires substantial computing time, so start early. And also you may choose not to use our provided classes or starter code.
• Error correcting codes are everywhere! This file is very likely stored on your computer in some memory (disk, RAM, ..) that uses an error correcting code. LDPC codes (like the one you implemented) in particular are used among other things for deep space communications, for 10GBase-T Ethernet and are also part of the Wi-Fi 802.11 standard. Loopy belief propagation is essentially a state-of-the-art decoding algorithm.