Bayes Nets¶

Bayes Nets take the idea of uncertainty and probability and marry it with efficient structures. We can understand what uncertain variable influences other uncertain variable.

Challenge Question¶

https://dl.dropbox.com/s/pvo18qlb1gekh1b/Screenshot%202017-02-24%2001.31.30.png

This requires creativity to connect O1 and O2.
We have to use g somehow.
We will use Capital case letters to indicate our Variables.
We will use lower case letters to indicate when the variable is true, and - in front of it to indicate when it is not true.
I think, the step by step illustration is not accurate.

https://dl.dropbox.com/s/zs0lzjj1yjppw7u/Screenshot%202017-02-24%2001.37.42.png

We solve for all the situations were o2 is true given o1 is true (this is subtler meaning with involving both G and o1)
Over all the situations were o1 is true. Here we go for every o2 and G.
Why are we doing this is not explained in this video.

We define the numerator

https://dl.dropbox.com/s/cz3atf9kxehtpyo/Screenshot%202017-02-24%2001.42.50.png

We define the denominator

https://dl.dropbox.com/s/smv3gpgs25fumh3/Screenshot%202017-02-24%2001.44.10.png

We calculated this result by summing up the results for all the relevant situations. But we can also get the results by sampling that can take care for more complex networks.

Bayes Network¶

We care about diagnostic reasoning.

https://dl.dropbox.com/s/uxu1x138ciwkph3/Screenshot%202017-02-24%2002.25.44.png

How many parameters?

We need one with the evidence positive.
We need once with the evidence negative.
One probability for the evidence itself.

https://dl.dropbox.com/s/zhexycql503lp27/Screenshot%202017-02-24%2002.27.40.png

Computing Bayes Rule¶

We compute the posterior probability not normalized, but ditching the probability B.

https://dl.dropbox.com/s/a3y7xt379zumi17/Screenshot%202017-02-24%2002.31.42.png

We calculate the normalizer indirectly using the terms itself.

https://dl.dropbox.com/s/d1t91jrqma5l8op/Screenshot%202017-02-24%2002.33.07.png

Two Test Cancer¶

https://dl.dropbox.com/s/tmirw03l9x2fppb/Screenshot%202017-02-24%2002.45.44.png

P(C| ++) = ?

Use the P' formula from above.

P'(C|++) = P(++|C) * P(C)
         = P(+|C) * P(+|C) * P(C)
         = 0.9 * 0.9 * 0.01

P'(-C|++) = P(++|-C) * P(-C)
          = P(+|-C) * P(+|-C) * P(-C)
          = 0.2 * 0.2 * 0.99

P(C| ++) = P'(C|++)
           --------------------
           P'(C|++) + P'(-C|++)

Calculating the result.

n1 =  0.9 * 0.9 * 0.01
d1 =  0.2 * 0.2 * 0.99

n1 / (n1 + d1)
0.169811320754717

https://dl.dropbox.com/s/i2e1s2e8v120scs/Screenshot%202017-02-24%2002.56.24.png

Conditional Independence¶

https://dl.dropbox.com/s/6rxgvmxfphe8298/Screenshot%202017-02-24%2002.59.44.png

Conditional Independence is a big thing in Bayes network.

https://dl.dropbox.com/s/16dy6pv5faer4tv/Screenshot%202017-02-24%2003.01.37.png

Without A, B and C are independent.
Given A, B and C are not independent. They are both conditioned on A.

Conditional Independence 2¶

Tricky again.
Apply Total Probability.

https://dl.dropbox.com/s/332s5ikar2v0zwq/Screenshot%202017-02-24%2003.20.48.png

https://dl.dropbox.com/s/7ygv4e7fuf4ak8s/Screenshot%202017-02-24%2003.24.27.png

Right here is the Magic. How did we bring this in?
Why do we not have any denominator.

https://dl.dropbox.com/s/kns1stjd71zjbjw/Screenshot%202017-02-24%2004.09.18.png

A Lot has happened in here. This is short-circuiting.

https://dl.dropbox.com/s/55g9nnv0fyvcok6/Screenshot%202017-02-24%2004.16.23.png

https://dl.dropbox.com/s/asqdlqjzsmxnx2d/Screenshot%202017-02-24%2004.17.38.png

Compare¶

Same thing approached. Two different situations.

Absolute and Conditional¶

https://dl.dropbox.com/s/bbrqxphfi6nmomr/Screenshot%202017-02-24%2020.29.05.png

Confounding Cause¶

https://dl.dropbox.com/s/ejn4qwdu4isw3h1/Screenshot%202017-02-24%2008.50.54.png

Explaining Away¶

https://dl.dropbox.com/s/g1jiqnre3ia32d3/Screenshot%202017-02-24%2008.52.17.png

https://dl.dropbox.com/s/yeutvmix4hyq57f/Screenshot%202017-02-24%2008.53.30.png

Explaining Away 2¶

https://dl.dropbox.com/s/jxn9a02cutmwpcr/Screenshot%202017-02-24%2021.13.27.png

Explaining Away 3¶

https://dl.dropbox.com/s/a2k3gjkpfsh6f5g/Screenshot%202017-02-24%2021.19.44.png

Conditional Dependence¶

https://dl.dropbox.com/s/04ab2uph1r2vkzz/Screenshot%202017-02-24%2021.21.12.png

General Bayes Network¶

https://dl.dropbox.com/s/nbf2tor4yz0bbp5/Screenshot%202017-02-24%2021.22.38.png

https://dl.dropbox.com/s/vt82z3mdkplpufi/Screenshot%202017-02-24%2021.24.20.png

D Separation¶

https://dl.dropbox.com/s/xb21x38u6qc1lmx/Screenshot%202017-02-24%2021.25.32.png

Not Independent, if linked by unknown variable.

https://dl.dropbox.com/s/uhzgjhwfc2vxoqi/Screenshot%202017-02-24%2021.26.33.png

D Separation¶

https://dl.dropbox.com/s/1d9cb70w42f99qq/Screenshot%202017-02-24%2021.28.08.png

Active Triplets render them Dependent
Inactive triplets render them Independent

Conclusion¶

https://dl.dropbox.com/s/imppwbjtti4pkua/Screenshot%202017-02-24%2021.29.41.png

Probabilistic Inference¶

Probability Theory
Bayes Net
Independence
Inference

https://dl.dropbox.com/s/fmbg4knfrkdz5qs/Screenshot%202017-02-25%2005.52.20.png

What kind of questions can we ask?
Given some inputs what are the outputs?
Evidence (know) and Query (to find out) Variables.
Hidden (neither Evidence or Query. We have to compute)variables.
Probabilistic Inference, output is going to be probability distribution over query variables.

https://dl.dropbox.com/s/r09675e4drswgfd/Screenshot%202017-02-25%2005.55.57.png

Enumeration¶

Start by stating the problem
Using conditional probability

https://dl.dropbox.com/s/xbhakaxuezhxnep/Screenshot%202017-02-25%2005.59.12.png

https://dl.dropbox.com/s/6pyyuk13ymf4c01/Screenshot%202017-02-25%2006.01.44.png

https://dl.dropbox.com/s/w9lajc4h2wqvnmz/Screenshot%202017-02-25%2006.02.35.png

We denote that product of 5 numbers term as a single term called f(e,a)
Then the final sum is the answer to sum of four terms where each term is a product of 5 numbers.

https://dl.dropbox.com/s/6rqq7gv64ko5ywq/Screenshot%202017-02-25%2006.04.57.png

https://dl.dropbox.com/s/h1do4kipzng82t3/Screenshot%202017-02-25%2006.05.27.png

Speeding up Enumeration¶

https://dl.dropbox.com/s/h1kqmgznefudqzt/Screenshot%202017-02-25%2006.18.58.png

Reduce the cost of each row in the table.
Still the same number of rows.

Using dependence

https://dl.dropbox.com/s/ztn5wq66p08c6pq/Screenshot%202017-02-25%2006.23.33.png

Casual Direction¶

Bayes Network is easier to do inference on, when the network flows from causes to effects.

Variable Elimination¶

NP Hard computation to do inference over Bayes Nets in general.
Requires algebra to manipulate the arrays that come out the probabilistic terms.

https://dl.dropbox.com/s/q0ufdgn4h6ci0p4/Screenshot%202017-02-25%2006.35.05.png

Compute by Marginalising out and we have smaller network to deal with.

https://dl.dropbox.com/s/7zms1cwvz9l2ggc/Screenshot%202017-02-25%2006.38.29.png

We apply elimination, also called marginalization or summing out to apply to the table.

https://dl.dropbox.com/s/yij3e5xs0mib8gx/Screenshot%202017-02-25%2006.41.32.png

Variable Elimination - 2¶

We sum out the variables and find the distribution.

https://dl.dropbox.com/s/7tnknw21tihfz0j/Screenshot%202017-02-25%2006.43.37.png

Variable Elimination - 3¶

https://dl.dropbox.com/s/z706dpnoslrfxl1/Screenshot%202017-02-25%2006.46.06.png

Summing out and eliminating.
If we make a good choice, then variable elimination is going to be more efficient than enumerating.

Approximate Inference¶

Sampling

https://dl.dropbox.com/s/uvfz2og3pbsbp33/Screenshot%202017-02-25%2006.51.24.png

Enough counts to estimate the joint probability distribution.
Sampling has an advantage over elimination as know a procedure to come up with an approximate value.
Without knowing the conditional probabilities, we can still do sampling.
Because we can follow the process.

Sampling Exercise¶

Sample that randomly
Doubt: Weighted Sample or the Random Sample. Video suggests that it is a weighted sample.

https://dl.dropbox.com/s/c34wjhd6p3heqvs/Screenshot%202017-02-25%2007.02.35.png

Approximate Inference 2¶

In the limit, the sampling will approach the true probability.
Consistent.
Sampling can be used for complete probability distribution.
Sampling can be used for an individual variable.
What if we want to compute for a conditional distribution?

https://dl.dropbox.com/s/dlvkzx2r6dudecx/Screenshot%202017-02-25%2007.13.39.png

Rejection Sampling¶

Evidence is unlikely, you will reject a lot of variables.

https://dl.dropbox.com/s/i3qv2e1svcmecer/Screenshot%202017-02-25%2007.22.37.png

We introduce a new method called likelihood weighting so that we can keep everyone.
In likelihood weighting, we fix the evidence variables.

https://dl.dropbox.com/s/4osmw87r1l3u4ft/Screenshot%202017-02-25%2007.23.40.png

Likelihood Weighting¶

https://dl.dropbox.com/s/xjhlsqbshnp4mik/Screenshot%202017-02-25%2007.26.11.png

It is a weighted Sample.

https://dl.dropbox.com/s/cc4jr3zd3dwtly5/Screenshot%202017-02-25%2007.28.37.png

We make likelihood weighting consistent.

Gibbs Sampling¶

Josiah Gibbs, takes all the evidence into account, not just upstream evidence.
Markov Chain Monty Carlo
We have a set of variables, we re-sample just one variable at a time conditioned on all the others.
Select one non-evidence variable and resample it on all other variables.

https://dl.dropbox.com/s/rnr442leqpjpuuu/Screenshot%202017-02-25%2007.34.54.png

We end up walking around the variables.
The samples are dependent.
They are very similar.
The technique is consistent.