(:commentbox:)

Jürgen Hahn — 08 February 2012, 16:14

First of all, it is one of the best ML books I've ever seen so far!

On page 319, Eq. (15.6.5) I suppose something is missing since the next sentence explains the summation over z which does not occur in the respective equation. My version is also 21/11/11.

Luis Maeda — 07 February 2012, 17:08

"making a" is repeated in 13.2.1 'Utility and Loss' on page 288, my version is 21/11/11

alireza.BT — 22 November 2011, 13:22

there is an m missing in page 564 just above equation 28.3.15 my current version is 211111

Ed Wright — 18 November 2011, 02:29

demoBayesLogRegression calls BayesLogRegression, but BayesLogRegression is not in the toolbox folder.

db: thanks. Do let me know if you find any other problems.

Ed Wright — 16 November 2011, 17:06

demoBayesLogRegression in not in tool box but in the book. My version of matlab does mot like 200*w(l) to be be zero in the plot on line 40 of demoParticleFilter. I inserted max(200*w(l),1/1000). In demoMixBernoulliDigits I needed the 'for' loop to be for d =3:2:9. The other digit files do not appear in the data folder.

db: thanks. Should be working now.

Ed Wright — 03 November 2011, 10:50

cliquedecomp.m needs betalog defined Unable to mex cliquedecomp.c

The following Demos are lised in book, but NOTin Software package demoShortestPath demoThinJT demoLogReg demoBayesLogRegression demoMixBernoulliDigits demoGMMclass demoPolya demoARtrain demoHopfieldLatent

The fowwloing are in Software Package but Not in Book demoLSI2 demoLDS (empty) demoLDSI ( empty) demoLearnDecMN demoCars demoMultGaussianMomentGaussianCanonical demoMultnomialpXYgZ DemoLDS

db: Thanks -- hopefully should be fixed now. There are additional files in the toolbox that are not documented (yet) in the book; others are additional teaching material.

Ed Wright — 29 October 2011, 16:42

I know this may be a dump question, but cliquedecomp.c wants stdbool.h and my microsoft visual c++ 2008 dose not find it. I cand not find anything that will work. Can you help?

Ed Wright — 29 October 2011, 16:42

I know this may be a dump question, but cliquedecomp.c wants stdbool.h and my microsoft visual c++ 2008 dose not find it. I cand not find anything that will work. Cany you help?

db:I don't know how to help on this I'm afraid, but I guess this should be easy to find a solution to on a forum.

Dai Wei — 23 September 2011, 19:38

In formula (1.2.8), the number 312 should be 412.

Vikram Narayan — 20 September 2011, 12:51
Version: DRAFT August 27, 2011
Error : Typo
Page: 57
Paragraph : 1

Whilst a Markov network........ .... giving p(*x2*,x2,x3,x4) = phi(x1,x2)...phi(x3,x4)/ Z

  • x2* needs to be replaced by x1.

Mark Alen — 23 August 2011, 10:06

I know it might be too much to ask but it might be amazing if you could compile it with smaller paper size and fewer words per page for those who read the book on e-book readers like Kindle. There are templates for Kindle in latex so it is just a matter of recompiling it. I'd also love to donate to the project to make that kindle version possible.

db:We tried this and I'm afraid it didn't work well enough to be useful. I think it's best to wait until the readers get more standardised and hopefully a little larger.

Rahul — 24 June 2011, 16:48

In section 20.3 , I found derivations in E & M steps to be cryptic and Hand wavvy (Please excuse me , If I am being rude ). I think , as an student it would make more sense if we go through step by step derivation (if possible making citations to appendix if some major math stuff needs to be reviewed). Having said all this , I must say this is an amazing book.appendices is also extensive Thanks,

Simon Zwieback — 24 June 2011, 13:42

HMM Gamma messages

when computing the term p(h_t|h_t+1,v_t) in 23.2.22 it is not valid to say it is proportional to the joint posterior of h_t+1 and h_t as you are summing over h_t+1 in 23.2.21

cf. e.g. chapter 11 of An Introduction to Graphical Models by Jordan & Bishop

db: What might be a bit confusing is that the proportionality constant, whilst constant with respect to h_t, is dependent on h_{t+1}. I'll try to clarify this. This issue isn't really related to eq 23.2.21. Thanks anyway for the comment.

Aleksei N. Sorokin — 17 June 2011, 07:55
Typo: p.8, formula (1.1.1). The summation should be over all "x in dom(x)" with THE FIRST X IN SANS-SERIF.

Thuraya — 03 June 2011, 12:32

sorry,i have made a mistake!

p.481:Line 14 "p(ht|st,v1:t)is a mixture with O(S^(t-1)) components" p(s1,h1|v1) is an indexed set of gaussians, so p(h1|v1) is a mixture with S components. p(ht|st,v1:t)=sum(p(st,ht|st,v1:t)) "p(ht|st,v1:t)is a mixture with O(S^t) components" Great book!

Thuraya — 03 June 2011, 09:15

p.481:Line 14 "p(ht|st,v1:t)is a mixture with O(S^(t-1)) components" p(s1,h1|v1) is an indexed set of gaussians, so p(h1|v1) is a mixture with S components. p(ht|st,v1:t)=sum(p(st,ht|st,v1:t))

"p(ht|st,v1:t)is a mixture with O(S^t) components"

Great book!

John B — 26 May 2011, 03:52

Minor typos in v 24052011 (indicated in asterisks below):

p. 11: "Intuitively, if x *is* conditionally independent..."

p. 18: "Since the dice are fair both ... p(sa = s*b*) = 1/6."

Great book!

Where might we find the solutions to the exercises?

Cheers

Ed Wright — 22 May 2011, 12:11

Some of the demo programs gave error messages. I also found that it is best to clear all plots and variables before running the demo programs.

demoNaiveBayes ??? Cell contents assignment to a non-cell array object.

                   Error in ==> demoNaiveBayes at 29
                  I added x=cell(2)   and gave no error message

demoGibbsGauss

                 ??? Subscripted assignment dimension mismatch.
                 x(:,1) = zeros(2,1);
                I changed x  = zeros(2,1 ) and gave no error message

demoLearnDecMN

         ??? Undefined function or method 'edges2adj'   

demoLinearCRF

         ??? Undefined function or method 'CRFfeature' for  'double'. 

 demoSumprodGaussCanonLDS
         ??? Input argument "meanV" is undefined 
          Error in ==> demoSumprodGaussCanonLDS at 46
           [f,F,g,G,Gp]=LDSsmooth(v,A,B,SigmaH,SigmaV,SigmaH,priormean)

demoIPFeff

      ??? Reference to non-existent field 'candedges'. 
      Error in ==> makeThinJT at 4 
      Error in ==> demoIPFeff at 12
      [A Atri]=makeThinJT(D,12,opts);  

  Your book looks like it will be a very usefull recourse!!               

E Wright — 20 May 2011, 20:39

I find no demoThinJT.m or edges2adj.m

Tadej Jane˛ — 19 May 2011, 11:13

First, thanks for providing a great book!

In chapter 3, the 'wet grass' example:

You should explain, why it is correct to go from Eq. 3.1.9 to Eq. 3.1.10 and omit summation over J in this particular case.

One could be mislead into thinking you can always just simplify the numerator and denominator by "crossing-out" the same terms (p(J|R) in this case). This is of course not true.

As it happens in this example, none of the other terms has J as a condition, so it is unnecessary to sum over J and R and it is enough to sum over R.

I think this should be made clearer to avoid wrong interpretations.

%db Thanks, I'll add a sentence on this.

Petri Myllymaki — 26 February 2011, 11:19

In Example 10.3. you say that the probability of the person being Scottish changes from 0.192 to 0.236 if using the uniform Dirichlet prior, but I believe in Example 10.2. the ML probability was actually 0.8076, i.e. 1-0.192?

[While I'm at this: are you sure you want to write "whiskey" when the example is about the Scottish and the English...?]

Mark — 20 February 2011, 14:42

Typo Page 61, Remark 4.1, Last Sentence: "In there is no..." should be "If there is no..."

Tim Zajic — 08 February 2011, 17:14

In the discussion of the Hammersley-Clifford Theorem it is mentioned that 'It is clear that for any decomposable G...' There has been no discussion of decomposability before this page.

Julien Gaugaz — 02 February 2011, 10:53

In Example 11.2, the paragraph after Equation 11.2.9 states "[...] q distribution that optimises L(q,theta) (E-Step) [...]". I guess it should mention "LB(q,theta)" instead of "L(q,theta)".

This is from the 121210.pdf version.

PS: Congratulations and many thanks for this excellent book!

Brian Vandenberg — 31 January 2011, 20:47

In the notation list, the 2nd line item for page 11: the word 'on' is missing at the end:

(...) Y conditioned [on] variables Z

sunqiang — 20 December 2010, 15:12

maybe there is a typo:121210.pdf(version) page 582(606 of 644). 29.1.9 Computing the matrix inverse when A = a,b];[c,d?, inverse of A should be 1/(ad-bc)d,-b],[-c,a?, not 1/(ad-bc)d,-c],[-b,a?

btw: is there any chance for a "Large fontsize" pdf version for easy reading?

14 December 2010, 16:45

example 3.5 does not look right db: thanks -- it should read A is independent of B (not A is independent of C)

Chris Bracegirdle — 03 December 2010, 15:30

I think the variables should be cleared up in example 11.1. Do you want x_c for the colour or c? Esp 11.1.12.

09 November 2010, 18:43

agreed. but the definition is stated in terms of "conditional independence statements". db: Another way to say it is that for your first example, the conditional independence statement set is I(A,C). In the other case the conditional independence statement set is empty. In this sense they are not the same. (Dependence is the absence of independence). I agree though that this might be a bit unclear -- I'll try to add a sentence to clarify.

09 November 2010, 13:07

Question on: Definition 3.5 (Markov equivalence). "Two graphs are Markov equivalent if they both represent the same set of conditional independence statements."

A->B<-C and A->C->B<-A share the same set of conditional independence statements and yet do not share the same skeleton or set of immoralities, no? db: The independence statements are different. In the first case A is (marginally) independent of C. In the second case A is (marginally) dependent on C.

Wei Wang — 09 November 2010, 02:35

a typo error in Page 62, the first line below equation 4.3.5, it should be "normalized parental components", not "mormalized..."

Alistair — 05 November 2010, 12:09

p156 Definition 8.4 should probably read "...more than one mode", instead of "...more than one node".

Excellent book!

Abu — 04 November 2010, 06:44

p69 - Section "Expressiveness of Graphical Models"; missing "of" in first sentence.

osdf — 02 November 2010, 15:15

p. 577, third line after A.1.3, 'they are _at_ right angels...'

osdf — 02 November 2010, 14:31

I think in A.1.14 is a typo, it should be [AB]_{ik} on the left hand side.

Amazing book!

Ron — 25 October 2010, 13:46

p28 should say "zero except" instead of "zero expect"

Jonathan Yedidia — 28 September 2010, 22:27

Great looking book! After equation 1.1.19 should be "...multiplied by a function of a..." rather than "...multiplied by a function of c..."

giorgos — 20 September 2010, 12:07

In the appendix, there is a section A.5 and a section A.7 but not a section A.6

Roy — 09 September 2010, 08:34

Hi. Perhaps I'm missing something, but where can I find the solutions to the exercises? The book says "Instructors seeking solutions to the exercises can nd information at the website" (www.cs.ucl.ac.uk/staff/D.Barber/brml)...

yochju — 20 July 2010, 08:05

p550 typo in (A.1.52) after the first equal sign "=" y^T -> x^T

Laszlo Kozma — 19 July 2010, 11:58

Page 9, example 1, part 2: "In this case only a small number of people in the population eat hamburgers, and most of them get ill."

"most" is not correct, it is still less than 1%, just a larger fraction than in part 1. Also, the approximate value is unfortunate because this way the intuitive fact is missed that by decreasing p(H) 500 times, p(KJ|H) increased exactly 500 times.

Nicholas Pilkington — 02 June 2010, 00:21

Page 359, Gaussian Processes

For isotropic covariance functions, the covariance is defined as a function of the distance k(|d|).

k(|d|) should read |d|.

It may also be useful to specify that isotropic covariance functions are rotation invariant, in addition to the translation invariance of the stationary covariance functions just described.

Nicholas Pilkington — 01 June 2010, 23:41

Page 45, Section: Resolution of the paradox. The 'paradox' occurs because we are ...

The first inverted comma is the wrong way around.

Vinh — 31 May 2010, 09:23

I tried to run the file setup.m from the download package and realized the zip file doesn't contain the "data" directory. Can you add that directory to the zip file?

Leslie Kanthan — 26 May 2010, 14:46

on page 92 figure 6.1(b). There is inconsistancy in the clique graph representation of a. The two circles encompass (abc), (bcd) and the box encompasses [bc] respectively. However, in the clique graph of 6.2.8 page 93, you have used the potentials inside that very same clique graph. So back to 6.1b, should it be PHI(abc) inside that circle or just (abc)..

Roger De Souza Eremita — 26 May 2010, 14:44

X = V intersection W not union. p93 union is correct --db

Roger De Souza Eremita — 26 May 2010, 14:41

X intersection Y on p93

Jacob Ayres-Thomson — 26 May 2010, 14:37

10.2.9

on RHS of relational operator...

log p(c=0|x*)

should be:

log p(x*|c=0) I can't find this error -- db

Roger De Souza Eremita — 26 May 2010, 14:11

On page 93 where describing how you first create the new separator, it gives the first update and then says "and the refine the W potential using", which should clearly be "and THEN refine the W potential using:"

Leslie Kanthan — 24 May 2010, 20:01

albeit p(x1=1) =1, the line states "We are asked to compute p(x5|x1=1) which is given by...." This sentance implies that the expression that follows after p(x5|x1) is a general expression. So it should have p(x1=1) in it even though this is =1.

Leslie Kanthan — 24 May 2010, 19:52

(5.1.14) pg 71 p(x1=1) is not explicitly in that expression. p(x1=1) is not required since we are considering a conditional distribution -- db

Leslie Kanthan — 24 May 2010, 19:24

Page 61 4.4.3 needs extra clarrifcation, namely

"p(a,b,c) = PHI(a,b)PHI(a,c)PHI(b,c). The MN representation is given in fig g(4.8c)."

This is not true with the original definition you gave us. I know what your trying to say.. but the MN representation is defined as the multiplication of the cliques. You define cliques as maximal by nature. Therefore p(a,b,c) = PHI(a,b)PHI(a,c)PHI(b,c) does NOT have markov representation fig4.8c. Because if you look at fig 4.8c, one would state that the distribution is P(a,b,c)=PHI(a,b,c) because this is the ONLY clique(by your definition for maximality). For completeness sake I suggest that the cliques are not defined as maximal, then that ALLOWS one to implement p(a,b,c) = PHI(a,b)PHI(a,c)PHI(b,c) with the figh4.8c as MN diagram. I think the original text is OK, but I'll try to clarify. PHI(a,b)PHI(a,c)PHI(b,c) does have the MN in fig 4.8c since it can be represented using the function PHI(a,b,c)=PHI(a,b)PHI(a,c)PHI(b,c) -- db

Leslie Kanthan — 24 May 2010, 19:10

Hi (5.1.8) page 70, it is not mathematically precise to use the proportional sign for each step beyond the first one. P(d|a)~P(abcd) part is correct.. But from that point onwards it should be EQUAL not proportional.

DRAFT May 11, 2010

Robert Theed — 24 May 2010, 16:33

"and the refine the W potential using"

DRAFT May 11, 2010

page 93

zalczer vincent — 24 May 2010, 15:43

Really small typo error, in 4.4 expressiveness of graphical models, the mn of p(c/a,b)p(a)p(b) IS a single clique... I dunno if that was rectified, should look

Yasaman Kalantar Motamedi — 23 May 2010, 13:41

Version 110510, Page 115 one line before the last one: whether or not is is raining->whether or not it is raining

Guangyan Song — 22 May 2010, 18:14

version 110510 page 57, chapter 4.2.2 "In this case, a GN satis es the following independence relations:" I think this should be removed.

Roger De Souza Eremita — 22 May 2010, 17:22

Figure 6.8 "One verify that this satisfies the running intersection property." -pg 102

Roger De Souza Eremita — 22 May 2010, 16:15

Error in the new insert on page 7, chapter 1, in the pretext between the two green lines."the workhorses of machine learning. Another strength of the language of probability is that it structures problems in a form that consistent for computer implementation." _ missing "IS" in "form that IS consistent"

Qian Chen — 15 May 2010, 19:23

26th Apr version, page 97 figure 6.7: from (c) to (e), after eliminating node d, the path between d and c should be pink, not black.

Lu Wang — 15 May 2010, 05:01
pg56: typo: Definition 4.7 B pass though(through) S

Laszlo Kozma — 14 May 2010, 12:53

pg.3.

                                       "Often these methods and are not necessarily directed to ..."

an extra "and"

Simon Zwieback — 11 May 2010, 08:51

Chapter 6:

pg 91: typo: apply to different (difference) inferences

pg 93: notational inconsistency: in eqns (6.2.7) & (6.2.8) as well as the figure, the separator potential after the entire absorption procedure is denoted with one star; in the subsequent formulae, however, with two.

pg 94: typo: eqn (6.2.15): lower case s in penultimate part

pg 101: Figure 6.7: a) the link between c and d is black in all subfigures. b) why are the nodes f and i simplical in (c)? c and j are both neighbours of f but not adjacent.

pg 102: Why does greedy elimination give a reversed perfect elimination ordering? In Fig 6.7 (e), eliminating k first introduces additional edges. Eliminating in the forward direction (beginning with a) is possible (in general, all the missing edges have been added during the triangulation).

pg 104: incomplete sentence: ...it is unlikely that a general purpose algorithm that could consistently ... second that

Ben — 09 May 2010, 21:56

A.3.1:

"Geometrically this means that the function f (x) is always always increasing (never non-decreasing). "

This seems to say that "always always increasing" is the same as "never non-decreasing" which makes little sense to me as surely every strictly increasing function is also always non-decreasing, no?

Benjamin Schwehn — 09 May 2010, 20:42

I think in A.1.24 the first matrix should be:

a_22 a_23 a_32 a_33

not

a_12 a_23 a_32 a_33

Seb — 07 May 2010, 15:03

Page 353: The second sentence is grammatically incorrect:

In Gaussian Processes uses this to motivate a prediction method that does not necessarily correspond to any 'parametric' model of the data.

Bo Wang — 07 May 2010, 05:19

26 Apr Version: Same error appeared in Page 58 4.2.4 'Separation' the beginning of 2nd line as "without passing though Z. ..." should be replaced by "...through Z."

Bo Wang — 07 May 2010, 04:25

26 Apr Version: Page 56, On Global Markov property Definition 4.7 2nd line you wrote "...to any member of B passes though S. ..." Should "though" replaced by "through" ?

Bo Wang — 06 May 2010, 21:21

I have found the same error that Qian Chen found in page 39.

and also, on page 41, Example 3.4 (3) "...on the path between t and f,...", it's not wrong, but I think you mean "on the path between b and f".

Qian Chen — 04 May 2010, 16:29

26 Apr Version: Page 58 4.2.4 Moralisation: This is typo for 'link-', it should be 'link' or for my understanding may be 'line'.

Qian Chen — 02 May 2010, 19:32

26 Apr Version: Page 39 Figure 3.7 At the end of the second line should be 'a and e are not d-connected...' and at the end of this paragraph should be 'Hence a and e are d-separated'.

Qian Chen — 02 May 2010, 16:59

oh that's: 'the denominator should has p(B)'

Qian Chen — 02 May 2010, 16:54

26 Apr Version: Page 35 3.2.30 The numerator should has a factor p(B=tr), and so do the numerator should has p(B)

Simon Zwieback — 29 April 2010, 14:44

pg 141: a sailer as (0,1,0,0)... - sailor pg 152: This can be obtained by view the disbribution - viewing pg 156: (8.7.47) exponential function isn't displayed correctly pg 164: Exercise 8.23: I think the averages should all be taken with respect to p(x|theta)

Haithem Jarraya — 28 April 2010, 11:52

Example 4.1: page 57.(Boltzmann machine) reads - link been nodes i and j for wij change to - link BETWEEN nodes i and j for wij

Jacob Ayres-Thomson — 24 April 2010, 19:19

10 April 2010 edition - Page 79

first major paragraph:

reads -

"product of N transitions"

change to -

"product of N-1 transitions"

reads -

the most probable path can be read o ff from the sequence of states up to the fi rst time the chain hits the absorbing state

change to

the most probable path can be read o ff from the sequence of states conferring highest total transition probability, which hit the absorbing state within N-1 steps.

Note: this latter change would tally with figure 5.5. The first time the chain hits the absorbing state is not necessarily the highest probability path.

Jacob Ayres-Thomson — 24 April 2010, 19:10

10 April 2010 edition - Page 79

first major paragraph:

"....correspond to the most probable state on the product of N transitions"

change: N to N-1

...if state 'a' has zero probability of reaching state 'b' in N-1 transitions then it cannot reach b at all.

Jacob Ayres-Thomson — 24 April 2010, 18:43

10 April 2010 edition - Page 78

(5.2.17)

Summation on right hand side of equality should be over: m little c Currently it is over: m little b

Robert Theed — 07 April 2010, 12:22

p38 DRAFT March 28, 2010

p(A, B, C) = p(A|c)p(B|C)p(C) (3.3.27)

small c should be big C

Vaisha Bernard — 25 March 2010, 08:48

26 februari version p. 12 in example 3 there is a double "if"

p. 31 Mr. Holmes speakes on the phone to Mrs Nosy, who is completely deaf. How can she use the phone? :)

Good point! db

Simon Zwieback — 24 March 2010, 17:53

09 March Notation list: bracket missing in definition of sigmoid

pg 232: 'In full generality for a set...': the sentence is incomplete and the algorithm reference is wrong

pg 413: eq 23.1.11: the second product symbol in the middle part is not necessary

	last sentence: I think the argument of the indicator function should be [v_{t-1}=j, v_t = i]

pg 425: for example, we might associated ..., associate

pg 431: One possible parameterisation of for a linear ...

pg 437: You might want to add that the transition matrix is assumed to be diagonalisable

pg 438: eq 24.1.8.: the cross term should be 2 a <v nu> = 2 a {<(v-<v>)nu + <<v>nu>} = 0

p 442: This (latent) LDS can be represented as a belief networkin ...

MG — 22 March 2010, 21:58

page V: "...dynamical system may be written in conducted of operations on Gaussian potentials."

the word 'conducted' doesn't seem quite right

MG — 22 March 2010, 21:58

page V: "...dynamical system may be written in conducted of operations on Gaussian potentials."

the word 'conducted' doesn't quite right

Chris Bracegirdle — 22 March 2010, 20:33

Rereading chapter 23 (2nd March version).

First para do you need to constrain the continuous time model to R+? "For the case... v_t is discrete" should be "are" not is as data is a plural

Under eqn 23.1.6 the brackets are inconsistent referring to the exercises. After that i wonder if it might be clearer to say x1=x1 (if you see what i mean) and similar for x2=x2 below.

Eqn 23.1.9 maybe use i' on the bottom to avoid confusion

"A crude search engine" maybe delete "then", add "see" before URL

Where is fig 23.2 referenced?

just under eqn 23.1.10 it should be theta_i|j i.e. add subscript. 23.1.11 and 23.1.12 i would add a comma between i and j in product/summation

Above 23.1.14 maybe say "sequence" not "datapoint". Just after eqn "clustering can then be"? Maybe helpful to motivate the EM algorithm.

Around 23.1.20 and 23.1.21 some 'n's are missing for the vs.

example 96 (gene clustering) Markov Models -> Markov models, maximum likelihood -> Maximum Likelihood. after 23.1.25, again "data was" should be "data were"

Definition 110 "selectes onE" "distributionS"

Example 97, here you use "arg max" as opposed to "argmax"

Section 23.2.6 maybe explain what ht is.

Example 98 "A with"

Section 23.3 "generateed", "number of hidden H states" -> "hidden states H"

Section 23.3.1 again maybe motivate the use of EM. Parameter initialisation, "local maxima" should be maximum, singular

Under eqn 23.3.17 "this is a dependent on"

section 23.3.4 "we might associated", "make a mode", "if the data is"->are

eqn 23.3.21 should be log p(c^n); just under dont need second "above"; "gradient of the is"

Just above eqn 23.4.12 "all sequenced"

section 23.5.3 BioInformatics

section 23.5.4 they aren't subscripts; "emission distributionS"; "first first"

fin.

Simon Zwieback — 19 March 2010, 15:40

P 77-78. Bucket Elimination: the numbering is wrong: it's referred to as A 1 in the box but A 11 in the text.

P 22. Powers of A. Don't know if it's relevant, but the existence of paths can already be determined from A^(N-1). After N-1 steps, N nodes have been visited, so the N+1st introduces a cycle. Also follows from Cayley Hamilton.

Chris Bracegirdle — 16 March 2010, 16:59

Bracket missing in eqn 25.4.9

Simon Zwieback — 10 March 2010, 11:20
Nitpicking:

page 14: the box spills beyond the margins page 16: Other potential utilities - setstate.m: typo in the description page 57: isn't it just the contrapositive statement rather than an application of inverse modus ponens?

Tom Nett — 05 March 2010, 15:41

P.499 in section 27.3 in the first paragraph it reads "Am important and widespread...", which I believe should be "An important and...".

Joshua Fasching — 03 March 2010, 17:25

p500 sect27.3 under Evidence heading

deal -> dealt

Matt Sperrin — 02 March 2010, 11:03

p263, sect13.5: 'how can we assess if' -> 'how can we assess whether'

p273, bottom bullet, affect -> effect

p287 'bag of words bag of words'

Matt Sperrin — 01 March 2010, 16:58

p259, first bullet point - 'affect' should be 'effect'

Matt Sperrin — 01 March 2010, 16:56

Page 142, Def 61. You define the 'delta function', then refer to it as the 'Dirac delta' later in the definition.

Legend to figure 7.6a - 'effect' should be 'affect'

p261 sect 13.2.5: 'mapping from h to x' - I think should be 'mapping from h to c'

Roger De Souza Eremita — 20 February 2010, 13:29

Small ambiguity page 12, regarding Aristotle reasoning. Trees = T and Fruit = F but it may confuse the naive reader, when you immediately write p(T = tr, F = tr), confusing T = True and F=False with Tree and Fruit!!

C Drysdale — 01 February 2010, 03:16

In the 22 January 2010 text version, middle of page 38 (3.3.5), the book states "Define the skeleton of a graph as its undirected version with the directions on the arrows removed." Shouldn't "undirected" be "directed" instead?

C Drysdale — 01 February 2010, 02:00

In the 22 January 2010 text version, top of page 34 (Fig 3.5), the book states "(d) with x3 -> x1." I think it should say "(d) with x2 -> x1."

C Drysdale — 24 January 2010, 03:51

In the 22 January 2010 text version, bottom of page 6, the contraction "who's" should be replaced by the possessive "whose".

Timothy Scully — 22 January 2010, 19:09

p. 44 Example 15, should FD be bad in the first case.

Hayden — 22 January 2010, 10:14

The book is well written and presents the topics in an accessible and intuitive way. However, I'm dissapointed that there are no answers for the exercises. Introducing some would greatly improve the value of this book. Thanks

Stavros Korokithakis — 16 January 2010, 15:27

Page 217, "(those who&#8217;s states would nominally be known but are missing for a particular datapoint)." should be "whose".

Bernardino Romera Paredes — 12 January 2010, 19:53

Ed 7-Jan, exercise 142. I think that, in formula 11.9.9, a summation of i and j is needed in the exponent.

zalczer vincent — 08 January 2010, 18:34

'coma' sorry.

zalczer vincent — 08 January 2010, 18:22

11.4 A failure case for EM

I am not sure the minus in the delta function have a sense (is it a dirac function?), is that not a come instead?

Simon Zwieback — 07 January 2010, 12:25

Chapter 11.2.1.: Footnote: Missing 'the' or 'a' Chapter 11.2.3.: 'the EM algorithm can be compactly stated as in algorithm(11).': should be algorithm(9) Chapter 11.5. : 'the algorithm is given in algorithm(7)'

		 'Under this we arrive at algorithm(9).'
		 the numbers are wrong here as well.

Chris Bracegirdle — 05 January 2010, 16:36

Example 86. should be "a mixture model" or "mixture models"

section 20.2.2 H essential typeS

eqn 20.2.15 just wondered why we use sigma h=j' not just sigma h.

footnote missing verb "is"

eqn 20.3.4 LHS why not h=i?

eqn 20.3.5 should the 2pi not be reciprocal

just above section 20.3.2. "find a sensible solutions"

Symmetry breaking - "responsible for explains"

eqn 20.4.3 why don't we need the sum over h?

section 20.5 "the the"

just above eqn 20.5.6 should it be p(z|pi)?

just below eqn 20.5.12 "to emit over"?

Example 89 "most probably latent" probable?

section 20.6 seems a bit confused with 20.5.2.

Chris Bracegirdle — 04 January 2010, 19:01

Definition 90 - is it positive definite or positive semi-definite?

Just under eqn 19.2.12 lambda l not lambda d

Just under eqn 19.5.12 I'm a little confused here, didn't the definition say we would use covariance and kernel interchangeably?

Fig 19.5 no need for (a)

Section 19.6.1 this is the sigmoid function (has it got 2 names?)

19.7 further reading "over recently years"

Chris Bracegirdle — 04 January 2010, 17:35

18.1.7 should I already know what B is? dim w?

Fig 18.3 second sentence doesnt make sense.

18.1.5 "no insignificant" double negative

Just before eqn 18.2.23 "the the"

Chris Bracegirdle — 30 December 2009, 10:26

Chapter 17 - 21/12 Equation 17.1.3 I think both should be negative. Section 17.2.3 you refer to fig 17.2.2 - I think it should be fig 17.4. Also in text of fig 17.6 A curse of dimensionality - 16^10 /= 10^12. Should be approx? or greater? "an alternative is" shouldnt have the word "is".

Amir Babaeian — 30 December 2009, 10:05

Version 30 Dec Page 131 formula 7.7.29 P(dt|x2, &#960;t) should be replaced with P(dt|xt, &#960;t)

Amir Babaeian — 30 December 2009, 10:04

Version 30 Dec Page 128 formula 7.7.4 P(x2|x1) should be replaced with P(x2|x1, d1)

Amir Babaeian — 30 December 2009, 10:04

Version 30 Dec Page 127, second Line after section 7.7 there is an additional to. “In order to to do we …”

Amir Babaeian — 30 December 2009, 10:03

Version 30 Dec Page 122; formulas 7.4.22 and 7.4.23: µ3(x2, x2, d1) should be replaced with µ3(x1, x2, d1), also in formula 7.4.25 for P3(x2, x2, d1) and µ3(x2, x2, d1).

Chris Bracegirdle — 29 December 2009, 22:28

Equation 16.3.8 should be W=B tilde inverse W tilde.

Chris Bracegirdle — 29 December 2009, 19:27

Example 65 - your text says fig 15.3 has 10, 50 and 100 dimensions but the figure text says it has 100, 30 and 5.

Chris Bracegirdle — 29 December 2009, 19:23

Could be wrong here but just above example 65 you have "subspace spanned by the M largest eigenvectors". My understanding is this should read "eigenvectors corresponding to the M largest eigenvalues".

Chris Bracegirdle — 29 December 2009, 18:36

Section 14.2 "dissimilarity dissimilarity". Maybe worth adding an underfitting vs overfitting graph? q.v. Hastie Fig2.4

Chris Bracegirdle — 29 December 2009, 18:26

Section 14.1.1 closet -> closest. Maybe mention in 14.1 that this is a supervised problem?

Chris Bracegirdle — 29 December 2009, 16:04

Page 270 (21/12). Equation 13.4.3. Is this right? I'm confused by the name "TrueNegative" for c=false t=false. Also you have "We call oa={...} for classifier A and similarly for oa={...} for classifier B" should be ob={ob(n)...} for classifier B

Chris Bracegirdle — 29 December 2009, 14:41

I think it would be helpful to be more verbose about equation 12.4.8. It isn't obvious to me where this comes from.

Chris Bracegirdle — 21 December 2009, 16:50

Page 288 Equation 12.4.4 I don't think there should be the last term p(K)

Yasaman Kalantar Motamedi — 14 December 2009, 08:51

version 10Dec p242,line:4 >> see exercise(130)->see exercise(129) p260,part:classification boundary >> we classifying-> we classify p263,part:Text Classification,line:7 >> is are->is

Simon Zwieback — 13 December 2009, 17:37

Please ignore the previous post. I didn't see 10.3.15

Simon Zwieback — 13 December 2009, 17:35

I think the numerator of the fraction in Equation (10.3.14) should read: Z(uhatprime i (c*))

Ke Zhou — 11 December 2009, 12:57

version 10Dec, p160 Figure 7.13: In uence Diagram for the `Chest Clinic' Decision example.

No explanation of variable s. should add "s=Smoking"

Ke Zhou — 11 December 2009, 11:43

version 10Dec,p247 below equation 10.4.3

It should be“By adding a term" in stead of “By adding and subtracting a term" because we only add a constant term to derive optimal setting of KL divergence

Ke Zhou — 10 December 2009, 17:46

version 10Dec,p247 below equation 10.4.3

“By adding and subtracting a term <log p(xi|xpa(i)>p(xi|xpa(i))” should be “By adding and subtracting a term <log p(xi|xpa(i)>p(xi,xpa(i))”

rowland sillito — 10 December 2009, 11:30

Page 193 (KL divergence) "The KL divergence is widely used and it is therefore instructure to understand..."

"...instructive to understand..."

Simon Zwieback — 09 December 2009, 18:41

Example 42, page 224 In Fig. 9.13 (c), there's a link from Node 2 to 1. According to the text, however, the algorithm was provided with 'the' correct ancestral order. Is it only a partial order or is it '2,1,3,4,...'?

Page 247 eq. (10.4.2) Is pa(i) = i supposed to represent the case of a node having no parent? after eq. (10.4.3.) I think it should read: ... and we may add and subtract [sum over i]<log p(xi|xpa(i))> p(xi,xpa(i))

Plugging this solution into equation (10.4.4): i think it's (10.4.3)

Page 248 First sentence: For two variables xi and xj and distribution p, the , definition(85) can be written...

zalczer vincent — 09 December 2009, 15:53

oups that's page 241 sorry (5 december version)

zalczer vincent — 09 December 2009, 15:51

I think that in the page 241 (10.2.1) "For each class of the two classes" is in fact " For each attribute of the two classes"?

Robert Theed — 09 December 2009, 11:32

Exercise 132.

"that minimises KL(p(c, x)|q(x, c))"

could be

"that minimises KL(p(x, c)|q(x, c))"

not strictly needed but would be more consistent with following lines.

Robert Theed — 09 December 2009, 11:27

p,247

Algorithm 5 Chow-Liu Tress should be. Algorithm 5 Chow-Liu Trees

Guangyan Song — 08 December 2009, 22:13

Version Dec 5th P143 Figure 7.8(c) the separator between Clique {f,d3,h} and {d3,h,k} should be: {d3,h}

Ching-Fu Lin — 08 December 2009, 18:40

P.247 In the paragraph above equation (10.4.4): Since p(x) is xed, the rst term is constant and we "may a term" log p(xijxpa(i)) that depend...

It should be corrected as "may include/add a term"

Tomasz Kacprzak — 08 December 2009, 12:20

page 243 formula 10.2.17 is: p(x_i = s) = theta^i_s(c) I think it should be: p(x_i = s given c) = theta^i_s(c) as this is a likelihood term.

Tomasz Kacprzak — 07 December 2009, 20:06

page 241 051109 formula 10.2.9 is:

 ... > log p(c=0 | x*) ...

I think it should be:

 ... > log p(x* | c=0) ...

Timothy Scully — 07 December 2009, 12:35

demoMDP

line 30: Utility is spelt incorrectly line 41: Criteria is spelt incorrectly

Guangyan Song — 05 December 2009, 21:15

Code for demoDecAsia:

line 2:clear pot porder;

Should be: clear pot partialorder;

Tomasz Kacprzak — 04 December 2009, 15:38

Code for demoDecAsia:

% backtrack decision: after observing asia=no, testxray=no: 59 disp('Evidence : asia=no, testxray=no. New JT decision potential:')

should be: takexray instead of testxray

zalczer vincent — 03 December 2009, 16:47

Page 201, just a little repetition: "we require a model a model of the probabilistic interaction"

Simon Zwieback — 30 November 2009, 12:24

Chapter 9.1.2 is called 'Loss functions, utility and decisions' but loss functions are not mentioned.

Equation 9.1.23: c instead of 1/c

Chris Bracegirdle — 26 November 2009, 13:44

Page 134 "We begin by writing those variables X0 [...] then variables X1 [...] revealed before D1 [...] ordering:"

X0 < D1 < X1.

Don't we need X1 < D1 according to the text?

Chris Bracegirdle — 26 November 2009, 13:39

Page 133 "We use a dashed link [...]. For this reason we use a dashed link"

Simon Zwieback — 23 November 2009, 23:10

page 135, typo. "If were" --> "If we're"

Le Li — 20 November 2009, 01:33

page 83, the (line before "backtracking") x3*=argmax phy(x2*,x3)gamma(x2*), where gamma(x2*)should be gamma(x3).

zalczer vincent — 19 November 2009, 16:19

Page 114, on the greedy elimination example "(d): f and i are now simplical and are eliminated." is false, which don't change the final result but the next step don't add links because they have already been add.( (e): We eliminate g and h since this adds only single extra links." is false also).

zalczer vincent — 19 November 2009, 15:05

Page 119: end of the page "contains" and not "containa".

zalczer vincent — 19 November 2009, 14:16

Page 106, in equation 6.3.15, a Phy(W) is missing

Ed Challis — 19 November 2009, 13:41

Penultimate sentence of section 28.6.2 "The Kikuchi method approximates the entropy based on overlaps OF OVERLAPS of the clusters."

Ke Zhou — 19 November 2009, 02:47

Page 118,equation(6.7.8):P(b)should be P(c) since Psi(b,c)=P(b|c)P(c)

Jacob Ayres-Thomson — 19 November 2009, 01:08

Relation in 3.3.8: Surely the term on the left hand side should be divided by P(z).

The term on the left is currently equivalent to P(x,y,z). Dividing this side of the equation by P(z) would give P(x,y|z) which is the term we should be comparing to right hand side: P(x|z)*P(y|z) in order to verify the state of conditional in/dependance. When divided by P(z) the same relation demonstrated still holds but without the P(z) the demonstration loses it's intended meaning.

Jacob Ayres-Thomson — 19 November 2009, 01:00

Section 3.3.1 The Impact of Collisions In last paragraph: "Here A and B are unconditionally independant" should read "Here A and C are unconditionally independant".

Jacob Ayres-Thomson — 19 November 2009, 00:49

Page 108, operation 6.4.12 and operation 6.4.13: "All Phi(x1,x4) outside of summation terms should read Phi(x2,x4)" This follows from the previously defined relation in 6.4.11 and enables the solution.

Bo Wang — 18 November 2009, 20:50

At page 106, on the 2nd last line, it says "See, for example, fig(6.3.2)... ..." "fig(6.3.2" should be replaced by "fig(6.2).

Ed Challis — 18 November 2009, 19:18

Exercise 69: "Consider a fly FLITTING at random between neighbouring rooms in a large house."

Bo Wang — 18 November 2009, 16:16

At page 65, it's written that "c and Vc(X c) is a real function de&#64257;ned over the variables in the clique indexed by c". You may want to delete "c and" at the beginning of this sentence?

Ed Challis — 17 November 2009, 18:14

Section 28.4.1. Second sentence following equation 28.4.2 "This gives A undirected distribution WHICH connection geometry defined by the weights w."

Ed Challis — 13 November 2009, 10:14

Section 18.2.2. 4th sentence "This is a general recipe for combining model predictions, where each model is weighted its posterior probability" should be "weighted BY its"

Ed Challis — 13 November 2009, 10:09

18.2.1. sentence following equation 18.2.7: "where alpha, THE plays the role of the inverse variance."

Zalczer Vincent — 12 November 2009, 19:51

Um, after some more reading, seems that at page 89 it's written "the due to the summation" instead of "due to the summation"

Zalczer Vincent — 12 November 2009, 19:16

I do think that at page 88, on the equation 5.2.24 the sum in it goes only to (T-1). I also fixed the mix at page 30 on the definition the indices of the adjacency matrices in a mail I send you.

Forgot to put the remark here last time ^_^

Chris Bracegirdle — 12 November 2009, 15:56

Page 62 formula 4.2.3 should there not be a normalisation constant?

Chris Bracegirdle — 12 November 2009, 15:50

Page 61 first paragraph of chapter 4 should be "unified" not "unifired"

Ed Challis — 12 November 2009, 15:15

First sentence following equation 19.2.8 "Using the marginal likelihood, one can learn any free (hyper)parameters the covariance function by maximising the marginal likelihood" missing "of" after parameters

Ed Challis — 12 November 2009, 15:06

Section 19.1.3. Last sentence of second paragraph: "In general, THE we would expect the correlation between yi and yj to decrease the further apart xi and xj are."

Chris Bracegirdle — 12 November 2009, 14:13

Example 10 - burglar. I think it would help to clarify in the intro paragraph that R=1 means there is news on the radio of an earthquake not that the radio is on.

Chris Bracegirdle — 12 November 2009, 14:02

Page 30 second paragraph under heading "Adjacency matrix powers" first sentence doesn't make sense, "If we include..."

Chris Bracegirdle — 12 November 2009, 10:47

page 8 subjective probability second paragraph "probability that the use will like" should be user not use

Ed Challis — 09 November 2009, 14:24

Section 9.1.4. First sentence: "The result of a coin tossing experiment is NH = 2 heads and NT = 8 tails in a coin tossing experiment" you repeat "coin tossing experiment"

Ed Challis — 06 November 2009, 12:48

Appendix A.1.1. First sentence: i think that it would be good to say that the vector is a column vector.

A.3.2. Typo in the hessian equation A.3.10. The element in the nth row and 1st column should have the order of the partials flipped.

Roger De Souza Eremita — 04 November 2009, 12:26

pg 3) nov 2 newest version, "In particular, Computer Science students are familiar the concept of algorithms as core." should be familiar "with" the concept.

Ed Challis — 01 November 2009, 17:18

Section 11.2.1. Sentence following equation 11.2.1 you have written p(h|vmtheta) instead of p(h|v,theta).

First sentence following equation 11.2.3 "...to the fully observed case, expect that the terms..." except instead of expect right?

Roger De Souza Eremita — 31 October 2009, 18:12

pg (46) "...Formally, the concepts defi ne d-separation and d-connection are closely related[269]..." - Should use 'which' or 'that' to make sense, e.g "Formally, the concepts which defi ne d-separation and d-connection are closely related[269]"

Roger De Souza Eremita — 31 October 2009, 18:01

Pg 44 - "...The usual assumption is that each virtual evidence acts independently from other virtual evidences..." - would be better to write 'other virtual evidence', rather than 'other virtual evidences'.

Kieran O'Neill — 27 October 2009, 14:15

Page 36 (3.2.1) "Is it due to overnight rain or did she *forgot* to turn o ff the sprinkler last night?". The word "forgot" should be "forget" or else maybe "had she forgotten".

Bernardino Romera Paredes — 25 October 2009, 22:38

Page 46 (3.3.1) "Consider the BN: A->B<-C. Here A and B are unconditionally independent." I think it should be "Consider the BN: A->B<-C. Here A and C are unconditionally independent."

Amir Babaeian — 24 October 2009, 15:02

Page 36 first sentence after eq (3.2.15). 'Jack's grass is wet' not 'jack's grass it wet'.

Ed Challis — 21 October 2009, 14:35

Section 11.4 first sentence after Algorithm 8 "For fixed q(theta) if we optimize minimise..."

Ed Challis — 21 October 2009, 10:40

Section 11.2.1 the sentence after eq 11.2.3 is missing something like a because.

Alessio Tomasino — 21 October 2009, 07:51

p.37. The sentence 'Jack's grass is wet is in fluenced only directly by whether or not it has been raining', this would imply that

1. Jack doesn't have a sprinkler or 2. Jack never forgets the sprinkler on or 3. Jack's sprinkler is out of order

However the above assumptions are invalidated by the sentence below 'sometimes Jack leaves his own sprinkler on too'. The two scenarios for Jack and Tracey should be symmetric (grass is wet because of rain or sprinkler for both), but this is not the case. Also check the graph at the beginning of the same page: there is no link between Jack and the sprinkler, but surely if the sprinkler is on, Jack's grass will get wet as a direct result of this.

Ed Challis — 13 October 2009, 15:54

p.35. Section 3.2.1. The last sentence of the conditional independence paragraph: "...sprinkler on too), p(T=1|R=1,S)=1,.."

Shouldn't that probability be p(T=1|R=1,S=0)=1 I think its OK. p(T=1|R=1,S)=1 means that Tracey's grass is wet if it rains, regardless of whether or not she left the sprinkler on. db

Timothy Scully — 12 October 2009, 17:41

In section 2.6.1 Edge list there are two mistakes in the first sentence. Simply is spelt simplu, and "lists vertex-vertex pairs are in the graph" should read "lists vertex-vertex pairs that are in the graph"

Simon Zwieback — 09 October 2009, 17:48

p.27: "Directed graphs corresponds to upper triangular adjacency matrices" - apart from the typo, I think this is incorrect, as you can have arcs from A to B and B to A in one graph.

I meant that a DAG always can be represented as an upper triangular matrix given an ancestral ordering -- I'll clarify this though, thanks. db

"For a directed graph this means that a path is sequence of nodes" - missing "a"

Also, DAGs are mentioned before they have been introduced.

Ed Challis — 09 October 2009, 15:35

pg. 54 ex 22 the 2nd and 3rd rows where the p(guage | b,f) is listed is missing true or false, for eg. "p(g = | b = good)

Ed Challis — 09 October 2009, 14:58

p. 43 Section 3.3.1. paragraph 2: "...if there is a non-collider" z which is conditioned ON ALONG the path between..."

p. 46 Section 3.3.4 last sentence before eq 3.3.9 "...or will impose some additional independence restriction TRUE that is not implied by the DAG."

eq. 3.3.9 missing the factor p(h)

pg 55 ex 26 is a repeat of ex 7.

pg 67 ex 32 ii) "is is" instead of "is it"

pg 73 last sentence before remark 3: "...and non-normalised potentials when passing MASSING messages..."

pg 167 eqns 8.1.15 and 8.1.16 are using subscripts to number the data points every where else in this section is using superscripts.

Stavros Korokithakis — 08 October 2009, 21:27

Exercise 3 of chapter one has a small inconsistency in the wording: "Box 1 contains three red and 5 white balls". Either "three" and "five" or "3 and 5".

Timothy Scully — 08 October 2009, 19:24

Using the code orderpot.m has a space after the m in the zip file. This means that matlab doesn't recognize it. Tested on linux, not sure if it ignores it on Windows or not.

Stavros Korokithakis — 08 October 2009, 17:55

Generally, Inspector Clouseau would be more likely to infer that the cat was the murderer. He would be more suitable for comic relief than probabilistic reasoning, methinks.

y doron — 07 October 2009, 14:38
Pedantry: p.24 It's 'Monty Hall' or 'Monte Halperin' not 'Monte Hall'.

Jurgen Van Gael — 03 October 2009, 09:37
p223: NB (as Naive Bayes shortcut) hasn't been introduced before.
p223: 2nd line below 10.2.1 has a closing bracket to many
p229: last sentence "comes from the using" -> "comes from using"

p230, eq 10.4.2 and following: would it make sense to call the parents of i "pa(i)" instead of "j(i)"? p230, eq 10.4.3 and following: summation over D (as in 10.4.2 and Algorithm 5) instead of over N? I agree -- thanks for the suggestion db

Daniel Nee — 26 August 2009, 15:03

Version 05/08/09. Page 172, equation (7.6.33). Tiny mistake on ML solution for Sigma. Think the derivative should be -(1/2)M + (N/2)Sigma

MLover — 16 July 2009, 13:43

This is an extremely well written book! Wondering whether the author has used metapost to produce some of the figures in this book. If it is the case, could you kindly share the original code, so that others can reuse them for academic purpose. Thanks [db : thanks for the feedback. I plan to release all the figures and tikz code for them in the next few weeks.]

D Unwin — 15 May 2009, 14:32

Section 22.1.1 4th para: 'A numerical approximation is begin with some vector...'

Section 22.1.2 1st para: '... how might me cluster them...'

A. Papangelis — 19 March 2009, 11:45

(*)p. 432, figure 23.5 (a) - Shouldn't you have a factor for h1 alone?

tomas m — 18 March 2009, 23:03

(*)p. 91 End of example 91, backtracking. Should have c*=argmax(b=f|c)p(c)

tomas m — 18 March 2009, 22:54

(***)p. 81 definition 28 last sentence. Do you mean that : a DAG (not undirected graph) can be multiply connected but acyclic ?

p.86 missing 'compute' before equation p(a,b,c) in the middle of the page

p.89 second bullet pt. on top. Marginal variables X_{f} were denoted X^{f} in Definition29 on previous page

tomas m — 18 March 2009, 21:52

(**)v170309 : p.49 after definition... 'We may write this in... (typo)

p.51 3rd line 'two arrows do not...'

tomas m — 18 March 2009, 21:36

(****)v170309 : p.40 In 'Making a model'.. S=1 means she has forgotten (typo) p.45 last sentence, 'could' should come after the vector p.46 2 lines after Jeffrey's rule, should a 'new joint distribution' be p2(x,y|y~) ? p2(x|y) looks like the old conditional

tomas m — 18 March 2009, 21:17

(*)v170309 : p.35 between code fragments, a typo (use)

tomas m — 18 March 2009, 21:06

(*)v170309 : p.24, there's a typo in the equation after the clouseau example(knife)

Hersh Asthana — 18 March 2009, 18:18

(*)v170309 – page 119 – Typo. It should be fig(5.2) not fig(5.2.2)

Hersh Asthana — 18 March 2009, 18:18

(*)v170309 – page 117 – Definition 32 – I think diagram is missing a star just before (V)?

db: I think the diagram is OK, but I need to make it clearer that W is absorbing from V through S

Hersh Asthana — 18 March 2009, 18:17

(**)v170309 – page 117 – Definition 31 says that Clique Graph and Junction Graph are synonymous. But on page 122 Definition 36 it is defined as different from the Clique Graph

Hersh Asthana — 18 March 2009, 18:17

(*)v170309 - Nitpicking – page 118 – Absorption and Marginal Consistency section: JTA should be Junction Tree Algorithm (JTA) as this is the first time this acronym is used.

Malcolm Reynolds — 14 March 2009, 20:16

(**)Page 200, halfway down - when you factorise the posterior, surely the term p(theta_c|V_c) should actually be p(theta_c|V) because you need all the a and s data to evaluate the conditional prob of c?

Malcolm Reynolds — 14 March 2009, 19:59

(**)Just noticed, still on page 198's definition of c - (1-theta) should be raised to the power of Nt, not 1-Nt.

Malcolm Reynolds — 14 March 2009, 19:56

(*)Page 198 - the definition of the normaliser c is an integral. This integral needs a "dtheta" although I guess it's pretty obvious from the context what is being integrated over.

Malcolm Reynolds — 14 March 2009, 19:50

(**)In example 29, the differentials by theta1 and theta2 seem incorrect. Surely the exponentials in the numerators should be exp(-theta1^2 - theta2^2(x2^n-x3^n)^2), as differentiating never changes the exponent? I think the minus sign in "- 2 theta2 I[x_1^n = 0] ..." should be a plus as well.

Malcolm Reynolds — 14 March 2009, 16:34

(**)The equation below the red "discount factor" text, section 6.5: Although it doesn't make any difference to the limit, I believe the fraction on the right should be (1-gamma^T)/(1-gamma), rather than (1-gamma^{T+1})/(1-gamma).

Malcolm Reynolds — 14 March 2009, 16:09

Still on Figure 6.6, since knowledge about the hidden state is uncertain, maybe each decision should receive a directed link from all previous observed variables? ie, v1->d1, v1->d2, v2->d2, v1->d3, v2->d3, v3->d3.

db : These links are implicitly assumed under the no-forgetting principle (otherwise influence diagrams get really messy). I'll point this out again though in the text.

Malcolm Reynolds — 14 March 2009, 16:06

Figure 6.6: When compared with figure 6.5, surely directed links are needed from v1 to d1, v2 to d2 etc, so that decisions are made in the right order and with all the knowledge currently available?

Malcolm Reynolds — 14 March 2009, 15:22

In the second "Should I do a PhD" example, for the equation U(S|P,E) shouldn't the probability of I be "P(I|S,P,E)" instead of "P(I|S,P)"?

db : I think P(I|S,P) is OK. I'll try to make it clearer that this example is not an extension of the previous scenario, just different. If I include a dependency on E as well writing out the tables is too complex.

db — 09 March 2009, 20:49

(*)090309: missing section reference on page 159

Malcolm Reynolds — 09 March 2009, 15:17

(*)Figure 5.2 - the arrow for absorption 6 should point from D to A, not A to D.

Malcolm Reynolds — 08 March 2009, 22:17

(*)Definition 30 - if x^* holds the maximal state, surely its equation should use argmax and not max?

Malcolm Reynolds — 08 March 2009, 18:24

(**)Section 3.5 - "A cliquo matrix containing only two cliques .." is somewhat ambiguous - it could be read as meaning either "there are two cliques" (as in C above) or "each clique contains two nodes" (as in C_inc below), with the latter being what I presume it's supposed to mean.

Malcolm Reynolds — 08 March 2009, 18:12

(*)Section 3.4.1, second bullet point - presumably "variables or factors" instead of "variables or functions"?

Malcolm Reynolds — 08 March 2009, 15:34

(*)Section 3.2.1 - "Consider the Markov Network in fig(3.1)" - probably a good idea to specify that it's the network in fig (3.1) (a) but this is fairly obvious in context of what follows..

Malcolm Reynolds — 08 March 2009, 15:33

(*)Section 3.2.1 - "Phi(1,2,3) = phi(x_1,x_2,x_4)" probably wants to be "phi(x_1,x_2,x_3)"

Malcolm Reynolds — 08 March 2009, 15:19

(**)Section 2.4.2 - "... and constrasts an observational ('see') inference p(x|do(y)) with a causal ('make' or 'do') inference p(x|y)" - surely the formula need to be switched around here?

Hersh Asthana — 04 March 2009, 17:58

Version 260209 - Some miscellaneous stuff (*)Page 43. Conditional Probability Table is formatted in red. Presumably this indicates the first occurrence/definition of the term. However, CPTs have already been used on page 39 and in the description of figure 2.2

(*)Page 84. Second paragraph. “with factors defined by the 'phi' above...”. What does that mean?

(*)Page 84 Last line. f1(a,b)f2(b,c,d)... Then on the next page, the commas disappear in the joint probability formula.

(*)Page 91. Probability formula shouldn't have a comma as it is not a list but a product.

Hersh Asthana — 04 March 2009, 17:57

(*)Version 260209 – Page 95. Last line. “The algorithm is given in algorithm(11)”. This should be algorithm 1? I think there is a numbering error every time is says “algorithm(nn)”.

Hersh Asthana — 04 March 2009, 17:57

(*)Version 260209 – Pg 83. Term “Markov Chain” is used on this page and on many other pages, but it is never defined. The closest is on page pg 397.

Hersh Asthana — 04 March 2009, 17:56

(**)Version 260209 – Page 68. Description of figure 3.4: It says “Summing over the state of variable in this DAG...”. Should this be “Summing over variable d”? Also variables in the figure are all in lowercase, but the description of the figure uses both lowercase and uppercase indiscriminately.

Hersh Asthana — 04 March 2009, 17:55

(*)Version 260209 – Pg 49. First paragraph. Fig 2.6(e) is mentioned. Figure 2.6 does not have (e).

db — 03 March 2009, 20:10

(****)The examples in chapter 6 are numerically incorrect.

The expected utilities are (section 6.3.2) PhD scenario: education=do phd 260174.000000 education=no phd 240000.024400

and for the section following: Startup scenario: education=do phd 260186.500000 education=no phd 310000.018650

Thanks to Yi-Ming Lu.

dan b — 26 February 2009, 21:30

(*)typo, p314 "we desires a function" shld be "we desire a function"

Kwabena Anin — 11 February 2009, 20:03

(**)Also regarding diagram 6.1 you have the utility of no rain and no at party as -50, but this should be 50.

Kwabena Anin — 11 February 2009, 19:47

(*)In section 6.3, you have in the second paragraph "in the above example, the BN trivially consists of a single node".

This is wrong, there are many nodes but a single utility node. I think it should say in the above example, the BN trivially consists of a single utility node" instead.

Kwabena Anin — 27 January 2009, 16:42

(**)In example 31, the p(c=1|a=1,s=o, V)is not 2/3 but it is a half.

Similarly, p(c=1|a=0,s=1, V) is again not 2/3, but 1/2.

I think you hsve the two the wrong way round.

Kwabena Anin — 27 January 2009, 11:27

(*)In chapter 8.1.1, There is a typo, the costant k is found by the equation:

1/k= 0.0014 (=6.46times 10^{-4} + 7.81times 10^{-4} + 8.19times 10^{-8})

Vryonides — 26 January 2009, 00:12

... which affects the ensuing calcs. I thought b* should be f but need to double check.

Vryonides — 26 January 2009, 00:09

(**)Max prod example 20... calculation gamma(a) -probabilities are wrong p(a=t|b=f)gamma(b=f) = 0.7 * 0.54 = 0.378 should be 0.2 * 0.54 similarly p(a=f|b=f) a few lines down should be 0.8 not 0.3

T Majersky — 16 January 2009, 00:23

(*)In the book and in the demos, demoMaxProd.m at the top has a wrong description (=> %max product)

Daniel Nee — 11 January 2009, 16:21

Sorry should be: U(no party,rain) = -50, U(no party,no rain) = -50

Daniel Nee — 11 January 2009, 16:18
Version: 111208

(*)Page 60. Definition 15 uses the term cliques, but the definition of a clique is not given until definition 16. Would make sense to swap the ordering of the definitions.

(**)Page 133. Stated the utilities U(no party,rain) = &#8722;50, U(no party,no rain) = &#8722;50. Then in calculation of U(no party) you have used -50 and +50.

Neha Murarka — 06 January 2009, 23:57

(*)On the same page 244, I just wanted to add to what I wrote. The heights in the fig14.1b all need to be interchanged. 0.1's needs to be 0.5's, 0.5's needs to be 0.8's and 0.8's needs to be 0.1's.

Neha Murarka — 06 January 2009, 23:52

On page 244, I noticed that in fig 14.1b, the 0.5 should have the height of 0.8 and vice versa. Because in the calculation, the posterior of 0.8 is 0.0001. Also, I noticed that when we normalise these posteriors, the k cancels off and is unecessary. So how is it that here k = 0.0014. It gives me the same answer without using k.

Alexandros Papangelis — 04 January 2009, 10:34

(*)On p. 251, example 30 p(x1 = 0 | x2, x) = 1 - p(x1 = 1 | x2, x3) should be p(x1 = 0 | x2, x3) = 1 - p(x1 = 1 | x2, x3)

Chuck Norris — 30 December 2008, 12:41

On page 122: Then we claim that the e ect of running the JTA is to produce on the cliques, the joint marginals p(a = 1; b; c = 1), p(a = 1; b; c = 1) and p(a = 1; b; c = 1)...

I think one or more p(a = 1; b; c = 1) are missing!

Chuck Norris — 28 December 2008, 12:40

p.116 Example 25: potential(b,e) is missing form text, but is in figure!

Alexandros Papangelis — 27 December 2008, 15:51

(*)p 133 - below the X -> D image, I think you need a fullstop at the sentence between "X" and "As"

(*)p 134 - "We begin by writing those variables X0 whose states known (evidential variables)." I think you are missing an "are"

(*)p 138 - u(x(t + 1) = i), x(t) = j, d(t) = k) the parentheses.

(*)p 144 - "A very simple procedure is just to iteration equation (6.5.1) until convergence,"

(*)"As discussed in section(6.4.3), the optimal decision depends on all the history of a part decisions and observations."

(*)p 146 - "where this depends only on messages compute locally."

Chuck Norris — 27 December 2008, 13:27

Figure 5.4: A Junction Tree. This clique graph satis

es the running intersection property that for any

two nodes which contain a variable a, the path linking the two nodes also contains the variable a.

Path from node cdf to dce contains variable d nowhere!

T Majersky — 15 December 2008, 11:54

Also about the included matlab code in the book. Perhaps would be much more readable if it was formatted with some simple colour segmentation - at least green for comments and bold for loop control statements ?

T Majersky — 15 December 2008, 11:50

v 111208 :

(***)Sect.15.1.1. 5th sentence starts with it. Also next sentence 'they' should be 'he/she'. Last word on the page 'computational' should be 'computationally'.

(*)Sect. 15.1.3. after marginal likelihood equation a typo - tere instead of there, also 2x techniques in the sentence.

(*)Sect. 15.2.2. 3rd sentence 'a tables' should be 'the table' ? Section Shared Parameters... '..need to identify..'

(*)Sect.15.2.3. 2nd sentence 'distribution over all...'

Hersh Asthana — 25 November 2008, 22:25

(*)Version 191108. Pg. 89. Bucket Elimination algorithm. Second line of while loop is missing a "from" as the third last word of the sentence.

Hersh Asthana — 25 November 2008, 22:21

(*)Version 191108. Pg. 73. Excercise 27. Typo: x condind y | (x, u) should be x condind y | (z, u)

Hersh Asthana — 25 November 2008, 22:19

(*)Version 191108. Pg. 67. Description of Figure 3.5. Last sentence seems to imply that chain components (a,e,d,f)... are for Figure 3.5 (b).

Hersh Asthana — 25 November 2008, 22:13

(*)Version 191108. Pg. 45. Footnote reads 'see www....d-sep.html' for details and some nice demos. The demo applet links on this page are broken and demos cannot be run. Maybe you'd like to remove the link or reword the footnote?

Hersh Asthana — 25 November 2008, 22:09

(*)Version 191108. Pg. 35. Last line of last paragraph (just above 'Inference') reads 'prior belief that the sprinkler ...'. This is redundant as the information is already given in the second sentence of the paragraph ' ... p(S) = (0.1,0.9).'

Hersh Asthana — 25 November 2008, 22:05

(*)Version 191108. Pg. 31. Nitpicking but why the hyphen in East-Berlin?

Hersh Asthana — 25 November 2008, 22:02

(*)Version 191108. Pg. 27 Function orderpotfields is repeated

Hersh Asthana — 25 November 2008, 22:00

(**)Sorry trying again... ignore previous post: Version 191108. Pg. 15 Third line reads 0.88, 0.04 and 0.08 respectively. Methinks it should be 0.88, 0.08 and 0.04 respectively. Also the second and third rows of vector immediately below should be swapped.

Hersh Asthana — 25 November 2008, 21:59

Version 191108. Pg. 15 Third line reads 0.88, 0.4 and 0.08 respectively. Methinks it should be 0.88, 0.8 and 0.04 respectively. Also the second and third rows of vector immediately below should be swapped.

Kwabena Anin — 18 November 2008, 15:14

(*)On page 80 in example 22, the end nodes of fig 4.1a are a,b and e, not e,b and c as in the notes.

Kwabena Anin — 11 November 2008, 17:18

(*)Further to my comment below, I still think that the distribution of fig. 4.5 is wrong.

Dominik Beste — 11 November 2008, 10:24

(*)In example 19 for page 65 the probabilities (0:56; 0:32; 0:12) should be for 6 time steps not 5. You have it as for 5 but should be for 6 time steps.

Kwabena Anin — 11 November 2008, 10:18

(*)I could be wrong but I think that the distribution of the fig. 4.5 is wrong. I think that it should be p(a|b)p(b|c,d)p(c)p(d|e)p(e). It seems you have got the d and e's mixed round. Also I think the resultant fig 4.5 is incorrect, should f5 not be attached to vertex e?

Alexandros Papangelis — 04 November 2008, 18:45

On p87, top: (*)"To see this, we observe that when W absorbs from V though the separator S only the potentials of W and S are changed."

On p88, top: (*)At the Square of Sums, the third Sum should be over x2, not 2

On p95 at the Remark: (*)"In the case that (discrete) variables have different numbers of states, a more re fined versions is..."

Malcolm Reynolds — 04 November 2008, 15:21

(**)The matrix just after the heading "3.5 Describing Neighbours" does not accurately represent the given adjacency list - the list should have (1,3) and (3,1) added if it is to match the matrix.

Neha Murarka — 04 November 2008, 14:32

(*)On pg.31 (small error), The following sentence is missing I think a 'be'. 'Once the graphical structure is defi ned, the entries of the conditional probability tables (CPTs) p(xi|pa (xi)) can expressed.'

Daniel Nee — 04 November 2008, 14:11

(*)Page 49 - Markov Networks - Markov properties. Second line of equation p(4|1,2,3,5,6,7) is repetition.

(*)Page 51 - Markov Networks - Markov properties. Figure 3.2 (c) caption is incorrect. The distribution should be p_c(x_1,x_2,x_3,x_4,x_5,x_6) = phi(x_1,x_2,x_4)*phi(x_2,x_3,x_4)*phi(x_3,x_5)*phi(x_3,x_6)/Z_c.

(*)Page 52 - Markov Networks - Figure 3.4 caption. Refers to image as Left: ... Middle:..., should be Left:..., Right:... or (a): ..., (b) ...

(*)Page 53 - Markov Networks - Expressiveness of Graphical Models. Definition 22. "Due to Inverse Modus Ponens, exercise (5)", I can't seem to find any exercise that relates to this.

Neha Murarka — 04 November 2008, 13:52

On page 31 in definition 10, it says that p(A,B|C) does not equal p(A|C)p(B|C). In the diagram though it says p(C|A,B) because the arrows go from A and B to C. Maybe I have understood the definition wrong. I'm not too sure. [I think the text is OK, and you misunderstand the meaning ;-) db]

Alexandros Papangelis — 04 November 2008, 13:22

Sorry, p53 is probably correct...

Alexandros Papangelis — 04 November 2008, 13:19

On p47, top : (*)"Reasoning in this system them corresponds to performing probabilistic inference."

On p49, above 3.2.1: (*)"For the case in which are clique potentials are strictly positive..."

On p50, Def 19: (*)"For a disjoint subsets of variables, ..."

On p53, bottom: (*)"The Hammersley-Cli ord theorem (below) speci es what the functional form of the any such joint distribution ..."

On p54, 3.2.6: (*)"A fundamental question is : if a distribution is positive and and satisfies"

below (3.2.1): (*)"over the variables in the clique c. equation (3.2.1) is equivalent" 'equation' to 'Equation' ?

On p56, top: (*)"...is determined implicity..."

On p64, ex18: (*)In the double Sums, shouldn't p(b|v) be p(b|c) ?

On p65, ex19, middle: (*)"Since the graph of the distribution is a simple chain, we can easily distribution the summation over terms."

On p76, bottom: "f(x1, x2, x2, x4) = &#966;(x1, x2)&#966;(x2, x3)&#966;(x3, x4)" [db : Not quite sure what you mean is wrong here]

Malcolm Reynolds — 04 November 2008, 13:08

(**)Sorry, previous post should have been "0.3 * 0.3 = 0.9 is wrong"

Malcolm Reynolds — 04 November 2008, 12:56

(*)Equation 4.5.6 - not all of the b's have been substituted for x_1 which invalidates the given factorisation into phi functions.

Example 22 - 0.3 * 0.3 = 9 is wrong, which invalidates the rest of the reasoning.

db — 04 November 2008, 00:28

yes, I agree - thanks

Neha Murarka — 04 November 2008, 00:09

On page 16, shouldn't there be a sum(over m) for the rhs of sum(over m)p(B,m|K) = p(B,m,K)/p(K). Therefore sum(over m)[p(B,m,K)/p(K)].

Alexandros Papangelis — 03 November 2008, 21:09

(*)On p40, 2nd paragraph:

"... in the structure of the graph marginalised graph ..."

and on p41, 1st paragraph:

"According to the table for males, the answer is no, since more males recovered when then were not given the drug than when they were."

The 'then' should be 'they' i think

(*)!!!!!Alexandros Papangelis — 03 November 2008, 20:26

On p37, Figure 2.6, there is no (e)...

Alexandros Papangelis — 03 November 2008, 20:24

(*)On p36, 2.3.1:

"In g(2.6)(c) that they are dependent"

I think 'that' is wrong

Alexandros Papangelis — 03 November 2008, 16:39

(*)On p31, last paragraph:

"Once the graphical structure is defined, the entries of the conditional probability tables (CPTs) p(xi | pa (xi)) can expressed."

I think you are missing a 'be'

and 4 lines up:

(*)"It might be that there are additional CI independence statements ..."

I think independence shouldn't be there (if CI means Conditional Independence)

Alexandros Papangelis — 03 November 2008, 12:48

(*)On p22, Notes you have a double 'the':

"The interpretation of probability is highly contentious, as is the the Bayesian viewpoint."

Alexandros Papangelis — 03 November 2008, 12:12

(*)On p18 :

"This example is interesting since we are not required to make a full probabilistic model in this case since, thanks to the limiting nature of the probabilities."

the second 'since' ?

Alexandros Papangelis — 03 November 2008, 11:52

(*)And another one on p17:

"Hence, our complete model is ... where are the terms on the right are explicitly defined"

Isn't the first 'are' wrong?

Alexandros Papangelis — 03 November 2008, 11:05

(*)On page 14, paragraph 5, you say "... we cannot say much about this unless we had many carried out many experiments with this new coin."

Shouldn't it be "... we cannot say much about this unless we had carried out many experiments with this new coin." ?

Umesh Telang — 31 October 2008, 23:36

(*)Shouldn't the generative model referred to in the first line on p 14 (30 Oct 2008), be p(D|theta) and not p (theta|D)?