StudentNumberSemester 2 Assessment, 2019School of Mathematics and StatisticsMAST90083 Computational Statistics and Data MiningWriting time: 3 hoursReading time: 15 minutesThis is NOT an open book examThis paper consists of 3 pages (including this page)Authorised Materials• Mobile phones, smart watches and internet or communication devices are forbidden.• No handwritten or print materials may be brought into the exam venue.• This is a closed book exam.• No calculators of any kind may be brought into the examination.Instructions to Students• You must NOT remove this question paper at the conclusion of the examination.Instructions to Invigilators• Students must NOT remove this question paper at the conclusion of the examination.This paper must NOT be held in the Baillieu LibraryMAST90083 Semester 2, 2019Question 1 Suppose we have a model p(x, z | θ) where x is the observed dataset and z are thelatent variables.(a) Suppose that q(z) is a distribution over z. Explain why the followingF(q, θ) = Eq [log p(x, z | θ) − log q(z)]is a lower bound on log p(x | θ).(b) Show that F(q; θ) can be decomposed as followsF(q, θ) = −KL(q(z) || p(z|x, θ)) + log p(x | θ)where for any two distributions p and q, KL(q||p) = −Eq log p(z)q(z)is the Kullback-Leibler(KL) divergence.(c) Describe the EM algorithm in terms of F(q, θ).(d) Note that the KL divergence is always non-negative. Furthermore, it is zero if and only ifp = q. Conclude the optimal q that maximises F isMAST90083代做、代写Data Mining、代写R设 p(z | x, θ).[10 + 10 + 5 + 5 = 30 marks]Question 2 Let {(xi, yi)}ni=1 be our dataset, with xi ∈ Rp and yi ∈ R. Classic linear regressioncan be posed a empirical risk minimisation, where the model is to predict y using a class offunctions f(x) = wT x, parametrised by vector w ∈ Rp using the squared loss, i.e. we minimise(a) Show that the optimal parameter vector iswˆn = (XT X)−1XT Ywhere X is n × p matrix, with i-th row given by xTiand Y is a n × 1 column vector withi-th entry yi(b) Consider regularising the empirical risk by incorporating an l2 penalty. That is, find wminimising.Show that the optimal parameter is given by the ridge regression estimatorwˆridgen = (XT X + λI)−1XT Y.(c) Suppose we now wish to introduce nonlinearities into the model, by transforming x toφ(x). Let Φ be a matrix with i-th row given by φ(xi)T.(i) Show the optimal parameters would be given bywˆkerneln = (ΦT Φ + λI)−1ΦT Y(ii) Express the predicted y values on the training set, Φ ˆwkernel n, only in terms of y andthe Gram matrix K = ΦΦT, with Kij = φ(xi)T φ(xj ) = k(xi, xj ), where k is somekernel function. (This is known as the kernel trick.) Hint: You will find the followingmatrix inversion formula useful:Page 2 of 3 pagesMAST90083 Semester 2, 2019(iii) Compute an expression for the value of y∗ predicted by the model at an unseen testvector x∗.[5+5+5+10+5 = 30 marks]Total marks = 60End of ExamPage 3 of 3 pages转自:http://www.3daixie.com/contents/11/3444.html
讲解:MAST90083、Data Mining、R、RR| Statistics、、
©著作权归作者所有,转载或内容合作请联系作者
- 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
- 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
- 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
推荐阅读更多精彩内容
- Model Predictive Control of Underactuated Bipedal Robotic...