STA 141AFall 2019Homework 4Due: December 5 (Thursday), 11.59 pmSubmit the assignment electronically through Canvas. Electronic submissionmust be in the form of a zip folder (with extension .zip, .7z, etc.) containingtwo files: (i) your answers (.pdf file); (ii) R codes used (.R file). Alternatively,the assignment can be submitted in the form of a R Markdown file (.Rmd).Honor Code: “The codes and results derived by using these codes constitute myown work. I have consulted the following resources regarding this assignment:”(ADD: names of persons or web resources, if any, excluding the instructor, TAs,and materials posted on Canvas.)Problem Statement: The goal is to compare k-means clustering and hierarchical clustering methods,in a real-data clustering problem.1. The data-set customers_data.csv contains eight variables measured on 440 instances:(a) CHANNEL:customer’s Channel - Horeca (Hotel/Restaurant/Cafe’) or Retail channel;(b) REGION: customer’s Region - Lisbon, Oporto or Other.(c) FRESH: annual spending (in US dollars) on fresh products;(d) MILK: annual spending (in US dollars) on milk products;(e) GROCERY: annual spending (in US dollars) on grocery products;(f) FROZEN: annual spending (in US dollars) on frozen products;(g) DETERGENTS_PAPER: annual spending (in US dollars) on detergents and paper products;(h) DELICATESSEN: annual spending (in US dollars) on delicatessen products.• Import customers_data.csv as a data frame with header and make a summaryof its variables. (5 points)• Extract the variables FRESH and FROZEN and store them in a separate data frame,called customers_2. (5 points)• From this new data frame, provide a scatter-plot matrix of FRESH and FROZEN via thefunction ggpairs from the package GGally代写STA 141A作业、代做R编程设计作业、代写Canvas留学生作业、代做R课程设计作业 代写Python程序|代写. (5 points)2. Estimate a k-means clustering partition on FRESH and FROZEN. For k = 1, . . . , 10, run 100times the following procedure: (5 points)1• draw randomly the 80% of observations in customer_2 and use them as a trainingdata-set. The remaining observations will constitute the test data-set; (5 points)• run the function kmeans on the training data-set with k centers, 20 random starts and100 maximum iterations; (5 points)• use the estimated centers to allocate the observations of the test data-set to a specificgroup and derive the relative vector of assignments; (10 points)• calculate the deviance within estimated groups in the test data-set. (10 points)Then, for each k, average the deviance within groups over the 100 runs. (5 points)Finally:• plot this average over the number of clusters and decide the optimal number of clustersusing the elbow criterion; (5 points)• re-apply kmeans with the selected number of clusters, 20 random starts and 100 maximumiterations, and derive the estimated cluster memberships; (5 points)• provide a scatter-plot matrix of FRESH and FROZEN conditional on the estimated clustermemberships via the function ggpairs from the package GGally and commentabout the shape of FRESH and FROZEN over groups. (5+5 points)3. • Estimate a hierachical partition on FRESH and FROZEN by the complete linkage usingthe number of clusters selected above. (5 points)• Derive the estimated cluster memberships. (5 points)• Provide a scatter-plot matrix of FRESH and FROZEN conditional on the estimated clustermemberships via the function ggpairs from the package GGally.(5 points)• Comment about the shape of FRESH and FROZEN over the estimated groups and comparethis outcome to the k-means one. (5+5 points)2转自:http://www.6daixie.com/contents/18/4506.html
讲解:STA 141A、R、Canvas、RPython|R
©著作权归作者所有,转载或内容合作请联系作者
- 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
- 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
- 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
推荐阅读更多精彩内容
- 本文转载自知乎 作者:季子乌 笔记版权归笔记作者所有 其中英文语句取自:英语流利说-懂你英语 ——————————...
- 本文转载自知乎 作者:季子乌 笔记版权归笔记作者所有 其中英文语句取自:英语流利说-懂你英语 ——————————...
- 周国平曾在《婚姻中没有天堂》中写道:“结婚是一个信号,表明两个人如胶似漆仿佛融成了一体的热恋有它的极限,然后就要降...