Data Mining with Rattle and R

Data Mining with Rattle and R pdf epub mobi txt 电子书 下载 2025

出版者:Springer
作者:Graham Williams
出品人:
页数:396
译者:
出版时间:2011-8-4
价格:GBP 49.99
装帧:Paperback
isbn号码:9781441998897
丛书系列:
图书标签:
  • R
  • 数据挖掘
  • Rattle
  • Programming
  • Mining
  • 计算机科学
  • 计算机技术
  • 方法论
  • 数据挖掘
  • R语言
  • Rattle
  • 机器学习
  • 统计学习
  • 数据分析
  • 商业智能
  • 数据科学
  • 可视化
  • 预测建模
想要找书就要到 图书目录大全
立刻按 ctrl+D收藏本页
你会得到大惊喜!!

具体描述

Data mining is the art and science of intelligent data analysis. By building knowledge from information, data mining adds considerable value to the ever increasing stores of electronic data that abound today. In performing data mining many decisions need to be made regarding the choice of methodology, the choice of data, the choice of tools, and the choice of algorithms.

Throughout this book the reader is introduced to the basic concepts and some of the more popular algorithms of data mining. With a focus on the hands-on end-to-end process for data mining, Williams guides the reader through various capabilities of the easy to use, free, and open source Rattle Data Mining Software built on the sophisticated R Statistical Software. The focus on doing data mining rather than just reading about data mining is refreshing.

The book covers data understanding, data preparation, data refinement, model building, model evaluation, and practical deployment. The reader will learn to rapidly deliver a data mining project using software easily installed for free from the Internet. Coupling Rattle with R delivers a very sophisticated data mining environment with all the power, and more, of the many commercial offerings.

作者简介

Dr Graham Williams is Senior Director of Analytics with the Australian Taxation Office, and previously Principal Computer Scientist for Data Mining with CSIRO. He is also Visiting Professor and Senior International Scientist with the Shenzhen Institutes of Advanced Analytics of the Chinese Academy of Sciences, Adjunct Professor, Data Mining, Fraud Prevention, Security, University of Canberra, and Adjunct Professor, Australian National University. Graham regularly teaches data mining courses and is author of the freely available, open source data mining system, Rattle. He has been involved in many data mining projects for clients from government and industry over his long career. His research developments included ensemble learning (1980's) and hot spots discovery (1990's). He is actively involved in the international artificial intelligence and data mining research communities, particularly as chair of the Pacific Asia Knowledge Discovery and Data Mining conference series and founder and co-chair of the Australasian Data Mining conference series. Graham has editted a number of books and authored many academic and industry papers and reports. His current focus is on making data mining technology readily accessible, ensuring research, innovation and discovery are repeatable and available, and encouraging the free and open sharing of knowledge.

目录信息

Contents
Preface vii
I Explorations 1
1 Introduction 3
1.1 Data Mining Beginnings . . . . . . . . . . . . . . . . . . . 5
1.2 The Data Mining Team . . . . . . . . . . . . . . . . . . . 5
1.3 Agile Data Mining . . . . . . . . . . . . . . . . . . . . . . 6
1.4 The Data Mining Process . . . . . . . . . . . . . . . . . . 7
1.5 A Typical Journey . . . . . . . . . . . . . . . . . . . . . . 8
1.6 Insights for Data Mining . . . . . . . . . . . . . . . . . . . 9
1.7 Documenting Data Mining . . . . . . . . . . . . . . . . . . 10
1.8 Tools for Data Mining: R . . . . . . . . . . . . . . . . . . 10
1.9 Tools for Data Mining: Rattle . . . . . . . . . . . . . . . . 11
1.10 Why R and Rattle? . . . . . . . . . . . . . . . . . . . . . . 13
1.11 Privacy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.12 Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2 Getting Started 21
2.1 Starting R . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.2 Quitting Rattle and R . . . . . . . . . . . . . . . . . . . . 24
2.3 First Contact . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.4 Loading a Dataset . . . . . . . . . . . . . . . . . . . . . . 26
2.5 Building a Model . . . . . . . . . . . . . . . . . . . . . . . 28
2.6 Understanding Our Data . . . . . . . . . . . . . . . . . . . 31
2.7 Evaluating the Model: Confusion Matrix . . . . . . . . . . 35
2.8 Interacting with Rattle . . . . . . . . . . . . . . . . . . . . 39
2.9 Interacting with R . . . . . . . . . . . . . . . . . . . . . . 43
2.10 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
2.11 Command Summary . . . . . . . . . . . . . . . . . . . . . 55
3 Working with Data 57
3.1 Data Nomenclature . . . . . . . . . . . . . . . . . . . . . . 58
3.2 Sourcing Data for Mining . . . . . . . . . . . . . . . . . . 61
3.3 Data Quality . . . . . . . . . . . . . . . . . . . . . . . . . 62
3.4 Data Matching . . . . . . . . . . . . . . . . . . . . . . . . 63
3.5 Data Warehousing . . . . . . . . . . . . . . . . . . . . . . 65
3.6 Interacting with Data Using R . . . . . . . . . . . . . . . 68
3.7 Documenting the Data . . . . . . . . . . . . . . . . . . . . 71
3.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
3.9 Command Summary . . . . . . . . . . . . . . . . . . . . . 74
4 Loading Data 75
4.1 CSV Data . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
4.2 ARFF Data . . . . . . . . . . . . . . . . . . . . . . . . . . 82
4.3 ODBC Sourced Data . . . . . . . . . . . . . . . . . . . . . 84
4.4 R Dataset|Other Data Sources . . . . . . . . . . . . . . 87
4.5 R Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
4.6 Library . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
4.7 Data Options . . . . . . . . . . . . . . . . . . . . . . . . . 93
4.8 Command Summary . . . . . . . . . . . . . . . . . . . . . 97
5 Exploring Data 99
5.1 Summarising Data . . . . . . . . . . . . . . . . . . . . . . 100
5.1.1 Basic Summaries . . . . . . . . . . . . . . . . . . . 101
5.1.2 Detailed Numeric Summaries . . . . . . . . . . . . 103
5.1.3 Distribution . . . . . . . . . . . . . . . . . . . . . . 105
5.1.4 Skewness . . . . . . . . . . . . . . . . . . . . . . . 105
5.1.5 Kurtosis . . . . . . . . . . . . . . . . . . . . . . . . 106
5.1.6 Missing Values . . . . . . . . . . . . . . . . . . . . 106
5.2 Visualising Distributions . . . . . . . . . . . . . . . . . . . 108
5.2.1 Box Plot . . . . . . . . . . . . . . . . . . . . . . . 110
5.2.2 Histogram . . . . . . . . . . . . . . . . . . . . . . . 114
5.2.3 Cumulative Distribution Plot . . . . . . . . . . . . 116
5.2.4 Benford's Law . . . . . . . . . . . . . . . . . . . . 119
5.2.5 Bar Plot . . . . . . . . . . . . . . . . . . . . . . . . 120
5.2.6 Dot Plot . . . . . . . . . . . . . . . . . . . . . . . . 121
5.2.7 Mosaic Plot . . . . . . . . . . . . . . . . . . . . . . 122
5.2.8 Pairs and Scatter Plots . . . . . . . . . . . . . . . 123
5.2.9 Plots with Groups . . . . . . . . . . . . . . . . . . 127
5.3 Correlation Analysis . . . . . . . . . . . . . . . . . . . . . 128
5.3.1 Correlation Plot . . . . . . . . . . . . . . . . . . . 128
5.3.2 Missing Value Correlations . . . . . . . . . . . . . 132
5.3.3 Hierarchical Correlation . . . . . . . . . . . . . . . 133
5.4 Command Summary . . . . . . . . . . . . . . . . . . . . . 135
6 Interactive Graphics 137
6.1 Latticist . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
6.2 GGobi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
6.3 Command Summary . . . . . . . . . . . . . . . . . . . . . 148
7 Transforming Data 149
7.1 Data Issues . . . . . . . . . . . . . . . . . . . . . . . . . . 149
7.2 Transforming Data . . . . . . . . . . . . . . . . . . . . . . 153
7.3 Rescaling Data . . . . . . . . . . . . . . . . . . . . . . . . 154
7.4 Imputation . . . . . . . . . . . . . . . . . . . . . . . . . . 161
7.5 Recoding . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
7.6 Cleanup . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
7.7 Command Summary . . . . . . . . . . . . . . . . . . . . . 167
II Building Models 169
8 Descriptive and Predictive Analytics 171
8.1 Model Nomenclature . . . . . . . . . . . . . . . . . . . . . 172
8.2 A Framework for Modelling . . . . . . . . . . . . . . . . . 172
8.3 Descriptive Analytics . . . . . . . . . . . . . . . . . . . . . 175
8.4 Predictive Analytics . . . . . . . . . . . . . . . . . . . . . 175
8.5 Model Builders . . . . . . . . . . . . . . . . . . . . . . . . 176
9 Cluster Analysis 179
9.1 Knowledge Representation . . . . . . . . . . . . . . . . . . 180
9.2 Search Heuristic . . . . . . . . . . . . . . . . . . . . . . . 181
9.3 Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
9.4 Tutorial Example . . . . . . . . . . . . . . . . . . . . . . . 185
9.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
9.6 Command Summary . . . . . . . . . . . . . . . . . . . . . 191
10 Association Analysis 193
10.1 Knowledge Representation . . . . . . . . . . . . . . . . . . 194
10.2 Search Heuristic . . . . . . . . . . . . . . . . . . . . . . . 195
10.3 Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
10.4 Tutorial Example . . . . . . . . . . . . . . . . . . . . . . . 197
10.5 Command Summary . . . . . . . . . . . . . . . . . . . . . 203
11 Decision Trees 205
11.1 Knowledge Representation . . . . . . . . . . . . . . . . . . 206
11.2 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
11.3 Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . 212
11.4 Tutorial Example . . . . . . . . . . . . . . . . . . . . . . . 215
11.5 Tuning Parameters . . . . . . . . . . . . . . . . . . . . . . 230
11.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
11.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 242
11.8 Command Summary . . . . . . . . . . . . . . . . . . . . . 243
12 Random Forests 245
12.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 246
12.2 Knowledge Representation . . . . . . . . . . . . . . . . . . 247
12.3 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 248
12.4 Tutorial Example . . . . . . . . . . . . . . . . . . . . . . . 249
12.5 Tuning Parameters . . . . . . . . . . . . . . . . . . . . . . 261
12.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . 264
12.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 267
12.8 Command Summary . . . . . . . . . . . . . . . . . . . . . 268
13 Boosting 269
13.1 Knowledge Representation . . . . . . . . . . . . . . . . . . 270
13.2 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 270
13.3 Tutorial Example . . . . . . . . . . . . . . . . . . . . . . . 272
13.4 Tuning Parameters . . . . . . . . . . . . . . . . . . . . . . 285
13.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . 285
13.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 290
13.7 Command Summary . . . . . . . . . . . . . . . . . . . . . 291
14 Support Vector Machines 293
14.1 Knowledge Representation . . . . . . . . . . . . . . . . . . 294
14.2 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 297
14.3 Tutorial Example . . . . . . . . . . . . . . . . . . . . . . . 299
14.4 Tuning Parameters . . . . . . . . . . . . . . . . . . . . . . 302
14.5 Command Summary . . . . . . . . . . . . . . . . . . . . . 304
III Delivering Performance 305
15 Model Performance Evaluation 307
15.1 The Evaluate Tab: Evaluation Datasets . . . . . . . . . . 308
15.2 Measure of Performance . . . . . . . . . . . . . . . . . . . 312
15.3 Confusion Matrix . . . . . . . . . . . . . . . . . . . . . . . 314
15.4 Risk Charts . . . . . . . . . . . . . . . . . . . . . . . . . . 315
15.5 ROC Charts . . . . . . . . . . . . . . . . . . . . . . . . . . 320
15.6 Other Charts . . . . . . . . . . . . . . . . . . . . . . . . . 320
15.7 Scoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321
16 Deployment 323
16.1 Deploying an R Model . . . . . . . . . . . . . . . . . . . . 323
16.2 Converting to PMML . . . . . . . . . . . . . . . . . . . . 325
16.3 Command Summary . . . . . . . . . . . . . . . . . . . . . 327
IV Appendices 329
A Installing Rattle 331
B Sample Datasets 335
B.1 Weather . . . . . . . . . . . . . . . . . . . . . . . . . . . . 336
B.1.1 Obtaining Data . . . . . . . . . . . . . . . . . . . . 336
B.1.2 Data Preprocessing . . . . . . . . . . . . . . . . . . 339
B.1.3 Data Cleaning . . . . . . . . . . . . . . . . . . . . 339
B.1.4 Missing Values . . . . . . . . . . . . . . . . . . . . 341
B.1.5 Data Transforms . . . . . . . . . . . . . . . . . . . 343
B.1.6 Using the Data . . . . . . . . . . . . . . . . . . . . 345
B.2 Audit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347
B.2.1 The Adult Survey Dataset . . . . . . . . . . . . . . 347
B.2.2 From Survey to Audit . . . . . . . . . . . . . . . . 348
B.2.3 Generating Targets . . . . . . . . . . . . . . . . . . 349
B.2.4 Finalising the Data . . . . . . . . . . . . . . . . . . 354
B.2.5 Using the Data . . . . . . . . . . . . . . . . . . . . 354
B.3 Command Summary . . . . . . . . . . . . . . . . . . . . . 354
References 357
Index 365
· · · · · · (收起)

读后感

评分

评分

评分

评分

评分

用户评价

评分

上完uiuc的2门Coursera,当做是看过这本书了,rattle真是懒人福利

评分

neat as a toolkit

评分

使用可视化的Rattle工具讲解了数据挖掘的各个流程,可作为R语言学习的入门教程!

评分

使用可视化的Rattle工具讲解了数据挖掘的各个流程,可作为R语言学习的入门教程!

评分

使用可视化的Rattle工具讲解了数据挖掘的各个流程,可作为R语言学习的入门教程!

本站所有内容均为互联网搜索引擎提供的公开搜索信息,本站不存储任何数据与内容,任何内容与数据均与本站无关,如有需要请联系相关搜索引擎包括但不限于百度google,bing,sogou

© 2025 book.wenda123.org All Rights Reserved. 图书目录大全 版权所有