第1 部分 起步 ............................................................... 1
第1 章 理論 .................................................................. 3
敏捷大數據 ............................................................................................................3
Big Words 定義 ......................................................................................................4
敏捷大數據團隊 .....................................................................................................5
認識機遇和問題 ..............................................................................................6
敏捷大數據流程 ................................................................................................... 11
代碼檢查和結對編程 ...........................................................................................12
敏捷的場所:開發的效率 ....................................................................................13
協作空間 .......................................................................................................14
私人空間 .......................................................................................................14
個人空間 .......................................................................................................14
用大幅打印件明確錶達想法 ................................................................................15
第2 章 數據 ............................................................... 17
電子郵件 ..............................................................................................................17
處理原始數據 ......................................................................................................18
原始的電子郵件 ............................................................................................18
結構化與半結構化數據 .................................................................................18
SQL ......................................................................................................................20
NoSQL .................................................................................................................24
序列化 ...........................................................................................................24
從演變的模式中抽取和展示特徵 ..................................................................25
數據流水綫 ...................................................................................................26
數據透視 ..............................................................................................................27
社交網絡 .......................................................................................................28
時間序列 .......................................................................................................30
自然語言 .......................................................................................................31
概率 ...............................................................................................................33
小結 .....................................................................................................................35
第3 章 敏捷開發工具 ................................................... 37
可擴展性= 簡潔...................................................................................................37
敏捷大數據處理 ...................................................................................................38
設置運行Python 的虛擬環境 ...............................................................................39
使用Avro 對事件進行序列化 ..............................................................................40
在Python 中使用Avro ..................................................................................40
收集數據 ..............................................................................................................42
使用Pig 處理數據................................................................................................44
安裝Pig .........................................................................................................45
使用MongoDB 發布數據 ....................................................................................49
安裝MongoDB ..............................................................................................49
安裝MongoDB 的Java 驅動程序 .................................................................50
安裝mongo-hadoop .......................................................................................50
用Pig 嚮MongoDB 推送數據 .......................................................................50
使用ElasticSearch 搜索數據 ................................................................................52
安裝 ...............................................................................................................52
使用Wonderdog 整閤ElasticSearch 和Pig ...................................................53
對工作流程的反思 ...............................................................................................55
輕量級的Web 應用 ..............................................................................................56
Python 和 Flask .............................................................................................56
展示數據 ..............................................................................................................58
安裝Bootstrap ...............................................................................................58
啓用Bootstrap ...............................................................................................59
使用d3.js 和nvd3.js 可視化數據 ..................................................................63
小結 .....................................................................................................................64
第4 章 在雲端 ............................................................. 65
引言 .....................................................................................................................65
GitHub .................................................................................................................67
dotCloud ...............................................................................................................67
dotCloud Echo 服務 .......................................................................................68
Python 工作者服務 ........................................................................................71
Amazon Web Services ..........................................................................................71
Simple Storage Service ..................................................................................71
Elastic MapReduce ........................................................................................72
MongoDB 即服務 ..........................................................................................79
輔助工具(Instrumentation) ................................................................................81
Google Analytics ...........................................................................................81
Mortar Data ...................................................................................................82
第2 部分 登上金字塔 ................................................... 85
第5 章 收集和展示數據 ............................................... 89
整閤軟件棧 ..........................................................................................................90
收集並序列化收件箱 ...........................................................................................90
處理和發布郵件數據 ...........................................................................................91
在瀏覽器中顯示郵件 ...........................................................................................93
用Flask 和pymongo 處理郵件數據 ..............................................................94
使用Jinja2 渲染HTML5 頁麵 ......................................................................94
敏捷檢查點 ..........................................................................................................98
生成電子郵件清單 ...............................................................................................99
用MongoDB 顯示郵件 .................................................................................99
對數據展示的分析 ...................................................................................... 101
搜索郵件 ............................................................................................................ 106
使用Pig,ElasticSearch 和Wonderdog 構建索引 ....................................... 106
在網頁中搜索郵件數據 ............................................................................... 107
結論 ................................................................................................................... 108
第6 章 使用圖錶可視化數據 ....................................... 111
優秀的圖錶 ........................................................................................................ 112
抽取實體:郵件地址 ......................................................................................... 112
抽取郵件 ..................................................................................................... 112
對時間進行可視化 ............................................................................................. 116
結論 ................................................................................................................... 122
第7 章 利用報錶探索數據 .......................................... 123
為數據添加聯係 ................................................................................................. 126
用TF-IDF 從郵件中提取關鍵字 ........................................................................ 133
小結 ................................................................................................................... 138
第8 章 預測 .............................................................. 141
預測電子郵件的迴復率 ...................................................................................... 142
個性化 ................................................................................................................ 147
小結 ................................................................................................................... 148
第9 章 驅動行動 ........................................................ 149
好郵件的屬性 .................................................................................................... 150
使用樸素貝葉斯方法進行更好的預測 ............................................................... 150
P(Reply | From ∩ To) ........................................................................................ 150
P(Reply | Token) ................................................................................................. 151
實時預測 ............................................................................................................ 153
記錄事件日誌 .................................................................................................... 157
小結 ................................................................................................................... 157
索引 ........................................................................... 159
· · · · · · (
收起)