譯者序 ........................................................................... v
序 ................................................................................xiii
前言 ............................................................................... x
第1 章 認識Apache Hadoop 和Apache HBase ............ 1
分布式文件係統HDFS ..........................................................................................1
HDFS 的數據格式 ...........................................................................................3
處理HDFS 中的數據 ......................................................................................4
Apache HBase ........................................................................................................4
總結 .......................................................................................................................5
參考文獻 ................................................................................................................6
第2 章 用Apache Flume 處理流數據 ............................ 7
我們需要Flume .....................................................................................................7
Flume 是否適閤呢? .............................................................................................9
Flume Agent 內部原理 .........................................................................................10
配置Flume Agent .................................................................................................13
Flume Agent 之間的相互通信 ..............................................................................17
復雜的流 ..............................................................................................................17
復製數據到不同目的地 ........................................................................................20
動態路由 ..............................................................................................................21
Flume 的無數據丟失保證,Channel 和事務 ........................................................22
Flume Channel 中的事務 ...............................................................................23
Agent 失敗和數據丟失 ........................................................................................25
批量的重要性 ......................................................................................................26
重復怎麼樣? ......................................................................................................27
運行Flume Agent .................................................................................................27
總結 .....................................................................................................................29
參考文獻 ..............................................................................................................30
第3 章 源(Source) .................................................. 31
Source 的生命周期 ...............................................................................................31
Sink-to-Source 通信 .............................................................................................33
Avro Source ...................................................................................................34
Thrift Source .................................................................................................37
RPC Sources 的失敗處理 ..............................................................................39
HTTP Source ........................................................................................................40
針對HTTP Source 寫處理程序* ..................................................................42
Spooling Directory Source ....................................................................................47
使用Deserializers 讀取自定義格式* ............................................................50
Spooling Directory Source 性能.....................................................................55
Syslog Source .......................................................................................................55
Exec Source ..........................................................................................................59
JMS Source ..........................................................................................................61
轉換JMS 消息為Flume 事件* .....................................................................63
編寫自定義Source* .............................................................................................65
Event-Driven Source 和Pollable Source ........................................................66
總結 .....................................................................................................................73
參考文獻 ..............................................................................................................73
第4 章 Channel ......................................................... 75
事務工作流 ..........................................................................................................76
Flume 自帶的Channel .........................................................................................78
Memory Channel ...........................................................................................78
File Channel ..................................................................................................80
總結 .....................................................................................................................86
參考文獻 ..............................................................................................................86
第5 章 Sink ............................................................... 87
Sink 的生命周期 ..................................................................................................88
優化Sink 的性能 .................................................................................................89
寫入到HDFS :HDFS Sink ..................................................................................89
理解Bucket ...................................................................................................90
配置HDFS Sink ............................................................................................93
使用序列化器控製數據格式* ..................................................................... 100
HBase Sink ......................................................................................................... 106
用序列化器將Flume 事件轉換成HBase Put 和Increment* ....................... 108
RPC Sink ............................................................................................................ 113
Avro Sink ..................................................................................................... 113
Thrift Sink ................................................................................................... 115
Morphline Solr Sink ........................................................................................... 116
Elastic Search Sink ............................................................................................. 119
自定義數據格式* ....................................................................................... 121
其他Sink :Null Sink、Rolling File Sink 和Logger Sink .................................. 124
編寫自定義Sink* .............................................................................................. 125
總結 ................................................................................................................... 129
參考文獻 ............................................................................................................ 129
第6 章 攔截器、Channel 選擇器、Sink 組和
Sink 處理器 ................................................... 131
攔截器 ................................................................................................................ 131
時間戳攔截器 .............................................................................................. 132
主機攔截器 ................................................................................................. 133
靜態攔截器 ................................................................................................. 133
正則過濾攔截器 .......................................................................................... 134
Morphline 攔截器 ........................................................................................ 135
UUID 攔截器 ............................................................................................... 136
編寫攔截器* ............................................................................................... 137
Channel 選擇器 .................................................................................................. 140
復製Channel 選擇器 ................................................................................... 140
多路復用Channel 選擇器 ........................................................................... 141
自定義Channel 選擇器* ............................................................................ 144
Sink 組和Sink 處理器 ....................................................................................... 146
Load-Balancing Sink 處理器 ....................................................................... 148
Failover Sink 處理器 ................................................................................... 151
總結 ................................................................................................................... 153
參考文獻 ............................................................................................................ 154
第7 章 發送數據到Flume* ....................................... 155
構建Flume 事件 ................................................................................................ 155
Flume 客戶端SDK ............................................................................................. 156
創建Flume RPC 客戶端 .............................................................................. 157
RPC 客戶端接口 ......................................................................................... 157
所有RPC 客戶端的公共配置參數 .............................................................. 158
默認RPC 客戶端......................................................................................... 165
Load-Balancing RPC 客戶端 ....................................................................... 168
Failover RPC 客戶端 ................................................................................... 171
Thrift RPC 客戶端 ....................................................................................... 172
嵌入式Agent ..................................................................................................... 173
配置嵌入式Agent ....................................................................................... 175
log4j Appender ................................................................................................... 180
Load-Balancing log4j Appender ................................................................... 181
總結 ................................................................................................................... 182
參考文獻 ............................................................................................................ 183
第8 章 規劃、部署和監控Flume ............................... 185
規劃一個Flume 部署 ......................................................................................... 185
修復時間 ..................................................................................................... 185
我的Flume Channel 需要多少容量? ......................................................... 186
多少層? ..................................................................................................... 186
通過跨數據中心鏈接發送數據 .................................................................... 188
層分片 ......................................................................................................... 190
部署Flume ......................................................................................................... 191
部署自定義代碼 .......................................................................................... 191
監控Flume ......................................................................................................... 193
從自定義組件報告度量 ............................................................................... 196
總結 ................................................................................................................... 196
參考文獻 ............................................................................................................ 196
索引 ........................................................................... 197
· · · · · · (
收起)