译者序 ........................................................................... v
序 ................................................................................xiii
前言 ............................................................................... x
第1 章 认识Apache Hadoop 和Apache HBase ............ 1
分布式文件系统HDFS ..........................................................................................1
HDFS 的数据格式 ...........................................................................................3
处理HDFS 中的数据 ......................................................................................4
Apache HBase ........................................................................................................4
总结 .......................................................................................................................5
参考文献 ................................................................................................................6
第2 章 用Apache Flume 处理流数据 ............................ 7
我们需要Flume .....................................................................................................7
Flume 是否适合呢? .............................................................................................9
Flume Agent 内部原理 .........................................................................................10
配置Flume Agent .................................................................................................13
Flume Agent 之间的相互通信 ..............................................................................17
复杂的流 ..............................................................................................................17
复制数据到不同目的地 ........................................................................................20
动态路由 ..............................................................................................................21
Flume 的无数据丢失保证,Channel 和事务 ........................................................22
Flume Channel 中的事务 ...............................................................................23
Agent 失败和数据丢失 ........................................................................................25
批量的重要性 ......................................................................................................26
重复怎么样? ......................................................................................................27
运行Flume Agent .................................................................................................27
总结 .....................................................................................................................29
参考文献 ..............................................................................................................30
第3 章 源(Source) .................................................. 31
Source 的生命周期 ...............................................................................................31
Sink-to-Source 通信 .............................................................................................33
Avro Source ...................................................................................................34
Thrift Source .................................................................................................37
RPC Sources 的失败处理 ..............................................................................39
HTTP Source ........................................................................................................40
针对HTTP Source 写处理程序* ..................................................................42
Spooling Directory Source ....................................................................................47
使用Deserializers 读取自定义格式* ............................................................50
Spooling Directory Source 性能.....................................................................55
Syslog Source .......................................................................................................55
Exec Source ..........................................................................................................59
JMS Source ..........................................................................................................61
转换JMS 消息为Flume 事件* .....................................................................63
编写自定义Source* .............................................................................................65
Event-Driven Source 和Pollable Source ........................................................66
总结 .....................................................................................................................73
参考文献 ..............................................................................................................73
第4 章 Channel ......................................................... 75
事务工作流 ..........................................................................................................76
Flume 自带的Channel .........................................................................................78
Memory Channel ...........................................................................................78
File Channel ..................................................................................................80
总结 .....................................................................................................................86
参考文献 ..............................................................................................................86
第5 章 Sink ............................................................... 87
Sink 的生命周期 ..................................................................................................88
优化Sink 的性能 .................................................................................................89
写入到HDFS :HDFS Sink ..................................................................................89
理解Bucket ...................................................................................................90
配置HDFS Sink ............................................................................................93
使用序列化器控制数据格式* ..................................................................... 100
HBase Sink ......................................................................................................... 106
用序列化器将Flume 事件转换成HBase Put 和Increment* ....................... 108
RPC Sink ............................................................................................................ 113
Avro Sink ..................................................................................................... 113
Thrift Sink ................................................................................................... 115
Morphline Solr Sink ........................................................................................... 116
Elastic Search Sink ............................................................................................. 119
自定义数据格式* ....................................................................................... 121
其他Sink :Null Sink、Rolling File Sink 和Logger Sink .................................. 124
编写自定义Sink* .............................................................................................. 125
总结 ................................................................................................................... 129
参考文献 ............................................................................................................ 129
第6 章 拦截器、Channel 选择器、Sink 组和
Sink 处理器 ................................................... 131
拦截器 ................................................................................................................ 131
时间戳拦截器 .............................................................................................. 132
主机拦截器 ................................................................................................. 133
静态拦截器 ................................................................................................. 133
正则过滤拦截器 .......................................................................................... 134
Morphline 拦截器 ........................................................................................ 135
UUID 拦截器 ............................................................................................... 136
编写拦截器* ............................................................................................... 137
Channel 选择器 .................................................................................................. 140
复制Channel 选择器 ................................................................................... 140
多路复用Channel 选择器 ........................................................................... 141
自定义Channel 选择器* ............................................................................ 144
Sink 组和Sink 处理器 ....................................................................................... 146
Load-Balancing Sink 处理器 ....................................................................... 148
Failover Sink 处理器 ................................................................................... 151
总结 ................................................................................................................... 153
参考文献 ............................................................................................................ 154
第7 章 发送数据到Flume* ....................................... 155
构建Flume 事件 ................................................................................................ 155
Flume 客户端SDK ............................................................................................. 156
创建Flume RPC 客户端 .............................................................................. 157
RPC 客户端接口 ......................................................................................... 157
所有RPC 客户端的公共配置参数 .............................................................. 158
默认RPC 客户端......................................................................................... 165
Load-Balancing RPC 客户端 ....................................................................... 168
Failover RPC 客户端 ................................................................................... 171
Thrift RPC 客户端 ....................................................................................... 172
嵌入式Agent ..................................................................................................... 173
配置嵌入式Agent ....................................................................................... 175
log4j Appender ................................................................................................... 180
Load-Balancing log4j Appender ................................................................... 181
总结 ................................................................................................................... 182
参考文献 ............................................................................................................ 183
第8 章 规划、部署和监控Flume ............................... 185
规划一个Flume 部署 ......................................................................................... 185
修复时间 ..................................................................................................... 185
我的Flume Channel 需要多少容量? ......................................................... 186
多少层? ..................................................................................................... 186
通过跨数据中心链接发送数据 .................................................................... 188
层分片 ......................................................................................................... 190
部署Flume ......................................................................................................... 191
部署自定义代码 .......................................................................................... 191
监控Flume ......................................................................................................... 193
从自定义组件报告度量 ............................................................................... 196
总结 ................................................................................................................... 196
参考文献 ............................................................................................................ 196
索引 ........................................................................... 197
· · · · · · (
收起)