Hadoop: The Definitive Guide 在線電子書 圖書標籤: Hadoop 大數據 BigData 計算機 分布式 hadoop 機器學習 O'Reilly
發表於2024-11-22
Hadoop: The Definitive Guide 在線電子書 pdf 下載 txt下載 epub 下載 mobi 下載 2024
很全,主要是前兩部分,尤其mapreduce部分,後麵的那些cluster和各種相關項目的其實可以隻做瀏覽,講得也不是很細,用的時候看apache的說明文檔就好
評分很棒
評分閱讀瞭第1,2部分,算是對Hadoop有瞭基本的認知,接下來需要結閤實際項目夯實。其他相關的技術如Hive,HBase,Spark也需要去學習。
評分2016 NO.4 深入淺齣,原理講的非常透徹。核心是 Hadoop Fundamentals 和 MapReduce 兩章,但是後麵的 Related Projects 也寫的言簡意賅,能夠突齣重點。比如 Flume 這一章會提到一些在 Flume 官網教程上也沒提到的要點。
評分2016 NO.4 深入淺齣,原理講的非常透徹。核心是 Hadoop Fundamentals 和 MapReduce 兩章,但是後麵的 Related Projects 也寫的言簡意賅,能夠突齣重點。比如 Flume 這一章會提到一些在 Flume 官網教程上也沒提到的要點。
Tom White has been an Apache Hadoop committer since February 2007, and is a member of the Apache Software Foundation. He works for Cloudera, a company set up to offer Hadoop support and training. Previously he was as an independent Hadoop consultant, working with companies to set up, use, and extend Hadoop. He has written numerous articles for O'Reilly, java.net and IBM's developerWorks, and has spoken at several conferences, including at ApacheCon 2008 on Hadoop. Tom has a Bachelor's degree in Mathematics from the University of Cambridge and a Master's in Philosophy of Science from the University of Leeds, UK.
Get ready to unlock the power of your data. With the fourth edition of this comprehensive guide, you’ll learn how to build and maintain reliable, scalable, distributed systems with Apache Hadoop. This book is ideal for programmers looking to analyze datasets of any size, and for administrators who want to set up and run Hadoop clusters.
Using Hadoop 2 exclusively, author Tom White presents new chapters on YARN and several Hadoop-related projects such as Parquet, Flume, Crunch, and Spark. You’ll learn about recent changes to Hadoop, and explore new case studies on Hadoop’s role in healthcare systems and genomics data processing.
Learn fundamental components such as MapReduce, HDFS, and YARN
Explore MapReduce in depth, including steps for developing applications with it
Set up and maintain a Hadoop cluster running HDFS and MapReduce on YARN
Learn two data formats: Avro for data serialization and Parquet for nested data
Use data ingestion tools such as Flume (for streaming data) and Sqoop (for bulk data transfer)
Understand how high-level data processing tools like Pig, Hive, Crunch, and Spark work with Hadoop
Learn the HBase distributed database and the ZooKeeper distributed configuration service
很好的Hadoop教程,比Apache和Yahoo !网页版guide详细很多,很多想不明白的Hadoop实现细节都可以在这本书里找到。
評分 評分中文版412页: 所以理论上,任何东西都可以表示成二进制形式,然后转化成为长整型的字符串或直接对数据结构进行序列化,来作为键值。 原文460页: ..., so theoretically anything can serve as row key, from strings to binary representations of long or even serialized ...
評分其实也不算全部读完了,读它主要是为了技术选型,考虑升级持久层架构、提高系统可扩展性,仔细研读了前几章,对Hadoop、MapReduce、HDFS的模型、机制、使用场景有了一定了解。后面几章及其生态圈内的其他项目抱着了解的心态简单浏览了一下。整体感觉还行,至少从我看过的章节来...
評分Hadoop: The Definitive Guide 在線電子書 pdf 下載 txt下載 epub 下載 mobi 下載 2024