Tapping into Unstructured Data pdf epub mobi txt 电子书下载 2026

简体网页||繁体网页

☆☆☆☆☆

出版者:

作者:Inmon, William H./ Nesavich, Anthony

出品人:

页数:264

译者:

出版时间:2007-11

价格:$ 56.49

装帧:

isbn号码:9780132360296

丛书系列:

图书标签:

数据科学
非结构化数据
数据挖掘
机器学习
自然语言处理
文本分析
大数据
信息检索
人工智能
数据分析

下载链接在页面底部

facebook linkedin mastodon messenger pinterest reddit telegram twitter viber vkontakte whatsapp 复制链接

想要找书就要到图书目录大全

book.wenda123.org

立刻按 ctrl+D收藏本页

你会得到大惊喜!!

具体描述

"The authors, the best minds on the topic, are breaking new ground. They show how every organization can realize the benefits of a system that can search and present complex ideas or data from what has been a mostly untapped source of raw data." --Randy Chalfant, CTO, Sun Microsystems The Definitive Guide to Unstructured Data Management and Analysis--From the World's Leading Information Management Expert A wealth of invaluable information exists in unstructured textual form, but organizations have found it difficult or impossible to access and utilize it. This is changing rapidly: new approaches finally make it possible to glean useful knowledge from virtually any collection of unstructured data. William H. Inmon--the father of data warehousing--and Anthony Nesavich introduce the next data revolution: unstructured data management. Inmon and Nesavich cover all you need to know to make unstructured data work for your organization. You'll learn how to bring it into your existing structured data environment, leverage existing analytical infrastructure, and implement textual analytic processing technologies to solve new problems and uncover new opportunities. Inmon and Nesavich introduce breakthrough techniques covered in no other book--including the powerful role of textual integration, new ways to integrate textual data into data warehouses, and new SQL techniques for reading and analyzing text. They also present five chapter-length, real-world case studies--demonstrating unstructured data at work in medical research, insurance, chemical manufacturing, contracting, and beyond. This book will be indispensable to every business and technical professional trying to make sense of a large body of unstructured text: managers, database designers, data modelers, DBAs, researchers, and end users alike. Coverage includes *What unstructured data is, and how it differs from structured data*First generation technology for handling unstructured data, from search engines to ECM--and its limitations*Integrating text so it can be analyzed with a common, colloquial vocabulary: integration engines, ontologies, glossaries, and taxonomies*Processing semistructured data: uncovering patterns, words, identifiers, and conflicts*Novel processing opportunities that arise when text is freed from context *Architecture and unstructured data: Data Warehousing 2.0 *Building unstructured relational databases and linking them to structured data*Visualizations and Self-Organizing Maps (SOMs), including Compudigm and Raptor solutions*Capturing knowledge from spreadsheet data and email*Implementing and managing metadata: data models, data quality, and more William H. Inmon is founder, president, and CTO of Inmon Data Systems. He is the father of the data warehouse concept, the corporate information factory, and the government information factory. Inmon has written 47 books on data warehouse, database, and information technology management; as well as more than 750 articles for trade journals such as Data Management Review, Byte, Datamation, and ComputerWorld. His b-eye-network.com newsletter currently reaches 55,000 people. Anthony Nesavich worked at Inmon Data Systems, where he developed multiple reports that successfully query unstructured data. Preface xvii 1 Unstructured Textual Data in the Organization 1 2 The Environments of Structured Data and Unstructured Data 15 3 First Generation Textual Analytics 33 4 Integrating Unstructured Text into the Structured Environment 47 5 Semistructured Data 73 6 Architecture and Textual Analytics 83 7 The Unstructured Database 95 8 Analyzing a Combination of Unstructured Data and Structured Data 113 9 Analyzing Text Through Visualization 127 10 Spreadsheets and Email 135 11 Metadata in Unstructured Data 147 12 A Methodology for Textual Analytics 163 13 Merging Unstructured Databases into the Data Warehouse 175 14 Using SQL to Analyze Text 185 15 Case Study--Textual Analytics in Medical Research 195 16 Case Study--A Database for Harmful Chemicals 203 17 Case Study--Managing Contracts Through an Unstructured Database 209 18 Case Study--Creating a Corporate Taxonomy (Glossary) 215 19 Case Study--Insurance Claims 219 Glossary 227 Index 233

《数字时代的信息炼金术：从数据洪流到商业洞察》图书简介在这个信息爆炸的时代，数据以前所未有的速度和规模涌现，如同奔腾不息的江河。然而，河流中的水并非都可直接饮用，其中包含了大量的泥沙、杂质，以及尚未被提炼的宝贵矿物。我们正处在一个由“信息”驱动的全新经济范式中，成功的关键不再仅仅是拥有数据，而是如何高效、精准地理解和利用这些数据。《数字时代的信息炼金术：从数据洪流到商业洞察》并非关注特定的技术流派或工具集，而是深入探讨一种更为本质的能力：在海量、异构、非结构化信息中，发掘高价值信号，并将这些信号转化为可执行的商业战略和创新动力的系统性思维框架。本书旨在为企业高管、战略规划师、数据科学领域的资深从业者，以及任何渴望在数据驱动决策中占据先机的人士，提供一套坚实、可操作的理论基础与实践指导。我们摒弃了技术术语的堆砌，转而聚焦于“洞察的产生过程”——即如何将原始的、看似杂乱无章的输入，通过精妙的解析和严谨的验证，转化为影响深远的商业输出。第一部分：信息的本质与认知的陷阱信息时代带来了空前的便利，也催生了新的盲点。本部分首先剖析了现代信息环境的结构性特征。我们探讨了“结构化”与“非结构化”数据的传统界限正在如何模糊，以及这种模糊性对传统数据治理模型构成的挑战。 1. 范式转换：从数据库到知识图谱的演进。我们详细阐述了传统关系型数据库的局限性，及其如何限制了对复杂关系和上下文的捕捉能力。随后，我们引入了知识图谱（Knowledge Graph）作为一种强大的组织工具，它如何通过节点与边的连接，模拟现实世界的复杂互动，从而实现更高层次的语义理解。这不仅仅是数据存储方式的改变，更是认知世界方式的升级。 2. 噪音、偏见与“回音室”效应。信息的质量是决定洞察价值的生命线。本书深入分析了数据采集、清洗和标注过程中潜藏的系统性偏见（Systemic Bias）。我们探讨了算法推荐系统如何无意中构建起信息“回音室”，固化甚至放大决策者的原有认知。读者将学习到如何设计反脆弱（Anti-fragile）的信息摄取机制，主动寻找“对立的证据”以确保决策的鲁棒性。 3. 语境（Context）的回归。在海量数据中，孤立的事实毫无意义。本部分强调了语境在信息价值链中的核心地位。一个交易记录、一条社交媒体评论、一份专利文档，只有置于特定的时间、空间和文化背景下，才能揭示其真正的意涵。我们提供了一套评估信息语境完整性的框架。第二部分：信号捕获与跨模态解析本部分转向实际操作层面，探讨如何从看似混乱的输入中，高效地提炼出关键的商业信号。重点在于处理那些传统统计方法难以直接量化的信息形态。 4. 文本的深度挖掘与情绪地图的绘制。我们跳出基础的关键词提取，深入研究如何通过高级的自然语言处理技术，理解文本背后的意图（Intent）和隐含的态度（Sentiment）。这包括对冗长报告、客户反馈邮件、法律文件等非结构化文本进行主题建模和情感极性分析，从而构建实时的市场情绪地图。 5. 视觉与听觉数据的编码与洞察。在物联网和监控日益普及的背景下，图像和视频信息正成为新的金矿。本章探讨了如何将非文本信息转化为可计算的特征向量。例如，通过分析零售店内的客流密度变化、供应链环节的视觉异常检测，或将会议录音转化为结构化的行动项列表，实现跨模态信息（如文字、图像、时序数据）的交叉验证。 6. 异构数据融合的艺术。真正的商业智能诞生于不同信息源的碰撞。本书详细介绍了多源数据集成（Multi-source Data Fusion）的技术路径，包括数据标准化、时间序列对齐，以及如何利用概率模型来处理因数据源不同而导致的信度差异。我们着重讲解了如何将传统的财务报表数据与非正式的行业报告、专利申请趋势进行有效整合，以预测行业拐点。第三部分：洞察的验证与战略转化信息采集与解析只是第一步，如何验证这些信号的有效性，并将其转化为能够落地执行的商业决策，是衡量信息炼金术成功的最终标准。 7. 假设驱动的信号验证。任何从数据中产生的“洞察”首先是一个可被证伪的假设。本部分介绍了一种基于科学方法的迭代验证流程。这涉及到构建最小可行性模型（MVM），进行A/B测试，以及设计对照组，以确保观察到的相关性并非偶然，而是具有真实的因果关系。 8. 复杂系统的建模与情景推演。商业环境是一个高度复杂的动态系统。我们讨论了如何利用系统动力学模型（System Dynamics）和基于主体的建模（Agent-Based Modeling），将前述提炼出的关键信号输入到模拟环境中。这使得决策者能够在不承担真实市场风险的前提下，预演不同战略干预措施可能产生的长期连锁反应。 9. 从洞察到叙事：影响力的构建。最深刻的洞察如果不能有效地传达给执行层和董事会，其价值将大打折扣。本书的最后一部分聚焦于“数据叙事学”（Data Storytelling）。我们提供了一套结构化的方法，教导专业人士如何将复杂的分析结果，转化为清晰、引人入胜且具有说服力的商业叙事，从而驱动组织变革和资源分配的决策。结语：持续学习的生态系统《数字时代的信息炼金术》倡导一种持续迭代的组织文化。数据环境永不静止，因此对信息的提炼过程也必须是一个活的、不断自我校正的生态系统。本书提供的框架，旨在帮助组织建立起对信息流的掌控力，将无序的数字洪流，转化为驱动未来增长的精准燃料。它是一本关于如何“思考”信息的指南，而非单纯的“操作”指南。