Symbolic Data Analysis: Conceptual Statistics and Data Mining

Lynne Billard, Edwin Diday

買這商品的人也買了...

商品描述

Description

With the advent of computers, very large datasets have become routine. Standard statistical methods don’t have the power or flexibility to analyse these efficiently, and extract the required knowledge. An alternative approach is to summarize a large dataset in such a way that the resulting summary dataset is of a manageable size and yet retains as much of the knowledge in the original dataset as possible. One consequence of this is that the data may no longer be formatted as single values, but be represented by lists, intervals, distributions, etc. The summarized data have their own internal structure, which must be taken into account in any analysis.

This text presents a unified account of symbolic data, how they arise, and how they are structured. The reader is introduced to symbolic analytic methods described in the consistent statistical framework required to carry out such a summary and subsequent analysis.

  • Presents a detailed overview of the methods and applications of symbolic data analysis.
  • Includes numerous real examples, taken from a variety of application areas, ranging from health and social sciences, to economics and computing.
  • Features exercises at the end of each chapter, enabling the reader to develop their understanding of the theory.
  • Provides a supplementary website featuring links to download the SODAS software developed exclusively for symbolic data analysis, data sets, and further material.

Primarily aimed at statisticians and data analysts, Symbolic Data Analysis is also ideal for scientists working on problems involving large volumes of data from a range of disciplines, including computer science, health and the social sciences. There is also much of use to graduate students of statistical data analysis courses.

 

Table of Contents

1. Introduction.

References.

2. Symbolic Data.

2.1 Symbolic and Classical Data.

2.1.1 Types of Data.

2.1.2 Dependencies in the Data.

2.2 Categories, Concepts and Symbolic Objects.

2.2.1 Preliminaries.

2.2.2 Descriptions, Assertions, Extents.

2.2.3 Concepts of Concepts.

2.2.4 Some Philosophical Aspects.

2.2.5 Fuzzy, Imprecise, and Conjunctive Data.

2.3 Comparison of Symbolic and Classical Analysis.

Exercises.

References.

Tables.

Figures.

3. Basic Descriptive Statistics: One Variate.

3.1 Some Preliminaries.

3.2 Multi-valued Variables.

3.3 Interval-valued Variables.

3.4 Multi-valued Modal variables.

3.5 Interval-valued Modal Variables.

Exercises.

References.

Tables.

Figures.

4. Descriptive Statistics: Two or More Variates.

4.1 Multi-valued Variables.

4.2 Interval-valued Variables.

4.3 Modal Multi-valued Variables.

4.4 Modal Interval-valued Variables.

4.5 Baseball Interval-valued Dataset.

4.5.1 The Data: Actual and Virtual Datasets.

4.5.2 Joint Histograms.

4.5.3 Guiding Principles.

4.6 Measures of Dependence.

4.6.1 Moment Dependence.

4.6.2 Spearman’s rho and copulas.

Exercises.

References.

Tables.

Figures.

5. Principal Component Analysis.

5.1 Vertices Method.

5.2 Centers Method.

5.3 Comparison of the Methods.

Exercises.

References.

Tables.

Figures.

6. Regression Analysis.

6.1 Classical Multiple Regression Model.

6.2 Multi-valued Variables.

6.2.1 Single Dependent Variable.

6.2.2 Multi-valued Dependent Variable.

6.3 Interval-valued Variables.

6.4 Histogram-valued Variables.

6.5 Taxonomy Variables.

6.6 Hierarchical Variables.

Exercises.

References.

Tables.

Figures.

7. Cluster Analysis.

7.1 Dissimilarity and Distance Measures.

7.1.1 Basic Definitions.

7.1.2 Multi-valued Variables.

7.1.3 Interval-valued Variables.

7.1.4 Mixed-valued Variables.

7.2 Clustering Structures.

7.2.1 Types of Clusters: Definitions.

7.2.2 Construction of Clusters: Building Algorithms.

7.3 Partitions.

7.4 Hierarchy-Divisive Clustering.

7.4.1 Some Basics.

7.4.2 Multi-valued Variables.

7.4.3 Interval-valued Variables.

7.5 Hierarchy-Pyramid Clusters.

7.5.1 Some Basics.

7.5.2 Comparison of Hierarchy and Pyramid Structures.

7.5.3 Construction of Pyramids.

Exercises.

References.

Tables.

Figures.

Data Index.

Author Index.

Subject Index.

商品描述(中文翻譯)

描述

隨著電腦的出現,非常大的數據集已經成為常態。標準的統計方法無法高效地分析這些數據集並提取所需的知識。另一種方法是將大型數據集總結成一個可管理的大小,同時保留原始數據集中的大部分知識。這導致數據可能不再以單個值的形式呈現,而是以列表、區間、分佈等形式呈現。總結數據具有其自身的內部結構,任何分析都必須考慮這一點。

本書提供了有關符號數據的統一介紹,包括它們的產生方式和結構。讀者將介紹符號分析方法,並描述了進行總結和後續分析所需的一致統計框架。

主要針對統計學家和數據分析師,本書也非常適合從事各個學科的大量數據問題的科學家,包括計算機科學、健康和社會科學。對於統計數據分析課程的研究生也非常有用。

目錄

- 符號數據分析的方法和應用的詳細概述。
- 包含許多真實示例,來自各種應用領域,包括健康和社會科學、經濟學和計算機科學。
- 每章末尾都有練習題,讓讀者加深對理論的理解。
- 提供一個補充網站,包含下載專門用於符號數據分析的SODAS軟件、數據集和其他資料。

以上