|国家预印本平台
首页|Location and association measures for interval-valued data based on Mallows' distance

Location and association measures for interval-valued data based on Mallows' distance

Location and association measures for interval-valued data based on Mallows' distance

来源:Arxiv_logoArxiv
英文摘要

The growing demand to analyse large and complex datasets has spurred the development of Symbolic Data Analysis as a promising approach to address contemporary data challenges. Amongst these, interval-valued data introduces new theoretical and methodological questions that remain open. In this paper, we generalise measures of location and association for interval-valued random variables using Mallows' distance. Departing from restrictive assumptions such as uniform distributions over microdata, our proposal extends the barycentre approach to any absolutely continuous distribution with finite second moment. A key contribution is the derivation of explicit formulas for Mallows' distance in p-dimensional interval spaces. These formulas decompose into components for centres, ranges, and a novel cross-term that captures their interaction. This decomposition leads to a new theoretical symbolic covariance matrix that explicitly accounts for the dependence between centres and ranges - a relation often obscured in current definitions of symbolic covariance. Theoretical developments are supported by empirical studies on diverse real-world datasets, each reflecting different degrees of information about the underlying microdata. These applications highlight both the flexibility of the proposed methodology and the interpretability of its results.

Diogo Pinheiro、M. Rosário Oliveira、Lina Oliveira

数学

Diogo Pinheiro,M. Rosário Oliveira,Lina Oliveira.Location and association measures for interval-valued data based on Mallows' distance[EB/OL].(2025-08-08)[2025-08-24].https://arxiv.org/abs/2407.05105.点此复制

评论