|国家预印本平台
首页|Texture: Structured Exploration of Text Datasets

Texture: Structured Exploration of Text Datasets

Texture: Structured Exploration of Text Datasets

来源:Arxiv_logoArxiv
英文摘要

Exploratory analysis of a text corpus is essential for assessing data quality and developing meaningful hypotheses. Text analysis relies on understanding documents through structured attributes spanning various granularities of the documents such as words, phrases, sentences, topics, or clusters. However, current text visualization tools typically adopt a fixed representation tailored to specific tasks or domains, requiring users to switch tools as their analytical goals change. To address this limitation, we present Texture, a general-purpose interactive text exploration tool. Texture introduces a configurable data schema for representing text documents enriched with descriptive attributes. These attributes can appear at arbitrary levels of granularity in the text and possibly have multiple values, including document-level attributes, multi-valued attributes (e.g., topics), fine-grained span-level attributes (e.g., words), and vector embeddings. The system then combines existing interactive methods for text exploration into a single interface that provides attribute overview visualizations, supports cross-filtering attribute charts to explore subsets, uses embeddings for a dataset overview and similar instance search, and contextualizes filters in the actual documents. We evaluated Texture through a two-part user study with 10 participants from varied domains who each analyzed their own dataset in a baseline session and then with Texture. Texture was able to represent all of the previously derived dataset attributes, enabled participants to more quickly iterate during their exploratory analysis, and discover new insights about their data. Our findings contribute to the design of scalable, interactive, and flexible exploration systems that improve users' ability to make sense of text data.

Will Epperson、Arpit Mathur、Adam Perer、Dominik Moritz

计算技术、计算机技术

Will Epperson,Arpit Mathur,Adam Perer,Dominik Moritz.Texture: Structured Exploration of Text Datasets[EB/OL].(2025-04-23)[2025-05-12].https://arxiv.org/abs/2504.16898.点此复制

评论