InterAxis: Steering Scatterplot Axes via Observation-Level Interaction

通过观察级互动操纵散点图轴

Abstract—Scatterplots are effective visualization techniques for multidimensional data that use two (or three) axes to visualize data
items as a point at its corresponding x and y Cartesian coordinates. Typically, each axis is bound to a single data attribute. Interactive exploration occurs by changing the data attributes bound to each of these axes. In the case of using scatterplots to visualize
the outputs of dimension reduction techniques, the x and y axes are combinations of the true, high-dimensional data. For these
spatializations, the axes present usability challenges in terms of interpretability and interactivity. That is, understanding the axes
and interacting with them to make adjustments can be challenging. In this paper, we present InterAxis, a visual analytics technique
to properly interpret, define, and change an axis in a user-driven manner. Users are given the ability to define and modify axes by
dragging data items to either side of the x or y axes, from which the system computes a linear combination of data attributes and binds
it to the axis. Further, users can directly tune the positive and negative contribution to these complex axes by using the visualization
of data attributes that correspond to each axis. We describe the details of our technique and demonstrate the intended usage through
two scenarios.

**
摘要-散点图是使用两个(或三个)轴来可视化数据的多维数据的有效可视化技术物品作为其对应的x和y笛卡尔坐标的点。通常,每个轴都绑定到单个数据属性。通过更改绑定到这些轴中的每一个的数据属性来进行交互式探索。在使用散点图进行可视化的情况下
尺寸缩小技术的输出,x和y轴是真实的,高维数据的组合。对于这些
空间化,轴在可解释性和交互性方面呈现可用性挑战。那就是理解轴
并与他们进行互动以进行调整是具有挑战性的。在本文中,我们介绍了视觉分析技术InterAxis
以用户驱动的方式正确解释,定义和更改轴。用户可以通过定义和修改轴
将数据项拖动到x轴或y轴的任一侧,系统将从中计算数据属性和绑定的线性组合它到轴。此外,用户可以通过使用可视化来直接调整对这些复杂轴的正负贡献的数据属性对应于每个轴。我们描述我们的技术的细节,并展示预期的用途两种情况。**

Index Terms—Scatterplots, user interaction, model steering

索引术语——散点图,用户交互,模型操纵

Scatterplots are commonly utilized in visualizing relationships between two individual data attributes . The use of two orthogonal
axes mapped to data attributes produces a Cartesian space where data
objects can be charted. A basic strategy to form these axes in multidimensional data visualization is to assign each axis an individual
feature or dimension originally given in a dataset. For example, plotting temperature over time on the y and x axes, respectively, generates a chart that can be used for understanding the relationship between
these two data attributes. However, this has a severe scalability issue because two-dimensional (2D) scatterplots can represent only two
features out of many at any given point of time.

散点图通常用于可视化两个单独数据属性之间的关系。 使用两个正交
映射到数据属性的轴产生数据的笛卡尔空间对象可以被绘制。 在多维数据可视化中形成这些轴的基本策略是将每个轴分配给个体特征或尺寸原始在数据集中给出。 例如,分别在y轴和x轴上绘制温度随时间的变化,生成可以用于理解关系的图表这两个数据属性。 然而,这具有严重的可扩展性问题,因为二维(2D)散点图可以仅表示两个在任何给定时间点的许多功能。

Instead, an alternative strategy that better handles this scalability issue is dimension reduction, which involves multiple original features
to represent each axis. Dimension reduction [21] is a popular technique used to transform high-dimensional data into lower-dimensional
views (typically, 2D scatterplots). While a variety of approaches exist,
their fundamental functionality is similar: to solve for distances between data points in a lower-dimensional space that closely represents
the true distances between the points in a high-dimensional space. This
is carried out by variations in solving for distance metrics from the
data.

相反,更好地处理这种可扩展性问题的替代策略是维度降低,其涉及多个原始特征以表示每个轴。 尺寸减小[21]是用于将高维数据转换为低维的流行技术视图(通常为2D散点图)。 虽然存在各种方法,它们的基本功能类似于:解决紧密代表的低维空间中的数据点之间的距离高维空间点之间的真实距离。 这个是通过解决距离度量的变化进行的数据。

In the visual and perceptual understanding of a scatterplot, the interpretation of its axes plays a crucial role. That is, understanding what
it means to have large/small values along the x or y axis significantly
helps the users’ reasoning process about why the relationships among
data items are close/remote in a scatterplot. In the case of traditional
scatterplots where each axis is directly mapped to a particular data
attribute (without any dimension reduction), this process is straightforward. However, this is not often the case when it comes to the axis
of a 2D scatterplot generated by dimension reduction. One of the primary reasons is that only a limited set of dimension reduction methods
provide the interpretability of the axes of a scatterplot. Such methods include traditional methods such as principal component analysis
(PCA) [27] and linear discriminant analysis [23], which form an axis
(or a reduced dimension) explicitly as a linear combination of the original data attributes. Through this linear combination representation of
the original attributes, one can interpret the contribution of each original attribute to the axis. On the other hand, many other dimension
reduction methods form each axis implicitly in terms of the original
attributes, and thus they do not provide users with its clear meaning.
Most advanced non-linear dimension reduction methods such as manifold learning [33] correspond to this case. Even worse, in some other
popular methods such as multidimensional scaling (MDS) [31] and
force-directed graph layout [22], these are rotation invariant, which
means that the axis is not defined at all. Thus, communicating with
users about the meaning of the axes resulting from dimension reduction techniques is an open challenge.

在对散点图的视觉和感知理解中,对其轴的解释起着至关重要的作用。那就是理解什么这意味着显着地沿x或y轴具有大/小的值帮助用户推理过程中关于为什么之间的关系数据项在散点图中是近/远的。在传统的情况下
每个轴直接映射到特定数据的散点图属性(没有任何维度减少),这个过程很简单。然而,这在轴上并不常见由尺寸减小生成的2D散点图。其中一个主要原因是只有一些有限的尺寸缩小方法提供散点图的轴的可解释性。这些方法包括诸如主成分分析的传统方法(PCA)[27]和线性判别分析[23],形成轴(或缩小的维度)显式地作为原始数据属性的线性组合。通过这种线性组合表示原始属性,可以解释每个原始属性对轴的贡献。另一方面,许多其他方面缩减方法以原始方式隐含地形成每个轴属性,因此它们不为用户提供其明确的含义。最先进的非线性尺寸缩小方法,如歧管学习[33]对应于这种情况。更糟糕的是,在其他一些流行的方法,如多维缩放(MDS)[31]和力导向图布局[22],这些是旋转不变量,其中意味着轴根本没有定义。因此,沟通用户关于由维度缩减技术产生的轴的含义是一个开放的挑战。

Another issue with the scatterplot generated by dimension reduction lies in the lack of interactivity. Forming the axes via dimension
reduction does not typically allow human intervention. In other words,
most of the dimension reduction methods are performed in a fully automated manner on the basis of their own pre-defined mathematical
criteria, and thus, diverse user needs and task goals are not considered
in this process. For instance, the PCA criterion, which maximally preserves the total variance of data, may not align well with the goal of
a user’s task. While MDS attempts to preserve all pairwise distances
with equal weights, one may want to focus on a subset of data points,
e.g., a local region in a scatterplot, at a time.
Motivated by these challenges, we propose a novel interactive
knowledge specification method for multidimensional data visualization, which is an alternative to the purely automatic process of generating a scatterplot via dimension reduction. The proposed method interactively forms an axis, thereby generating a corresponding scatterplot
in a user-driven manner. The key novelty of the proposed method lies
in the direct and seamless incorporation of user-selected data items for
characterizing the axis during the data exploration process. Our technique enables users to create and modify the axes by dragging data
objects to the high and low locations on both the x and y axes. The
proposed method defines the meaning of an axis accordingly in the
form of a linear combination of original data features, similar to the
output of linear dimension reduction methods. Such a user-driven linear combination of data attributes is visualized on each axis, showing
the positive or negative contribution of each attribute to the axis. Finally, users can continually refine the axes by dragging additional data
points to the axes, or by directly adjusting the contribution of the data
attributes as part of the linear combination.

由维度降低产生的散点图的另一个问题在于缺乏交互性。通过维度降低形成轴通常不允许人为干预。换一种说法,
大多数维度降低方法是以完全自动化的方式根据它们自己的预定义数学来执行的标准,因此,在这个过程中不考虑不同的用户需求和任务目标。例如,最大限度地保留数据总方差的PCA标准可能与目标无关用户的任务。虽然MDS尝试保留所有成对的距离具有相等的权重,可能想要集中在数据点的一个子集上,例如,散点图中的局部区域。
受到这些挑战的驱动,我们提出了一个新颖的互动
用于多维数据可视化的知识规范方法,其是通过维度降低生成散点图的纯自动过程的替代方法。所提出的方法交互地形成轴,由此产生相应的散点图。以用户驱动的方式。提出的方法的关键新颖之处在于
在用户选择的数据项目中直接和无缝地结合
在数据勘探过程中表征轴。我们的技术使用户能够通过拖动数据来创建和修改轴对象到x和y轴上的高低位置。该提出的方法定义了相应的轴的含义形式的原始数据特征的线性组合,类似于
线性维度降低方法的输出。这样的用户驱动的数据属性的线性组合在每个轴上可视化,显示每个属性对轴的正或负贡献。最后,用户可以通过拖动附加数据来持续细化轴指向轴,或通过直接调整数据的贡献属性作为线性组合的一部分。

The primary contributions of this work include the following:
• a visual analytics technique for directly creating, modifying, and
visualizing complicated axes formed by a linear combination of
data attributes
• a user interaction technique enabling seamless interactivity via
both data objects and data attributes to steer the meaning of the
axes
• a visual analytics technique to help users discover and weigh data
attributes

这项工作的主要贡献包括:
•通过数据属性线性组合直观的分析技术,直接创建,修改和
可视化形成复杂轴
•通过用户交互技术实现无缝交互
这两个数据对象和数据属性来指导它轴的意义
•视觉分析技术,帮助用户发现和权衡数据属性

The rest of this paper is organized as follows: Section 2 discusses related work. Section 3 describes our proof-of-concept visual analytics
system along with how the proposed interaction techniques are performed from the perspectives of both the front end and the back end,
followed by a discussion about our design rationale. Section 4 presents
several usage scenarios showcasing the advantages of the proposed interaction techniques. Section 5 presents in-depth discussions about the
limitations of our interaction techniques as well as potential directions
for improving them. Finally, Section 6 concludes the paper with some
future work.

本文的其余部分组织如下:第二部分讨论相关工作。 第3节(怎么实现)描述了我们的概念验证视觉分析系统以及从前端和后端的角度如何执行所提出的交互技术,其次是关于我们的设计理念的讨论。 第4节(使用场景)介绍几种使用场景展示了所提出的交互技术的优点。 第5节对此进行了深入的讨论我们的互动技术的局限性以及潜在的方向改善他们。 最后,第6节总结了一些文章未来的工作。

2.1 Multiattribute Data Visualization

Fig. 2. A scatterplot generated by Tableau [41]. Users can interactively explore data by selecting and changing the bindings between
data attributes and axes.

图2

图2,Tableau [41]生成的散点图。 用户可以通过选择和更改两者之间的绑定来交互地探索数据数据属性和轴。

The design space for visualization techniques for representing multiattribute data is large [28]. For example, the existing techniques include iconic displays [6], transforming displays based on geometric
characteristics [13], and stacked visual representations [32]. Among
these many techniques, one commonly used technique is the scatterplot [12, 20, 45], owing to the visual simplicity and cultural familiarity
of such charts [43]. Scatterplots (such as the one shown in Fig. 2) represent data on a Cartesian plane defined by the two graphical axes (the
x and the y axes). Three-dimensional scatterplots are also an available
option, but their use in information visualization is limited given the
perceptual and visual challenges [38, 47]. Systems that enable users to
generate scatterplots include Tableau [41], GGobi [40], Matlab [34],
Spotfire [1], and Microsoft Excel [19]. One basic user interaction supported by scatterplots is to select and change the mapping of the axes
to data attributes (Fig. 2).
Other kinds of high-dimensional data have also been visualized in
the form of a scatterplot based on dimension reduction, including education performance data, census data [18], wine characteristics [5],
facial images [8], and text documents [7].

用于表示多属性数据的可视化技术的设计空间很大[28]。例如,现有技术包括图标显示[6],基于几何变换显示特征[13]和叠加的视觉表示[32]。其中
这些许多技术,一种常用的技术是散点图[12,20,45],由于视觉简洁和文化熟悉度的这样的图表[43]。散点图(如图2所示)表示由两个图形轴定义的笛卡尔坐标平面上的数据(x和y轴)。三维散点图也是可用的选项,但它们在信息可视化中的使用受到感知和视觉挑战的限制[38,47]。允许用户使用生成散点图的系统包括Tableau [41],GGobi [40],Matlab [34],
Spotfire [1]和Microsoft Excel [19]。通过散点图支持的一个基本用户交互是选择和更改轴的映射到数据属性(图2)。

Fig. 3. A scatterplot matrix (adapted from [15]) showing all individual
pairwise feature scatterplots of an 8-dimensional dataset

图三

图3.散点图矩阵(从[15]改编))显示所有个体8维数据集的成对特征散点图

Fig. 4. A Galaxy View generated by IN-SPIRE [48] showing a scatterplot of documents (dots)

图四

图4. IN-SPIRE [48]生成的Galaxy View,显示文件散点图(点)

As dataset complexities increase, often, the number of data attributes to select from increases as well. This causes situations where
directly selecting one out of hundreds or thousands of data attributes
can be less than optimal. As such, different types of techniques exist
to show more combinations of data attributes simultaneously. For example, multiple scatterplots can be arranged into a single view called a scatterplot matrix [12]. A scatterplot matrix (such as the example
shown in Fig. 3, adapted from [15]) binds data attributes to rows and
columns so that each cell in the matrix can represent a single scatterplot. As such, users do not have to individually bind data attributes to
the axes and interactively choose among the potentially large number
of choices

随着数据集复杂性的增加,通常选择的数据属性数量也会增加。 这导致了情况,直接从数以百计的数据属性中直接选择一个不是最佳的。 因此,存在不同类型的技术同时显示更多的数据属性组合。 例如,可以将多个散点图排列成称为散点图矩阵的单个视图[12]。 散点图矩阵(如示例
如图3,改编自[15])将数据属性绑定到行和列,使得矩阵中的每个单元格可以表示单个散点图。 因此,用户不必单独绑定数据属性
轴和不在需要大量的选择

2.2 Applications of Dimension Reduction in Information Visualization

在信息可视化中,降维的应用

When using dimension reduction for visualization purposes, the goal
is to provide a low-dimensional view, typically a 2D scatterplot, in
a manner that the original high-dimensional distances between data
points are maximally preserved in the resulting 2D views. These
views often show spatial clusters or groups of data representing coherent contents. The widely used dimension reduction methods used
for visualization include PCA [27], MDS [31], self-organizing map
(SOM) [29], and generative topographic mapping (GTM) [3]. Recently, t-distributed stochastic neighbor embedding [46] has been proposed as a dimension reduction method, which is particularly suitable for generating 2D scatterplots that can reveal meaningful insights
about data such as clusters and outliers

当为了可视化目的使用降维时,目标
是提供一个低维度的视图,通常是2D散点图,初始化高维数据点之间的距离需要最大程度的表现在2维视图中。 这些视图通常显示表示相干内容的空间群集或数据组。 使用广泛使用的降维方法
可视化包括PCA [27],MDS [31],自组织图
(SOM)[29]和生成地形图(GTM)[3]。 最近,t分布随机相邻嵌入[46]已经被提出作为一种维数减小方法,特别适用于生成可以揭示有意义的见解的二维散点图关于诸如集群和异常值之类的数据

To date, these methods have been actively adopted in visual analytics systems. For example, IN-SPIRE [48], a well-known visual analytics system for document analysis, provides a Galaxy View (as shown
in Fig. 4) that visualizes text corpora spatially by showing the pairwise similarity between documents as their distance in a 2D space.
As a result, groups and clusters emerge, which can be perceived as
the sets of similar documents, based on the geographic "near=similar"
metaphor [39]. More recently, a visual analytics system applicable to
more general high-dimensional data types including documents and
images has been proposed, allowing a user to explore the diverse aspects of data by applying various dimension reduction methods to generate different scatterplot visualizations [9].
Other kinds of high-dimensional data have also been visualized in
the form of a scatterplot based on dimension reduction, including education performance data, census data [18], wine characteristics [5],
facial images [8], and text documents [7].

迄今为止,这些方法已经在视觉分析系统中得到积极应用。例如,IN-SPIRE [48],用于文档分析的知名视觉分析系统提供了一个Galaxy View(如图4)通过显示文档之间的成对相似性作为它们在2D空间中的距离,在空间上可视化文本语料库。结果,群体和集群出现,这可以被认为是各类相似的文件,基于地理“近=相似”比喻[39]。最近,一个视觉分析系统适用于更一般的高维数据类型包括文档和已经提出了图像,允许用户通过应用各种维度降低方法来生成不同的散点图可视化来探索数据的不同方面[9]。其他类型的高维数据也已被可视化基于维度降低的散点图形式,包括教育绩效数据,人口普查数据[18],葡萄酒特征[5]面部图像[8]和文本文档[7]。

2.3 Interactivity for Dimension Reduction in Information
Visualization

在信息可视化中对于降维的交互

In general, the axes created via dimension reduction techniques are defined by linear or non-linear combinations of original data dimensions.
This complexity can lead to trust and interpretation challenges for domain experts exploring their data visually [10]. For example, users
may question whether their interpretation of a pattern is trustworthy or
if it is just an artifact of a dimension reduction technique. More fundamentally, using only two dimensions to represent considerably higherdimensional data inevitably involves significant information loss and
distortion. To overcome these issues, various user interactions have
been employed in numerous visual analytics systems.
One approach to user interaction is via direct manipulation of dimension reduction model parameters. For example, Jeong et al.
presented iPCA, a visual analytics application that visualizes highdimensional data in a 2D scatterplot using PCA [26]. They utilize
graphical controls (e.g., sliders) to enable users to directly manipulate
the weight on the principal components used in PCA. As a result, the
adjustments by the user generate a new projection (i.e., a new scatterplot). Similar interaction guidelines have been used by other applications, such as a text visualization system called STREAMIT [2].
A different set of techniques for incorporating user interactions into
such visual analytics systems also exists. Semantic interaction techniques function by inferring model updates based on direct interactions performed in the visualization [16, 17]. For example, Endert et
al. have shown how directly manipulating the position of points in a
2D scatterplot can be used for inferring the parameters of PCA, MDS,
and GTM [18]. These inferences can also be used for exporting the
specification of distance functions computed in the dimension reduction step so that they can be reused, shared, or simply saved [5].
Other than manipulating data items to interact with scatterplots, researchers have studied the interaction techniques that manipulate features or dimensions. Yi et al. have presented a technique called Dust
& Magnet that allows users to additionally place features or dimensions on top of a scatterplot themselves to see which data items have
large values of these features or dimensions [49]. For text analysis, the
VIBE system allows users to perform similar interactions with keywords [35]. In addition, Turkay et al. proposed a technique using
dual scatterplots one of which shows data items while the other shows
features [44]. By providing brushing and linking as well as filtering
operations on both data items and features in these dual scatterplots,
users can check major patterns as well as outliers among data items
and among features.The technique proposed in this paper follows a similar idea of interacting with both data items and features, but the main novelty of
the proposed technique against the existing work lies in the capability
of directly defining and interpreting the axes of the 2D scatterplot by
assigning the data items of our interest to the axes. In this respect, our
work is related to PivotSlice, a technique recently proposed by Zhao
et al. that allows faceted browsing of high-dimensional data [50], as
it allows users to specify data attributes on axes of the scatterplot by
directly dragging the attribute to the axis. However, our technique enables users to drag data objects (instead of data attributes) to the axis.
Further, the proposed technique does not divide the scatterplot into a
multifaceted view.
Furthermore, a technique called flexible linked axes [11] has a relationship with our work from a different aspect. That is, this technique
is a different type of interaction that allows users to draw axes on a canvas, where scatterplots can be generated between any two neighboring
axes. However, the main goal of this technique is fundamentally different from ours in that it attempts to flexibly coordinate and place
multiple scatterplots on a large canvas, while our focus is on improving a single scatterplot for better supporting the interactive exploration
of data based on a more sophisticated, user-driven axis specification.
Further, Kondo and Collins have shown how directly interacting with
visualizations can be used for revealing temporal trends and relationships between data items [30]. Their work allowed users to manipulate
the position of data points in a scatterplot to reveal the temporal trends
in data, again enabling interactions directly on the data items in a scatterplot to parameterize a data model.

通常,通过降维技术创建的轴由原始数据维度的线性或非线性组合定义。这种复杂性可能导致领域专家解释数据可视化的信任和解释挑战[10]。例如,用户可能质疑他们对模式的解释是否值得信赖
如果它只是尺寸缩小技术的工件。更基本的是,仅使用二维代表相当高的维度数据就不可避免地会涉及重大的信息丢失失真。为了克服这些问题,各种用户交互都有被用于许多视觉分析系统。
用户交互的一种方法是通过直接操纵维度降低模型参数。例如,Jeong et al。提出了iPCA,一种视觉分析应用程序,可以使用PCA在2D散点图中显示高维数据[26]。他们利用图形控件(例如滑块),以使用户能够直接操纵PCA中使用的主要成分的重量。结果,用户的调整产生新的投影(即新的散点图)。其他应用程序也使用了类似的交互指南,例如名为STREAMIT [2]的文本可视化系统。用于将用户交互纳入的一组不同的技术
这样的视觉分析系统也存在。语义交互技术通过基于在可视化中执行的直接交互来推断模型更新而起作用[16,17]。例如,Endert et
人。已经表明如何直接操纵一个点的位置2D散点图可用于推断PCA,MDS,和GTM [18]。这些推论也可以用于出口在维度降低步骤中计算出的距离函数的规范,以便可以重用,共享或简单地保存[5]。
除了操纵数据项与分散图进行交互之外,研究人员还研究了操纵特征或尺寸的相互作用技术。 Yi等已经提出了一种称为尘埃的技术
&Magnet,允许用户另外将功能或维度放在散点图上,以查看哪些数据项
这些特征或尺寸的大值[49]。对于文本分析,
VIBE系统允许用户执行与关键字的类似交互[35]。此外,Turkay等提出了一种使用技术双散点图其中一个显示数据项,而另一个显示
特征[44]。通过提供刷洗和连接以及过滤
对这两个散点图中的数据项和特征的操作,
用户可以检查数据项中的主要模式以及异常值
本文提出的技术遵循与数据项和特征相互作用的类似思想,但主要的新颖性
针对现有工作的提出的技术在于能力
直接定义和解释2D散点图的轴
将我们感兴趣的数据项分配给轴。在这方面,我们的
工作涉及到PivotSlice,这是赵先生最近提出的一种技术
et al。这允许分面浏览高维数据[50],as
它允许用户在散点图的轴上指定数据属性
直接将属性拖到轴上。然而,我们的技术使用户能够将数据对象(而不是数据属性)拖到轴上。
此外,所提出的技术不将散点图划分成a
多方面的观点
此外,一种称为灵活连接轴的技术[11]与我们在不同方面的工作有关系。也就是说,这种技术
是一种不同类型的互动,允许用户在画布上绘制轴,其中可以在任何两个相邻之间生成散点图
轴。然而,这种技术的主要目标是与我们的根本不同,它试图灵活地协调和放置
大型画布上的多个散点图,而我们的重点是改进单个散点图,以更好地支持互动式探索
的数据基于更复杂的用户驱动的轴规范。
此外,Kondo和Collins已经展示了如何直接相互作用
可视化可用于揭示数据项之间的时间趋势和关系[30]。他们的工作允许用户操纵
数据点在散点图中的位置,以揭示时间趋势
在数据中,再次在分散图中直接对数据项进行交互以参数化数据模型。

3 PROPOSED TECHNIQUE
To realize the proposed interaction technique, we built a proof-ofconcept visual analytics system. In this section, we describe (1) the
overall design of the proposed visual analytics system, (2) the proposed interaction to steer the axis in a user-driven manner, (3) the underlying mathematical details to support the proposed user interaction,
(4) the design rationale, and (5) the implementation details of the proposed system.

3提出的技术
为了实现所提出的交互技术,我们构建了一个验证视觉分析系统。 在本节中,我们将描述(1)
提出的视觉分析系统的总体设计,(2)提出的以用户驱动的方式操作轴的交互作用,(3)支持提出的用户交互的基础数学细节,
(4)设计理由,(5)拟议制度的实施细则。

3.1 System Design
As shown in Fig. 1 by using the well-known Car dataset, which consists of 387 data items with 18 attributes,1 the proposed system mainly
contains three panels: (1) the scatterplot view (Fig. 1(A)), (2) the
axis interaction panel to support the proposed interaction capabilities
(Fig. 1(B-D)), and the data detail view (Fig. 1(E)).
The user interaction technique presented in this paper fosters a visual data exploration process grounded in the principles of semantic
interaction techniques [16, 17]. That is, the system interprets the analytical reasoning of exploratory user interactions to steer the underlying data model. The generic workflow supported by our user interaction technique is as follows:

  1. The user observes two data points that define the difference between the two semantic groupings (e.g., “nice cars” and “bad
    cars”).
  2. The user drags one data item to each side of the axis.
  3. Interaxis computes the weighting of data attributes that supports
    these higher-level groupings (Eq. 1). The weights are displayed
    in the bar chart below the axis.
  4. The scatterplot updates to reflect the newly defined axis, where
    data items are placed according to the similarity on either side of
    the axis (Eq. 2).
  5. The user can refine the semantic grouping by adding/removing
    data points or directly modifying the weighting in the visualization below the axes.
  6. The user can save the axis for future use and continue to explore
    the visualization iteratively by using the same interaction concept
    based on different semantic groupings.

The scatterplot view provides a 2D overview of the data. By default,
the first and the second features of data, e.g., Retail Price and HP
(Horsepower), are assigned to the x and the y axes, respectively, but
this initial view can be set up by using a dimension reduction method
such as PCA [27] to provide another starting point. Data points are represented as semi-transparent circles so that regions with overlapped
data points can be highlighted. The scatterplot view supports zoom
and pan via mouse wheel operations on a white space (to zoom on
both axes simultaneously) or over a particular axis (to zoom only on
this axis). Hovering over or clicking on a data point, one can check the
full details (or the original high-dimensional information) of the data
item in the data detail view (Fig. 1(E)).
The axis interaction panel consists of two drop zones (the high-end
and the low-end of each axis), which the user drags data points into in
order to steer the axis (Fig. 1(B)), an interactive bar chart (Fig. 1(C)),
and a sub-panel (Fig. 1(D)) containing buttons to save the current axis
for further use or to clear the data points currently assigned to the axis
and a combo box to change the axis back to one among the original
features or the previously defined axes. The bars in the interactive
bar chart represent the contributions/weights of attributes to the corresponding axis. The longer the length of a bar is, the stronger its corresponding attribute contributes to the axis. The bars are color-coded
by the signs of their weights: positive contributions in blue and negative contributions in red. Data points that are high on the positively
weighted (blue-colored) attributes will be placed on the high-end side
of the axis. Data points that are high on the negatively weighted attributes will be placed on the low-end side of the axis. For example,
in Fig. 1(C), sedans tend to be on the left side of the scatterplot, while
sports cars and cars with rear-wheel drive (RWD) tend to be on the
right side. Positive and negative weights represent the magnitude and
at which end of the axis the data points with those attributes will be
placed

3.1系统设计
如图1所示,通过使用著名的Car数据集,其中包括387个具有18个属性的数据项,1个主要提出的系统
包含三个面板:(1)散点图(图1(A)),(2)轴互动面板来支持所提出的交互能力(图1(B-D))和数据细节图(图1(E))。
本文提出的用户交互技术,建立在语义学原理基础上的可视化数据挖掘过程交互技术[16,17]。也就是说,系统解释了探索性用户交互的分析推理,以引导基础数据模型。我们的用户交互技术支持的通用工作流程如下:
1.用户观察定义两个语义分组之间的差异的两个数据点(例如,“漂亮的车”和“不好的”汽车”)。
2.用户将一个数据项拖到轴的每一侧。
3.Interaxis计算支持的数据属性的权重这些较高级别的分组(等式1)。显示权重在轴下方的条形图中。
4.散点图更新以反映新定义的轴,其中数据项根据两边的相似度进行放置在轴的一侧(方程2)。
5.用户可以通过添加/删除来细化语义分组数据点或直接修改轴下的可视化中的权重。
6.用户可以保存轴以备将来使用,并继续探索通过使用相同的交互概念迭代地进行可视化基于不同的语义分组。
散点图提供了数据的2D概述。默认,数据的第一和第二个特征,例如零售价和HP(马力)分别分配给x轴和y轴,但是可以通过使用尺寸缩小方法来设置此初始视图如PCA [27]提供了另一个起点。数据点被表示为半透明圆圈,使得具有重叠的区域数据点可以突出显示。散点图视图支持缩放
并通过鼠标滚轮操作在白色空间(以放大同时)或在特定的轴上(仅缩小这个轴)。悬停或点击数据点,可以检查数据的完整细节(或原始高维信息)数据详细视图中的项目(图1(E))。
轴互动面板由两个放置区(高端组成和每个轴的低端),用户将数据点拖入为了引导轴(图1(B)),交互式条形图(图1(C)),和包含用于保存当前轴的按钮的子面板(图1(D))用于进一步使用或清除当前分配给轴的数据点和一个组合框将轴更改回原来的一个特征或先前定义的轴。在互动的条形图表示属性对相应轴的贡献/重量。条的长度越长,其对应的属性越强于轴。条形框是彩色编码的通过他们的权重的迹象:积极的贡献在蓝色和负面的贡献在红色。数据点高,积极加权(蓝色)属性将被放置在高端端的轴。负权重属性高的数据点将放置在轴的低端侧。例如,
在图1中。 1(C),轿车往往位于散点图的左侧,而
具有后轮驱动(RWD)的跑车和汽车倾向于在
右边。正负权重表示大小和在轴的哪一端,数据点与这些属性将一起
放置

Fig. 1. An overview of the proposed visual analytics system, InterAxis, showing a car dataset, which includes 387 data items with
18 attributes. The proposed system contains three panels: (A) the scatterplot view to provide a two-dimensional overview of data,
(B-D) the axis interaction panel to support the proposed interaction capabilities, and (E) the data detail view to show the original
high-dimensional information of the data items of interest. The axis interaction panel (B-D) consists of (B) two drop zones (the
high-end and the low-end of each axis), which a user drags data points into in order to steer the axis, (C) an interactive bar chart,
and a sub-panel containing buttons to save the current axis for future use (D, middle) or to clear the data points currently assigned
to the axis (D, right) and a combo box to change the axis back to one among the original features or the previously created axes
via our interaction (D, left).

图1.提出的视觉分析系统的概述,InterAxis,显示一个汽车数据集,其中包括387个数据项 18个属性。 所提出的系统包含三个面板:(A)散点图视图以提供数据的二维概述,(B-D)轴互动面板支持提出的交互能力,(E)数据详细视图显示原始感兴趣的数据项的高维信息。 轴相互作用面板(B-D)由(B)两个放置区组成
每个轴的高端和低端),用户拖动数据点以引导轴,(C)交互式条形图,
和一个子面板,其中包含保存当前轴以供将来使用(D,中间)或清除当前分配的数据点的按钮
到轴(D,右)和组合框将轴更改回原始要素或先前创建的轴之一
通过我们的互动(D,左)。

图1

3.2 Interactive Axis Steering
The proposed method provides two types of interactions: (1) data-level
axis steering and (2) attribute-level axis manipulation. Data-level axis
steering is prompted by dragging a data point from the scatterplot into
the two drop zones at the high- and the low- end of the axis. Attributelevel axis manipulation is prompted by directly adjusting the bars in
the interactive bar chart.
The main idea of the proposed interaction for steering the axis in
a user-driven manner lies in an intuitive process of incorporating data
items seamlessly while exploring data in a scatterplot. For example,
when a user finds data points that he likes (or dislikes) in the scatterplot, he can drag them to the high-end (or the low-end) drop zone of
an axis (Fig. 1(B)). Accordingly, a new axis is formed by reflecting
these choices of data items, which will then update the scatterplot on
the basis of the newly formed axis. The technical details about how
we form a new axis will be described in the next section.
How the axis is formed from this process is summarized and visualized as a bar chart (Fig. 1(C)) so that a user can get an idea about
how much a particular original feature or dimension is emphasized or
de-emphasized. Given such a bar chart, a user can further refine the
meaning of an axis by directly manipulating the length of each bar
via drag-and-drop operations on the tip of the bar (attribute-level axis
manipulation).
The entire interaction process can be dynamic and iterative. That is,
a user can additionally assign new data items to an axis or remove data
items that was already assigned to an axis. Furthermore, the abovedescribed direct manipulation on the bar chart can be performed at
any moment during such an interactive exploration of the bar chart.
Finally, a user can save the current definition of an axis, and then it is
registered as a new entry in the combo box (Fig. 1(D, left)) so that a
user can later recover the axis to a previously saved one.

3.2交互轴操作
所提出的方法提供了两种类型的交互:(1)数据级
轴转向和(2)属性级轴操纵。数据级轴通过将数据点从散点图拖到中来提示转向在轴的高端和低端的两个落下区域。通过直接调整条形来提示属性级别的轴操作
互动条形图。
提出的相互作用的主要思想是将轴转向
用户驱动的方式在于并入数据的直观过程
在散点图中探索数据时,项目无缝连接。例如,
当用户在散点图中找到他喜欢(或不喜欢)的数据点时,他可以将它们拖到高端(或低端)下拉区域
轴(图1(B))。因此,通过反射形成新的轴
这些数据项的选择,然后将更新散点图
新形成轴的基础。技术细节如何
我们形成一个新的轴将在下一节描述。
如何从这个过程形成轴是总结和可视化为条形图(图1(C)),以便用户可以得到一个想法
要强调特定原始特征或维度多少
去加重。给定这样的条形图,用户可以进一步细化
通过直接操纵每个条的长度来表示轴
通过拖动操作在杆的顶端(属性级轴
操作)。
整个交互过程可以是动态的和迭代的。那是,
用户可以另外向轴分配新的数据项或删除数据
已分配给轴的项目。此外,可以在条形图上进行上述的直接操纵
在条形图的这种互动探索过程中的任何时刻。
最后,用户可以保存轴的当前定义,然后是
在组合框中注册为新条目(图1(D,左)),以便a
用户可以稍后将轴恢复到以前保存的轴。

3.3 Underlying Techniques根本技术
In this section, we describe the underlying technique for the proposed
user interaction of forming the axis via data items. For the sake of
brevity, we consider only the x axis (the horizontal axis) in a scatterplot, but the following description can be generalized to the y axis in
the same manner

在本节中,我们描述了提出的基础技术
用户通过数据项形成轴的交互作用。 为了
简而言之,我们在散点图中仅考虑x轴(横轴),但以下描述可以推广到y轴同样的方式

Data preprocessing. As will be discussed later, the underlying
model to define the axis is based on a linear combination of the original dimensions. To this end, we adopt data preprocessing steps used in linear regression models [14]. For a categorical variable with c different categories, we use dummy encoding, which converts it to a cdimensional indicator vector where the value of each dimension is 1
if a data item is in the category of the corresponding dimension and
0 otherwise. Next, we scale and translate each dimension (including
both indicator and numerical variables) so that its value is exactly in
the range from 0 to 1
Linear transformation. Assuming that such data preprocessing is
done, we denote a set of high-dimensional vectors of data items that
the user assigned (via a drag-and-drop) to the high-end of the x axis
as , ax n,xh,h� and a set of those that he dragged into
the low-end side of the x axis as�, where
n
x,h and nx,l represent the total number of the assigned points to the
high-end and the low-end of the x axis, respectively. Now, we define
the linear transformation vector for the x axis as follows:

This is then further scaled to have a unit Euclidean norm.
One can define the linear transformation vector T
y for the y axis
in the same manner. Every data item is mapped to the x axis (and
the y axis) via the transformation Tx (and Ty). That is, the i-th data
item whose high-dimensional vector is represented as ai is mapped to
a point in our 2D scatterplot so that its 2D coordinates are represented
as follows:

Owing to the easy interpretability of this linear model, one can understand the meaning of this transformation in a straightforward manner. That is, the resulting x axis basically emphasizes the features
or dimensions that have large values on the high-dimensional vectors
contained in Ax,h but have low values on those in Ax,l. On the other
hand, we de-emphasize the features that have low values on the vectors
contained in Ax,h but have high values on those in Ax,l. In this manner,
as a data item has larger (or lower) values on these emphasized dimensions and lower (or higher) values on the de-emphasized dimensions,
its x coordinate will have a higher (or lower) value, appearing more on
the right (or left) side of the x axis. The notations used in this section
are summarized in Table 1.

**数据预处理如下文将讨论的,底层
定义轴的模型是基于原始尺寸的线性组合。为此,我们采用线性回归模型中使用的数据预处理步骤[14]。对于具有c个不同类别的分类变量,我们使用虚拟编码,将其转换为维度指示符向量,其中每个维度的值为1
如果数据项在相应维度的类别中
否则为0。接下来,我们缩放和翻译每个维度(包括
指标和数值变量),使其值正好在
范围从0到1
线性变换。假设这样的数据预处理是
完成,我们表示一组数据项的高维向量
用户(通过拖放)分配到x轴的高端
as,ax n,xh,h?和一组他拖入的那些
x轴的低端侧为?,哪里?
ñ
x,h和nx,l表示分配给的点的总数
高端和低端的x轴分别。现在,我们定义
x轴的线性变换矢量如下:

然后进一步缩放以具有单位欧几里得规范。
可以定义线性变换向量T
y为y轴
以相同的方式。每个数据项都映射到x轴(和
y轴)通过转换Tx(和Ty)。也就是说,第i个数据
将其高维向量表示为ai的项目映射到
我们的2D散点图中的一个点,以便表示其2D坐标
如下:

由于这种线性模型的易解释性,可以直接的方式了解这种变换的含义。也就是说,所得到的x轴基本上强调了特征
或在高维度向量上具有大值的尺寸
包含在Ax,h中,但在Ax,l中的值较低。在另一
手,我们不强调在向量上具有低价值的特征
包含在Ax,h中,但在Ax,l中具有高值。以这种方式,
因为数据项在这些强调维度上具有更大(或更低)的值,而在减重维度上具有较低(或更高)的值,它的x坐标将具有更高(或更低)的值,更多出现x轴的右侧(或左侧)。本节中使用的符号总结在表1中。
**

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 211,743评论 6 492
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 90,296评论 3 385
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 157,285评论 0 348
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 56,485评论 1 283
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 65,581评论 6 386
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 49,821评论 1 290
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 38,960评论 3 408
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 37,719评论 0 266
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 44,186评论 1 303
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 36,516评论 2 327
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 38,650评论 1 340
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 34,329评论 4 330
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 39,936评论 3 313
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 30,757评论 0 21
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 31,991评论 1 266
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 46,370评论 2 360
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 43,527评论 2 349

推荐阅读更多精彩内容

  • 用我经历的不太多的年华,讲一个暗恋的故事。 好像每一个学年的结束,都是在这种流汗不止的夏天。 ...
    May长安阅读 374评论 0 4
  • 漆黑的夜 精灵跳跃在 火红的云朵中 我想象着你的世界 希望有自己的一片蓝天 风不是绵柔的 也不曾轻拂火热的双颊 我...
    悟_晨晖阅读 158评论 0 1
  • 在这个理所应当的世界,仿佛我们接受什么东西都成为了理所应当的,例如:在车上给老人让座,不是我不尊老,就我们有一...
    熬夜成瘾N阅读 402评论 0 1
  • 已经参加简书一周期,每一天,为了完成作业,拿回我的本金,真的是拼了老命。但我发现,真的有内容可写的日子,...
    犹佑阅读 385评论 0 0
  • 现在来谈规则。你在上一章已经看到第五个规则了:找出关键字,与作者达成共识。第六个规则可以说是:将一本书中最重...
    小狮子在路上阅读 133评论 0 0