Data Abstraction & Data Transformation

Introduction

To create visualizations, we have to start with some data. In this article, we will focus on understanding what data is and what is the role that it plays in the context of information visualization.

What is Data?

Factual information such as measurements or statistics, used as a basis for reasoning, discussion or calculation. Data as a general concept refers to the fact that some existing information or knowledge, is represented or coded in some form suitable for better usage or processing.

I have mentioned the information visualization pipeline in the previous article, the process of creating data from visualization is visual encoding but before that we have two more stages of transformation.

There are typically three main stages of data processing:

  • Data Collection - We go from the phenomenon to collecting some data.
  • Data Transformation - We need to transform the data to have it in a format or configuration.
  • Visual Encoding - Transforming data into a visual form.
Data Abstraction
  • It is the idea that we can describe data in ways that helps to make choices about available and appropriate encoding methods.
  • It is a way to recognize common structures in data coming from very different domains.

Dataset Types

A Dataset can be described as a collection of items and each item having several attributes. We generally have two types of datasets:

  • Tables

    • Collection of rows and columns.
    • Every row represent one item, one object of the dataset.
    • Every column represents one attribute of these objects.
  • Networks and Trees

    • Collection of nodes and links.
    • Attributes can be associated with nodes and links

Attributes Types

There are three main types of attributes:

  • Categorical Attributes

    • Contains values that describe categories
    • Don't have particular order
    • Example: Hair color(White, Black, Blonde, Brown)
  • Ordinal Attributes

    • Contains values that describe categories
    • Have some order
    • Example: Economic Status(Low, Medium, High)
  • Quantitative Attributes

    • Contains values that represent some measured quantities.
    • Example: Weight and Height of a person

Attribute Semantics

It means the meaning of attributes. It is very useful for identification.

  • Spatial Semantics

    • Attribute describes some spatial characteristics or geographical characteristics
    • Example: Latitude and Longitude of different of places
  • Temporal Semantics

    • Attribute describe something related to time.
    • Example: Timestamp
  • Sequential Attribute

    • Some sequence like ids of user
  • Diverging Attribute

    • For a given quantity, it is possible to identify a zero value and above this value, all the elements are positive, and below this value, all the elements are negative.
  • Cyclic Attribute

    • Cycles of months in many years
  • Hierarchical Attribute

    • When a category has a sub-category and it becomes hierarchical when put together.

Why is it useful to identify attribute types?

If we know what kind of attribute and an attribute is, is going to give us guidance in selecting appropriate graphical visual representations.

Now we can conclude that data abstraction is a very important activity and even if data comes from different domains and describes very different phenomena, there are common characteristics that can be identified, and these characteristics are useful to decide what visual representations are appropriate for your data.

Did you find this post useful?

I would be really grateful if you let me know by sharing it on Twitter!

Follow me @ParthS0007 for more tech and blogging content :)