Introduction

We will be discussing fundamental graphs and data transformation. In the previous article, we discussed data and data abstraction. Here, I will go from data to visual representation.

How to visualize?

It is a two-step process:

  • Step 1: Select and Transform

  • Step 2: Choose or Design appropriate representation

We will first go by step 2 because it's easier to first study the number of predefined fundamental graphs and what kind of data they can accommodate, and what kind of information they can communicate and then how some transformations of the original data is needed before the data can be visualized with these graphs.

Fundamental Graphs

Categorical(C), Quantitative(Q), Ordinal(O)

  • Bar Chart

    • It allows visualizing how a quantity distributes across a set of categories
    • C/O + Q
  • Line Chart

    • It displays how a quantity changes with another quantity which is mostly time.
    • T + Q
    • Alternate: Area Chart
  • Scatter Plot

    • It displays how a quantity relates to another quantity.
    • Q + Q
    • Alternate: Slope Chart
  • Matrix Plot

    • It displays how a quantity distributes across two categories.
    • C/O + C/O + Q
    • Alternate: Stacked Bar chart, Bar Graph
  • Symbol Map

    • It displays how a quantity distributes across two spatial coordinates.
    • S + Q
    • Alternate: Bar Graph

More than two attributes:

  • Stacked Bar Chart

    • We have as many bars as the number of categories that are included in the first categorical attribute and as many segments within each bar as the values that are in the other categorical attribute.

    • It is very good when your main question is regarding the proportion. If it's important to understand what is the proportion of values within each category, it's very good to communicate proportions, sometimes this is also called part-to-whole information.

  • Grouped Bar Chart

    • We have the same bar graph repeated multiple times for the number of categories that exist for the other categorical attribute.

    • It is better when the goal is to compare every single value one to another.

Faceting and Small multiples

  • Faceting

    • Select one categorical/ordinal attribute
    • Create as many sets as no. of values
    • Create one plot for each value
  • Small multiples - When we have split a bigger plot into several plots

Now, we are going back to the first step of data visualization

What is Selection?

Every time designing a new visual representation requires choosing which attributes are going to be used for these visual representations. This process or step is called selection.

Typically, when we have more attributes than what we need to visualize. So, first, we have to figure out which of these attributes we need to select, to create the visualization that we need, and many visualizations require an intermediate step, which typically is the aggregation or other transformations.

  • Aggregation - Common aggregation functions that are used are the sum, the maximum, the minimum, the average, the median, and the standard deviation, but there may be situations where we may need to calculate some other type of information.

  • Useful Transformations:

    • Transformations related to attributes that encode information on time and date. Aggregation by days, weeks, months, years.

    • Transformation related to spatial data

    • Transformation of the quantitative attribute to ordinal attribute(Binning)

      • Taking quantities and binning them into several categories and then sorting them according to their values.
    • Rescaling/Re-expressing a given quantitative attribute(Normalization)

      • If the attribute has a given minimum and maximum value, one can represent the same range using a different scale.
    • Transforming quantitative values into percentages

Creating the right, effective visual representation for a given problem is not only about finding the right graphical format, but also finding the right information. It's rarely the case that we can take the original data and represent it as it is. We need some intermediary transformations.

Data transformation is a crucial step in visualization design. Visualization design is never only about finding the right graphical representation, but also finding the right data transformation.

We are clear about the process of data abstraction and choosing a graph that is appropriate for a given type of data.

Now, We will discuss what kind of individual components can be used to create a visualization that is appropriate for a type of data that we have and the goal that we have. This concept of graphical components and mapping between data and graphical components mainly for two reasons.

  • We can better understand the visualization if we know how to visually encode and decode the representation, so it is a useful evaluation tool.

  • It's helpful in designing and re-designing visualization.

Visual Encoding

These are the rules that a person implements in a computer program to transform data into a graphical representation.

  • Visual Marks

    • Graphical elements representing data items
    • Points, Line, Bar, Area
  • Visual Channels

    • Encode properties of data-items
    • Position, Size, Angle & Slope, Color, Texture & Shape
Visual Decoding
  • It is the reverse of visual encoding, going from observing a visualization and trying to figure out rules/mapping rules and graphical components of visualization.

  • Step 1 -> Identify graphical components explicitly(Visual Marks)

  • Step 2 -> Identify mapping rules means what data items represent

  • Step 3 -> Identify Visual channels

How good a visual encoding is?

The goodness of a visual representation is decided by two principles

Expressiveness Principle

  • It states that "the visual representation should represent all and only the relationships that exist in the data."

  • It means that the visual representation should represent the information that is present in the data, but even more important, it shouldn't convey information that is not contained in the data.

Effectiveness Principle

  • It states that the relevance of information that is displayed should match the effectiveness of the channel.

  • Use more effective channels.

Rank

Contextual Components

They are really important for visualization and helps in interpreting the visualization.

  • Legends, Labels and Annotations

    • Legends and Labels enable the interpretation of the graphical elements.
    • Annotations guide attention and explain patterns of interest.
  • Axis, Grids and Reference lines

    • These enable value reading and comparison.

This concludes the article. We introduced a series of fundamental graphs, described how to use them effectively, and to transform data to give it the shape that is needed to convey certain types of information. We discussed the individual graphical components that one can use to build a visualization and to use specific types of encoding rules to transform information and data more in general, into visual representations.

Did you find this post useful?

I would be really grateful if you let me know by sharing it on Twitter!

Follow me @ParthS0007 for more tech and blogging content :)