Published on

Colors and Data

Authors
Article Cover
Table of Contents

1. Overview

Visualizations are tools that can help people understand difficult ideas better. And a bad choice of color can hide important information and cause confusion. Graphics can help send the right message if they use the right color palettes. Color makes a chart look better and makes it easier for people to understand the data it shows.

Overview

Based on the types of data, the colors used for data visualization can be put into three groups: categorical colors, sequential colors, and diverging colors.

2. Types of Data

Type of Data

There are two main types of data

  • Qualitative/ categorical data, fields that contain qualitative information are dimensions.
  • Quantitative/ numerical data - will always be a number that can be measured, which called a metric.

Based on these we can classify to four primary levels of measurement are nominal, ordinal, interval, and ratio.

Types of dataExample
QualitativeNominal

Categories without order. These are discrete and unique categories that have no inherent order. These variables are also called factors.

  • Languages: English, Chinese, Vietnamese, etc.
  • Relationship status: married, single, etc.
  • Public transportation bus, train, car, etc.
Ordinal

Categories with order. These are discrete and unique categories with an order. These variables are also called ordered factors.

  • Economic status: poor, middle income, wealthy
  • Likert scales: Very satisfied, satisfied, neutral, dissatisfied, very dissatisfied
QuantitativeDiscreteNominal

When numbers are assigned to characteristics for the purpose of data classification arbitrarily and without any regard to order.

Gender:

  • Females are assigned the number 1
  • Males are assigned the number 2
Ordinal

When numbers are purposefully assigned to data that have a sense of rank or order, but the magnitude of difference between those numbers is not known or cannot be measured.

Grade score can range from as low as 0 and as high as 20.
ContinousInterval
  • When numbers have units that are of equal magnitude as well as rank order on a scale without an absolute zero
  • Scales of this type can have an arbitrarily assigned "zero", but it will not correspond to an absence of the measured variable.
Temperature in Fahrenheit scale: 10, 0, +10, +20, +30.
Ratio
  • When numbers have units that are of equal magnitude as well as rank order on a scale with an absolute zero
  • Distance (from zero miles/km upwards)
  • Time intervals

3. Colors and Data

3.1. Categorical Colors

Colors Palete

A qualitative palette is used when the variable is a categorical data. Categorical variables are those that have different labels but don't have an inherent order.

Categorical colors are optimized for maximum differentiation. Use these for a nominal scale. Do not use them for ordinal, interval, or ratio scales.

Some examples include country, languages, and gender. Each possible value of the variable is given a color from a qualitative palette.

In a qualitative palette, the colors for each group need to be different, and you should try to keep the palette to a maximum of six colors that have been optimized and won't be too confusing.

3.2. Sequential Colors

Colors Palete

A sequential palette can be used when the variable that is supposed to be colored is a number or has values that are naturally ordered. Using these colors for dimensions can make it harder to see the numbers and cause people to misunderstand visualizations.

Sequential colors are optimized for numeric meaning. Use these to create ordinal and interval scales or also use these for ratio scales. Don't use these as scales for categorical data. Using these colors for dimensions can make it harder to see the numbers and cause people to misunderstand visualizations.

3.3. Diverging Colors

Colors Palete

If our numeric variable has a central value that makes sense, like zero, we can use a diverging palette. Diverging palettes are made up of two sequential palettes that share an endpoint at the central value. Colors on one side of the center are given values that are bigger than the center, while colors on the other side are given values that are smaller than the center.

Diverging colors are designed to be balanced from a central midpoint. Use these for ordinal and ratio scales, especially when there's a meaningful middle value. You can also use these to make interval scales. Don't use these with categorical data.

4. More about Data | Inferential statistics

Statistical approaches are classified into two types: descriptive statistics and inferential statistics. Parametric tests are used to examine quantitative (rather than qualitative) information, whereas non-parametric tests are more typically employed for qualitative, non-numerical data.

Inferential statistics DefinitionCharacteristicsType of dataExample
Nonparametric Statistical Tests

Refers to the use of statistical tests or methods when the data being studied comes from a sample or population of people that does not follow a normal distributed.

  • Assumes patient population being studied is not normally distributed (i.e., as seen with outliers)
  • The usual central measure is a median
  • Nominal
  • Ordinal
  • Mann-Whitney test (assumes 2 independent groups (i.e., not related) being studied)
  • Kruskal-Wallis test (assumes > 2 independent groups being studied/compared)
  • Spearman (correlation test)
Parametric Statistical Analysis

Refers to the use of statistical tests or methods when the data being studied comes from a sample or population of people that is normally distributed.

  • Assumes patient population being studied is normally distributed
  • Assumes the variance is homogeneous - The usual central measure is a mean
  • Interval
  • Ratio
  • T-test (assumes 2 independent groups (i.e., not related) being studied)
  • One-way ANOVA (assumes > 2 independent groups being studied/compared)
  • Pearson (correlation test)

5. References

  1. Wilke, C. O. (2019). Fundamentals of data visualization: a primer on making info
  2. Gaddis, M. L., & Gaddis, G. M. (1990). Introduction to biostatistics: part 1, basic concepts. Annals of emergency medicine19(1), 86-89.
  3. EBMC. (n.d.). Nominal Data
  4. CareerFoundry. (2021, May 7). What is Ratio Data?
  5. EBMC. (n.d.). Nonparametric Statistical Analysis.
  6. EBMC. (n.d.). Parametric Statistical Analysis.
  7. Adobe Spectrum. (n.d.). Data Visualization Fundamentals.
  8. Adobe Spectrum. (n.d.). Color for Data Visualization.