- Published on
Colors and Data
- Authors
- Name
- Mai TIEU Khoi
- @tieukhoimai
Table of Contents
1. Overview
Visualizations are tools that can help people understand difficult ideas better. And a bad choice of color can hide important information and cause confusion. Graphics can help send the right message if they use the right color palettes. Color makes a chart look better and makes it easier for people to understand the data it shows.
Based on the types of data, the colors used for data visualization can be put into three groups: categorical colors, sequential colors, and diverging colors.
2. Types of Data
There are two main types of data
- Qualitative/ categorical data, fields that contain qualitative information are dimensions.
- Quantitative/ numerical data - will always be a number that can be measured, which called a metric.
Based on these we can classify to four primary levels of measurement are nominal, ordinal, interval, and ratio.
Types of data | Example | |||
---|---|---|---|---|
Qualitative | Nominal | Categories without order. These are discrete and unique categories that have no inherent order. These variables are also called factors. |
| |
Ordinal | Categories with order. These are discrete and unique categories with an order. These variables are also called ordered factors. |
| ||
Quantitative | Discrete | Nominal | When numbers are assigned to characteristics for the purpose of data classification arbitrarily and without any regard to order. | Gender:
|
Ordinal | When numbers are purposefully assigned to data that have a sense of rank or order, but the magnitude of difference between those numbers is not known or cannot be measured. | Grade score can range from as low as 0 and as high as 20. | ||
Continous | Interval |
| Temperature in Fahrenheit scale: 10, 0, +10, +20, +30. | |
Ratio |
|
|
3. Colors and Data
3.1. Categorical Colors
A qualitative palette is used when the variable is a categorical data. Categorical variables are those that have different labels but don't have an inherent order.
Categorical colors are optimized for maximum differentiation. Use these for a nominal scale. Do not use them for ordinal, interval, or ratio scales.
Some examples include country, languages, and gender. Each possible value of the variable is given a color from a qualitative palette.
In a qualitative palette, the colors for each group need to be different, and you should try to keep the palette to a maximum of six colors that have been optimized and won't be too confusing.
3.2. Sequential Colors
A sequential palette can be used when the variable that is supposed to be colored is a number or has values that are naturally ordered. Using these colors for dimensions can make it harder to see the numbers and cause people to misunderstand visualizations.
Sequential colors are optimized for numeric meaning. Use these to create ordinal and interval scales or also use these for ratio scales. Don't use these as scales for categorical data. Using these colors for dimensions can make it harder to see the numbers and cause people to misunderstand visualizations.
3.3. Diverging Colors
If our numeric variable has a central value that makes sense, like zero, we can use a diverging palette. Diverging palettes are made up of two sequential palettes that share an endpoint at the central value. Colors on one side of the center are given values that are bigger than the center, while colors on the other side are given values that are smaller than the center.
Diverging colors are designed to be balanced from a central midpoint. Use these for ordinal and ratio scales, especially when there's a meaningful middle value. You can also use these to make interval scales. Don't use these with categorical data.
4. More about Data | Inferential statistics
Statistical approaches are classified into two types: descriptive statistics and inferential statistics. Parametric tests are used to examine quantitative (rather than qualitative) information, whereas non-parametric tests are more typically employed for qualitative, non-numerical data.
Inferential statistics | Definition | Characteristics | Type of data | Example |
---|---|---|---|---|
Nonparametric Statistical Tests | Refers to the use of statistical tests or methods when the data being studied comes from a sample or population of people that does not follow a normal distributed. |
|
|
|
Parametric Statistical Analysis | Refers to the use of statistical tests or methods when the data being studied comes from a sample or population of people that is normally distributed. |
|
|
|
5. References
- Wilke, C. O. (2019). Fundamentals of data visualization: a primer on making info
- Gaddis, M. L., & Gaddis, G. M. (1990). Introduction to biostatistics: part 1, basic concepts. Annals of emergency medicine, 19(1), 86-89.
- EBMC. (n.d.). Nominal Data
- CareerFoundry. (2021, May 7). What is Ratio Data?
- EBMC. (n.d.). Nonparametric Statistical Analysis.
- EBMC. (n.d.). Parametric Statistical Analysis.
- Adobe Spectrum. (n.d.). Data Visualization Fundamentals.
- Adobe Spectrum. (n.d.). Color for Data Visualization.