Data visualization
Published:
Data visualization is viewed by many disciplines as a modern equivalent of visual communication. It involves the creation and study of the visual representation of data, meaning “information that has been abstracted in some schematic form, including attributes or variables for the units of information”.
A primary goal of data visualization is to communicate information clearly and efficiently via statistical graphics, plots and information graphics. Numerical data may be encoded using dots, lines, or bars, to visually communicate a quantitative message. Effective visualization helps users analyze and reason about data and evidence. It makes complex data more accessible, understandable and usable. Users may have particular analytical tasks, such as making comparisons or understanding causality, and the design principle of the graphic (i.e., showing comparisons or showing causality) follows the task. Tables are generally used where users will look up a specific measurement, while charts of various types are used to show patterns or relationships in the data for one or more variables.
Sometimes data visualization cross the line of science to be art. It is viewed as a branch of descriptive statistics by some, but also as a grounded theory development tool by others. The rate at which data is generated has increased. Data created by Internet activity and an expanding number of sensors in the environment, such as satellites, are referred to as “Big Data”. Processing, analyzing and communicating this data present a variety of ethical and analytical challenges for data visualization. The field of data science and practitioners called data scientists have emerged to help address this challenge.
Not only science depends on data visualization:
- Marketing: oriented to sell a product with data information
- Journalism: to express more amount of information easily in a world with every day new events.
- Business: to convince the stakeholders.
- Science: express results or doing exploratory data analysis.
- Services: leverage human-computer interaction through data communication easily.
The main things that a good visualization has to show are:
- show the data
- Focus the attention in the data not in the process of visualization. So, it is good to avoid extra unnecessary things.
- present many numbers in a small space and easy to be understood.
- make large data sets coherent
- encourage the eye to compare different pieces of data
- reveal the data at several levels of detail, from a broad overview to the fine structure
- serve a reasonably clear purpose: description, exploration, tabulation or decoration
- be closely integrated with the statistical and verbal descriptions of a data set
The different main types of information we are going to tackle in problems of visualization are:
- Time-series: A single variable is captured over a period of time. A line chart may be used to demonstrate the trend. Warning with the axis.
- Ranking: Categorical subdivisions are ranked in ascending or descending order. A bar chart may be used to show the comparison across the variable values.
- Part-to-whole: Categorical subdivisions are measured as a ratio to the whole. A pie chart or bar chart can show the comparison of ratios.
- Deviation: Categorical subdivisions are compared against a reference, such as a comparison of actual vs. budget expenses for several departments of a business for a given time period. A bar chart can show comparison of the actual versus the reference amount.
- Frequency distribution: Shows the number of observations of a particular variable for given interval. A histogram, a type of bar chart, may be used for this analysis. A boxplot helps visualize key statistics about the distribution, such as median, quartiles, outliers, etc.
- Correlation: Comparison between observations represented by two variables (X,Y) to determine if they tend to move in the same or opposite directions or if they have a clear pattern.
- Nominal comparison: Comparing categorical subdivisions in no particular order, such as the sales volume by product code. A bar chart may be used for this comparison.
- Geographic or geospatial: Comparison of a variable across a map or layout. A cartogram is a typical graphic used.
The main static visualization are:
- Bar chart: X (categorical variable values), Y as a continuous variable.
- Histogram: X (categorized variable values), Y as a continuous variable.
- Scatter plot: plotting points in the 2d or 3d space. All variables are continuous.
- Network: plot the relationship between elements or statistical patterns that could arise as communities.
- Streamgraph: Different time-series for different categories. It is useful for stream data.
- Treemap: Square subdivided in smaller squares. It is used for hierarchical data or proportions.
- Gantt chart: show processes as bars. It is useful for scheduling and planning tasks.
- Heat map: 2d grid with each square colored depending on the value. It is like a histogram or bar chart but for 2 independent variables.
See also
Material
Books
- Post, F. H., Nielson, G., & Bonneau, G. P. (Eds.). (2012). Data visualization: the state of the art (Vol. 713). Springer Science & Business Media.
- Lieberman, L. (2010). The Book of Trees. Orca Book Publishers.