Making Sense of Data through Visualization

Produced by the Department of Energy’s Pacific Northwest National Laboratory, this visualization presents an example of a process of data analysis. The diagram contains a lot of information expressed in jargon that takes a while to understand, but the essence of the process is captured in one unifying image that provides a clear framework for interpretation. You can go as deeply as you want into the details, but the overall structure is clear.

To get a better sense of how the diagram works, you should click on the image or this link to see a larger version at Wikimedia.

The example concerns the interpretation of data to facilitate the identification of insider cyber security attacks. The diagram integrates several types of information to give a concise but comprehensive overview of a complicated process. It also exactly parallels the process of visual thinking described by Dan Roam in his book, The Back of the Napkin.

As described in the diagram, the process consists of four steps:

  • Assembling a flow of data from multiple sources, including email, traffic within the firewall, traffic coming from outside the firewall

  • Observing the data to identify discrete “states” such as instant messaging, websites, file sizes, etc

  • Processing these states to identify actions or events, e.g. disregarding data policies, harvesting proprietary data and suspicious communications

  • Assess these actions to construct patterns of suspicious behavior that point to an attempted security breach by an insider.

The steps of the underlying process, apart from this context, are similar to those used by Roam to capture visual thinking. His everyday language couldn’t be simpler: look, see, imagine, show.

  • Look: What is out there?

  • See: What categories and patterns emerge?

  • Imagine: What are the connections that add new meaning?

  • Show: Here’s what I think it all means.

The last step involves finding an effective way to present conclusions that tie everything together. In this diagram, the staff used a series of simple boxes and arrows to indicate the basic flow of process steps.

The relatively complicated verbal content identifies three levels of meaning: descriptions of the products of each step in the top row of boxes; a brief description of the action performed during each step between the two rows of boxes; and lastly the types of information studied or inferred in each step.

But they also added a flow of images that helps bring the process to life in an easy-to-understand pattern. Putting pieces of a puzzle together is one of the most overworked of all visual metaphors, but here it looks fresh and effective.

The PNNL site also has several technical papers to download that explain the theoretical background on the visualization process and its application in a training context.