Big Graph Drawing: Metrics and Methods
NII Shonan Meeting:
@ Shonan Village Center,January 12-15, 2015
NII Shonan Meeting Report (ISSN 2186-7437):No.2015-1
- Takayuki Itoh, Ochanomizu University, Japan
- Karsten Klein, Monash University, Australia
- Giuseppe Liotta, University of Perugia, Italy
Description of the meeting
Graphs provide a versatile model for data from a large variety of application domains, including for example biology, finance, information security, telecommunication, software engineering, and social sciences. Graph visualization helps scientists and engineers to understand critical issues in these domains. However, the depth of understanding depends on the quality of the drawing.
For example, Figure 1a shows a graph of read-overlaps for RNA sequencing data with more than one hundred thousand edges, drawn using a standard graph layout method. Unfortunately, the poor layout in Figure 1a shows no insight. On the other hand, using a good layout of the same graph, such as in Figure 1b, a biologist can infer the structure of the sequence; this can lead to a better understanding of the species under investigation.
More generally, good visualization can facilitate efficient visual analysis of the data to detect patterns and trends, and can also be used for the presentation of the underlying information. Important aspects for the development of graph drawing methods are the efficiency and accuracy of the algorithms, and the quality, i.e. readability, of the resulting pictures.
(a)?? Bad layout????????????????????? ?(b) Improved layout
For a successful application of visualization approaches in practice these approaches need to meet the particular requirements for the intended use-cases. Quality metrics need to be developed to measure and optimize the graph drawing quality for the task at hand. These metrics need to cover both how accurate the drawing represents the data as well as how well the drawing can be perceived by the human user.
A number of metrics were developed and validated in the 1980s and 1990s: these classical metrics include the number of edge crossings and the number of edge bends. A number of graph drawing methods based on these metrics were also developed; fundamentally, these methods are algorithms that optimise the metrics. Examples include the well-known Topology-Shape-Metrics approach developed by Tamassia, Di Battista, and Batini. At that time, the size of the data sets was modest, and these classical approaches were successful at their time.
Since the 1990s, the size of relevant data sets has grown exponentially. In biotechnology, sequencing machines produce one terabase of data in a single sequencing run. In the finance arena, algorithmic trading has reached hundreds of thousands of trades per second. Data from social networks, wireless networks, and the web itself continue to grow at a rate that outstrips current methodologies. Human understanding of this data has clear social and economic benefits; this has led to a strong demand for visualization of huge graphs.
The problem is that the classical drawing methods and quality metrics developed in the 1990s do not apply to the visualization of huge graphs. Methods that are currently used are mainly energy-based methods that ignore the classical metrics. Their application is however motivated by some of the requirements from the application areas, including the need for fast layout computation for big and dynamic graphs. While reasonable progress has been made to fit such requirements by providing very fast layout methods, capable of laying out graphs with millions of nodes and edges, the development of corresponding metrics, the evaluation of their efficiency for use in application areas regarding tasks semantics, and the incorporation into efficient visualization methods has not been driven in a similar pace. The number of crossings has for example been the well-accepted main quality criterion in graph drawing for two decades, and only a few years ago a study by Huang, Hong, and Eades revealed that the influence of crossings on the readability can be decreased to a large extent by optimizing the crossing angle. This example however also shows the success of the research on and evaluation of quality metrics, as the results immediately stimulated research on the theory of crossing angle optimization, creating a whole new subfield, and subsequently also leading to new practical realizations of layout approaches with improved readability.
While the research in the graph drawing field focuses mainly on the geometry of the drawing, successful approaches in practice usually need to cover more. Data in practice is not given as just a graph structure, but either as a graph with additional data annotations or as a multidimensional data set from which a graph has to be modeled first. In both cases the additional information that is given beyond the graph’s structure, like domain specific semantics, needs to be visualized in an appropriate way to allow the user to gain insight. Although different domain areas have their own types of data and work flows, there is also an overlap in the tasks for network analysis, including for example the search for important actors, identification of their role, or detection of interaction patterns in biological and social networks. Requirements stemming from these tasks need to be taken into account by experts in visualization research, and usually a close collaboration with domain experts is required during the development phase of new visualization approaches. Also network theory plays an important role for the development of new visualization approaches, as methods and metrics that are used for the analysis of large graphs often are based on results from this field. In addition, for big graphs suitable interaction and navigation methods need to be developed to enable the user to explore the data.
These tasks are main topics of research in the visualization community, which has a symbiotic relationship with the graph drawing community for graph visualization purposes.
2 ??Topics and Aims of the Seminar
The research on big graph visualization touches several research fields, including network visualization, network analysis, graph theory, computational geometry, algorithm engineering, computational complexity, and main application areas like bioinformatics, social sciences, information security, and software engineering. We will bring together experts in the fields to investigate research questions and potential solutions as stated below.
The practical value of current graph drawing approaches degrades with big data sets, and new metrics and methods are needed to tackle the related problems, also taking into account the semantics of the data. The study of planarity-based concepts is an important part of graph drawing research, but for big graphs these concepts are much less useful than for graphs of up to medium size. The workshop will study the theory related with the visual exploration of networks that are well beyond planarity. Optimizing classical metrics like the number of crossings and bends is infeasible for big graphs due to the complexity of the underlying problems, and would most probably be insufficient to create readable visualizations that help to gain insight in the data. One aspect of the work at the workshop will thus be to develop new metrics that are tailored for big graphs. In addition, we will investigate the complexity of the corresponding optimization problems and also computational methods to efficiently optimize the metrics. A further aspect that is lacking both theoretical foundation as well as practical methods is the combination of layout constraints, e.g. given by the application or the user, with big graph drawing methods. As a starting point, the seminar investigates suitable constraint models and the potential for efficient extensions of constraint-based drawing methods.
Appropriate interaction techniques, which e.g. might use graph aggregation, can facilitate exploratory analysis, but require appropriate aggregation methods and layouts. These techniques need to be intuitive for domain experts to facilitate exploration of big networks within the expert’s workflow, and have to preserve both relevant regions of interest as well as global patterns. The seminar will collect the requirements, investigate the theoretical aspects behind them, and make a first step towards the development of aggregation approaches, and corresponding visualization and interaction concepts. Concepts to allow a consistent combination of global and local layout while preserving the user’s mental map are a further topic of research, as existing big graph layout methods show global structure better than local structure. An important task in practice where useful approaches are still missing is big graph comparison. Many of the corresponding problems like maximum common subgraph or (subgraph) isomorphism are computationally hard, and can also not directly be applied to big graphs from practice as the similarity might not lie in a one-to-one matching but global or local patterns. The seminar will look into better suited models, corresponding algorithms, and visualization methods.
Often the geometry of the drawing is considered independently of other visualization aspects, i.e. graph drawing experts develop efficient methods to either optimize classical metrics or to satisfy their readability criteria, which might differ from the criteria in application areas and are often evaluated only by visual inspection. Visualization experts in turn use such methods then as black boxes within their visualization approaches. Combining the efforts has the potential to push both theory and practice, as on the one hand the requirements from practical visualization motivates fundamental research and defines constraints on the acceptable results, and on the other hand visualization can greatly profit from those results. With that in mind, the aims of the seminar are as follows:
- To bring together experts from the graph drawing and information visualization communities from Europe, Asia, the Americas, and Australia that are involved in layout evaluation and large data visualization, and domain experts from application areas that require big graph drawing.
- To investigate the challenges of today’s big data graphs for graph drawing, including layout methods, interaction, and computational aspects.
- To investigate how the evaluation of drawing approaches can be done for such big graphs, including layout quality and interaction efficiency, and to study and compare the performance of existing metrics and methods for real-world instances.
- To create new quality metrics and discuss new methods for big graphs, i.e. fast methods that optimize the metrics and are suitable for use in practice, and new interaction concepts.
- To formulate guidelines for the visualization of large graphs, state the pitfalls, and the problems with current metrics and methods.
- To define and document corresponding research challenges for future research.