Seminars

NO.029 Compact Data structures for Big Data

Shonan Village Center

September 27 - 30, 2013 (Check-in: September 26, 2013 )

Organizers

  • Kunihiko Sadakane
    • National Institute of Informatics, Japan
  • Wing-Kin Sung
    • National University of Singapore, Singapore

Overview

Big Data are structured and unstructured datasets whose size is in the order of billions or trillions. Because of their diversity and size, it is difficult to store, search and analyze them. This meeting therefore focuses on algorithms and data structures for efficient manipulation of Big Data. Especially, the meeting is devoted to compact data structures for managing Big Data.

Typical examples of big data are genomic sequences and gene expression data, Web and SNS data, sensor data in intelligent transport systems, etc. Traditional data structures do not scale to handle such data, and therefore we should design new data structures to handle them.

Although the amount of data explodes, the amount of the underlying information inside the data may not be exploded. It is observed that many big datasets are redundant. In the Web, many webpages were copies of others. In global positioning system (GPS), GPS position data change continuously, which can be compressed using differential encoding. In genomics, although different individuals have different genomes, the individual genomes have highly similarity. Therefore we can compress such data by identifying the similar parts. After the data is compressed, other issues are how to access and search them efficiently. Traditional data structures are not designed to handle compressed data and they may not manipulate Big Data well because the size of the data structures exceeds the limit of memory usage, or searching time increases due to their size. To handle these problems, researchers have worked on developing compact data structures. Such data structures are also called compressed or succinct data structures. They are much smaller than standard data structures, while keeping the same access time to data in theory. However actual performance of such compact data structures for storing Big Data is unknown or unsatisfactory.

The aim of this workshop is to bring together researchers active in the areas of compact data structures to exchange ideas for handling Big Data. We will discuss methods for compressing and storing Big Data. We will also discuss how to design time- and space-efficient data structures for them Through discussion and sharing knowledge, we hope to promote collaborations and further improve data structures for Big Data.


Report

No-029.pdf