Compact Data structures for Big Data

NII Shonan Meeting:

@ Shonan Village Center, September 27-30, 2013

NII Shonan Meeting Report (ISSN 2186-7437):No.2013-10

Organizers

  • Kunihiko Sadakane (National Institute of Informatics), Japan
  • Wing-Kin Sung (National University of Singapore), Singapore

Overview

Big Data are structured and unstructured datasets whose size is in the order of billions
or trillions. Because of their diversity and size, it is difficult to store, search and
analyze them. This meeting therefore focuses on algorithms and data structures for
efficient manipulation of Big Data. Especially, the meeting is devoted to compact data
structures for managing Big Data.

Typical examples of big data are genomic sequences and gene expression data, Web and
SNS data, sensor data in intelligent transport systems, etc. Traditional data structures
do not scale to handle such data, and therefore we should design new data structures to
handle them.

Although the amount of data explodes, the amount of the underlying information inside
the data may not be exploded. It is observed that many big datasets are redundant. In
the Web, many webpages were copies of others. In global positioning system (GPS),
GPS position data change continuously, which can be compressed using differential
encoding. In genomics, although different individuals have different genomes, the
individual genomes have highly similarity. Therefore we can compress such data by
identifying the similar parts. After the data is compressed, other issues are how to
access and search them efficiently. Traditional data structures are not designed to
handle compressed data and they may not manipulate Big Data well because the size of
the data structures exceeds the limit of memory usage, or searching time increases due
to their size. To handle these problems, researchers have worked on developing
compact data structures. Such data structures are also called compressed or succinct
data structures. They are much smaller than standard data structures, while keeping
the same access time to data in theory. However actual performance of such compact
data structures for storing Big Data is unknown or unsatisfactory.

The aim of this workshop is to bring together researchers active in the areas of compact
data structures to exchange ideas for handling Big Data. We will discuss methods for
compressing and storing Big Data. We will also discuss how to design time- and
space-efficient data structures for them Through discussion and sharing knowledge, we
hope to promote collaborations and further improve data structures for Big Data.

?