Intelligent Information Processing ? Chances of Crowdsourcing


NII Shonan Meeting Seminar 034

Meeting Notes

In this section, we slowly start collection results and discussion notes of the meeting.

We’ll just start with the introduction slides presented by Tilo and Christoph:?Introductio Slides


Currently a variety of platforms like Amazon’s Mechanical Turk, CrowdFlower, or SamaSource are offering frameworks with different degrees of sophistication where (usually relatively simple) cognitive tasks can be dynamically posed to a large and readily available workforce. This ability of cheaply distributing simple jobs via the Web allows for new modes of labor and information processing. In fact, the “knowledge society” has already brought severe changes to business processes in today’s economy. This is especially true for the basic question of what and where people work.

Here the ubiquity of sophisticated mobile devices and communication services allow for almost unlimited flexibility and freedom in negotiating and outsourcing short-term work contracts and delivering results. Currently, mobile crowdsourcing by smartphone users is a hot research area. In any case, in the industrialized world there is a clear transition from traditional production of goods or processing of raw materials towards the provisioning of services and the flexibility with respect to the place where such services are actually physically provided has dramatically increased. Still, although services could in principle be offered flexibly from virtually anywhere in the world, typical constraints like the local cost of labor or easy access to an educated workforce, remain valid. Crowd-Sourcing promises to break with these traditional work models, by offering a dynamic global information-processing workforce which is available 24/7 with close to no overhead. This shift paves the way for approaching large-scale information task which were previously infeasible for both algorithmic and traditional human-based approaches.

The central challenge in the current knowledge society is to efficiently and intelligently deal with an overwhelming amount of information, a daunting task for computer systems and humans alike. To this aims the data management and data mining communities considers a wide variety of operators, algorithms, and workflows.

For some information-heavy areas like for example customer relationship management, where everyday services like ordering procedures, customer data management, complaint handling, etc. have to be performed, out-sourcing the work to specialized workers has become a commonly accepted solution for increasing efficiency. Although such services do not produce anything in the traditional material sense, they are critical for company goals like efficient sales handling, customer satisfaction and retention, etc. Whereas such tasks used to be done on-site, nowadays ‘call centers’ all over the world centrally provide such services at considerably reduced costs for a large number of customers. These services are quite basic and easy to provide in terms of education. On an educationally higher level, business intelligence services can serve as a good example: extracting relevant information from company data and using it to recognize or design value-adding areas like new products, promising customer segments, or better business processes for a company is a profitable business. Indeed ‘infopreneur’ is a term coined for the growing number of persons whose primary business is gathering and selling electronic information. However, this current form of out-sourcing information-centric tasks is still quite static (i.e. a fixed team of specialists is contracted for a larger task). In contrast, crowdsourcing as understood in in this proposal dynamically assigns small intelligence tasks to workers from a large pool in a demand-driven fashion. The advantages are obvious: if at creation time each process can be effectively broken down to manageable tasks and a viable time plan, it can be fulfilled very efficiently. The main factor is elasticity: peaks and slumps in activity can be dynamically handled and missing expertise or competences can be contracted. Thus, the efficiency of the overall process is hard to beat.

The main purpose of this Shonan meeting is to bring together researchers from the field of data management, information processing, HCI, and mobile computing to discuss the technical challenges, possible societal impact, as well as promising industrial applications for on-demand crowdsourcing techniques in vast information management challenges. The seminar puts a clear focus on operations in data management and data processing workflows. Indeed there are many open questions to discuss: How can operators/workflows benefit from crowdsourcing? Can the resulting quality be controlled? Which workers should be selected? How to determine expected response times? How to deal with privacy risks?

As stated above, a special focus should be paid to crowd-sourceable operators for applications for data and information management, information organization, and information access. In recent years algorithms aimed at these tasks have raised a lot of attention and indeed, methods have grown quite powerful even over huge and largely unstructured information repositories like the Web. Applications are almost limitless ranging from basic information extraction over knowledge management to complex business intelligence.

However, with more complex information processing, retrieval, or mining capabilities also the algorithms’ complexity, susceptibility for errors and danger of overspecialization increases. Since most failings can be traced back to limited cognitive abilities, missing contextual knowledge or heuristics gone wrong, the idea of direct human supervision and intervention at processing time is currently pursued in many domains. But also the quality of the work delivered by workers raises concerns: today’s platforms are facing spam and individual workers’ work quality, skill, and reliability have to be measured for effective quality control. While for spam detection simple methods like gold questions or majority vote may work well, more complex quality assessment need new and more powerful models. Actually, ranking schemes based on reputation mechanisms already play a vital role in Web platforms, where matchings or transactions between anonymous parties are brokered. Hence their applicability for crowd-sourcing scenarios should be discussed.

In fact, the need for human assistance in bridging the final semantic gap for today’s information processing has already given rise to information systems that rely on hybrid architectures. Such hybrid architectures transparently combine the efficiency of current algorithms with the cognitive power and flexibility of humans.

Here, generally two design directions are popular:

  • Using human input for improving the steps performed by information processing algorithms by providing training samples, answering questions about ambiguous results, or by providing relevance feedback.
  • Involving humans directly into the information processing process, explicitly out-sourcing some of the required tasks or operators within the process.

Both general approaches are still very new, and no established research community has yet developed for crowd-assisted information processing algorithms. This Shonan meeting can provide a significant stimulus to the research community in order to advance this still new field of interest.

Topics of Interest:

The meeting is primarily intended to focus on topics and problems related to information and knowledge processing. In this area, there are many tasks for which basic algorithmic approaches exist, but fall short because they often cannot grasp the semantics of the data they operate on correctly. Here, we envision that crowd-sourcing techniques are running in parallel in a hybrid system, and supplementing the algorithms when necessary. Especially, operators and algorithms of the following areas shall be discussed with their potential synergy with crowd-sourcing in mind:

  • Complex databases operators like cognitive comparison and similarity functions, as for example sorting or joining images, ambiguous labels, descriptions, etc.
  • Information and knowledge mining tasks, as for example entity and relation detection, entity reconciliation, or improving typical extraction pattern
  • Improving data or knowledge representation, as for example schema matching, ontology cleaning, or data cleaning
  • Sensor data stream processing (e.g., energy efficient stream join, uncertain stream processing)
  • Obtaining cognitive meta-data from natural-language, as for example sentiment or emotion analysis, intention detection, sarcasm detection, etc.
  • Semantic querying and retrieval, as for example question answering techniques or semantically-aware information retrieval algorithms
  • Privacy issues, especially for mobile participants (e.g., location, trajectory, POI)
  • Ethics of crowd-computing: discussions and insights on how the large-scale application of crowd-sourcing affects both workers and information management systems from an ethical perspective


Wolf-Tilo Balke, Technische Universität Braunschweig Germany

Takahiro Hara, University of Osaka, Japan

Seung-won Hwang, POSTECH University, South Korea

Christoph Lofi, NII Tokyo, Japan


Preliminary schedule (more details will follow):

Arrival Day (Saturday, 17th November)

15:00-18:30 Check-In in Shonan Center
19:00-20:30 Welcome Dinner
21:00- Free Time

Day 1 (Monday, 18th November)

In one of the early seminar sessions, we will have an introduction round with all participants / organizers introducing themselves.
Please prepare a brief presentation of roughly 10 minutes for that purpose, outlining yourself and your research interests. Please explain how your research currently involves crowd-sourcing for intelligent information processing, or how crowd-sourcing techniques might help to address problems currently found in your research area. These introductions should provide some interesting problems or provide fruitful insights which can be used to start further discussions.

07:30-09:00 Breakfast
09:00-09:10 Shonan Introduction by Staff
09:10-12:00 Seminar Session with one Coffee Break
– Opening briefing from organizers
– Position talks from participants
12:00-14:00 Lunch with Photo Shooting
14:00-18:00 Seminar Session with one Coffee Break
– Position talks from participants (continued)
– Discussion to categorize the issues addressed by the participants
18:30-19:30 Dinner
19:30- Free Time

Day 2 (Tuesday, 19th November)

07:30-09:00 Breakfast
09:00-12:00 Seminar Session with one Coffee Break
– Break-out Sessions discussing important issues and find new?research directions on the target topic
12:00-13:30 Lunch
13:30-18:00 Seminar Session with one Coffee Break
– Break-out Sessions??(continued)
– Prepare for group presentation
18:30-19:30 Dinner
19:30- Free Time

Day 3 (Wednesday, 20th November)

07:30-09:00 Breakfast
09:00-12:00 Seminar Session with one Coffee Break
– Presentation from each group and discussion
12:00-13:30 Lunch
13:30-18:00 Excursion to Kamakura
19:00-21:30 Banquet Dinner
21:30- Free Time

Day 4 (Thursday, 21th November)

07:30-09:00 Breakfast
09:00-12:00 Seminar Session with one Coffee Break
– Idea marketplace and future collaborations
– Final organizer presentation and wrap up
12:00-13:30 Lunch



During our excursion on the 3rd seminar day (which unfortunately incurs an additional fee of 5500 Yen per person – but it will definitely be worth it!), we will visit the northern parts of Kamakura. Kamakura used to be the de facto capital of Japan with the seat of the Shogunate and the Regency in the Kamakura Period (roughly 11th century until 1333). Therefore, Kamakura features many impressive temples and shrines, and can be seen as the “little Kyoto”. Also, there are many forests around Kamakura, and mid-November is supposed to be the nicest and most impressive time to visit Kamakura due to the autumn leaves.

We will visit the impressive Kenchoji Mountain Temple, one of the Zen head temples.

We will have tea at the Tsurugaoka Hachimangu Shrine afterwards.


After the excursion, we will go to Japanese restaurant for the banquet.



Accommodation is directly provided by the Shonan Village Center, no external hotels are necessary during the time of the seminar. Details on how to reserve one of the rooms have been sent out per email. Please use the personalized link in that mail to book your room.

Traveling to Shonan Village

There are several routes from Tokyo area or Narita airport. In general, first go by a train to either Zushi (JR – Japan Rails) or Shin-Zushi (Keikyu Railways), and then take a bus or a taxi. For foreigners, we would recommend sharing a taxi from Zushi or Shin-Zushi because that’s just easier. The train route/schedule can be searched by using the train route finder below.

More information on?Shonan Village Home Page.

You may use the one of the following train schedule searches (Google Maps usually also works fine).

Jordan Train Route Finder.
For destination, enter “Zushi” or “Shin-Zushi”. As Tokyo area is served by several companies, you will see several choices. Please pick one that seems most convenient.

Narita Airport Access Planner. You can search a train schedule from/to Narita Airport.

Recommended routes from Tokyo/Airports to Zushi

Note: The following may not be the best route, it’s just routes which are usually quite good.

From Tokyo Narita Airport.
Take JR Narita Express to go to “Ofuna” or “Yokohama”, and then take JR Yokosuka Line to Zushi. Please note that the Narita Express is comparatively expensive, but also quite fast and easy.

From Tokyo Haneda Airport.
Take Keikyu Haneda Airport Line. Change at Keikyu Kamata station to Keikyu Line, and then change at Kanazawa-Hakkei to Keikyu Zushi Line and get off at Shin-Zushi terminal.

From Tokyo Jimbocho area (where NII is located if you happen to visit it).
Go to Tokyo station by a taxi (which costs around 1,500 yen) and then take JR Yokosuka Line to Zushi.
Go to Shibuya station by Tokyo Metro, and then take JR Shonan-Shinjuku Line.

From Zushi or Shin-Zushi to Shonan Village Center

You can take a bus (which leaves once in an hour or half an hour, and takes 30 min from Zushi to Shonan Village Center) or a taxi, which costs 2.500 ? 3.000 yen and takes about 20 min.
As a bus driver is not likely to speak English, we recommend sharing a taxi (or finding a company who speaks Japanese, to take a bus).?Here?are some Japanese messages to show to a taxi driver.

Bus Time Table


Wolf-Tilo Balke, Technische Universität Braunschweig, Germany
Seung-won Hwang, POSTECH University, South Korea
Takahiro Hara, University of Osaka, Japan
Christoph Lofi, National Institute of Informatics, Japan
Yukino Baba, University of Tokyo, Japan
Sozo Inoue, Kyusyu Institute of Technology, Japan
Koichi Kise, Osaka Prefecture University, Japan
Kai Kunze, Osaka Prefecture University, Japan
Kinda El Maarry, Technische Universität Braunschweig, Germany
Yoshifumi Masunaga, Ochanomizu Univeristy, Japan
Shigeo Matsubara, Kyoto University, Japan
Atsuyuki Morishima, University of Tsukuba, Japan
Stephan Sigg, Technische Universität Braunschweig, Germany
Yoshito Tobe, Aoyama Gakuin University, Japan
Sanjay Madria, Missouri University of Science and Technology, United States
Jiyin He, Centrum Wiskunde en Informatica, Netherlands
Xuan Zhou, Renmin University of China, China
Koji Zettsu, NICT, Japan
Nestor Alvaro, National Institute of Informatics, Japan
Hyunsouk Cho, POSTECH, South Korea
Victor Muntés-Mulero, CA Technologies, Spain