NO.034 Intelligent Information Processing – Chances of Crowdsourcing

Shonan Village Center

November 18 - 21, 2013 (Check-in: November 17, 2013 )


  • Wolf-Tilo Balke
    • Technische Universität Braunschweig, Germany
  • Seung-won Hwang
    • POSTECH University, South Korea
  • Takahiro Hara
    • University of Osaka, Japan
  • Christoph Lofi
    • National Institute of Informatics, Japan


Currently a variety of platforms like Amazon’s Mechanical Turk, CrowdFlower, or Sama-Source are offering frameworks with different degrees of sophistication where (usually rela-tively simple) cognitive tasks can be dynamically posed to a large and readily available work-force. This ability of cheaply distributing simple jobs via the Web allows for new modes of labor and information processing. In fact, the “knowledge society” has already brought severe changes to business processes in today’s economy. This is especially true for the basic ques-tion of what and where people work.

Here the ubiquity of sophisticated mobile devices and communication services allow for al-most unlimited flexibility and freedom in negotiating and outsourcing short-term work con-tracts and delivering results. Currently, mobile crowdsourcing by smartphone users is a hot research area. In any case, in the industrialized world there is a clear transition from tradition-al production of goods or processing of raw materials towards the provisioning of services and the flexibility with respect to the place where such services are actually physically pro-vided has dramatically increased. Still, although services could in principle be offered flexibly from virtually anywhere in the world, typical constraints like the local cost of labor or easy access to an educated workforce, remain valid. Crowd-Sourcing promises to break with these traditional work models, by offering a dynamic global information-processing workforce which is available 24/7 with close to no overhead. This shift paves the way for approaching large-scale information task which were previously infeasible for both algorithmic and tradi-tional human-based approaches.

The central challenge in the current knowledge society is to efficiently and intelligently deal with an overwhelming amount of information, a daunting task for computer systems and hu-mans alike. To this aims the data management and data mining communities considers a wide variety of operators, algorithms, and workflows.

For some information-heavy areas like for example customer relationship management, where everyday services like ordering procedures, customer data management, complaint handling, etc. have to be performed, out-sourcing the work to specialized workers has become a com-monly accepted solution for increasing efficiency. Although such services do not produce anything in the traditional material sense, they are critical for company goals like efficient sales handling, customer satisfaction and retention, etc. Whereas such tasks used to be done on-site, nowadays ‘call centers’ all over the world centrally provide such services at consider-ably reduced costs for a large number of customers. These services are quite basic and easy to provide in terms of education. On an educationally higher level, business intelligence services can serve as a good example: extracting relevant information from company data and using it to recognize or design value-adding areas like new products, promising customer segments, or better business processes for a company is a profitable business. Indeed ‘infopreneur’ is a term coined for the growing number of persons whose primary business is gathering and sell-ing electronic information. However, this current form of out-sourcing information-centric tasks is still quite static (i.e. a fixed team of specialists is contracted for a larger task). In con-trast, crowdsourcing as understood in in this proposal dynamically assigns small intelligence tasks to workers from a large pool in a demand-driven fashion. The advantages are obvious: if at creation time each process can be effectively broken down to manageable tasks and a viable time plan, it can be fulfilled very efficiently. The main factor is elasticity: peaks and slumps in activity can be dynamically handled and missing expertise or competences can be contracted. Thus, the efficiency of the overall process is hard to beat.

The main purpose of this Shonan meeting is to bring together researchers from the field of data management, information processing, HCI, and mobile computing to discuss the tech-nical challenges, possible societal impact, as well as promising industrial applications for on-demand crowdsourcing techniques in vast information management challenges. The seminar puts a clear focus on operations in data management and data processing work-flows. Indeed there are many open questions to discuss: How can operators/workflows ben-efit from crowdsourcing? Can the resulting quality be controlled? Which workers should be selected? How to determine expected response times? How to deal with privacy risks?

As stated above, a special focus should be paid to crowd-sourceable operators for applica-tions for data and information management, information organization, and information access. In recent years algorithms aimed at these tasks have raised a lot of
attention and indeed, methods have grown quite powerful even over huge and largely unstructured information re-positories like the Web. Applications are almost limitless ranging from basic information ex-traction over knowledge management to complex business intelligence.

However, with more complex information processing, retrieval, or mining capabilities also the algorithms’ complexity, susceptibility for errors and danger of overspecialization increas-es. Since most failings can be traced back to limited cognitive abilities, missing contextual knowledge or heuristics gone wrong, the idea of direct human supervision and intervention at processing time is currently pursued in many domains. But also the quality of the work deliv-ered by workers raises concerns: today’s platforms are facing spam and individual workers’ work quality, skill, and reliability have to be measured for effective quality control. While for spam detection simple methods like gold questions or majority vote may work well, more complex quality assessment need new and more powerful models. Actually, ranking schemes based on reputation mechanisms already play a vital role in Web platforms, where matchings or transactions between anonymous parties are brokered. Hence their applicability for crowd-sourcing scenarios should be discussed.

In fact, the need for human assistance in bridging the final semantic gap for today’s infor-mation processing has already given rise to information systems that rely on hybrid architec-tures. Such hybrid architectures transparently combine the efficiency of current algorithms with the cognitive power and flexibility of humans.

Here, generally two design directions are popular:

  • Using human input for improving the steps performed by information processing algo-rithms by providing training samples, answering questions about ambiguous results, or by providing relevance feedback.
  • Involving humans directly into the information processing process, explicitly out-sourcing some of the required tasks or operators within the process.

Both general approaches are still very new, and no established research community has yet developed for crowd-assisted information processing algorithms. This Shonan meeting can provide a significant stimulus to the research community in order to advance this still new field of interest.

2.1.Topics of Interest:

The meeting is primarily intended to focus on topics and problems related to information and knowledge processing. In this area, there are many tasks for which basic algorithmic ap-proaches exist, but fall short because they often cannot grasp the semantics of the data they operate on correctly. Here, we envision that crowd-sourcing techniques are running in parallel in a hybrid system, and supplementing the algorithms when necessary. Especially, operators and algorithms of the following areas shall be discussed with their potential synergy with crowd-sourcing in mind:

  • Complex databases operators like cognitive comparison and similarity functions, as for example sorting or joining images, ambiguous labels, descriptions, etc.
  • Information and knowledge mining tasks, as for example entity and relation detection, enti-ty reconciliation, or improving typical extraction pattern
  • Improving data or knowledge representation, as for example schema matching, ontology cleaning, or data cleaning
  • Sensor data stream processing (e.g., energy efficient stream join, uncertain stream pro-cessing)
  • Obtaining cognitive meta-data from natural-language, as for example sentiment or emo-tion analysis, intention detection, sarcasm detection, etc.
  • Semantic querying and retrieval, as for example question answering techniques or seman-tically-aware information retrieval algorithms
  • Privacy issues, especially for mobile participants (e.g., location, trajectory, POI)
  • Ethics of crowd-computing: discussions and insights on how the large-scale application of crowd-sourcing affects both workers and information management systems from an ethical perspective