NO.170 Privacy, Ethics, and Legislation for Speech Communication

Shonan Village Center

March 23 - 27, 2020 (Check-in: March 22, 2020 )


  • Stephan Sigg
    • Aalto University, Finland
  • Andreas Nautsch
    • EURECOM, France
  • Junichi Yamagishi
    • National Institute of Informatics Japan


Advances in voice interface technology enable unprecedented services from networked virtual assistants. Examples include directional audio reception, energy-gentle collaborative processing, unified multi-device user-interfaces, context-adaptive optimization of smart assistants or capability-based task allocation.

Together with the advances in artificial Intelligence, the recent cornucopia of digital voice-based smart assistants therefore promises exceptional potential for ubiquitous and continuous perception and proper pro-active user assistance tailored for the context at hand.

However, not only technical challenges arise from this vision, but also privacy concerns, moral psychological, legislation and interface-design aspects of such intelligent machines need to be taken into account. For instance, eavesdropping by third parties could lead to potential misuse by both state, commercial and criminal actors. The use of this technology could lead to unfair profiling decisions made by algorithms and thereby to discrimination of individuals and denial of crucial services. Further, the scope for autonomy of these technologies has not been defined yet. Future versions could have unforeseen abilities, risks and morally relevant capacities. With this interdisciplinary meeting, we intend to derive holistic foundations to these challenges in order to open new business opportunities, and to lay the conceptional basis for scientific exploitation in this novel research domain.

By taking positions on Privacy, Ethics, and Legislation for Speech Communication, the proposed Shonan meeting enriches the interdisciplinary dialogue, informs the opinion papers on which new legislation is written, and outlines better designs of speech technology. Having transparency and mutual understanding between technology experts and privacy legislation, allows for appropriate proposals to tenders, where privacy is more risk than understood, e.g. speech communication in health care, with speech of elderly people or of children. Clarity yet brevity of Data Protection Impact Assessment (DPIA) is in high demand, however, only an interdisciplinary effort is able to provide solutions, an interdisciplinary effort the proposed Shonan meeting will supply with global impact, since not only in Europe but also in the US and other international countries, privacy legislation and technology solutions are a rising market.

This meeting will establish the conceptual, moral and technical foundations for intuitive privacy mechanisms, well founded ethical boundaries, privacy-preserving user interfaces that are intuitive and implicit, as well as proper legislation that establishes a fair balance of power between all involved actors.

Ethics and Privacy

Speech communication for virtual assistants has been demonstrated in products like Siri, Alexa, Cortana, and the like. It can be integrated into smart homes, mobile equipment or also robotic devices. This opens questions regarding moral relationships between humans and such assistants. Examples are autonomous cars, nursing robots, or robot prostitution. Furthermore, it is important to understand how society and expectation on social interaction changes by such technology.

Virtual assistants that exploit speech audio are bound to overhear all ongoing communication and might need to upload large part of the overheard audio to a remote server for processing. Naturally, this scenario raises the questions what content or information is privacy critical and shall thus not be processed, how wide should be the recording range of the system a which part of speech shall be recorded by which device when multiple devices of different vendors populate the same space.

It is first necessary to distinguish between the different types of information demanding protection. Such taxonomy classes could compare sensitive, personal, (legally) non-personal but protection worthy, and unprotectable data derivable from speech. To define these and their relations in a digestible manner for non-experts, communication models may be be useful tools.


Since virtual assistants need to overhear all communication to be able to spot specific trigger words, specific legislation is required that provides guidelines as to which content may be uploaded to a remote server for processing, in which data format, for how long and with whom it may be shared.

The goal of the Shonan Seminar is to thrive the dialog between studies of the law and research in speech communication technology towards a better understanding of the mutual needs of either community. Legal and technology experts operate different in their methodology and use terminology in different nuances (the same word can have different meaning depending on the context). In other words, the expected outcome of the proposed Shonan seminar is to narrate policy papers (opinions) that will eventually lead us to (better informed) legislation, and better designed products and services.

User interface and case studies

Virtual assistants that are currently available on the market are maximally intransparent regarding when and which information is recorded and whether it is shared with a remote server or with other devices for processing. It is a challenge for user interface designers to develop the respective user interface in a way that fosters trust in the technology and which puts the user in control of the data that was recorded and processed.

As a means of managing the almost limitless variability in speech data applications (in e.g., smart homes, health care, social media, eLearning platforms), taxonomy classes for use cases need defining in order to facilitate the dialogue between legal and technical communities. Class relations might be based on if senders/recipients in communication are peers and how information flows in their communication. Only then can the requirements for safeguards be determined.

Audio capture and speech signal processing

Advances in voice interface technology enables unprecedented services from networked virtual assistants. Examples include directional audio reception, energy-gentle collaborative processing, unified multi-device user-interfaces, context-adaptive optimization of smart assistants or capability-based task allocation.

The manner in which speech is captured (single/multiple microphones), in addition to sensor configurations and locations (distance from speakers, location, single/multi-room) influences potential privacy intrusions (the number of persons from whom speech is captured). Class relations could emphasise on unwittingly or consensually captured speech (on own devices or of others).

Encryption and data protection

Safeguards such as encryption, should be designed according to the specific use case and DPIA. Safeguards can either enhance existing technology (privacy as an addon), or as privacy by design principles and also be used for de-identification. Solutions can be classified according to the attributes of the underlying techniques, e.g., cryptographic technologies, security proofs, resource demands and assumptions. Cryptographic technology is needed that facilitates the (real-time) demands of speech technology; Novel encryption techniques on waveforms are demanded to maintain inference in speech processing computationally feasible.

Anticipated outcome

Bringing together experts from these interdisciplinary fields, we aim to kick-start and inspire collaboration initiatives in this novel research domain. As a direct outcome of the meeting, we target a technical report and position paper containing recommendations for the technical and social communities. Specifically, this paper shall constitute a research roadmap for emerging special interest groups e.g. within the International Speech Communication Association (ISCA).