PRIVACY-PRESERVING AUTOMATIC SPEAKER DIARIZATION Francisco Teixeira1 Alberto Abad1 Bhiksha Raj23 Isabel Trancoso1 1INESC-IDIST University of Lisbon Portugal2LTI Carnegie Mellon University USA

2025-05-02 0 0 328.56KB 5 页 10玖币

侵权投诉

PRIVACY-PRESERVING AUTOMATIC SPEAKER DIARIZATION

Francisco Teixeira1, Alberto Abad1, Bhiksha Raj2,3, Isabel Trancoso1

1INESC-ID/IST, University of Lisbon, Portugal, 2LTI, Carnegie Mellon University, USA,

3Mohammed bin Zayed University of AI, UAE

ABSTRACT

Automatic Speaker Diarization (ASD) is an enabling technology

with numerous applications, which deals with recordings of multi-

ple speakers, raising special concerns in terms of privacy. In fact, in

remote settings, where recordings are shared with a server, clients

relinquish not only the privacy of their conversation, but also of all

the information that can be inferred from their voices. However, to

the best of our knowledge, the development of privacy-preserving

ASD systems has been overlooked thus far. In this work, we tackle

this problem using a combination of two cryptographic techniques,

Secure Multiparty Computation (SMC) and Secure Modular Hash-

ing, and apply them to the two main steps of a cascaded ASD

system: speaker embedding extraction and agglomerative hierarchi-

cal clustering. Our system is able to achieve a reasonable trade-off

between performance and efﬁciency, presenting real-time factors of

1.1 and 1.6, for two different SMC security settings.

Index Terms—Automatic Speaker Diarization, Privacy, Secure

Multiparty Computation, Secure Modular Hashing

1. INTRODUCTION

Automatic Speaker Diarization (ASD) is an enabling technology for

many speech-based applications. When combined with Automatic

Speech Recognition systems, ASD can provide additional context

to the reader, make transcripts clearer or even be used to perform

speaker adaptation. On its own, ASD may also allow users to search

for and ﬁlter segments that correspond to speciﬁc speakers, or, in the

case of audio diarization, speciﬁc audio events. This ﬁltering may

be particularly important in multi-speaker audio streams, where the

target is a single speaker. In security applications, this speaker may

be a potential blacklisted criminal. In clinical interviews, it may be

the patient. In language acquisition recordings it may be the child

whose linguistic skills are being assessed. The list of potential ASD

scenarios is very extensive, ranging from courtrooms, to meetings,

sociolinguistic interviews and broadcast news, among others [1, 2].

When dealing with large amounts of speech data, when ASD is

used as part of a larger system, or even due to the lack of computa-

tional resources, it may be useful to delegate this task to an external

service. However, this setting creates a major privacy challenge: the

server will have direct access to the user’s data. This means that

the voices present in the recording and what is being said, will be

available to the server, giving it a very large repository of potentially

sensitive information [3], which the speakers may want to keep pri-

vate, or may even need to keep private for legal reasons (e.g., to

This work was supported by Portuguese national funds through

Fundac¸˜

ao para a Ciˆ

encia e a Tecnologia (FCT), with references

UIDB/50021/2020 and CMU/TIC/0069/2019, as well as by the Portuguese

Recovery and Resilience Plan (RRP) through project C645008882-00000055

(Responsible.AI).

follow privacy regulations such as the EU’s GDPR [4]) [5]. The al-

ternative of having the diarization process run on the user’s device

is also unattractive, as it would require the service provider to share

their model with the user. Considering that ASD models require

large amounts of data and high levels of expertise to be developed,

sharing them with users would make the service provider potentially

lose the value that the model holds. In cascaded ASD models, this is

particularly true for the speaker embedding extraction model [6].

The above, make this (mostly) unexplored problem – with the

notable exception of [7] – particularly interesting. In this work we

build on our previous contribution on the privacy-preserving extrac-

tion of x-vector embeddings using Secure Multiparty Computation

(SMC) [6] and extend it to the setting of ASD.

Speciﬁcally, we propose a system that performs the extraction of

speaker embeddings and the clustering step in a privacy-preserving

way, by leveraging two cryptographic techniques: SMC and Se-

cure Modular Hashing (SMH). The combination of these techniques

allows us to protect the service provider’s model, particularly the

speaker embedding extraction model, while at the same time keep-

ing the speakers’ data hidden from the server.

The remainder of this document is organised as follows: in Sec-

tion 2 we provide the necessary background on SMC and SMH; Sec-

tion 3 describes the ASD baseline model, and our privacy-preserving

system; in Section 4 we describe the experimental setup; in Sec-

tion 5 we present and discuss the results obtained; ﬁnally, Section 6

presents our conclusions and topics for future work.

2. CRYPTOGRAPHIC BACKGROUND

2.1. Secure Multiparty Computation

Secure Multiparty Computation (SMC) is an umbrella term for pro-

tocols designed to allow several parties to jointly and securely com-

pute functions over their data, while keeping all inputs private. SMC

protocols are usually built over some form of Secret Sharing [8, 9]

or Garbled Circuits [10, 11], and are often combined with crypto-

graphic primitives like Homomorphic Encryption (HE) or Oblivi-

ous Transfers (OTs) to perform speciﬁc functionalities [12]. Our

privacy-preserving approach will heavily rely on SMC, and particu-

larly on Secret Sharing, which is brieﬂy described below1.

2.1.1. Secret Sharing

Secret Sharing is a basic primitive for SMC protocols. It allows data

owners to represent and share their data with other parties such that

each party participating in the computation will only have access to

a random-looking share (here denoted as h·i) of the original value.

Considering an additive arithmetic secret sharing scheme in the n-

party case, a value xshared among several parties by a dealer is

1For a more in-depth introduction to SMC we direct readers to [12].

arXiv:2210.14995v2 [eess.AS] 18 Apr 2023

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

PRIVACY-PRESERVINGAUTOMATICSPEAKERDIARIZATIONFranciscoTeixeira1,AlbertoAbad1,BhikshaRaj2;3,IsabelTrancoso11INESC-ID/IST,UniversityofLisbon,Portugal,2LTI,CarnegieMellonUniversity,USA,3MohammedbinZayedUniversityofAI,UAEABSTRACTAutomaticSpeakerDiarization(ASD)isanenablingtechnologywithnumerousapplicati...

展开>> 收起<<

PRIVACY-PRESERVING AUTOMATIC SPEAKER DIARIZATION Francisco Teixeira1 Alberto Abad1 Bhiksha Raj23 Isabel Trancoso1 1INESC-IDIST University of Lisbon Portugal2LTI Carnegie Mellon University USA.pdf

共5页,预览1页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

PRIVACY-PRESERVING AUTOMATIC SPEAKER DIARIZATION Francisco Teixeira1 Alberto Abad1 Bhiksha Raj23 Isabel Trancoso1 1INESC-IDIST University of Lisbon Portugal2LTI Carnegie Mellon University USA

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: