PRIVACY-PRESERVING AUTOMATIC SPEAKER DIARIZATION Francisco Teixeira1 Alberto Abad1 Bhiksha Raj23 Isabel Trancoso1 1INESC-IDIST University of Lisbon Portugal2LTI Carnegie Mellon University USA

2025-05-02 0 0 328.56KB 5 页 10玖币
侵权投诉
PRIVACY-PRESERVING AUTOMATIC SPEAKER DIARIZATION
Francisco Teixeira1, Alberto Abad1, Bhiksha Raj2,3, Isabel Trancoso1
1INESC-ID/IST, University of Lisbon, Portugal, 2LTI, Carnegie Mellon University, USA,
3Mohammed bin Zayed University of AI, UAE
ABSTRACT
Automatic Speaker Diarization (ASD) is an enabling technology
with numerous applications, which deals with recordings of multi-
ple speakers, raising special concerns in terms of privacy. In fact, in
remote settings, where recordings are shared with a server, clients
relinquish not only the privacy of their conversation, but also of all
the information that can be inferred from their voices. However, to
the best of our knowledge, the development of privacy-preserving
ASD systems has been overlooked thus far. In this work, we tackle
this problem using a combination of two cryptographic techniques,
Secure Multiparty Computation (SMC) and Secure Modular Hash-
ing, and apply them to the two main steps of a cascaded ASD
system: speaker embedding extraction and agglomerative hierarchi-
cal clustering. Our system is able to achieve a reasonable trade-off
between performance and efficiency, presenting real-time factors of
1.1 and 1.6, for two different SMC security settings.
Index TermsAutomatic Speaker Diarization, Privacy, Secure
Multiparty Computation, Secure Modular Hashing
1. INTRODUCTION
Automatic Speaker Diarization (ASD) is an enabling technology for
many speech-based applications. When combined with Automatic
Speech Recognition systems, ASD can provide additional context
to the reader, make transcripts clearer or even be used to perform
speaker adaptation. On its own, ASD may also allow users to search
for and filter segments that correspond to specific speakers, or, in the
case of audio diarization, specific audio events. This filtering may
be particularly important in multi-speaker audio streams, where the
target is a single speaker. In security applications, this speaker may
be a potential blacklisted criminal. In clinical interviews, it may be
the patient. In language acquisition recordings it may be the child
whose linguistic skills are being assessed. The list of potential ASD
scenarios is very extensive, ranging from courtrooms, to meetings,
sociolinguistic interviews and broadcast news, among others [1, 2].
When dealing with large amounts of speech data, when ASD is
used as part of a larger system, or even due to the lack of computa-
tional resources, it may be useful to delegate this task to an external
service. However, this setting creates a major privacy challenge: the
server will have direct access to the user’s data. This means that
the voices present in the recording and what is being said, will be
available to the server, giving it a very large repository of potentially
sensitive information [3], which the speakers may want to keep pri-
vate, or may even need to keep private for legal reasons (e.g., to
This work was supported by Portuguese national funds through
Fundac¸˜
ao para a Ciˆ
encia e a Tecnologia (FCT), with references
UIDB/50021/2020 and CMU/TIC/0069/2019, as well as by the Portuguese
Recovery and Resilience Plan (RRP) through project C645008882-00000055
(Responsible.AI).
follow privacy regulations such as the EU’s GDPR [4]) [5]. The al-
ternative of having the diarization process run on the user’s device
is also unattractive, as it would require the service provider to share
their model with the user. Considering that ASD models require
large amounts of data and high levels of expertise to be developed,
sharing them with users would make the service provider potentially
lose the value that the model holds. In cascaded ASD models, this is
particularly true for the speaker embedding extraction model [6].
The above, make this (mostly) unexplored problem – with the
notable exception of [7] – particularly interesting. In this work we
build on our previous contribution on the privacy-preserving extrac-
tion of x-vector embeddings using Secure Multiparty Computation
(SMC) [6] and extend it to the setting of ASD.
Specifically, we propose a system that performs the extraction of
speaker embeddings and the clustering step in a privacy-preserving
way, by leveraging two cryptographic techniques: SMC and Se-
cure Modular Hashing (SMH). The combination of these techniques
allows us to protect the service provider’s model, particularly the
speaker embedding extraction model, while at the same time keep-
ing the speakers’ data hidden from the server.
The remainder of this document is organised as follows: in Sec-
tion 2 we provide the necessary background on SMC and SMH; Sec-
tion 3 describes the ASD baseline model, and our privacy-preserving
system; in Section 4 we describe the experimental setup; in Sec-
tion 5 we present and discuss the results obtained; finally, Section 6
presents our conclusions and topics for future work.
2. CRYPTOGRAPHIC BACKGROUND
2.1. Secure Multiparty Computation
Secure Multiparty Computation (SMC) is an umbrella term for pro-
tocols designed to allow several parties to jointly and securely com-
pute functions over their data, while keeping all inputs private. SMC
protocols are usually built over some form of Secret Sharing [8, 9]
or Garbled Circuits [10, 11], and are often combined with crypto-
graphic primitives like Homomorphic Encryption (HE) or Oblivi-
ous Transfers (OTs) to perform specific functionalities [12]. Our
privacy-preserving approach will heavily rely on SMC, and particu-
larly on Secret Sharing, which is briefly described below1.
2.1.1. Secret Sharing
Secret Sharing is a basic primitive for SMC protocols. It allows data
owners to represent and share their data with other parties such that
each party participating in the computation will only have access to
a random-looking share (here denoted as h·i) of the original value.
Considering an additive arithmetic secret sharing scheme in the n-
party case, a value xshared among several parties by a dealer is
1For a more in-depth introduction to SMC we direct readers to [12].
arXiv:2210.14995v2 [eess.AS] 18 Apr 2023
摘要:

PRIVACY-PRESERVINGAUTOMATICSPEAKERDIARIZATIONFranciscoTeixeira1,AlbertoAbad1,BhikshaRaj2;3,IsabelTrancoso11INESC-ID/IST,UniversityofLisbon,Portugal,2LTI,CarnegieMellonUniversity,USA,3MohammedbinZayedUniversityofAI,UAEABSTRACTAutomaticSpeakerDiarization(ASD)isanenablingtechnologywithnumerousapplicati...

展开>> 收起<<
PRIVACY-PRESERVING AUTOMATIC SPEAKER DIARIZATION Francisco Teixeira1 Alberto Abad1 Bhiksha Raj23 Isabel Trancoso1 1INESC-IDIST University of Lisbon Portugal2LTI Carnegie Mellon University USA.pdf

共5页,预览1页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:5 页 大小:328.56KB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 5
客服
关注