ACM MobiCOVID’22, Seoul, South Korea Rohan Kumar et al.
(1)
We study the network usage at the client-side of three
popular video-conferencing platforms and correlate
that with video and audio quality to understand whether
and how the two are related.
(2)
We conduct this study using Google Meet, which is
widely used in the education domain, and Microsoft
Teams and Zoom, which are commonly used in the
corporate space and education.
(3)
We quantitatively measure network usage and video-
audio quality. Since video-audio quality is also a subjec-
tive metric, we quantitatively measure the perceived
quality through a user experience study.
(4)
We use bandwidth, download payloads, upload pay-
loads, and IPAT (inter-packet arrival times) to measure
network characteristics.
(5)
We quantitatively measure video characteristics in
terms of PSNR (Peak Signal to Noise Ratio) and SSIM
(Structural Similarity Index Measure).
(6)
We measure energy in dierent audio frequencies, the
bitrate, and the number of channels for studying the
audio characteristics.
(7)
We measure these metrics for the three apps on wired
broadband over WiFi and 4G mobile Internet connec-
tions. These two varieties of connections are the most
widely used for online classes during the COVID-19
pandemic.
(8)
In our measurements, the bandwidth of wired broad-
band on optical ber is roughly about 150 Mbps while
that of 4G mobile Internet is about 11 Mbps. We want
to see how the platforms behave in terms of their net-
work usage and video-audio quality when presented
with dierent backhaul networks.
(9)
We vary the settings of the platforms in terms of micro-
phone and camera as they result in creating dierent
payloads for the network.
Organization of the Paper
The paper is organized as fol-
lows. Section 2 describes the metrics for quantitative mea-
surements of network usage, video quality, and audio quality.
All of these count towards the Quality of Service (QoS) class
of metrics. We explain the survey that we conduct to mea-
sure the quality of video and audio qualitatively, which count
towards the class of QoE. In Section 3, we analyze the perfor-
mance of the two apps using the metrics for wired broadband
and 4G mobile Internet connections. We compare our work
with the existing bodywork in Section 4. Finally, we conclude
in Section 5 and mention the future work in Section 6.
2 MEASURING THE PERFORMANCE
In this section, we describe each of the metrics we use and
discuss the setup used to collect values of those metrics.
2.1 Measuring Quantitative Performance
We measure the apps’ performance in terms of their network
usage at the client-ends, both transmitter, and receiver of the
video. An advantage of measuring at the client-end is that no
special access is required at the server. Any user can measure
performance without needing any special access to the apps.
The Upload Payload is the total payload in the packets sent
from the video source to the server. The Download Payload
is the total payload in the packets sent from the server to
the video receiver. The IPAT is the time dierence between
any two successive packets at the receiver. To compare the
network performance of both the apps, we measure Upload
Payload at the transmitter-end of the video, Download Pay-
load at the receiver-end of the video, and Standard Deviation
in IPAT. We analyze CPU utilization, memory usage, and
battery consumption for the two dierent networks.
We perform the measurements over a session for each
app, lasting for fteen minutes. Over the session, we play a
recorded video of a lecture from a university, which mim-
ics the scenario of streaming live or recorded classes and
meetings for which these apps are heavily being used. In
total, we perform twelve dierent combinations for the ses-
sion, depending on whether the microphone and camera are
switched ON or OFF. We tabulate these combinations in Ta-
ble 1. These twelve test combinations give us all the possible
congurations of the state of the apps and the accessories.
While the video contains the speaker and the slides, the cam-
era transmits the video at the receiver’s end. Since we need
at least one speaker for the video conference, the speaker’s
video is transmitted in all twelve combinations. We use Wire-
shark to capture sent and received packets. We use Numpy
and Pandas Python libraries with the Wireshark packet cap-
ture to compute the network metrics. We use a python script
to measure resource consumption of the conferencing apps’
processes. The script uses the psutil [
2
] library to capture
resource consumption characteristics.
We record the sessions using the app’s recording feature to
measure the video quality and audio quality against the local
copy of the video and audio. There are multiple techniques
available to evaluate video quality [
24
]. We use PSNR and
SSIM to compare the video quality. PSNR is a quantitative
video quality metric that gives us the inverse of the error be-
tween the original and the recorded frames. A higher PSNR
indicates better quality. SSIM is a more complex quantitative
metric that considers perceptual quality [
25
]. Its value lies
between zero and one, the latter value implying that the
two frames are the same. We use the YUV color encoding
to calculate the SSIM and PSNR values. ‘Y’ component de-
picts the brightness, ‘U’ the blue projection, and ‘V’ the red
projection [
1
]. We use Spek [
3
] to compare the audio qual-
ity, which gives us energy distribution for dierent audible
2