
APREPRINT - OCTOBER 13, 2022
2 BACKGROUND
2.1 The Chukchi Language
The Chukotko-Kamchatkan family of languages is said to contain two branches by default. The northern branch is
referred to as the Chukotian branch (or “Luorevetlan”, based on the Chukchi ethnonym) and consists of Chukchi,
Koryak, Alutor and Kerek (now extinct). The second branch is known as Itelmen, and contains the language Western
Itelmen, which itself consists of two dialects: Khajrjusovo and Sedanka [
5
]. The language of focus for this paper is
Chukchi, a polysynthetic language spoken primarily within the Chukotka Autonomous Okrug, which is located in the
easternmost part of Siberia. Chukchi itself is an endangered indigenous language with less than 10,000 speakers at
present, and most speakers are bilingual with a primary language of Russian. There are only less than 100 speakers
who don’t speak Russian at all. Instances and usages of this language are difficult to come by, and is not a language
taught in schools. The decreasing use of this language in general everyday life, as well prominence of Russian within
the community demonstrates the necessity for an automatic speech recognition system, so that we may provide more
accessibility to such an endangered and very low-resource language and its community.
2.2 What is a Low-Resource Language?
In the field of NLP, research tends to have a large focus on languages where data and native speakers are easily
accessible, and the language is relatively well-known. These are referred to as high-resource languages, and as such,
produce a large quantity of data. On the other hand, low-resource languages (occasionally referred to as LRLs) are
usually “..less studied, resource scarce, less computerized, less privileged, less commonly taught, or low density..”
[
8
] and therefore are not prioritized in the realm of NLP research. However, this is actually one of the more major
motivating factors for our project. Chukchi is an incredibly low-resource language, an example of which is that most of
the up-to-date information regarding the language and its speakers is most easily accessed from a detailed article found
on Wikipedia
1
. The low-resourcedness of Chukchi is what inspired this project, as it is an endangered language, and
one that is not particularly accessible in terms of media, education, and history. By creating a new automatic speech
recognition system, not only can accessibility be provided for this language, but it also creates new opportunities for the
same achievements in other low-resource languages.
2.2.1 What is an ASR System?
Traditionally, modern automatic speech recognition systems are typically made up of three different parts: a lexicon, an
acoustic model, and a language model
2
. The lexicon contains the information that an ASR system needs to be able to
understand the input it receives on the base level. This includes things such as phonetic transcription codes that are
used for the target language’s phonemes. For English, ARPABET
3
and TIMIT
4
are the most commonly used codes and
transcriptions, developed by the Defense Advanced Research Projects Agency (DARPA).
The second component of an ASR system is the acoustic model, which is responsible for forging the relationships
between the phonemes of a language (such as the ones provided in the lexicon) and an audio signal. This interaction is
supported by the use of transcripts along with their respective audio files
5
, and are thus supposed to be able to map
statistical representations for feature vector sequences of a particular phoneme (or sound unit) and classify it [
11
]. This
allows the system to recognize and distinguish this particular sound unit from the rest of the phonemes that it may
encounter in both training data and experimental data.
Finally, there is the language model, which helps to provide clearer contexts and allows the model to view the language
in a naturally occurring form. Thus, this is where training comes in. By training the language model, contexts become
more comprehensive and coherent when interacting with the system, and are thus understandable. By design, the
system, with the help of all of these aforementioned components, is then supposed to be able to predict speech patterns.
1
Link to the article in question:
https://en.wikipedia.org/wiki/Chukchi_language#:~:text=Chukchi%20%2F%CB%
88t%CA%83%CA%8Ak,mainly%20in%20Chukotka%20Autonomous%20Okrug.&text=In%20the%20UNESCO%20Red%20Book,
the%20list%20of%20endangered%20languages
2
Information about the basic components of an automatic speech recognition system is widely available, one of the more easily
understandable sources can be found here: https://voximplant.com/blog/what-is-automatic-speech-recognition
3https://en.wikipedia.org/wiki/ARPABET
4https://en.wikipedia.org/wiki/TIMIT
5
Microsoft conducts extensive research regarding acoustic models, more information as well as links to other sources and
publications can be found here: https://www.microsoft.com/en-us/research/project/acoustic-modeling/
2