We presented a deep neural network based speech de-identiﬁcation method that can map vocoder features of human speech to those of a generic TTS engine with little or minimal loss of sound quality using the TIMIT data set. The novelty of our scheme is that de-identiﬁcation is based on speech-text sample pairs, which are widely available in the speech processing community. In the resulting signal,

2https://people.inf.elte.hu/foauaai/deidentification

Table 2: Results of the perceptual listening experiments. We report the average and the standard deviation of the identiﬁcation accuracy.

Task # of

subj.

# of samp.

Accuracy
mean*±*std

Random choice A-not-B

22

20 0.56*±*0.15 0.5

Female/Male (2AFC) 15 0.51*±*0.15 0.5

# of Speakers 6 0 0.16

(6AFC) 6 0.18 0.16

the identity of the speaker is concealed, as conﬁrmed by our perceptual listening experiments.

A limitation of our technique is that the dynamics of the original speaker are inherited due to the application of DTW. We hypothesize that this problem may be alleviated by applying DTW in the loss function of the deep network. We leave such studies to future work.

Our technique enables privacy-aware speech recognition for adults. The pro- posed method is lightweight and can be used for collecting de-identiﬁed databases when the privacy of the user is important, for example in cloud-based speech services or in medical records. The fact that our method requires only speech- transcript sample pairs is a very promising aspect for deep learning, which requires large and high quality databases.

**References**

[1] Aaron van den, Oord, Dieleman, Sander, Zen, Heiga, Simonyan, Karen, Vinyals, Oriol, Graves, Alex, Kalchbrenner, Nal, Senior, Andrew, and Kavukcuoglu, Koray. Wavenet: A generative model for raw audio.

*arXiv:1609.03499*, 2016.

[2] Abadi, Mart´ın, Agarwal, Ashish, et al. TensorFlow: Large-scale machine learn- ing on heterogeneous systems, 2015. Software available fromtensorflow.org.

[3] Amodei, Dario, Ananthanarayanan, Sundaram, et al. Deep Speech 2: End-to-
end speech recognition in English and Mandarin. In*Proceedings of the 33rd*
*International Conference on Machine Learning*, pages 173–182, 2016.

[4] Black, Alan. The Festival Speech Synthesis System: System Documentation (1.1.1). Technical Report HCRC/TR-83, Human Communication Research Center, 1997.

[5] Chollet, Francois et al. Keras. https://keras.io, 2015.

[6] Degottex, Gilles, Lanchantin, Pierre, and Gales, Mark. A log domain pulse
model for parametric speech synthesis. *IEEE/ACM Transactions on Audio,*
*Speech, and Language Processing*, 26(1):57–70, 2018. DOI: 10.1109/taslp.

2017.2761546.

[7] Erro, Daniel, Sainz, Inaki, Navas, Eva, and Hernaez, Inma. Harmonics plus
noise model based vocoder for statistical parametric speech synthesis. *IEEE*
*Journal of Selected Topics in Signal Processing*, 8(2):184–194, 2014. DOI:

10.1109/jstsp.2013.2283471.

[8] Espic, Felipe, Botinhao, Cassia Valentini, and King, Simon. Direct modelling of magnitude and phase spectra for statistical parametric speech synthesis.

*Proc. Interspeech*, 2017. DOI: 10.21437/interspeech.2017-1647.

[9] Fisher, William M., Doddington, George R., Goudie-Marshall, Kathleen M., Jankowski, Charles, Kalyanswamy, Ashok, Basson, Sara, and Spitz, Judith.

NTIMIT: A phonetically balanced, continuous speech, telephone bandwidth
speech database. In *Proc. IEEE ICASSP*, pages 109–112, 1990. DOI: 10.

1109/icassp.1990.115550.

[10] Fukada, Toshiaki, Tokuda, Keiichi, Kobayashi, Takao, and Imai, Satoshi. An
adaptive algorithm for mel-cepstral analysis of speech. In*Acoustics, Speech,*
*and Signal Processing, 1992. ICASSP-92., 1992 IEEE International Confer-*
*ence on*, volume 1, pages 137–140. IEEE, 1992. DOI: 10.1109/icassp.1992.

225953.

[11] He, Kaiming, Zhang, Xiangyu, Ren, Shaoqing, and Sun, Jian. Deep residual
learning for image recognition. In*Proceedings of IEEE Conference on Com-*
*puter Vision and Pattern Recognition*, pages 770–778, 2016.

[12] Hsu, Chin-Cheng, Hwang, Hsin-Te, Wu, Yi-Chiao, Tsao, Yu, and Wang,
Hsin-Min. Voice conversion from non-parallel corpora using variational auto-
encoder. In*APSIPA, Asia-Paciﬁc*, pages 1–6. IEEE, 2016. DOI: 10.1109/

apsipa.2016.7820786.

[13] Hsu, Jeremy et al. PyWorldVocoder: A Python wrapper for World Vocoder. https://github.com/JeremyCCHsu/ Python-Wrapper-for-World- Vocoder, 2016.

[14] Imai, Satoshi, Kobayashi, Takao, et al. Speech signal processing toolkit (SPTK), 2009. http://sp-tk.sourceforge.net.

[15] Iser, Bernd, Minker, Wolfgang, and Schmidt, Gerhard. Broadband spectral
envelope estimation. *Bandwidth Extension of Speech Signals. Lecture Notes in*
*Electrical Engineering*, 13:67–95, 2008. DOI: 10.1007/978-0-387-68899-2_

5.

[16] Jin, Qin, Toth, Arthur R., Schultz, Tanja, and Black, Alan W. Speaker de-
identiﬁcation via voice transformation. *IEEE Workshop on Automatic Speech*
*Recognition & Understanding*, pages 529–533, 2009. DOI: 10.1109/ASRU.

2009.5373356.

[17] Justin, Tadej, Struc, Vitomir, Dobrisek, Simon, Vesnicer, Bostjan, Ipsic,
Ivo, and Mihelic, France. Speaker de-identiﬁcation using diphone recognition
and speech synthesis. *11th IEEE International Conference and Workshops*
*on Automatic Face and Gesture Recognition (FG)*, pages 1–7, 2015. DOI:

10.1109/FG.2015.7285021.

[18] Kawahara, Hideki. STRAIGHT, exploitation of the other aspect of
VOCODER: Perceptually isomorphic decomposition of speech sounds. *Acous-*
*tical Science and Technology*, 27(6):349–353, 2006. DOI: 10.1250/ast.27.

349.

[19] Kominek, John, Schultz, Tanja, and Black, Alan W. Synthesizer voice quality
of new languages calibrated with mean mel cepstral distortion. *First Inter-*
*national Workshop on Spoken Languages Technologies for Under-Resourced*
*Languages (SLTU-2008)*, pages 63–68, 2008.

[20] Liptchinsky, Vitaliy, Synnaeve, Gabriel, and Collobert, Ronan. Letter-based
speech recognition with Gated ConvNets. *CoRR*, abs/1712.09444, 2017.

[21] Liu, Li-Juan, Chen, Ling-Hui, Ling, Zhen-Hua, and Dai, Li-Rong. Spectral conversion using deep neural networks trained with multi-source speakers.

*Proc. IEEE ICASSP*, pages 4849–4853, 2015. DOI: 10.1109/ICASSP.2015.

7178892.

[22] Magari˜nos, Carmen, Lopez-Otero, Lopez-Otero, Paula, Docio-Fernandez,
Laura, Rodriguez-Banga, Eduardo, Erro, Daniel, and Garcia-Mateo, Car-
men. Reversible speaker de-identiﬁcation using pre-trained transformation
functions. *Computer Speech & Language*, 46:36–52, 2017. DOI: 10.1016/j.

csl.2017.05.001.

[23] Mohammadi, Seyed Hamidreza and Kain, Alexander. Voice conversion using
deep neural networks with speaker-independent pre-training.*Proc. IEEE SLT*
*Workshop*, pages 19–23, 2014. DOI: 10.1109/SLT.2014.7078543.

[24] Mohammadi, Seyed Hamidreza and Kain, Alexander. An overview of voice
conversion systems. *Speech Communication*, 88:65–82, 2017. DOI: 10.1016/

j.specom.2017.01.008.

[25] Morise, Masanori. CheapTrick, a spectral envelope estimator for high-quality
speech synthesis. *Speech Communication*, 67:1–7, 2015. DOI: 10.1016/j.

specom.2014.09.003.

[26] Morise, Masanori. D4C, a band-aperiodicity estimator for high-quality speech
synthesis. *Speech Communication*, 84:57–65, 2016. DOI: 10.1016/j.specom.

2016.09.001.

[27] Morise, Masanori, Kawahara, Hideki, and Nishiura, Takanobu. Rapid F0 es- timation for high-SNR speech based on fundamental component extraction.

*IEICE TRANSACTIONS on Information and Systems*, 93:109–117, 2010.

[28] Morise, Masanori, Yokomori, Fumiya, and Ozawa, Kenji. WORLD: A vocoder-
based high-quality speech synthesis system for real-time applications. *IE-*
*ICE Trans. Info. Sys.*, 99(7):1877–1884, 2016. DOI: 10.1587/transinf.

2015edp7457.

[29] Nakashika, Toru, Takiguchi, Tetsuya, and Ariki, Yasuo. Voice conversion
based on speaker-dependent restricted Boltzmann machines. *IEICE Trans-*
*actions on Information and Systems*, E97.D(6):1403–1410, 2014. DOI:

10.1587/transinf.E97.D.1403.

[30] Nguyen, Hy Quy, Lee, Siu Wa, Tian, Xiaohai, Dong, Minghui, and Chng,
Eng Siong. High quality voice conversion using prosodic and high-resolution
spectral features. *Multimedia Tools and Applications*, 75(9):5265–5285, 2016.

DOI: 10.1007/s11042-015-3039-x.

[31] Qian, Jianwei, Du, Haohua, Hou, Jiahui, Chen, Linlin, Jung, Taeho, Li, Xiang-
Yang, Wang, Yu, and Deng, Yanbo. VoiceMask: Anonymize and sanitize voice
input on mobile devices. *arXiv:1711.11460*, 2017.

[32] Sekii, Yusuke, Orihara, Ryohei, Kojima, Keisuke, Sei, Yuichi, Tahara, Ya-
suyuki, and Ohsuga, Akihiko. Fast many-to-one voice conversion using autoen-
coders. *Proceedings of the 9th International Conference on Agents and Artiﬁ-*
*cial Intelligence*, pages 164–174, 2017. DOI: 10.5220/0006193301640174.

[33] Tokuda, Keiichi, Kobayashi, Takao, Masuko, Takashi, and Imai, Satoshi. Mel-
generalized cepstral analysis — a uniﬁed approach to speech spectral estima-
tion. In*Third International Conference on Spoken Language Processing*, 1994.

[34] Wu, Zhizheng, Watts, Oliver, and King, Simon. Merlin: An open source neural
network speech synthesis system. In *9th ISCA Speech Synthesis Workshop*.

ISCA, 2016. DOI: 10.21437/ssw.2016-33.

[35] Zue, Victor, Seneﬀ, Stephanie, and Glass, James. Speech database develop-
ment at MIT: TIMIT and beyond. *Speech Communication*, 9:351–356, 1990.

DOI: 10.1016/0167-6393(90)90010-7.

### Toolset for Supporting the Research of Lattice Based Number Expansions

### P´ eter Hudoba

^{ab}

### and Attila Kov´ acs

^{ac}

**Abstract**

The world of generalized number systems contains many challenging areas.

Computer experiments often support the theoretical research. In this paper we introduce a toolset that helps to analyze some properties of lattice based number expansions. The toolset is able to (1) analyze the expansions, (2) decide the number system property, (3) classify and visualize the periodic points.

The toolset is implemented in Python, published alongside with a database that stores plenty of special expansions, and is able to store the custom prop- erties like signature, operator eigenvalues, etc. Researchers can connect to the server and request/upload data, or perform experiments on them.

We present an introductory usage of the toolset and detail some results that has been observed by the toolset. The toolset can be downloaded from http://numsys.infodomain.

**1** **Introduction**

The generalization of positional number representations to a wide range of digit sets or to higher dimensions is a fascinating story. Gr¨unwald (1885) investigated negative-based, Kempner (1936), Knuth (1960), Khmelnik (1964), Penney (1965) complex-based systems. From the 70’s K´atai, B. Kov´acs, K¨ornyei, Peth˝o (the

“Hungarian school”) and Gilbert examined systematically the radix extensions in
*algebraic number ﬁelds*. In the 90’s the *topological aspects* of radix representa-
tions were studied by Bandt, Indlekofer, J´arai, K´atai, Lagarias, Wang, Vince, and
later by Akiyama, Thuswaldner and others. The canonical number representation
was generalized to*arbitrary polynomial systems* by Peth˝o (1989), and investigated
later extensively by many authors (incl. Akiyama, Brunotte, Kov´acs, Peth˝o, Rao,
Scheicher, Thuswaldner). The number system concept in *general lattices* was in-
vestigated ﬁrst by Vince (1993). The*algorithmic aspects*of canonical (polynomial)
systems was initiated by Brunotte (2001) and for general lattices by the second

*a*E¨otv¨os Lor´and University, Budapest, Hungary

*b*E-mail:peter.hudoba@inf.elte.hu, ORCID:0000-0001-5810-4193

*c*E-mail:attila@inf.elte.hu, ORCID:0000-0002-1858-7618

DOI:10.14232/actacyb.289524

author (2000). Recently, a special type of radix systems (SRS) studied in length by Thuswaldner and his co-workers (the “Austrian school”).

**2** **Preliminaries**

Let Λ be a lattice in R^{n} and let *M* : Λ *→* Λ be a linear operator such that
det(*M*)= 0. Let furthermore 0 *∈D⊆*Λ be a ﬁnite subset. Lattices can be seen
as ﬁnitely generated free Abelian groups and have many signiﬁcant applications
in pure mathematics (Lie algebras, number theory and group theory), in applied
mathematics (coding theory, cryptography) because of conjectured computational
hardness of several lattice problems, and are used in various ways in the physical
sciences. We note that the number system research in general lattices comprises
also the orders.

**Deﬁnition 1.** *The triple* (Λ*, M, D*) *is called a* number system *(GNS) if every*
*elementxof* Λ*has a unique, ﬁnite representation of the form*

*x*=
*L*
*i*=0

*M*^{i}*d**i* *,*

*whered**i**∈D* *andL∈*Z*(L*+ 1*is the length of the expansion).*

Here*M* is called the*base* and*D* is the*digit set*. It is easy to see that similarity
preserves the number system property, i.e., if*M*_{1}and*M*_{2}are similar via the matrix
*Q* then (Λ*, M*_{1}*, D*) is a number system if and only if (*Q*Λ*, M*_{2}*, QD*) is a number
system at the same time. If we change the basis in Λ a similar integer matrix can
be obtained, hence, there is no loss of generality in assuming that *M* is integral
acting on the latticeZ^{n}. If two elements of Λ are in the same coset of the factor
group Λ*/M*Λ then they are said to be*congruent*modulo*M*. The following theorem
gives a necessary condition for the number system property.

**Theorem 1.** *If* (Λ*, M, D*) *is a number system, then (1)* *D* *must be a full residue*
*system modulo* *M, (2)* *M* *must be expansive, and (3)* det(*I**n**−M*) =*±*1*. (unit*
*condition). If a system fulﬁls the ﬁrst two conditions then it is called a radix system.*

We note that the theorem in this form was stated ﬁrst in [9] but it was well-
known and used much earlier by K´atai and Vince. The full residue system property
can be decided easily using Smith normal form [8]. Algorithms, that calculate the
eigenvalues of *M* exactly in a ﬁnite number of steps exist only for a few special
classes of matrices. For general matrices, iterative algorithms produce approximate
solutions. In polynomial systems, where *M* is the companion of a monic inte-
ger polynomial*f*, deciding the Schur or Hurwitz stability of *f* is computationally
equivalent with the expansivity check. Veriﬁcation of the unit condition is trivial.

Write *ϕ* : Λ *→* Λ, *x* *→*^{ϕ} *M*^{−1}(*x−d*) for the unique *d* *∈* *D* satisfying *x* *≡* *d*
(mod*M*). Since *M*^{−1} is contractive and*D* is ﬁnite, there exists a norm *.* on Λ
and a constant *C* such that the orbit of every *x* *∈* Λ eventually enters the ﬁnite

set *S* =*{x∈*Λ*| x< C}* after repeated application of *ϕ*. This means that the
sequence*x, ϕ*(*x*)*, ϕ*^{2}(*x*)*, . . .* is eventually periodic for all*x∈*Λ. Clearly, (Λ*, M, D*)
is a number system iﬀ for every*x∈*Λ the orbit of*x*eventually reaches 0. A point
*p*is called*periodic*if*ϕ*^{k}(*p*) =*p*for some*k >*0. The orbit of a periodic point*p*is a
*cycle*. The set of all periodic points is denoted by*P*. The*signature* [*l*_{1}*, l*_{2}*, . . . , l**ω*]
of a radix system is a ﬁnite sequence of non-negative integers in which the periodic
structure *P* consists of #*l**i* cycles with period length*i* (1 *≤i* *≤ω*). Clearly, the
signature of a number system is*Sig*= [1].

The following problem classes are in the mainstream of the research.

• For a given (Λ*, M, D*) the*decision problem* asks if the triple forms a number
system or not.

• For a given (Λ*, M, D*) the*classiﬁcation problem* means ﬁnding all cycles (*wit-*
*nesses*).

• The*parametrization problem*means ﬁnding parametrized families of number
systems.

• The*construction problem* aims at constructing a digit set*D* to*M* for which
(Λ*, M, D*) is a number system. In general, construct a digit set*D*to*M* such
that (Λ*, M, D*) satisﬁes a given signature.

We note that the algorithmic complexity of the decision and classiﬁcation problems are still unknown.

The*fundamental domain* or set of “fractions” in (Λ*, M, D*) can be deﬁned as
*H* =

'_{∞}

*i*=1

*M*^{−}^{i}*d**i*:*d**i**∈D*
(

*⊆*R^{n}*.*

**Theorem 2.** *(a)H* *is compact. (b)x∈ P ⇔x∈ −H.*

The compactness was proved by many authors (see e.g. Vince [15]). The *⇒*
part of (b) was proved in [9]. The other direction is obvious as well, otherwise 0
would have at least two diﬀerent representations.

The theorem means that in order to determine the periodic points it is enough
to localize the lattice points in*−H*. There are two diﬀerent approaches to overcome
this problem: the IFS-method (see [8, 10]), and the covering method (see [8, 4]),
which was optimized by the authors [6]. The idea of the latter is that we can put the
compact set*−H* into a box*B* in which the integer elements are easily enumerable.

Then, we can compute the pairs (*x, ϕ*(*x*)) for all*x∈B*, and ﬁnally, we determine
the cycles applying one of the cycle ﬁnding methods.

There are other algorithms for solving the decision/classiﬁcation problems.

Based on the method of Vince [15], Brunotte [2] described a digit-propagation algorithm for polynomial systems with canonical digits. Later, his method was gen- eralized for arbitrary operators and digit sets [4]. The shortcoming of this method

is the sequential nature of the digit propagation, however, there is an algorithmic attempt to overcome this disadvantage [14].

Let *f*(*x*) = *a*_{0} +*a*_{1}*x*+*a*_{2}*x*^{2}+*· · ·*+*a**n**−*1*x*^{n}^{−1} +*x*^{n} be an integer (monic)
polynomial. Let us denote the factor ring Z[*x*]*/*(*f*) by Λ*f*. Then Λ*f* is a lattice
and all the problems regarding number expansions in Λ*f* can be formulated inZ^{n},
where *M* is the companion of *f*. If *f* is irreducible then Λ*f* is isomorphic toZ[*θ*]

where *f*(*θ*) = 0 in an appropriate extension of Q. Hence, if the digit set *D* is
restricted to be a set of non-negative numbers *D* =*{*0*,*1*, . . .* *|a*_{0} *| −*1*}*, we get a
straightforward generalization of the traditional number systems inZ. In this case
the digit set is called*canonical*. If the radix system (Λ*f**, θ, D*) satisﬁes the unique
representation property of Deﬁnition 1 with some canonical digit set *D* then it is
called a *canonical number system* (CNS). The notion of canonical digit sets can
be extended to form a *j-canonical* set *D**j* = *{*0*, e**j**, . . . ,*(*|* *a*_{0} *| −*1)*e**j**} ⊂* Z^{n} (*e**j*

is the*j*th unit vector) [8]. There exists a canonical number system in *O**K* – the
ring of integers of the algebraic number ﬁeld*K* – if and only if there is a power
integral basis in*O**K* [12]. We note that canonical digit sets may or may not exist in
diﬀerent lattices and canonicity depends on the chosen basis. The*symmetric digit*
*set* is formed by those integer multiples of*e**j* which are closest to the origin, and
was introduced by K´atai [7]. The *adjoint digit set* consists of those lattice points
which are in det(*M*))

*−*_{2}^{1}_{,}*,*^{1}_{2}

. The*dense digit set* — in which each digit has the
smallest norm in its residue class — were introduced and used extensively by the
second author. We note that the adjoint set is a dense one in a special basis.