BLIND SOURCE SEPARATION USING HELLINGER DIVERGENCE AND COPULAS

. Whenever there is a mixture of signals of any type, e


Introduction
Blind Source Separation is the technique used to extract sources, from observations of their mixtures without the knowledge of the original signals or the mixing process.Signal processing and Machine learning communities have widely explored the challenges in BSS during the last three decades.It was first used for the cocktail party problem, where the aim was to separate the sound signals of each person's speech.Then exploited in other scientific fields such as signal processing, image processing, medical signal processing, artificial neural networks, statistics, and information theory, speech recognition systems, telecommunications.
BSS is an ill posed problem, therefore, in literature, various assumptions on the sources has been made to enable the separation of the observed mixtures.For the linear and static mixing environment, Independent Component Analysis (ICA) [9] is used.It considers the sources to be mutually independent and non-Gaussian.Under these assumptions, the source signals can be estimated by optimizing a cost function.Numerous variations of ICA were introduced to literature citing for examples, maximizing likelihood [5], minimizing the mutual information [13,29], minimizing the criteria of -divergences [14,18], the second or higher order statistics [7,28], etc.A good overview on the problem can be found in [10].
In [17,18,22] a new BSS algorithm was proposed to overcome the drawbacks of ICA techniques.This algorithm uses Copula to accurately model the dependency structure between the source components, hence omitting the mutual independence assumption.In this paper we make use of this copula and focus on the Hellinger divergence between the copula densities as our cost function to minimize, due to its efficiency and robustness in improving the results even for noisy data [4,23], moreover one of its main characteristics is its rapid convergence compared to any other divergences.
This paper is organized as follows, section "Blind Source Separation" gives an overview on the BSS principle and model, then a review on copula in section "What are copulas?".After that in section "Hellinger divergence and copula" we present our cost function as the Hellinger divergence between the copula densities.We then introduce our new approach detailing separately the independent and the dependent cases, in section "The proposed approach".Then the comparison between our approach and various other methods is made, illustrating that the superiority of our approach in section "Simulation results".Finally, we conclude the paper and give some further research directions.

Blind Source Separation
The linear BSS problem states that the  unknown source components () ∈ R  , are blindly mixed together through a matrix  containing the mixing coefficients.In vector notations where () ∈ R  are the observations, and () is an additive noise.In our work we consider the determine case where the number of sources is the same as the number of the observations e.g. =  and that the additive noise is omitted using a pre-processing technique [14].The new BSS model is as follows: having only the observed signals (), and no prior knowledge about the mixing process, the BSS solution searches for the optimum  ×  un-mixing matrix , which gives the recovered sources Where () is the estimated sources that would be similar to the wanted sources () if the un-mixing matrix  is as close as possible to  −1 .

What are copulas?
Copulas has become very popular recently as a method to model the dependency structure of random variables.We can define copula as the function that helps us to connect univariate marginal distributions to a joint multivariate distribution function with a specific form of dependency.The Sklar's theorem [32], which is the fundamental theorem for copulas, affirm the copula function's existence, which it is of the form: Where  is an p-dimensional distribution function with marginals  1 , . . .,   .C Z (•) is the copula function which is also a joint distribution function on [0, 1]  in itself, with uniform margins.We have the following: If  1 , . . .,   are all continuous, then C Z (•) is unique.In the opposite direction, consider a copula, C Z (•), and univariate distribution functions,  1 , . . .,   .Then  as defined in (3.1) is a joint multivariate distribution function with marginals  1 , . . .,   .
For the case, where the components of a random vector variable, Z := ( 1 , . . .,   ) ⊤ ∈ R  are statistically independent, we have the copula of independence denoted C ∏︀ (.) of the form: If the copula has a density, then it is obtained in the following manner as Using the last formula we can obtain the density of the copula of independence as follows: For the the random vector Z := ( 1 , . . .,   ) ⊤ , let  Z (•) be its probability density if it exists, and  1 (•), . . .,   (•) ∈ R  the marginal probability densities of  1 , . . .,   respectively.We can obtain the following relation after some uncomplicated computations Numerous models for copulas have been proposed in the literature.Semi-parametric copula models class are the most popular for modeling and estimating the structure of dependency.For this class the parametric copulas C(•, ) is indexed by a parameter  ∈ Θ ⊂ R  , with a non-parametric margins.
In Table 1 we recall a description of three models of copula: Clayton [8], Ali-Mikhail-Haq (AMH) [1] and Frank [15], which were used in Section "Simulation results" in our simulation study.We provide the respective parameter space Θ for each model and the parameter  corresponding to the independence hypothesis of margins denoted  0 , in other words For a better understanding on the widely used semi-parametric copulas, one may refer to [21,25].
In the following lines, we outline briefly one of the copula model selection procedures and the method of estimating the parameter  from the data.For a random vector  ∈ R  , let's assume that a training sample of  is available, that is, we dispose of i.i.d.realizations (1), . . ., ( ) of .
The objective is to select the "best" copula model from the data, among a list of candidate models, that models the dependence structure of the components , and to estimate the parameter  of the model selected.
be a list of candidate copula models.The selection of models can be done using the Bayesian information criterion (BIC) [30], resulting from the semiparametric log-likelihood, see e.g.[16] and [33].Denote by   (•,   ) the density of the copula C  (•,   ), for all .
The BIC, of a given model , is defined by The ideal model is the one which minimizes the BIC values, namely, the density copula model

𝐵𝐼𝐶(𝑘).
Denote, simply, {︀ (•, );  ∈ Θ ⊂ R  }︀ a selected model according to the above procedure.The parameter  of the copula model in question can be estimated by maximizing the semi-parametric log-likelihood

Hellinger divergence and copula
One of the important issues in many applications of probability theory is finding an appropriate measure of distance between two probability distributions.A number of divergence measures for this purpose have been studied.In this paper, we singled out Hellinger divergence [11] as the measure of instantaneous information because it improves the maximum likelihood in terms of efficiency-robustness for noisy data.It also converges faster than other divergences, see, e.g., [4,20].
The Hellinger distance denoted  between two probability density functions is defined through where  and  are two probabilities on R  and  is absolutely continuous with respect to .Note that the function  → (, ) is convex and non-negative, for any given probability .Furthermore, we have the following fundamental property which was proved in [12]: The Hellinger distance  between the joint density   (•) of the random vector  := ( 1 , . . .,   ) ⊤ ∈ R  ,  ≥ 1, and the product of the the marginal densities   of the components   ,  ∈ {1, . . ., }, is given by where E is the mathematical expectation.
Note that )︂ is non-negative and reaches its minimum value zero only when the components of the random vector  are statistically independent, in other words: From equation (4.2) and using formula (3.3), the hellinger distance This last equation implies that the Hellinger distance between the product of the marginal densities and the joint density of the random vector  can be also defined as the Hellinger distance between the copula density of independence  ∏︀ , and copula density   of the random vector

A separation procedure for independent sources.
As shown in the previous section the Hellinger distance  ( ∏︀ ,   ) between the copula density of independence and the copula density of the random variable  is always positive and only achieve its minimum zero if the components of  are statistically independent and the un-mixing matrix  =   −1 , where  and  are, a diagonal and permutation matrix respectively.
For a successful separation, the idea is to minimize an estimate ̂︀  ( ∏︀ ,   ) constructed from the data (1), . . ., ().Therefore, the separation matrix is calculated in this fashion (5.2) That results in approximating the components ̂︀ () = ̂︀  (),  = 1, . . .,  .Considering equation (4.3), we introduce the following estimate of the distance  ( ∏︀ ,   ) as where the kernel estimate of the copula density   (.) is of the form with ̂︀   (),  = 1, . . .,  the smoothed estimate of the marginal distribution functions   () for the random variable   .For any real value  ∈ R, ̂︀   () is defined by (.) is a symmetric and centered probability density and the primitive of a kernel (.) which we chose it to be the regular Gaussian density in this study.A more acceptable kernel choice (.) that copes with the boundary effect can be rendered according to [26] to approximate the copula density.
The un-mixing matrix is written as follow:  =  () hence, the estimated sources take the upcoming form: () =  ()(),  = 1, . . .,  .Accordingly, the estimate ̂︀  ( ∏︀ ,   ) is a function of the parameter vector  which can be computed using a gradient descent algorithm by minimizing ̂︀  := arg min with respect to .The un-mixing matrix is then estimated by which results in approximating the source signals: The gradient in  of  ( ∏︀ ,   ) can be calculated from the proper definitions of the estimates as follows: (5.12) where, with with () =  ()(),  = 1, . . .,  .
The following algorithm sums up the proposed approach for the separation of independent components: Algorithm 1 The separation algorithm for independent source components.

A separation procedure for dependent sources.
In this section we tackle the case of dependent source components, hence, we can't use the independent copula density as in the previous section.Denote by   (•) the unknown semi-parametric copula density of .We assume that it belongs to a set of  candidate semi-parametric models, say, (5.15) Table 1 gives some examples of semi-parametric copula density models.Each "semiparametric" model   for  = 1, . . ., , satisfies the following identifiability condition: for any regular matrix , if the copula density, of , belongs to {   (•);   ∈ Θ  ⊂ R}, then  =  , where  is diagonal and  is a permutation.To get the objective function for dependent sources all we have to do is to replace the copula density of the independence sources in 5.3 by the semi-parametric copula density {   (•) [22].The new objective function will be of the following form: this term is non-negative and achieves its minimum value zero iff  =  −1 (up to scale and permutation indeterminacies).Therefore, we estimate the demixing matrix by where The copula density as well as the marginal distribution functions estimates are defined as before.The solution ︀  can be computed by a gradient descent algorithm with respect to both  and  of the criterion function (,   ) ↦ → ̂︀  (  ,   ) for each model and then choose the solution minimizing the criterion over all considered models.The calculations for the gradient of the Hellinger divergence is the same as the one stated in the case of independence (5.12).The Algorithm (2) summarizes the presented method.
In all the instances of our experiments the number of samples is  = 3000.The matrix used to mix the source components is  := [1 0.7 0.7 ; 0.7 1 0.7 ; 0.7 0.7 1], and  = 0.1 is the chosen gradient descent parameter.All simulations are iterated 80 times, and the accuracy of the estimated sources is calculated using the signal-to-noise-ratio criterion, which is defined by    := 10 log 10 ,  = 1, 2, 3. (6.1)

Independent source components
We consider in this experiment three mixed signals from two types of sample sources: -Uniform i.i.d with independent components (see Fig. 1a).
From Figures 1a and 1b, we observe that for both independent samples the SNR is close to 45dB which is considered highly satisfying for this classical case.In the other hand, Figures 2a and 2b present the criterion value vs. iterations.We can see that the separation is achieved when our criterion converges to its minimum value 0.
Table 2 illustrate the different SNR values of the sources, for our approach and other methods, we can see that the method proposed achieves the separation with similar accuracy with a slight improvement for the independent source components case and faster convergence compared to the latest Alpha-divergence method [27] when alpha = 0.5.

Dependent source components
Within this subsection we demonstrate the ability of the proposed approach (Algorithm 2 for dependent sources) to successfully separate mixtures of three dependent signals, we dealt with instantaneous mixtures of tree kinds of sample sources: -i.i.d (with uniform marginals) sources with dependent components generated from AMH copula with  = 0.75.In Figures 3a and 3b, we have shown the SNRs for dependent sources from Clayton and Frank copulas.From the simulation results it is noticeable that the proposed approach can separate the mixtures of dependent source components, with good performance.
Moreover, Figures 4a and 4b show the criterion value versus iterations for Clayton and Frank copulas.We can see that the separation is achieved when our criterion converges to its minimum value 0.

Noisy source components
In this subsection we test the accuracy of our approach for noisy data.We work with the same source signals as above and the same conditions with an added white gaussian noise to the observed signals.We take   = −25.
Figure 5a illustrate the SNR of the independent sources, it can be seen that the proposed approach is able to separate noisy independent sources with good performance, and Figure 5b shows that when the separation is achieved our criterion converges to its minimum 0. Figure 6a showcase the SNR of the dependent sources from Clayton copula, the proposed approach is able to separate even noisy dependent sources.Moreover Figure 6b shows that the criterion in this case also converges to its minimum 0.
Table 4 present the output SNR values of the estimated sources using our approach and the other methods, we can see that the approaches are equivalent, with superiority of our method, in case of noise-contaminated   independent source components.On the other hand, our approach is apt to separate even noisy mixtures of dependent source components with higher accuracy.

Conclusion
We have presented a new BSS algorithm, that is able to separate instantaneous linear mixtures of both independent and dependent source components.Our approach proceeds in two steps: First a normalization stage with spatial whitening and the then the application of Givens rotations, minimizing the estimate of the Hellinger distance.This divergence works better in presence of noise and it also converge faster than the usual Kullback-Leibler divergence as illustrated in section "Simulation results" for 3 × 3 mixture-sources, where the efficiency and the accuracy of the proposed algorithms is evaluated through the signal-to-noise-ratio criterion.It  should be noted that our proposed algorithms are more time-consuming compared to the classic ones, considering that we estimate both copulas density of the vector and the marginal distribution function of each component, however, ours gives better results especially for the dependent noisy source components case.

Table 2 .
Output SNR's for independent source components.

Table 3 .
Output SNR's for dependent source components.

Table 4 .
Output SNR's for independent and dependent noisy source components.