### abstract ###
In this contribution, we propose a generic online (also sometimes called adaptive or recursive) version of the Expectation-Maximisation (EM) algorithm applicable to latent variable models of independent observations
Compared to the algorithm of  CITATION , this approach is more directly connected to the usual EM algorithm and does not rely on integration with respect to the complete data distribution
The resulting algorithm is usually simpler and is shown to achieve convergence to the stationary points of the Kullback-Leibler divergence between the marginal distribution of the observation and the model distribution at the optimal rate, \ie, that of the maximum likelihood estimator
In addition, the proposed approach is also suitable for conditional (or regression) models, as illustrated in the case of the mixture of linear regressions model {Keywords:} Latent data models, Expectation-Maximisation, adaptive algorithms, online estimation, stochastic approximation, Polyak-Ruppert averaging, mixture of regressions
### introduction ###
The EM (Expectation-Maximisation) algorithm  CITATION  is a popular tool for maximum-likelihood (or maximum a posteriori) estimation
The common strand to problems where this approach is applicable is a notion of  incomplete data , which includes the conventional sense of missing data but is much broader than that
The EM algorithm demonstrates its strength in situations where some hypothetical experiments yields  complete  data that are related to the parameters more conveniently than the measurements are
Problems where the EM algorithm has proven to be useful include, among many others, mixture of densities  CITATION , censored data models  CITATION , etc
The EM algorithm has several appealing properties
Because it relies on complete data computations, it is generally simple to implement: at each iteration,  (i)  the so-called  E-step  only involves taking expectation over the conditional distribution of the latent data given the observations and  (ii)  the  M-step  is analogous to complete data weighted maximum-likelihood estimation
Moreover,  (iii)  the EM algorithm naturally is an ascent algorithm, in the sense that it increases the (observed) likelihood at each iteration
Finally under some mild additional conditions,  (iv)  the EM algorithm may be shown to converge to a  stationary point  (\ie, a point where the gradient vanishes) of the log-likelihood  CITATION
Note that convergence to the maximum likelihood estimator cannot in general be guaranteed due to possible presence of multiple stationary points
When processing large data sets or data streams  however, the EM algorithm becomes impractical due to the requirement that the whole data be available at each iteration of the algorithm
For this reason, there has been a strong interest for online variants of the EM which make it possible to estimate the parameters of a latent data model without storing the data
In this work, we consider online algorithms for latent data models with independent observations
The dominant approach (see also Section~ below) to online EM-like estimation follows the method proposed by  CITATION  which consists in using a  stochastic approximation algorithm, where the parameters are updated after each new observation using the gradient of the incomplete data likelihood weighted by the complete data Fisher information matrix
This approach has been used, with some variations, in many different applications (see, \eg,  CITATION ); a proof of convergence was given by  CITATION
In this contribution, we propose a new online EM algorithm that sticks more closely to the principles of the original (batch-mode) EM algorithm
In particular, each iteration of the proposed algorithm is decomposed into two steps, where the first one is a stochastic approximation version of the E-step aimed at incorporating the information brought by the newly available observation, and, the second step consists in the maximisation program that appears in the M-step of the traditional EM algorithm
In addition, the proposed algorithm does not rely on the complete data information matrix, which has two important consequences: firstly, from a practical point of view, the evaluation and inversion of the information matrix is no longer required, secondly, the convergence of the procedure does not rely on the implicit assumption that the model is  well-specified , that is, that the data under consideration is actually generated by the model, for some unknown value of the parameter
As a consequence, and in contrast to previous work, we provide an analysis of the proposed algorithm also for the case where the observations are not assumed to follow the fitted statistical model
This consideration is particularly relevant in the case of %regression, or conditional missing data models, a simple case of which is used as an illustration of the proposed online EM algorithm
Finally, it is shown that, with the additional use of Polyak-Ruppert averaging, the proposed approach converges to the stationary points of the limiting normalised log-likelihood criterion (\ie, the Kullback-Leibler divergence between the marginal density of the observations and the model pdf) at a rate which is optimal
The paper is organised as follows: In Section~, we review the basics of the EM and associated algorithms and introduce the proposed approach
The connections with other existing methods are discussed at the end of Section~ and a simple example of application is described in Section~
Convergence results are stated in Section~, first in term of consistency (Section~) and then of convergence rate (Section~), with the corresponding proofs given in Appendix~
Finally in Section~, the performance of this approach is illustrated in the context of mixture of linear regressions
