### abstract ###
In probabilistic grammatical inference, a usual goal is to infer a good approximation of an unknown distribution  SYMBOL  called a  stochastic language
The estimate of  SYMBOL  stands in some class of probabilistic models such as probabilistic automata (PA)
In this paper, we focus on probabilistic models based on multiplicity automata (MA)
The stochastic languages generated by MA are called  rational stochastic languages ; they strictly include stochastic languages generated by PA; they also admit a very concise canonical representation
Despite the fact that this class is not recursively enumerable, it is efficiently identifiable in the limit by using the algorithm DEES, introduced by the authors in a previous paper
However, the identification is not proper and before the convergence of the algorithm, DEES can produce MA that do not define stochastic languages
Nevertheless,  it is possible to use these MA to define stochastic languages
We show that they belong to a broader class of rational series, that we call  pseudo-stochastic rational languages
The aim of this paper is twofold
First we provide a theoretical study of pseudo-stochastic rational languages, the languages output by DEES, showing for example that this class is decidable within polynomial time
Second, we have carried out a lot of experiments in order to compare DEES to classical inference algorithms such as ALERGIA and MDI
They show that DEES outperforms them in most cases {Keywords } pseudo-stochastic rational languages, multiplicity automata, probabilistic grammatical inference
### introduction ###
In probabilistic grammatical inference, we often consider stochastic languages which define distributions over  SYMBOL , the set of all the possible words over an alphabet  SYMBOL
In general, we consider an unknown distribution  SYMBOL   and the goal is to find a good approximation given a finite sample of words independently drawn from  SYMBOL
The class of probabilistic automata (PA) is often used for modeling such distributions
This class has the same expressiveness as Hidden Markov Models and is identifiable in the limit~ CITATION
However, there exists no efficient algorithm for identifying PA
This can be explained by the fact that there exists no canonical representation of these automata which makes it difficult to correctly identify the structure of the target
One solution is to focus on subclasses of PA such as probabilistic deterministic automata~ CITATION  but with an important lack of expressiveness
Another solution consists in considering the class of multiplicity automata (MA)
These models admit a canonical representation which offers good opportunities from a machine learning point of view
MA define functions that compute rational series with values in  SYMBOL ~ CITATION
MA are a strict generalization of PA and the stochastic languages generated by PA are special cases of rational stochastic languages
Let us denote by  SYMBOL  the class of rational stochastic languages computed by MA with parameters in  SYMBOL  where  SYMBOL
With  SYMBOL  or  SYMBOL ,  SYMBOL  is exactly the class of stochastic languages generated by PA with parameters in  SYMBOL
But, when  SYMBOL  or  SYMBOL , we obtain strictly greater classes
This provides several advantages: Elements of  SYMBOL  have a minimal normal representation, thus elements of  SYMBOL  may have significantly smaller representation in  SYMBOL ; parameters of these minimal representations are directly related to probabilities of some natural events of the form  SYMBOL , which can be efficiently estimated from stochastic samples; lastly when  SYMBOL  is a field, rational series over  SYMBOL  form a vector space and efficient linear algebra techniques can be used to deal with rational stochastic languages
However, the class  SYMBOL  presents a serious drawback: There exists no recursively enumerable subset class of MA which exactly generates it~ CITATION
As a consequence, no proper identification algorithm can exist: indeed, applying a proper identification algorithm to an enumeration of samples of  SYMBOL  would provide an enumeration of the class of rational stochastic languages over  SYMBOL
In spite of this result, there exists an efficient algorithm, DEES, which is able to identify  SYMBOL  in the limit
But before reaching the target, DEES can produce MA that do not define stochastic languages
However, it has been shown in~ CITATION  that with probability one, for any rational stochastic language  SYMBOL , if DEES is given as input a sufficiently large sample  SYMBOL  drawn according to  SYMBOL , DEES outputs a rational series such that  SYMBOL  converges absolutely to 1
Moreover,  SYMBOL  converges to 0 as the size of  SYMBOL  increases
We show that these MA belong to a broader class of rational series, that we call  pseudo-stochastic rational languages
A pseudo-stochastic rational language  SYMBOL  has the property that  SYMBOL  is defined for any word  SYMBOL  and that  SYMBOL
A stochastic language  SYMBOL  can be associated with  SYMBOL  in such a way that  SYMBOL  when the sum  SYMBOL  is absolutely convergent
As a first consequence,  SYMBOL  when  SYMBOL  is a stochastic language
As a second consequence, for any rational stochastic language  SYMBOL , if DEES is given as input increasing samples drawn according to  SYMBOL , DEES outputs pseudo-stochastic rational languages  SYMBOL  such that  SYMBOL  converges to 0 as the size of  SYMBOL  increases
The aim of this paper is twofold: To provide a theoretical study of the class of pseudo-stochastic rational languages and a series of experiments in order to compare the performance of DEES to two classical inference algorithms: ALERGIA~ CITATION  and MDI~ CITATION
We show that the class of pseudo-stochastic rational languages is decidable within polynomial time
We provide an algorithm that can be used to compute  SYMBOL  from any MA that computes  SYMBOL
We also show how it is possible to simulate  SYMBOL  using such an automaton
We show that there exist pseudo-stochastic rational languages  SYMBOL  such that  SYMBOL  is not rational
Finally, we show that it is undecidable whether two pseudo-stochastic rational languages define the same stochastic language
We have carried out a lot of experiments which show that DEES outperforms ALERGIA and MDI in most cases
These results were expected since ALERGIA and MDI have not the same theoretical expressiveness and since DEES aims at producing a minimal representation of the target in the set of MA, which can be significantly smaller than the smaller equivalent PDA (if it exists)
The paper is organized as follows
In section 2, we introduce some background about multiplicity automata, rational series and stochastic languages and present the algorithm DEES
Section 3 deals with our study of pseudo-rational stochastic languages
Our experiments are detailed in Section 4
