### abstract ###
The goal of the present paper is to provide a systematic and comprehensive study of  rational stochastic languages  over a semiring  SYMBOL
A rational stochastic language is a probability distribution over a free monoid  SYMBOL  which is rational over  SYMBOL , that is which can be generated by a multiplicity automata with parameters in  SYMBOL
We study the relations between the classes of rational stochastic languages  SYMBOL
We define the notion of  residual  of a stochastic language and we use it to investigate properties of several subclasses of rational stochastic languages
Lastly, we study the representation of rational stochastic languages by means of multiplicity automata
### introduction ###
In probabilistic grammatical inference, data often arise in the form of a finite sequence of words  SYMBOL  over some predefined alphabet  SYMBOL
These words are assumed to be independently drawn according to a fixed but unknown probability distribution over  SYMBOL
Probability distributions over free monoids  SYMBOL  are called  stochastic languages
A usual goal in grammatical inference is to try to infer an approximation of this distribution in some class of probabilistic models, such as  probabilistic automata
A probabilistic automaton (PA) is composed of a  structure , which is a finite automaton (NFA), and  parameters  associated with states and transitions, which represent the probability for a state to be initial, terminal or the probability for a transition to be chosen
It can easily be shown that probabilistic automata have the same expressivity as Hidden Markov Models (HMM), which are heavily used in statistical inference~ CITATION
Given the structure  SYMBOL  of a probabilistic automaton and a sequence of words  SYMBOL , computing parameters for  SYMBOL  which maximize the likelihood of  SYMBOL  is NP-hard  CITATION
In practical cases however, algorithms based on the E M ( Expectation-Maximization ) method  CITATION  can be used to compute approximate values
On the other hand, inferring a probabilistic automaton (structure and parameters) from a sequence of words is a widely open field of research
Most results obtained so far only deal with restricted subclasses of PA, such as Probabilistic Deterministic Automata (PDA), i e probabilistic automata whose structure is deterministic (DFA) or Probabilistic Residual Automata (PRA), i e probabilistic automata whose structure is a residual finite state automaton (RFSA) CITATION
In other respects, it can be noticed that stochastic languages are particular cases of  formal power series  and that probabilistic automata are also particular cases of  multiplicity automata , notions which have been extensively studied in the field of formal language theory CITATION
Therefore, stochastic languages which can be generated by multiplicity automata are special cases of  rational languages
We call them  rational stochastic languages
The goal of the present paper is to provide a systematic and comprehensive study of  rational stochastic languages  so as to bring out properties that could be useful for a grammatical inference purpose
Indeed, considering the objects to infer as special cases of rational languages makes it possible to use the powerful theoretical tools that have been developed in that field and hence, give answers to many questions that naturally arise when working with them: is it possible to decide within polynomial time whether two probabilistic automata generate the same stochastic language
does allowing negative coefficients in probabilistic automata extend the class of generated stochastic languages
can a rational stochastic language which takes all its values in  SYMBOL  always be generated by a multiplicity automata with coefficients in  SYMBOL
and so forth
Also, studying  rational stochastic languages  for themselves, considered as objects of language theory, helps to bring out notions and properties which are important in a grammatical inference pespective: for example, we show that the notion of residual language (or derivative), so important for grammatical inference~ CITATION , has a natural counterpart for stochastic languages~ CITATION , which can be used to express many properties of classes of stochastic languages
Formal power series  take their values in a semiring  SYMBOL : let us denote by  SYMBOL  the set of all formal power series
Here, we only consider semirings  SYMBOL ,  SYMBOL ,  SYMBOL  and  SYMBOL
For any such semiring  SYMBOL , we define the set  SYMBOL  of rational stochastic languages as the set of stochastic languages over  SYMBOL  which are rational languages over  SYMBOL
For any two distinct semirings  SYMBOL  and  SYMBOL , the corresponding sets of rational stochastic languages are distinct
We show that  SYMBOL  is a Fatou extension of  SYMBOL  for stochastic languages, which means that any rational stochastic language over  SYMBOL  which takes its values in  SYMBOL  is also rational over  SYMBOL
However,  SYMBOL  is not a Fatou extension of  SYMBOL  for stochastic languages: there exists a rational stochastic language over  SYMBOL  which takes its values in  SYMBOL  and which is not rational over  SYMBOL
For any stochastic language  SYMBOL  over  SYMBOL  and any word  SYMBOL  such that  SYMBOL , let us define the residual language  SYMBOL  of  SYMBOL  with respect to  SYMBOL  by  SYMBOL : residual languages clearly are stochastic languages
We show that the residual languages of a rational stochastic language  SYMBOL  over  SYMBOL  are also rational over  SYMBOL
The residual subsemimodule  SYMBOL  of  SYMBOL  spanned by the residual languages of any stochastic language  SYMBOL  may be used to express the rationality of  SYMBOL :  SYMBOL  is rational iff  SYMBOL  is included in a finitely generated subsemimodule of  SYMBOL
But when  SYMBOL  is positive, i e SYMBOL  or  SYMBOL , it may happen that  SYMBOL  itself is not finitely generated
We study the properties of two subclasses of  SYMBOL : the set  SYMBOL  composed of rational stochastic languages over  SYMBOL  whose residual subsemimodule is finitely generated and the set  SYMBOL  composed of rational stochastic languages over  SYMBOL  which have finitely many residual languages
We show that for any of these two classes,  SYMBOL  is a Fatou extension of  SYMBOL : any stochastic language of  SYMBOL  (resp
of  SYMBOL ) which takes its values in  SYMBOL  is an element of  SYMBOL  (resp
of  SYMBOL )
We also show that for any element  SYMBOL  of  SYMBOL , there exists a unique minimal subset of residual languages of  SYMBOL  which generates  SYMBOL
Then, we study the representation of rational stochastic languages by means of multiplicity automata
We first show that the set of multiplicity automata with parameters in  SYMBOL  which generate stochastic languages is not recursive
Moreover, it contains no recursively enumerable subset capable to generate the whole set of rational stochastic languages over  SYMBOL
A stochastic language  SYMBOL  is a formal series which has two properties: (i)  SYMBOL  for any word  SYMBOL , (ii)  SYMBOL
We show that the undecidability comes from the first requirement, since the second one can be decided within polynomial time
We show that the set of stochastic languages which can be generated by probabilistic automata with parameters in  SYMBOL  (resp
SYMBOL ) exactly coincides with  SYMBOL  (resp
SYMBOL )
A probabilistic automaton  SYMBOL  is called a Probabilistic Residual Automaton (PRA) if the stochastic languages associated with its states are residual languages of the stochastic languages  SYMBOL  generated by  SYMBOL
We show that the set of stochastic languages that can be generated by probabilistic residual automata with parameters in  SYMBOL  (resp
SYMBOL ) exactly coincides with  SYMBOL  (resp
SYMBOL )
We do not know whether the class of PRA is decidable
However, we describe two decidable subclasses of PRA capable of generating  SYMBOL  when  SYMBOL  or  SYMBOL : the class of  SYMBOL -reduced PRA and the class of prefixial PRA
The first one provides minimal representation in the class of PRA but we show that the membership problem is PSPACE-complete
The second one produces more cumbersome representation but the membership problem is polynomial
Finally, we show that the set of stochastic languages that can be generated by probabilistic deterministic automata with parameters in  SYMBOL  (resp
SYMBOL ) exactly coincides with  SYMBOL , which is also equal to  SYMBOL  (resp
SYMBOL , which is also equal to  SYMBOL )
We recall some properties on rational series, stochastic languages and multiplicity automata in Section~
We define and study rational stochastic languages in Section~
The relations between the classes of rational stochastic languages are studied in Subsection~
Properties of the residual languages of rational stochastic languages are studied in Subsection~
A characterisation of rational stochastic languages in terms of stable subsemimodule is given in Subsection~
Classes  SYMBOL  and  SYMBOL  are defined and studied in Subsection~
The representation of rational stochastic languages by means of multiplicity automata is given in Section~
