lea module¶
- class Alea(vs, ps, normalization=True, prob_type=-1)¶
Bases:
LeaAn Alea instance is defined by given value-probability pairs, that is an explicit probability mass function (pmf).The probabilities can be expressed as any object with arithmetic semantic. The main candidates are float, fraction or symbolic expressions.An Alea instance is an “elementary pex”, as defined in the paper on Statues algorithm (see http://arxiv.org/abs/1806.09997 ).- P()¶
- returns the probability that self is True;the probability is expressed in the probability type used in self, possibly downcasted for convenience (Fraction -> ExtFraction, Decimal -> ExtDecimal);raises an exception if some value in the distribution is not Boolean (note that this is NOT the case with self.p(True))
- Pf()¶
- returns the probability that self is True;the probability is expressed as a float between 0.0 and 1.0;raises an exception if the probability type is no convertible to float;raises an exception if some value in the distribution is not Boolean (this is NOT the case with self.p(True))
- as_float(nb_decimals=6)¶
- returns a string representation of probability distribution self;it contains one line per distinct value, separated by a newline character; each line contains the string representation of a value with its probability expressed as decimal with given nb_decimals digits;if an order relationship is defined on values, then the values are sorted by increasing order, otherwise, an arbitrary order is used
- as_pct(nb_decimals=2)¶
- returns a string representation of probability distribution self;it contains one line per distinct value, separated by a newline character; each line contains the string representation of a value with its probability expressed as percentage with given nb_decimals digits;if an order relationship is defined on values, then the values are sorted by increasing order, otherwise, an arbitrary order is used
- as_string(kind=None, nb_decimals=None, chart_size=None, tabular=None, one_line=None)¶
- returns a string representation of probability distribution self;it contains by default one line per distinct value, separated by a newline character; each line contains the string representation of a value with its probability in a format depending of given kind, which is either None (default) or a string among ‘/’, ‘.’, ‘%’, ‘-’, ‘/-’, ‘.-’, ‘%-‘;the probabilities are displayed as:- if kind is None : as they are stored,- if kind[0] is ‘/’ : rational numbers “n/d” or “0” or “1”,- if kind[0] is ‘.’ : decimals with given nb_decimals digits,- if kind[0] is ‘%’ : percentage decimals with given nb_decimals digits,- if kind[0] is ‘-’ : histogram bar made up of repeated ‘-’, such that a bar length of chart_size represents a probability 1;if kind[1] is ‘-’, the histogram bars with ‘-’ are appended after numerical representation of probabilities;if the probability distribution has been created with ordered=True, then the values are ordered in the order of their definition, otherwise, if an order relationship is defined on values, then the values are sorted by increasing order; otherwise, an arbitrary order is used;if tabular is True and if values are tuples of same length, then these are represented in a tabular format (fixed column width); in the specific cases of named tuple, a header line is prepended with the field names;if one_line is True, then the values and probabilities are put on one single line, separated by commas;if some arguments are None or not specified, then they take the default values specified by call to set_display_options function
- cdf_dict()¶
- returns, after evaluation of the probability distribution self, the cumulative distribution function of self, as an OrderedDict with {v : P(x<=v)} pairs;the sequence follows the order defined on values
- cdf_tuple()¶
- returns, after evaluation of the probability distribution self, the cumulative distribution function of self, as a tuple with tuples (v,P(x<=v));the sequence follows the order defined on values
- static create_prob_obj(arg)¶
- static method, returns a probability object corresponding to the given arg:if arg is not a string, then it is returned as-is;if arg is a string, then it is tried to be interpreted as ExtDecimal, ExtFraction or sympy symbol, in that order; the object of the first successful type is returned;note: ExtDecimal allows for ‘%’ suffixes
- static create_prob_symbol(arg)¶
- static method;if given arg is a string, then returns a sympy Symbol, having arg as name, possibly embedded in parentheses if arg is not a valid identifier;otherwise, returns arg as-is (which could incidentally be a sympy Symbol)
- cross_entropy(lea1)¶
- static method, returns the cross-entropy between self and given lea1;the logarithm base is 2;requires: all values of lea1’s support have a non-null probability in self;the cross-entropy between two probability distributions p and q, over the same underlying set of events, measures the average number of bits needed to identify an event drawn from the set when the coding scheme used for the set is optimized for an estimated probability distribution q, rather than the true distribution p (see https://en.wikipedia.org/wiki/Cross-entropy );See also Lea.entropy;notes:- the cross-entropy is non-commutative,- the cross-entropy should always be greater than the entropy of first argument, the equality being reached if both arguments have same pmf; this is guaranteed by the implementation, even in case of rounding errors,- if self is interpreted as frequencies of observed data having N as total number of samples, then the cross-entropy is linked to (negative) log-likelihood by log-likelihood = - N * cross-entropy, using logarithm in base 2 (for other bases, use the right factor)
- cumul()¶
- returns a list with the probabilities p that self <= value;there is one element more than number of values; the first element is 0, then the sequence follows the order defined on values; if an order relationship is defined on values, then the tuples follows their increasing order; otherwise, an arbitrary order is used, fixed from call to call;implementation note: the returned list is cached
- draw_sorted_with_replacement(n)¶
- returns a new Alea instance representing the probability distribution of drawing n elements from self WITH replacement, whatever the order of drawing these elements; the returned values are tuples with n elements sorted by increasing order;assumes that n >= 0;note: the efficient combinatorial algorithm is due to Paul Moore
- draw_sorted_without_replacement(n)¶
- returns a new Alea instance representing the probability distribution of drawing n elements from self WITHOUT replacement, whatever the order of drawing these elements; the returned values are tuples with n elements sorted by increasing order;assumes that 0 <= n <= number of values of self;note: if the probability distribution of self is uniform, then the results is produced in an efficient way, thanks to the combinatorial algorithm of Paul Moore
- draw_with_replacement(n)¶
- returns a new Alea instance representing the probability distribution of drawing n elements from self WITH replacement, taking the order of drawing into account; the returned values are tuples with n elements put in the order of their drawing;assumes that n >= 0
- draw_without_replacement(n)¶
- returns a new Alea instance representing the probability distribution of drawing n elements from self WITHOUT replacement, taking the order of drawing into account; the returned values are tuples with n elements put in the order of their drawing;assumes that n >= 0;requires: n <= number of values of self, otherwise, an exception is raised
- entropy()¶
- returns the entropy of self in bits;the entropy of a random variable quantifies the average level of uncertainty or information associated with the variable’s potential states or possible outcomes. This measures the expected amount of information needed to describe the state of the variable, considering the distribution of probabilities across all potential states (see https://en.wikipedia.org/wiki/Entropy_(information_theory) );note: this is equivalent to the mean of information, taken on each value of self (see Lea.information_of);the returned type is a float or a sympy expression: if all probabilities are (convertible to) float, then the entropy is returned as a float; if any probability is a sympy expression, then the entropy is returned as a sympy expression;raises an exception if some probabilities are neither convertible to float nor a sympy expression
- static fast_extremum(cumul_func, *alea_args)¶
- cumul_func is the cumul function that determines whether max or min is used: respectively, Alea.p_cumul or Alea.p_inv_cumul;note: the method uses an efficient algorithm (linear complexity) due to Nicky van Foreest
- histo(size=100)¶
- returns a string representation of probability distribution self;it contains one line per distinct value, separated by a newline character; each line contains the string representation of a value with its probability expressed as a histogram bar made up of repeated ‘-’, such that a bar length of given size represents a probability 1;if an order relationship is defined on values, then the values are sorted by increasing order, otherwise, an arbitrary order is used
- information()¶
- returns the information of self being true, expressed in bits, assuming that self is a Boolean probability distribution;raises an exception if self is certainly false, i.e. P(self) == 1;the returned type is a float or a sympy expression: if all probabilities are (convertible to) float, then the information is returned as a float; if any probability is a sympy expression, then the information is returned as a sympy expression;raises an exception if some probabilities are neither convertible to float nor a sympy expression see also Lea.entropy;
- information_of(val)¶
- returns a float number representing the information of given val, expressed in bits;- log2(P(self == val));assuming that probability of val is (convertible to) float;if probability of val is a sympy expression, then the returned object is the information of val as a sympy expression;raises an exception if given val is impossible, i.e. absent from self support or having null probability;raises an exception if probability of given val is neither convertible to float nor a sympy expression
- internal(with_names=False, full=False, _indent='', _refs=None, _client_name_by_obj_dict=None)¶
- returns a string representing the inner definition of self; if the same lea child appears multiple times, it is expanded only on the first occurrence, the other ones being marked with reference id;this method is useful to display the “graphical model” or direct acyclic graph (DAG) that models the interdependencies of between Lea instances, which are otherwise hidden;the present method overloads Lea.internal, implementing the end of recursion since, by definition, Alea instances model independent random events, namely the DAG’s leaves;if full is False (default), then only the first element of Alea instances is displayed, otherwise, all elements are displayed; the other arguments are used only for recursive calls made in Lea.internal method, they can be ignored for a normal usage;if there is some active evidence context, the returned string shows, instead of self’s, the internals of a new Ilea instance embedding self and conditioned by the evidence context
- inv_cumul()¶
- returns a tuple with the probabilities p that self >= value;there is one element more than number of values; the first element is 0, then the sequence follows the order defined on values; if an order relationship is defined on values, then the tuples follows their increasing order; otherwise, an arbitrary order is used, fixed from call to call;note: the returned list is cached
- is_bindable(v)¶
- see Lea.is_bindable
- is_uniform()¶
- returns True if the probability distribution is uniform, False otherwise
- mean()¶
- returns the mean value of the probability distribution, which is the probability weighted sum of the values;requires:1) the values can be subtracted together,2) the differences of values can be multiplied by integers,3) the differences of values multiplied by integers can be added to the values,4) the sum of values calculated in 3) can be divided by a float or an integer;if any of these conditions is not met, then the result depends on the value class implementation (likely, exception raising)
- mean_f()¶
- same as Alea.mean method but with conversion to float or simplification of symbolic expression
- mode()¶
- returns a tuple with the value(s) of the probability distribution having the highest probability
- new(n=None, prob_type=-1, sorting=False, normalization=False)¶
- returns a new Alea instance, which represents the same probability distribution as self but for another event, independent from the event represented by self;* if n is not None, then a tuple containing n new independent Alea instances is returned;* if prob_type is -1, then the returned Alea instance is a shallow copy of self (values and probabilities data are shared), otherwise, the returned Alea instance has shared values data but has new probabilities converted according to prob_type (see lea.set_prob_type);* sorting allows sorting the value of the returned Alea instance (see `Alea.pmf`_ method);* normalization (default: False): if True, then each probability is divided by the sum of all probabilities;note: the present method overloads Lea.new to be more efficient
- norm_entropy()¶
- returns the normalized entropy of self (aka “efficiency”);it is calculated as the Lea.entropy divided by the logarithm (base 2) of number of distinct values in self’s support;note: it is the complement of “relative redundancy” Alea.rel_redundancy:the returned type is a float or a sympy expression: if all probabilities are (convertible to) float, then the normalized entropy is returned as a float between 0.0 and 1.0; if any probability is a sympy expression, then the normalized entropy is returned as a sympy expression;raises an exception if some probabilities are neither convertible to float nor a sympy expression;See also Lea.entropy;
- observe(v)¶
- see Lea.observe
- p_cumul(val)¶
- returns, as an integer, the probability that self <= val;note that it is not required that val is in the support of self
- p_inv_cumul(val)¶
- returns, as an integer, the probability that self >= val;note that it is not required that val is in the support of self
- p_sum()¶
- returns the sum of all probabilities of self;the result is expressed in the probability type used in self, possibly downcasted for convenience (Fraction -> ExtFraction, Decimal -> ExtDecimal);note: the result is supposed to be 1 (expressed in some type) BUT it could be different:- due to float rounding-errors,- due to an explicit normalization=False argument
- plot(title=None, fname=None, savefig_args={}, **bar_args)¶
- produces a matplotlib bar chart representing the probability distribution self with the given title (if not None); the bar chart may be customized by using named arguments bar_args, which are relayed to matplotlib.pyplot.bar function (see doc in http://matplotlib.org/api/pyplot_api.html );* if fname is None, then the chart is displayed on screen, in a matplotlib window; the previous chart, if any, is erased;* otherwise, the chart is saved in a file specified by given fname as specified by matplotlib.pyplot.savefig;the file format may be customized by using savefig_args argument, which is a dictionary relayed to matplotlib.pyplot.savefig function and containing named arguments expected by this function;example:flip.plot(fname=’flip.png’, savefig_args=dict(bbox_inches=’tight’), color=’green’);requires: matplotlib package is installed; otherwise, an exception is raised
- pmf_dict()¶
- returns, after evaluation of the probability distribution self, the probability mass function of self, as an OrderedDict with {v: P(v)} pairs;the sequence follows the order defined on values
- pmf_tuple()¶
- returns, after evaluation of the probability distribution self, the probability mass function of self, as a tuple with tuples (v,P(v));the sequence follows the order defined on values
- ps()¶
- returns a tuple with probability of self;the sequence follows the increasing order defined on values; if order is undefined (e.g. complex numbers), then the order is arbitrary but fixed from call to call
- random_draw(n=None, sorted=False)¶
- if n is None, then returns a tuple with all the values of the distribution, in a random order respecting the probabilities; (the higher probability of a value, the more likely the value will be in the beginning of the sequence);if n > 0, then only n different values will be drawn;if sorted is True, then the returned tuple is sorted;note: this uses pseudo-random number generator of Python’s random module; for having deterministic results from run to run, the random.seed method should be called before calling the present method
- random_val()¶
- returns a random value among the values of self, according to their probabilities;note: this uses pseudo-random number generator of Python’s random module; for having deterministic results from run to run, the random.seed method should be called before calling the present method
- rel_redundancy()¶
- returns the relative redundancy of self;note: it is the complement of normalized entropy (aka “efficiency”) Alea.norm_entropy:if all probabilities are (convertible to) float, then the rel_redundancy is returned as a float between 0.0 and 1.0;if any probability is a sympy expression, then the rel_redundancy is returned as a sympy expression;raises an exception if some probabilities are neither convertible to float nor a sympy expression;
- std()¶
- returns the standard deviation of the probability distribution;requires: the requirements of the Alea.var method are met
- std_f()¶
- same as Alea.std method but with conversion to float or simplification of symbolic expression
- support()¶
- returns a tuple with values of self the sequence follows the increasing order defined on values;if order is undefined (e.g. complex numbers), then the order is arbitrary but fixed from call to call
- var()¶
- returns the variance of the probability distribution;requires:1) the requirements of the Alea.mean method are met,2) the values can be subtracted to the mean value,3) the differences between values and the mean value can be squared;if any of these conditions is not met, then the result depends on the value implementation (likely, exception raising)
- class Blea(*ileas)¶
Bases:
LeaEach Ilea instance represents a given distribution <Vi, p(Vi|C)>, assuming a given condition C is verified in the sense of a conditional probability.The set of conditions shall form a partition of the “certain true”, i.e.- ORing all conditions shall give a “certain true” distribution,- ANDing all conditions pairwise shall give “certain false” distributions.A Blea instance is a “mixture pex”, as defined in the paper on Statues algorithm (see http://arxiv.org/abs/1806.09997 ).
- class Clea(*args)¶
Bases:
LeaIf the n events are independent, then P(<v1, … ,vn>) = P1(v1) x … x Pn(vn).A Clea instance is a “table pex”, as defined in the paper on Statues algorithm (see http://arxiv.org/abs/1806.09997 ).
- class Dlea(clea1)¶
Bases:
LeaThe arguments are coerced to Lea instances.A Dlea instance is a special case of “functional pex”, as defined in the paper on Statues algorithm (see http://arxiv.org/abs/1806.09997 ).
- class EvidenceCtx(*conditions, bindings=None)¶
Bases:
objectEvidenceCtx is a class which instance represents global conditions that can be activated or deactivated. At any given time, all active EvidenceCtx instances define implicit conditions that are enforced on all calculated probabilities; these become then automatically conditional probabilities (see Ilea class).An EvidenceCtx is defined by a sequence of conditions (boolean Lea instances) and bindings dictionary for binding Alea instances to given values (“observations”). Both data are optional. Semantically, these are combined together with a conjunction (AND); the bindings {x:v) are equivalent to conditions (x==v) but these can be treated faster, especially if x’s domain is large.An EvidenceCtx can be used as a Python context manager, using the ‘with’ keyword, making the activation/deactivation automatic.- activate()¶
- activate the evidence context
- deactivate()¶
- deactivate the evidence context
- class Flea(f, clea_args)¶
Bases:
LeaA Flea instance is a “functional pex”, as defined in the paper on Statues algorithm (see http://arxiv.org/abs/1806.09997 ).
- class Flea1(f, lea_arg)¶
Bases:
LeaThe function is applied on all values of the argument. This results in a new probability distribution for all the values returned by the function.A Flea1 instance is a “functional pex”, as defined in the paper on Statues algorithm (see http://arxiv.org/abs/1806.09997 ).
- class Flea2(f, arg1, arg2)¶
Bases:
LeaThe function is applied on all elements of the joint of the arguments. This results in a new probability distribution for all the values returned by the function.A Flea2 instance is a “functional pex”, as defined in the paper on Statues algorithm (see http://arxiv.org/abs/1806.09997 ).
- class Flea2a(f, arg1, arg2, absorber)¶
Bases:
Flea2The function is applied on all elements of the joint of the arguments. This results in a new probability distribution for all the values returned by the function.A Flea2a instance is a “functional pex”, as defined in the paper on Statues algorithm (see http://arxiv.org/abs/1806.09997 ).
- class Glea(clea_func_and_args)¶
Bases:
LeaGlea is a Lea subclass, which instance is defined by a Lea instance having functions as values applied on a given sequence of arguments. The arguments are coerced to Lea instances. All functions are applied on all elements of cartesian product of all arguments (see Clea class). This results in a new probability distribution for all the values returned by calls to all the functions.A Glea instance is a “multi-functional pex”, as defined in the paper on Statues algorithm (see http://arxiv.org/abs/1806.09997 ).
- class Ilea(lea1, cond_leas)¶
Bases:
LeaBeside explicit creations, Ilea instances can be created automatically to enforce global conditions expressed in “evidence contexts” (see EvidenceCtx class).An Ilea instance is a “conditional pex”, as defined in the paper on Statues algorithm (see http://arxiv.org/abs/1806.09997 ).- gen_one_random_mc(nb_subsamples=1)¶
- generates one random value from the current probability distribution, WITHOUT precalculating the exact probability distribution (contrarily to Lea.random method);this obeys the “binding” mechanism, so if the same variable is referred multiple times in a given expression, then the same value will be yielded at each occurrence; before yielding the random value v, this value v is bound to the current instance; if the current calculation requires to get again a random value on the current instance, then the bound value is yielded; the instance is rebound to a new value at each iteration, as soon as the execution is resumed after the yield; the instance is unbound at the end;the actual random value is yielded by _gen_one_random_mc method, which is implemented in each Lea subclass;note: nb_subsamples is not used in Lea.gen_one_random_mc method; it is used in the overloaded Ilea.gen_one_random_mc method
- lr()¶
- returns a float giving the likelihood ratio (LR) of an ‘evidence’ E, which is self’s unconditional probability distribution, for a given ‘hypothesis’ H, which is self’s condition; it is calculated as;P(E | H) / P(E | not H);both E and H must be Boolean probability distributions, otherwise, an exception is raised;an exception is raised also if H is certainly true or certainly false
- class Lea¶
Bases:
objectLea is an abstract class representing discrete probability distributions.Each instance of concrete Lea’s subclasses (called simply a “Lea instance” in the following) represents a discrete probability distribution, which associates each value of a set of values with the probability that such value occurs.Lea instances can be combined in arithmetic expressions resulting in new Lea instances, by obeying the following rules:- Lea instances can be added, subtracted, multiplied and divided together, through +, -, *, /, // operators; the resulting distribution’s values and probabilities are determined by combination of operand’s values into sums of probability products (an operation known as ‘convolution’, for the addition case),- other supported binary arithmetic operators are power (**), modulo (%) and divmod function,- unary operators +, - and abs function are supported also,- the Python’s operator precedence rules, with the parenthesis overrules, are fully respected,- any object X, which is not a Lea instance, involved as argument of an expression containing a Lea instance, is coerced to a Lea instance having X has sole value, with probability 1 (i.e. occurrence of X is certain),- Lea instances can be compared together, through ==, !=, <, <=, >, >= operators, the resulting distribution is a Boolean probability distribution, giving probability of True result and complementary probability of False result,- Boolean probability distributions can be negated through ~ operator, and they can be combined together with AND, OR, XOR through &, |, ^ operators, respectively.WARNING: the Python’s ‘and’, ‘or’ and ‘not’ operators shall NOT be used on Lea instances; doing so will raise an exception because such operators are suppose to work on Python’s bool values, not on Boolean probability distributions; you should then replace, for any Lea instances lea1, lea2, “lea1 and lea2” by “lea1 & lea2”, “lea1 or lea2” by “lea1 | lea2”, “not lea1” by “~lea1”.WARNING: in Boolean expression involving arithmetic comparisons, the parenthesis shall be used, e.g. (lea1 < lea2) & (lea2 < lea3).WARNING: the augmented comparison (“a < lea2 < b”) expression shall NOT be used; this raises an exception (reason: it has the same limitation as ‘and’ Python’s operator).Lea instances can be used to generate random values, respecting the given probabilities. There are two Lea methods for this purpose:- Lea.random: calculates the exact probability distribution, then takes random values,- Lea.random_mc: takes random values from atomic probability distribution, then makes the required calculations (Monte-Carlo algorithm); this method is suited for complex distributions, when calculation of exact probability distribution is intractable; this could be used to provide an estimation of the probability distribution (see Lea.estimate_mc method).There are 13 concrete subclasses to Lea class, namely:Each Lea subclass represents a “definition” of discrete probability distribution, with its own data or with references to other Lea instances to be combined together through a given operation. Each Lea subclass defines what are the <value, probability> pairs or how they can be generated (see _gen_vp method implemented in each Lea subclass). The Lea class acts as a facade, by providing different constructors (static methods) to instantiate these subclasses, so it is usually not needed to instantiate Lea subclasses explicitly. Here is an overview on these subclasses, with their relationships. We indicate the equivalent type of “p-expression” (“pex” for short) as defined in the paper on the Statues algorithm (see reference below).- Alea (elementary pex) defines a probability mass function (“pmf”) defined by extension, i.e. explicit <value, P(value)> pairs,- Olea (elementary pex) defines a binomial probability distribution,- Plea (elementary pex) defines a Poisson probability distribution,- Glea (multi-functional pex) applies n-ary functions present in a given Lea instance to a given sequence of n Lea instances,- Ilea (conditional pex) filters the values of a given Lea instance according to a given Lea instance representing a Boolean condition (conditional probabilities),Instances of Lea subclasses other than Alea represent prob. distributions obtained by operations done on existing Lea instance(s). Any such instance forms a direct acyclic graph (DAG) structure, having other Lea instances as nodes and Alea instances as leaves. This uses “lazy evaluation”: actual <value, probability> pairs are calculated only at the time they are required (e.g. display, query probability of a given value, etc); then, these are aggregated in a new Alea instance. This Alea instance is then cached, as an attribute of the queried Lea instance, for speeding up next queries.Tlea, Slea and Blea may be used to define Bayesian networks. Tlea class is the closest to CPT concept since it stores the table in a dictionary. Slea allows to define CPT by means of a function, which could be more compact to store than an explicit table; it may be useful in particular for noisy-or and noisy-max models.Short design notes:- Lea uses the “template method” design pattern: the Lea base abstract class calls the following methods, which are implemented in each Lea’s subclass:_get_lea_children, _clone_by_type, _gen_vp, _gen_one_random_mc and _em_step,- excepting the aforementioned Lea.estimate_mc method, Lea performs EXACT calculation of probability distributions,- it implements an original algorithm, called the “Statues” algorithm, by reference to the game of the same name, this uses a variable binding mechanism that relies on Python’s generators. To learn more, you may read the paper “Probabilistic inference using generators - the Statues algorithm”, freely available on http://arxiv.org/abs/1806.09997 ; the heart of the algorithm is implemented in Lea._gen_bound_vp method (aka GENATOMS in the paper) and <X>lea._gen_vp methods implemented in Lea’s <X>lea subclasses (aka GENATOMSBYTYPE in the paper); the final collection and condensation is done by Lea.calc method (aka MARG in the paper), which uses llea.pmf method- P()¶
- evaluates the probability distribution self, then returns the probability that self is True;the probability is expressed in the probability type used in self, possibly downcasted for convenience (Fraction -> ExtFraction, Decimal -> ExtDecimal);raises an exception if some value in the distribution is not Boolean (note that this is NOT the case with self.p(True))
- Pf()¶
- evaluates the probability distribution self, then returns the probability that self is True;the probability is expressed as a float between 0.0 and 1.0;raises an exception if the probability type is no convertible to float;raises an exception if some value in the distribution is not Boolean (this is NOT the case with self.p(True))
- as_depending_on(*inputs, _check=True)¶
- requires: instances given in inputs cannot refer to self instance, whether directly or indirectly;notes:- any provided inputs instance X is assumed to be used in the definition of self, otherwise, X has no impact on the result,- the CPT is calculated once for all by the present method,- the returned instance has no dependency to self but only dependencies to given inputs instances,- the method is specially useful when self is an Ilea instance, i.e.X.given(*constraints).as_depending_on(*inputs) for expressing a decision process based on input and constraints (see Lea.such_that method)
- as_float(nb_decimals=6)¶
- returns, after evaluation of the probability distribution self, a string representation of it;it contains one line per distinct value, separated by a newline character; each line contains the string representation of a value with its probability expressed as decimal with given nb_decimals digits;if an order relationship is defined on values, then the values are sorted by increasing order; otherwise, an arbitrary order is used;
- as_joint(*attr_names, create_vars=False)¶
- returns a new Flea instance by building named tuples from self, which is supposed to have n-tuples as values, using the n given attr_names;note: this is useful to access fields of joint probability distribution by names instead of indices;if create_vars is True (default: False), then variables with the attribute names are created in the dictionary passed to last call to lea.declare_namespace, typically by the call “lea.declare_namespace(globals())”
- as_pct(nb_decimals=1)¶
- returns, after evaluation of the probability distribution self, a string representation of it;it contains one line per distinct value, separated by a newline character; each line contains the string representation of a value with its probability expressed as percentage with given nb_decimals digits;if an order relationship is defined on values, then the values are sorted by increasing order; otherwise, an arbitrary order is used
- as_string(kind=None, nb_decimals=None, chart_size=None, tabular=None, one_line=None)¶
- returns, after evaluation of the probability distribution self, a string representation of probability distribution self;it contains by default one line per distinct value, separated by a newline character; each line contains the string representation of a value with its probability in a format depending of given kind, which is either None (default) or a string among ‘/’, ‘.’, ‘%’, ‘-’, ‘/-’, ‘.-’, ‘%-‘;the probabilities are displayed as:- if kind is None : as they are stored,- if kind[0] is ‘/’ : rational numbers “n/d” or “0” or “1”,- if kind[0] is ‘.’ : decimals with given nb_decimals digits,- if kind[0] is ‘%’ : percentage decimals with given nb_decimals digits,- if kind[0] is ‘-’ : histogram bar made up of repeated ‘-’, such that a bar length of chart_size represents a probability 1;if kind[1] is ‘-’, the histogram bars with ‘-’ are appended after numerical representation of probabilities;if the probability distribution has been created with ordered=True, then the values are ordered in the order of their definition, otherwise, if an order relationship is defined on values, then the values are sorted by increasing order; otherwise, an arbitrary order is used;if tabular is True and if values are tuples of same length, then these are represented in a tabular format (fixed column width); in the specific cases of named tuple, a header line is prepended with the field names;if one_line is True, then the values and probabilities are put on one single line, separated by commas;if some arguments are None or not specified, then they take the default values specified by call to set_display_options function
- assign(**value_by_name_dict)¶
- returns a new Alea instance, equivalent to self, where probabilities have been converted by applying subs(value_by_name_dict) on them; this is useful for assigning variables,e.g p=…, q=… when probabilities are expressed as sympy expressions using true variable identifiers, i.e. without whitespaces, commas, etc. (see doc of sympy.Expression.subs method);note: for using other options of sympy.Expression.subs, see Lea.subs instance method;requires: all self’s probabilities have a ‘subs’ method available
- build_bn_from_joint(*bn_definition)¶
- returns a named tuple of Lea instances (Alea or Tlea) representing a Bayesian network with variables stored in attributes A1, … , An, assuming that self is a Lea joint probability distribution having, as values, named tuples with the same set of attributes A1, … , An (such Lea instance is returned by Lea.as_joint method, for instance);each argument of given bn_definition represents a dependency relationship from a set of given variables to one given variable; this is expressed as a tuple (src_var_names, tgt_var_name) where src_var_names is a sequence of attribute names (strings) identifying ‘from’ variables and tgt_name is the attribute name (string) identifying the ‘to’ variable;the method builds up the ‘to’ variable of the BN as a CPT calculated from each combination of ‘from’ variables in the joint probability distribution: for each such combination C, the distribution of ‘to’ variable is calculated by marginalization on the joint probability distribution given the C condition;possible missing combinations are covered in an ‘else’ clause on the CPT that is defined as a uniform distribution of the values of ‘to’ variable, which are found in the other clauses (principle of indifference);the variables that are never referred as ‘to’ variable are considered as independent in the BN and are calculated by unconditional marginalization on the joint probability distribution;if a variable appears in more than one ‘to’ variable, then an exception is raised (error)
- calc(prob_type=-1, sorting=True, normalization=True, bindings=None, memoization=True, algo='EXACT', optimize=True, nb_samples=None, nb_subsamples=None, nb_tries=None, exact_vars=None, debug=False)¶
- returns a new Alea instance representing the distribution after it has been evaluated;the first three arguments allow customizing the Alea instance returned;* prob_type (default: -1): if -1, then the probability type is the same as self’s, otherwise, the probability type is defined using prob_type (see lea.set_prob_type);* normalization (default: True): if True, then each probability is divided by the sum of all probabilities before being stored; this division is essential to get correct results in case of conditional probabilities; setting normalization=False is useful: 1) to speed up if the caller guarantees that the probabilities sum is 1 or 2) to get non-normalized probabilities of a subset of a given probability distribution;the two following arguments change the problem statement;* bindings (default: None): if not None, it required to be a dictionary {a1: v1, a2: v2 ,… } associating some Alea instances a1, a2, … to specific values v1, v2, … of their respective domains; these Alea instances are then temporarily bound for calculating the resulting pmf; this offers an optimization over the self.given(a1==v1, a2==v2, …) construct: this last gives the same result but requires browsing the whole v1, v2, … domains, evaluating the given equalities; specifying the bindings argument requires that keys are all unbound Alea instances and that the bindings values are in the expected domains of associated keys;* memoization (default: True): if False, then no binding is performed by the algorithm, hence reference consistency is no more respected; this option returns WRONG results in all construction referring multiple times to the same instances (e.g. conditional probability and Bayesian reasoning); this option has no real use, excepting demonstrating by absurd the importance of memoization and referential consistency in the Statues algorithm; note that this option offers NO speedup when evaluating expressions not requiring referential consistency: such cases are already detected and optimized by the calculation preprocessing (see Lea._init_calc);the last arguments specify the evaluation algorithm and related options;* algo (default: EXACT): four algorithms are available;- EXACT: calculates the exact probability distribution using the Statues algorithm; for such choice, the arguments nb_samples, nb_subsamples, nb_tries, exact_vars shall not be used;- MCRS (Monte-Carlo Rejection Sampling): calculates an approximate probability distribution following the MC rejection sampling algorithm on nb_samples random samples; if self is an Ilea instance, i.e. evaluating a conditional probability x.given(e), then the algorithm may be speed up by specifying the nb_subsamples argument, which shall be a divisor of nb_samples; each time that the condition e is satisfied, nb_subsamples random samples are taken on the conditioned part x instead of a single one; specifying nb_subsamples is especially valuable if the condition has a small probability;- MCLW (Monte-Carlo Likelihood Weighting): this requires: self is an Ilea instance, i.e. a conditional probability x.given(e); this algorithm calculates an approximate probability distribution by making first an exact evaluation of the condition e using the Statues algorithm; then, for each binding that verifies the condition with some probability p, it makes nb_subsamples random samples on the conditioned part x, assigning a weight p to these samples; this algorithm is especially valuable if the condition has a small probability while its exact evaluation is tractable; this algorithm accepts also an optional exact_vars argument, to include given variables in the exact evaluation, beyond these already referred in the condition (see MCEV);- MCEV (Monte-Carlo Exact Variables): calculates an approximate probability distribution by making first an exact evaluation of the variables given in exact_vars using the Statues algorithm; for each binding found with some probability p, it makes nb_subsamples random samples on remaining (unbound) variables, assigning a weight p to these samples; MCEV algorithm cannot handle expressions under condition, i.e. x.given(e); MCLW shall be used instead;* optimize (default: true), considered only if algo=EXACT, if true then independent sub-DAG are searched in the DAG rooted by self; if such independent sub-DAG are found, then their roots are evaluated using EXACT algorithm and replaced by resulting Alea instances; for some DAG presenting inner tree patterns, this divide-and-conquer process may save a lot of calculations; putting optimize=False allows getting the behavior of Lea versions prior to 3.4.0. and highlighting the effect of non-optimization;* nb_samples (default: None): number of random samples made for MCRS algorithm;* nb_subsamples (default: None): only for MCRS and MCLW algorithms and if self is an Ilea instance, i.e. a conditional probability x.given(e); it specifies the number of random samples made on x for each binding verifying the condition e; for MCRS, nb_subsamples is optional, if specified it shall be a divisor of nb_samples; for MCLW, nb_subsamples is mandatory;* nb_tries (default: None): if not None, defines the maximum number of trials in case a random value is incompatible with a condition; this happens only if the current Lea instance is an Ilea instance x.given(e) or is referring to such instance; for MCLW algorithm on x.given(e), it only applies on x, should it refers to Ilea instances, since e is evaluated using the exact algorithm; if a condition cannot be satisfied after nb_tries tries, then an error exception is raised WARNING: if nb_tries is None, any infeasible condition shall cause an infinite loop;* exact_vars (default: None): only for MCEV algorithm: an iterable giving the variables referred in self that shall be evaluated by using the exact algorithm, the other ones being subject to random sampling;* debug (default: False): displays debug trace on standard output;on choosing the right algorithm and options…EXACT is the default algorithm; it is the recommended algorithm for all tractable problems; it allows in particular to work with probability fractions and symbols;for untractable problems, the three other algorithms offer fallback options;MCRS algorithm with sole nb_samples argument is the easiest option; choosing the value for nb_samples is a matter of trade-off between result accuracy and execution time; if the evaluated expression contains conditions having low probabilities, then the MCRS algorithm may be inefficient: as a rejection sampling algorithm, it may use most of processing time to find bindings satisfying the condition; for improving the efficiency, the nb_subsamples argument can be used: this allows making multiple random samples each time the condition is met, instead of a single one; the samples are generated until the condition has been satisfied n times, with n = nb_samples/nb_subsamples; for a given nb_samples, increasing nb_subsamples shall speed up the calculation; however, the result accuracy may tend to decrease if the condition is not visited enough (e.g. choosing nb_subsamples=nb_samples will satisfy the condition with only one binding);MCLW algorithm with mandatory nb_subsamples argument is the best choice if the evaluation of the conditioned part is untractable meanwhile the condition is tractable (whatever its probability); every binding verifying the condition is covered and, for each one, nb_subsamples random samples are generated, weighted by the binding’s probability;MCEV algorithm is suited for untractable problems, from which a set of variables v1, …, vn can be evaluated in a reasonable time or, in other words, if joint(v1, …, vn) is tractable; if this set of variables is specified in exact_vars argument, then all their value combinations are browsed systematically by the exact algorithm while random sampling is done for other variables
- cdf_dict()¶
- evaluates the probability distribution self, then returns, after evaluation of the probability distribution self, the cumulative distribution function of self, as an OrderedDict with {v : P(x<=v)} pairs;the sequence follows the order defined on values
- cdf_tuple()¶
- evaluates the probability distribution self, then returns, after evaluation of the probability distribution self, the cumulative distribution function of self, as a tuple with tuples (v,P(x<=v));the sequence follows the order defined on values
- clone(shared=(), n=None)¶
- returns a deep copy of current Lea, without any value binding;if n is not None, then a tuple containing n new instances is returned;all Lea instances are cloned, excepting the instances present in given iterable shared, these instances are shared between the cloned and the original instances;
- cond_entropy(other)¶
- returns the conditional entropy of self given other, expressed in bits;note that this value is also known as the equivocation of self about other;the returned type is a float or a sympy expression (see Lea.entropy);the conditional entropy should always be positive; this is guaranteed by the implementation, even in case of rounding errors
- cov(lea1)¶
- returns the covariance between self and given lea1 probability distributions;requires: that for self and lea1,1) the requirements of the Lea.mean method are met,2) the values can be subtracted to the mean value,3) the differences between values and the mean value can be multiplied together, if any of these conditions is not met, then the result depends on the value implementation (likely, exception raising)
- cov_f(lea1)¶
- same as Lea.cov method but with conversion to float or simplification of symbolic expression;
- cross_entropy(lea1)¶
- evaluates the distribution, then, returns the cross-entropy between self and given lea1;the logarithm base is 2;requires: all values of lea1’s support have a non-null probability in self;notes:- the cross-entropy is non-commutative,- the cross-entropy should always be greater than the entropy of first argument, the equality being reached if both arguments have same pmf; this is guaranteed by the implementation, even in case of rounding errors,- if self is interpreted as frequencies of observed data having N as total number of samples, then the cross-entropy is linked to (negative) log-likelihood by:log-likelihood = - N * cross-entropy;using logarithm in base 2 (for other bases, use the right factor)
- cumul()¶
- evaluates the distribution, then, returns a tuple with probabilities p that self <= value;the sequence follows the order defined on values (if an order relationship is defined on values, then the tuples follows their increasing order; otherwise, an arbitrary order is used, fixed from call to call;note: the returned value is cached
- draw(n, sorted=False, replacement=False)¶
- returns, after evaluation of the probability distribution self, a new Alea instance representing the probability distribution of drawing n elements from self;the returned values are tuples with n elements;* if sorted is True, then the order of drawing is irrelevant (i.e. all permutations are considered equivalent) and the tuples are arbitrarily sorted by increasing order, otherwise, the order of elements of each tuple follows the order of the drawing;* if replacement is True, then the drawing is made WITH replacement, so the same element may occur several times in each tuple, otherwise, the drawing is made WITHOUT replacement, so an element can only occur once in each tuple; this last case, requires: 0 <= n <= number of values of self, otherwise, an exception is raised;note: if the order of drawing is irrelevant, it is strongly advised to use sorted=True because the processing can be far more efficient thanks to a combinatorial algorithm proposed by Paul Moore; however, this algorithm is NOT used if replacement is False AND the probability distribution is NOT uniform
- em_step(model_lea, cond_lea, obs_pmf_tuple, conversion_dict)¶
- returns a revised version of self, with parameters tuned to match a given observed sample; this executes one step of the Expectation-Maximization (EM) algorithm;the arguments are:- model_lea: model in which self occurs, it shall match the observed data (see obs_pmf_tuple below),- cond_lea: condition involving variables of model_lea, to be verified in the returned instance,- obs_pmf_tuple: tuple containing the frequencies of the observed data in the form of tuples (vi, f(vi)); the set of vi values shall be a subset of the support of model_lea; in case of multiple variables observed, the vi could be tuples giving the values jointly observed,- conversion_dict: dictionary associating the variables of model_lea already converted by the present function, with their conversion; if self is not yet present in it as a key, then it is added, with the instance to be returned;the object returned has same type and same DAG structure as self, only the internal parameters may be different; if self is in conversion_dict, then the associated instance is returned without any further treatment;the method calls _em_step defined on each Lea subclass, this implements the required treatments for EM step for the specific instance, possibly calling recursively em_step on self’s child nodes
- entropy()¶
- evaluates the probability distribution self, then returns the entropy of self in bits;the entropy of a random variable quantifies the average level of uncertainty or information associated with the variable’s potential states or possible outcomes. This measures the expected amount of information needed to describe the state of the variable, considering the distribution of probabilities across all potential states (see https://en.wikipedia.org/wiki/Entropy_(information_theory) );note: this is equivalent to the mean of information, taken on each value of self (see Lea.information_of);the returned type is a float or a sympy expression: if all probabilities are (convertible to) float, then the entropy is returned as a float; if any probability is a sympy expression, then the entropy is returned as a sympy expression;raises an exception if some probabilities are neither convertible to float nor a sympy expression
- equiv(other)¶
- returns True iff self and other represent the same probability distribution, i.e. they have the same probability for each of their value;returns False otherwise;the probabilities are compared strictly (see Lea.equiv_f method for comparisons tolerant to rounding errors)
- equiv_f(other, rel_tol=1e-09, abs_tol=0.0)¶
- returns True iff self and other represent the same probability distribution, i.e. they have the same probability for each of their value;returns False otherwise;the probabilities are compared using the math.isclose function, in order to be tolerant to rounding errors
- estimate_mc(nb_samples, nb_tries=None)¶
- convenience method equivalent to:calc(algo=MCRS, nb_samples=nb_samples, nb_tries=nb_tries);returns an Alea instance, which is an approximate probability distribution following the MC rejection sampling algorithm on nb_samples random samples; the method is suited for complex distributions, when calculation of exact probability distribution is intractable; the larger the value of nb_samples, the better the returned estimation;if a condition cannot be satisfied after nb_tries tries, then an error exception is raised;WARNING: if nb_tries is None, any infeasible condition shall cause an infinite loop;note: this uses pseudo-random number generator of Python’s random module; for having deterministic results from run to run, the random.seed method should be called before calling the present method
- flat()¶
- free(check=True)¶
- unbind self, i.e. remove fixed value that has been previously bound by Lea.observe;if check is True, then requires that self is bound
- gen_em_steps(obs_lea, fixed_vars=())¶
- generates an infinite sequence of steps of Expectation-Maximization (EM) algorithm, yielding revised versions of a probabilistic model, with parameters tuned to match a given observed sample; this algorithm allows hidden variables in the model (i.e. absent from observed sample);the arguments are:- obs_lea: Lea instance giving the frequencies of the observed data; the support of obs_lea shall be a subset of the support of self; in case of n variables observed, self and obs_lea can be defined as joint tables, having tuples of size n as support,- fixed_vars (default: empty tuple): an iterable giving the variables that shall NOT be revised by the algorithm; in the returned model, these variables shall keep their initial parameters unchanged;the object yielded is a dictionary md associating self and inner self variables to the revised variables after each EM step, for any variable v present in self, md[v] has same type and same DAG structure as v, only the internal parameters may be different;the EM algorithm is iterative, supposedly converging to a Lea instance maximizing the likelihood of obs_lea; the caller is expected to stop iterations when some criteria are satisfied (see Lea.learn_by_em method for an example)
- gen_lea_descendants()¶
- gen_one_random_mc(nb_subsamples=1)¶
- generates one random value from the current probability distribution, WITHOUT precalculating the exact probability distribution (contrarily to Lea.random method);this obeys the “binding” mechanism, so if the same variable is referred multiple times in a given expression, then the same value will be yielded at each occurrence; before yielding the random value v, this value v is bound to the current instance; if the current calculation requires to get again a random value on the current instance, then the bound value is yielded; the instance is rebound to a new value at each iteration, as soon as the execution is resumed after the yield; the instance is unbound at the end;the actual random value is yielded by _gen_one_random_mc method, which is implemented in each Lea subclass;note: nb_subsamples is not used in Lea.gen_one_random_mc method; it is used in the overloaded Ilea.gen_one_random_mc method
- gen_random_mc(nb_samples, nb_subsamples=1, nb_tries=None)¶
- generates nb_samples random values from the current probability distribution, without precalculating the exact probability distribution (contrarily to Lea.random method);nb_tries, if not None, defines the maximum number of trials in case a random value is incompatible with a condition; this happens only if the conditioned part is itself an Ilea instance x.given(e) or is referring to such instance;nb_subsamples (default: 1) may greater than 1 only if self is an Ilea instance, i.e. a conditional probability x.given(e); it specifies the number of random samples made on x for each binding verifying the condition e; nb_subsamples shall be a divisor of nb_samples;
- get_alea(sorting=True)¶
- returns an Alea instance representing the distribution after it has been evaluated considering, if any, explicit bindings and evidence contexts;in most simple cases, the newly created Alea is cached: the evaluation occurs only for the first call
- get_certain_value()¶
- returns the value having probability 1;requires: such value exists (i.e. self.is_certain() returns True)
- get_inner_lea_set()¶
- returns a set containing all the Lea instances in the tree having the root self, including self itself;this calls _get_lea_children() method implemented in Lea’s subclasses;
- get_leaves_set(restricted=False)¶
- returns a set containing all the leaves in the DAG having the root self;this calls _get_lea_children() methods implemented in Lea’s subclasses;
- given(*evidences)¶
- returns a new Ilea instance representing self constrained by the given evidences, each of which is expected to be a Lea instance with Boolean values; assuming that the following assignments are made, starting from a Lea instance lea1:1) ilea1 = lea1.given(*evidences),2) lea1 = ilea1.calc();then, the values present in alea1 are those of self, and only those, compatible with the AND of the given evidences; the probabilities of alea1 are conditional probabilities that verify, for any value v:P(alea1==v) == P((self==v) & evidences) / P(evidences), provided that the evidences are ANDed (with the & operator); if the given evidences are unfeasible, i.e. if P(evidences) = 0, then an exception is raised at the time ilea1.calc() is called;notes:- the Lea.calc method calculates the resulting probability distribution in one single pass (contrarily to what the formula of conditional probability above may suggest): the division occurs as effect of the normalization of the yielded probabilities; for special explanatory needs, this division can be skipped by calling:ilea1.calc(normalization=False);- see also Lea.such_that method
- given_prob(*evidences, p)¶
- requires: if p is a number, the conjunction of given evidences are:- feasible if p > 0,- unfeasible if p = 0,- certain if p = 1, otherwise, an exception is raised;note: the argument p shall be passed by keyword: p=… ;note: since the returned object is an Alea instance, it is independent of self and any of its child Lea instances; in particular, x.given_prob(e, p=1) is not fully the same as x.given(e), and x.given_prob(e, p=0) is not fully the same as x.given(~e); the equivalence is plain however if get_alea() or calc(…) is called on the result of the given(..) call
- histo(size=100)¶
- returns, after evaluation of the probability distribution self, a string representation of it;it contains one line per distinct value, separated by a newline character; each line contains the string representation of a value with its probability expressed as a histogram bar made up of repeated ‘-’, such that a bar length of given size represents a probability 1;if an order relationship is defined on values, then the values are sorted by increasing order; otherwise, an arbitrary order is used
- information()¶
- evaluates the probability distribution self, then returns the information of self being true, expressed in bits, assuming that self is a Boolean probability distribution;raises an exception if self is certainly false, i.e. P(self) == 1;the returned type is a float or a sympy expression: if all probabilities are (convertible to) float, then the information is returned as a float; if any probability is a sympy expression, then the information is returned as a sympy expression;raises an exception if some probabilities are neither convertible to float nor a sympy expression see also Lea.entropy;
- information_of()¶
- evaluates the probability distribution self, then returns a float number representing the information of given val, expressed in bits;- log2(P(self == val));assuming that probability of val is (convertible to) float;if probability of val is a sympy expression, then the returned object is the information of val as a sympy expression;raises an exception if given val is impossible, i.e. absent from self support or having null probability;raises an exception if probability of given val is neither convertible to float nor a sympy expression
- internal(with_names=False, full=False, _indent='', _refs=None, name_by_obj_dict=None)¶
- returns a string representing the inner definition of self, with children leas recursively up to Alea leaves; if the same lea child appears multiple times, then it is expanded only on the first occurrence, the other ones being marked with reference id appended with a ‘*’ suffix;this method is useful to display the “graphical model” or direct acyclic graph (DAG) that models the interdependencies of between Lea instances, which are otherwise hidden;if with_names is False (default), then self and children are named basing on their Python’s ids; otherwise, these are tentatively named by inverted look-up in the dictionary {name: obj} passed to the lea.declare_namespace function;if full is False (default), then only the first element of Alea instances is displayed, otherwise, all elements are displayed;the other arguments are used only for recursive calls, they can be ignored for a normal usage;if there is some active evidence context, the returned string shows, instead of self’s, the internals of a new Ilea instance embedding self and conditioned by the evidence context;note: this method is overloaded in Alea.internal, so to end the recursion
- inv_cumul()¶
- evaluates the distribution, then, returns a tuple with the probabilities p that self >= value;the sequence follows the order defined on values (if an order relationship is defined on values, then the tuples follows their increasing order; otherwise, an arbitrary order is used, fixed from call to call;note: the returned value is cached
- is_any_of(*values)¶
- returns a Boolean probability distribution indicating the probability that self is any of the values passed as arguments;if some of given values are Lea instances, then these are treated consistently with self and other values
- is_bindable(v)¶
- is_bound()¶
- returns True iff self is currently bound
- is_certain()¶
- returns True iff there is only one possible value, having probability 1, False otherwise
- is_defined_using(other)¶
- returns True iff self is defined using other, whether directly or indirectly, i.e. iff other is self or one of its descendant in the DAG; this uses current evidence context, if any;note:if other is self, then True is returned, otherwise, x.is_defined_using(y) => not y.is_defined_using(x);since the ‘A’ in DAG means ‘Acyclic”;also, x.is_defined_using(y) => x.is_dependent_of(y)
- is_dependent_of(other)¶
- returns True iff self and other have potentially some dependency, i.e. if they share some Alea instance(s);this uses current evidence context, if any;note: the method is commutative:x.is_dependent_of(y) == x.is_dependent_of(y)
- is_feasible()¶
- returns True iff the value True has a non-null probability, False otherwise;raises exception if some value is not Boolean
- is_none_of(*values)¶
- returns a Boolean probability distribution indicating the probability that a value is none of the given values passed as arguments;if some of given values are Lea instances, then these are treated consistently with self and other values
- is_true()¶
- returns True iff the value True has probability 1, False otherwise;raises exception if some value is not Boolean
- is_uniform()¶
- returns, after evaluation of the probability distribution self:- True if the probability distribution is uniform,- False, otherwise
- kl_divergence(lea1)¶
- evaluates the distribution, then, returns the Kullback-Leibler divergence between self and given lea1;the logarithm base is 2;requires: all values of lea1’s support have a non-null probability in self;notes:- the KL divergence is also known as “relative entropy”,- the KL divergence is non-commutative,- the KL divergence should always be positive; it is null if both arguments have same pmf; this is guaranteed by the implementation, even in case of rounding errors
- learn_by_em(obs_lea, fixed_vars=(), nb_steps=None, max_kld=None, max_delta_kld=None)¶
- returns a revised version of a probabilistic model, with parameters tuned to match a given observed sample;this uses the Expectation-Maximization (EM) algorithm, which allows hidden variables in the model (i.e. absent from observed sample);the first two arguments are:- obs_lea: Lea instance giving the frequencies of the observed data; the support of obs_lea shall be a subset of the support of self; in case of n variables observed, self and obs_lea can be defined as joint tables, having tuples of size n as support,- fixed_vars (default: empty tuple): an iterable giving the variables that shall NOT be revised by the algorithm, if any; in the returned model, these variables shall keep their initial parameters unchanged;the object returned is a dictionary md associating self and inner self variables to the revised variables after each EM step, for any variable v present in self, md[v] has same type and same DAG structure as v, only the internal parameters may be different;the algorithm is iterative, supposedly converging to a Lea instance maximizing the likelihood of obs_lea; this is equivalently stated as maximizing log-likelihood, minimizing the cross-entropy or minimizing the Kullback-Leibler divergence;the exit condition can be specified in three different ways, defined by the last three arguments (at least one of them shall be not None):- nb_steps (int): the maximum number of iterations of EM algorithm,- max_kld (float): the EM algorithm halts as soon as the Kullback-Leibler divergence is lower or equal to this number; this indicates the degree of fit required; the smallest, the longest the execution,- max_delta_kld (float): the EM algorithm halts as soon as the difference in absolute value between cross entropy calculated on two consecutive iterations is equal or lower to this number; this is a convergence criterion; the smallest, the longest the execution;if more than one argument is not None, then any fulfilled halting conditions makes the EM algorithm halts
- lr(*hyp_leas)¶
- returns a float giving the likelihood ratio (LR) of an ‘evidence’ E, which is self, for a given ‘hypothesis’ H, which is the AND of given hyp_leas arguments; it is calculated as;P(E | H) / P(E | not H);both E and H must be Boolean probability distributions, otherwise, an exception is raised;an exception is raised also if H is certainly true or certainly false
- map(f, *args)¶
- returns a new Flea instance representing the distribution obtained by applying the given function f, taking values of self distribution as first argument and optional given args as following arguments;requires: f is an n-ary function with 1 <= n = len(args)+1;note: f can be also a Lea instance, with functions as values
- map_seq(f, *args)¶
- returns a new Flea instance representing the distribution obtained by applying the given function f on each element of each value of self distribution; optional given args are added as f’s following arguments;requires: f is an n-ary function with 1 <= n = len(args)+1;requires: self’s values are sequences;the values of returned distribution are tuples;note: f can be also a Lea instance, with functions as values
- mean()¶
- evaluates the probability distribution self, then returns the mean value of the probability distribution, which is the probability weighted sum of the values;requires:1) the values can be subtracted together,2) the differences of values can be multiplied by integers,3) the differences of values multiplied by integers can be added to the values,4) the sum of values calculated in 3) can be divided by a float or an integer;if any of these conditions is not met, then the result depends on the value class implementation (likely, exception raising)
- mean_f()¶
- evaluates the probability distribution self, then same as Alea.mean method but with conversion to float or simplification of symbolic expression
- merge(*lea_args)¶
- returns a new Blea instance, representing the merge of self and given lea_args, i.e.P(v) = (P1(v) + … + Pn(v)) / n;where P(v) is the probability of value v in the merge result, Pi(v) is the probability of value v in ((self,)+lea_args)[i]
- mode()¶
- evaluates the probability distribution self, then returns a tuple with the value(s) of the probability distribution having the highest probability
- nb_cases(bindings=None, memoization=True)¶
- returns the number of atomic cases evaluated to build the exact probability distribution;this provides a measure of the complexity of the probability distribution;* bindings argument: see Lea.calc method;* memoization argument: see Lea.calc method
- new(n=None, prob_type=-1, sorting=True, normalization=True)¶
- returns a new Alea instance representing the distribution after it has been evaluated;if self is an Alea, then it returns a copy of itself representing an independent event;the probability type used in the returned instance depends on given prob_type;* if n is not None, then a tuple containing n new independent Alea instances is returned;* if prob_type is -1, then the probability type is the same as self’s, otherwise, the probability type is defined using prob_type (see lea.set_prob_type);* normalization (default: True): if True, then each probability is divided by the sum of all probabilities;note: the present method is overloaded in Alea.new, to be more efficient
- norm_entropy()¶
- evaluates the probability distribution self, then returns the normalized entropy of self (aka “efficiency”);it is calculated as the Lea.entropy divided by the logarithm (base 2) of number of distinct values in self’s support;note: it is the complement of “relative redundancy” Alea.rel_redundancy:the returned type is a float or a sympy expression: if all probabilities are (convertible to) float, then the normalized entropy is returned as a float between 0.0 and 1.0; if any probability is a sympy expression, then the normalized entropy is returned as a sympy expression;raises an exception if some probabilities are neither convertible to float nor a sympy expression;See also Lea.entropy;
- observe(v)¶
- (re)bind self with given value v; if queried, self will yield v as a sole value, with probability 1, until it is rebound (Lea.observe) or freed (Lea.free);requires: v is present in the domain of self
- p(val)¶
- returns the probability of given value val
- p_sum()¶
- evaluates the probability distribution self, then returns the sum of all probabilities of self;the result is expressed in the probability type used in self, possibly downcasted for convenience (Fraction -> ExtFraction, Decimal -> ExtDecimal);note: the result is supposed to be 1 (expressed in some type) BUT it could be different:- due to float rounding-errors,- due to an explicit normalization=False argument
- plot(title=None, fname=None, savefig_args={}, **bar_args)¶
- produces, after evaluation of the probability distribution self, a matplotlib bar chart representing it with the given title (if not None); the bar chart may be customized by using named arguments bar_args, which are relayed to matplotlib.pyplot.bar function (see doc in http://matplotlib.org/api/pyplot_api.html );* if fname is None, then the chart is displayed on screen, in a matplotlib window; the previous chart, if any, is erased;* otherwise, the chart is saved in a file specified by given fname as specified by matplotlib.pyplot.savefig; the file format may be customized by using savefig_args argument, which is a dictionary relayed to matplotlib.pyplot.savefig function and containing named arguments expected by this function; example:flip.plot(fname=’flip.png’, savefig_args=dict(bbox_inches=’tight’), color=’green’);the method requires matplotlib package; an exception is raised if it is not installed
- pmf_dict()¶
- evaluates the probability distribution self, then returns, after evaluation of the probability distribution self, the probability mass function of self, as an OrderedDict with {v: P(v)} pairs;the sequence follows the order defined on values
- pmf_tuple()¶
- evaluates the probability distribution self, then returns, after evaluation of the probability distribution self, the probability mass function of self, as a tuple with tuples (v,P(v));the sequence follows the order defined on values
- random(n=None)¶
- evaluates the distribution, then, if n is None, returns a random value with the probability given by the distribution otherwise, returns a tuple of n such random values;note: this uses pseudo-random number generator of Python’s random module; for having deterministic results from run to run, the random.seed method should be called before calling the present method
- random_draw(n=None, sorted=False)¶
- evaluates the distribution, then, if n=None, then returns a tuple with all the values of the distribution, in a random order respecting the probabilities (the higher probability of a value, the most likely the value will be in the beginning of the sequence);if n > 0, then returns only n different drawn values;if sorted is True, then the returned tuple is sorted;note: this uses pseudo-random number generator of Python’s random module; for having deterministic results from run to run, the random.seed method should be called before calling the present method
- random_iter()¶
- evaluates the distribution, then, generates an infinite sequence of random values among the values of self, according to their probabilities;note: this uses pseudo-random number generator of Python’s random module; for having deterministic results from run to run, the random.seed method should be called before calling the present method
- random_mc(nb_samples=None, nb_tries=None)¶
- if nb_samples is None, returns a random value with the probability given by the distribution without precalculating the exact probability distribution (contrarily to Lea.random method), otherwise, returns a tuple of nb_samples such random values;WARNING: if nb_tries is None, any infeasible condition shall cause an infinite loop;note: this uses pseudo-random number generator of Python’s random module; for having deterministic results from run to run, the random.seed method should be called before calling the present method
- redundancy()¶
- evaluates the probability distribution self, then returns the relative redundancy of self;note: it is the complement of normalized entropy (aka “efficiency”) Alea.norm_entropy:if all probabilities are (convertible to) float, then the rel_redundancy is returned as a float between 0.0 and 1.0;if any probability is a sympy expression, then the rel_redundancy is returned as a sympy expression;raises an exception if some probabilities are neither convertible to float nor a sympy expression;
- reset()¶
- erases the Alea cache, so to force the recalculation at next call to get_alea();note: there is no need to call this method, except for freeing memory or for making cleanup after hacking private attributes of Lea instances assumed immutable
- revised_with_cpt(*clauses)¶
- returns an instance of Blea representing the conditional probability table (e.g. a node in a Bayesian network) from the given clauses; each clause is a tuple (condition,result) where condition is a Boolean or a Lea Boolean distribution and result is a value or Lea distribution representing the result assuming that condition is true;requires: the conditions from all clauses shall be mutually exclusive;requires: no clause contain None as condition;the ‘else’ clause is calculated so that the returned Blea if no condition is given is self
- sort_by(*ordering_leas)¶
- returns an Alea instance representing the same probability distribution as self but having values ordered according to given ordering_leas;requires: self doesn’t contain duplicate values, otherwise, an exception is raised; note that it is NOT required that all ordering_leas appear in self
- std()¶
- evaluates the probability distribution self, then returns the standard deviation of the probability distribution;requires: the requirements of the Alea.var method are met
- std_f()¶
- evaluates the probability distribution self, then same as Alea.std method but with conversion to float or simplification of symbolic expression
- subs(*args, **kwargs)¶
- returns a new Alea instance, equivalent to self, where probabilities have been converted by applying subs(*args) on them; this is useful for substituting variables when probabilities are expressed as sympy expressions (see doc of sympy.Expression.subs method);requires: all self’s probabilities have a ‘subs’ method available
- such_that(*constraints, inputs=None)¶
- returns an instance of Tlea representing a conditional probability table (CPT) where given inputs is an iterable of Lea instances giving the conditioning variables, and associating each value combination V of inputs to the values of self corresponding to V binding and conditioned by given constraints;each constraint is expected to be a Lea instance with Boolean values; assuming that the following assignments are made, starting from a Lea instance lea1:1) tlea1 = lea1.such_that(*constraints, inputs)2) alea1 = tlea1.calc() then, the values present in alea1 are those of self, and only those, compatible with the AND of the given constraints; if inputs argument is None, then the actual inputs is tentatively determined as a tuple with all direct children of constraints, excepting self; if self is present as descendant of any of the inputs, then an exception is raised, with a message asking to provide explicit inputs argument; provided that inputs is valid, the CPT is then defined as follows:- conditioning variable: joint(*inputs) conditioned by constraints,- decision table: dictionary with conditioning variable’s values as keys and Alea instances as values, corresponding to self bound to inputs and conditioned by constraints;the probabilities of alea1 verify, for any value v;P(alea1==v) == Sum ( P(inputs==i) P((self==v).given(*constraints, inputs==i)) );where i browses the values of joint(inputs).given(*constraints;requires: instances given in inputs cannot refer to self instance, whether directly or indirectly;notes:- unlike the Lea.given method, the instance returned by Lea.such_that is independent of self,- the constraints are used only locally, to build the CPT; unlike the Lea.given method, these are not “contaminating”,- if the given constraints are unfeasible, i.e. if P(constraints) = 0, then an exception is raised at the time Lea.such_that method is called (in Lea.given method, such exception is done when Lea.calc method is called),
- support()¶
- evaluates the probability distribution self, then returns a tuple with values of self the sequence follows the increasing order defined on values;if order is undefined (e.g. complex numbers), then the order is arbitrary but fixed from call to call
- switch(lea_dict, default_lea=[], prior_lea=[])¶
- if default_lea is given, then this defines the Lea instance associated to the value(s) of self missing in lea_dict;all dictionary’s values and default_lea (if defined) are coerced to Alea instances;requires: default_lea and prior_lea shall not be defined together;requires: if prior_lea is provided, a solution shall exist for default_lea;note: self is typically a joint of variables (a Clea instance), with lea_dict’s keys being the tuples found in this joint’s values
- switch_func(f)¶
- times(n, op=<built-in function add>, normalization=True)¶
- returns, after evaluation of the probability distribution self, a new Alea instance representing the current distribution operated n times with itself, through the given binary operator op;if n = 1, then a copy of self is returned;requires: n is strictly positive; otherwise, an exception is raised;if normalization is True (default), then each probability is divided by the sum of all probabilities;note that the implementation uses a fast dichotomous algorithm, instead of a naive approach that scales up badly as n grows;Warning: since the returned distribution is evaluated by repeating independent events that remain hidden, there is no referential consistency possible after “times” is called; in particular, it is useless to have the call chain xxx.times(…).given(…), meanwhile the opposite xxx.given(…).times(…) is perfectly sensible
- times_tuple(n)¶
- returns a new Alea instance with tuples of length n, containing the joint of self with itself repeated n times;note: equivalent to self.draw(n,sorted=False,replacement=True)
- var()¶
- evaluates the probability distribution self, then returns the variance of the probability distribution;requires:1) the requirements of the Alea.mean method are met,2) the values can be subtracted to the mean value,3) the differences between values and the mean value can be squared;if any of these conditions is not met, then the result depends on the value implementation (likely, exception raising)
- exception LeaError¶
Bases:
Exceptionexception representing any violation of requirements of Lea methods
- P(*bool_leas)¶
- returns the probability that given bool_leas are all True;the probability is expressed in the probability type used in bool_leas, possibly downcasted for convenience (Fraction -> ExtFraction, Decimal -> ExtDecimal);the bool_leas arguments are coerced to Lea instances, so that Python’s Boolean values are handled consistently, P(True) == 1 and P(False) == 0;requires: there is one argument at least;requires: all values in all bool_leas are Boolean (note that this is NOT the case with lea1.p(True))
- Pf(*bool_leas)¶
- returns the probability that given bool_leas are all True;the probability is expressed as a float between 0.0 and 1.0;the bool_leas arguments are coerced to Lea instances, so that Python’s Boolean values are handled consistently, in particular: P(True) == 1.0 and P(False) == 0.0;requires: there is one argument at least;requires: all values in all bool_leas are Boolean (note that this is NOT the case with lea1.p(True));requires: the resulting probability is convertible to float type
- class Rlea(lea_of_leas)¶
Bases:
LeaA Rlea instance is a “mixture pex”, as defined in the paper on Statues algorithm (see http://arxiv.org/abs/1806.09997 ).
- class Slea(lea_c, f)¶
Bases:
LeaA Slea instance is similar to a “table pex”, as defined in the paper on Statues algorithm (see http://arxiv.org/abs/1806.09997 ), if we replace the explicit CPT lookup table by a function.
- class Tlea(lea_c, lea_dict, default_lea=[])¶
Bases:
LeaA Tlea instance is a “table pex”, as defined in the paper on Statues algorithm (see http://arxiv.org/abs/1806.09997 ).
- add_evidence(*conditions, bindings=None)¶
- adds given evidences as a new evidence context;these evidences are Boolean Lea instances or coerced Booleans
- all_decreasing(*args)¶
- returns a new Flea2 instance that yields True iff all yielded values in args are in non-strict decreasing order or if args has 0 or 1 element;the evaluation is done from left to right, yielding False as soon as some yielded value is not in decreasing order (short-circuit evaluation)
- all_different(*args)¶
- returns a new Dlea instance that yields True iff all arg in args are different each from each other or if args has 0 or 1 element
- all_equal(*args)¶
- returns a new Flea2 instance that yields True iff- all values in args are equal,- or args has 0 or 1 element, the evaluation is done from left to right, yielding False as soon as some yielded value differs (short-circuit evaluation)
- all_false(*args)¶
- returns a new Flea2 instance that yields True iff- all elements of args yield False,- or args is empty, the evaluation is done from left to right, yielding False as soon as some arg yields True (short-circuit evaluation)
- all_increasing(*args)¶
- returns a new Flea2 instance that yields True iff all yielded values in args are in non-strict increasing order or if args has 0 or 1 element;the evaluation is done from left to right, yielding False as soon as some yielded value is not in increasing order (short-circuit evaluation)
- all_pairwise_verify(args, op)¶
- returns a new Flea2 instance that yields True iff- op(arg1, arg2) yields True for all pairs of subsequent values (arg1, arg2) in iterable args,- or args has 0 or 1 element, the evaluation is done from left to right, yielding False as soon as some op(arg1, arg2) yields False (short-circuit evaluation);requires: op is a 2-ary Boolean function
- all_strict_decreasing(*args)¶
- returns a new Flea2 instance that yields True iff all yielded values in args are in strict decreasing order or if args has 0 or 1 element;the evaluation is done from left to right, yielding False as soon as some yielded value is not in decreasing order (short-circuit evaluation)
- all_strict_increasing(*args)¶
- returns a new Flea2 instance that yields True iff all yielded values in args are in strict increasing order or if args has 0 or 1 element;the evaluation is done from left to right, yielding False as soon as some yielded value is not in increasing order (short-circuit evaluation)
- all_true(*args)¶
- returns a new Flea2 instance that yield True iff- all elements of args yield True,- or args is empty, the evaluation is done from left to right, yielding False as soon as some arg yields False (short-circuit evaluation)
- all_verify(arg1s, op, arg2)¶
- returns a new Flea2 instance that yields True iff- op(arg1, arg2) yields True for all arg1 in iterable args1,- or arg1s has 0 or 1 element, the evaluation is done from left to right, yielding False as soon as some op(arg1, arg2) yields False (short-circuit evaluation);requires: op is a 2-ary Boolean function
- any_false(*args)¶
- returns a new Flea2 instance that yields True iff- any element of args yields False,- or args is empty, the evaluation is done from left to right, yielding True as soon as some arg yields False (short-circuit evaluation)
- any_true(*args)¶
- returns a new Flea2 instance that yield True iff- any element of args yields True,- or args is empty, the evaluation is done from left to right, yielding True as soon as some arg yields True (short-circuit evaluation)
- any_verify(arg1s, op, arg2)¶
- returns a new Flea2 instance that yields True iff- op(arg1, arg2) yields True for some arg1 in iterable args1,- or arg1s has 0 or 1 element, the evaluation is done from left to right, yielding False as soon as some op(arg1, arg2) yields False (short-circuit evaluation);requires: op is a 2-ary Boolean function
- bernoulli(p, prob_type=None)¶
- returns an Alea instance representing a Bernoulli distribution giving 1 with probability p and 0 with probability 1-p;prob_type argument allows converting the given probability p:- -1: no conversion,- None (default): default conversion, as set by lea.set_prob_type,- other: see lea.get_prob_type
- binom(n, p, prob_type=None)¶
- returns an Olea instance representing a binomial distribution giving the number of successes among a number n of independent experiments, each having probability p of success;prob_type argument allows converting the given probability p:- -1: no conversion,- None (default): default conversion, as set by lea.set_prob_type,- other: see lea.get_prob_type;note: the binom method generalizes the bernoulli method: binom(1, p) is the same as bernoulli(p);
- clear_evidence()¶
- removes all evidence contexts
- coerce(value, prob_type=-1)¶
- returns a Lea instance corresponding to the given value;if prob_type is -1, then the returned Alea instance has integer 1 as probability, otherwise, the returned Alea instance has probability 1 converted according to prob_type (see lea.set_prob_type)
- cpt(*clauses, prior_lea=None, auto_else=False, check=True)¶
- returns an instance of Blea representing the conditional probability table (e.g. a node in a Bayesian network) from the given clauses;the conditions from all clauses shall be mutually exclusive;if a clause contains None as condition, then it is considered as an ‘else’ condition;the method supports three optional named arguments:‘prior_lea’, ‘auto_else’, ‘check’ and ‘ctx_type’;‘prior_lea’ and ‘auto_else’ are mutually exclusive, they require the absence of an ‘else’ clause (otherwise, an exception is raised);* if prior_lea argument is specified, then the ‘else’ clause is calculated so that the prior_lea is returned for the unconditional case;* if auto_else argument is specified as True, then the ‘else’ clause is calculated so that a uniform probability distribution is returned for the condition cases not covered in given clauses (principle of indifference);the values are retrieved from the results found in given clauses;* if check argument is specified as False, then NO checks are made on the given clauses (see below); this can significantly increase performances, as the set of clauses or variables become large;by default (check argument absent or set to True), checks are made on clause conditions to ensure that they form a partition:1) the clause conditions shall be mutually disjoint, i.e. no subset of conditions shall be true together,2) if ‘else’ is missing and not calculated through ‘prior_lea’ nor ‘auto_else’, then the clause conditions shall cover all possible cases, i.e. ORing them shall be certainly true;an exception is raised if any of such conditions is not verified;
- declare_namespace(obj_by_name)¶
- declare a {name: obj} dictionary, as prerequisite for calling Lea.as_joint (…, create_vars=True) method and Lea.internal (with_names=True, …) method;this dictionary can be withdrawn by passing None as argument
- dist_l1(lea1, lea2)¶
- returns the L1 distance between the pmf of given (coerced) lea instances;note: assuming that Lea instances are normalized, the result is between 0 (iff lea1 and lea2 have same pmf) and 2 (iff lea1 and lea2 have disjoint supports)
- dist_l2(lea1, lea2)¶
- returns the L2 distance between the pmf of given (coerced) lea instances;note: assuming that Lea instances are normalized, the result is between 0 (iff lea1 and lea2 have same pmf) and sqrt(2) (iff lea1 and lea2 have disjoint singleton supports)
- event(p, prob_type=None)¶
- returns an Alea instance representing a Boolean probability distribution giving True with probability p and False with probability 1-p;prob_type argument allows converting the given probability p:* -1: no conversion,* None (default): default conversion, as set by lea.set_prob_type,* other: see lea.get_prob_type
- evidence¶
alias of
EvidenceCtx
- func_wrapper(f)¶
- returns a wrapper function on given f function, mimicking f with Lea instances as arguments;the returned wrapper function has the same number of arguments as f and expects for argument #i- either an object of the type expected by f for argument #i,- or a Lea instance with values of that type;the returned wrapper function, when called, returns a Lea instance having values of the type returned by f;note: Lea.func_wrapper can be used as a function decorator
- get_active_conditions()¶
- returns a tuple with active EvidenceCtx instances
- get_prob_type(prob_type)¶
- returns the class or function associated to given code, this class or function is applied to convert each probability given in an Alea constructor method;if prob_type is -1, then None is returned;if prob_type is a callable object, then it is returned as-is;if prob_type is None, then current prob_type configured by lea.set_prob_type is returned, otherwise, the given prob_type is a code interpreted as follows:- ‘f’ -> float (instance of Python’s float),- ‘d’ -> decimal (instance of Python’s decimal.Decimal),- ‘r’ -> rational (instance of Python’s fractions.Fraction),- ‘s’ -> symbolic (instance of a sympy Symbol) - see Alea.create_prob_symbol method,- ‘x’ -> any: if probability given in a string, then determines the type from it (decimal, rational or symbol) and converts into that type, otherwise, takes the object as-is - see Alea.create_prob_obj method;requires: prob_type is -1 or None or a callable or a code among the ones given above
- has_evidence()¶
- returns True iff there is at least one active evidence context
- if_(cond_lea, then_lea, else_lea=[], prior_lea=[])¶
- returns an instance of Tlea representing the conditional probability table (CPT) giving:- then_lea if cond_lea is true,- else_lea otherwise;if else_lea is defined this method is equivalent to:cond_lea.switch({True: then_lea, False: else_lea});if else_lea is undefined then prior_lea provides the Lea instance representing the prior (unconditional) probabilities; this is used then to calculate the missing else_lea;requires: either else_lea or prior_lea argument shall be provided;requires: if prior_lea is provided, a solution shall exist for else_lea
- interval(from_val, to_val, prob_type=None)¶
- returns an Alea instance representing a uniform probability distribution for all the integers in the interval [from_val, to_val] (inclusive!);the given prob_type, if not None, allows using a probability type different from the default one (float or any one set by lea.set_prob_type);
- joint(*args)¶
-
note: if the n events are independent, then P(<v1, … ,vn>) = P1(v1) x … x Pn(vn)
- joint_entropy(*args)¶
- returns the joint entropy of arguments, expressed in bits;the returned type is a float or a sympy expression (see Lea.entropy)
- max_of(*args, fast=False)¶
- returns a Lea instance giving the probabilities to have the maximum value of each combination of the given args;if optional argument fast is False (default), then returns a Flea2 instance or, if there is only one argument in args, this argument unchanged, the returned distribution keeps dependencies with args but the calculation could be prohibitively slow (exponential complexity);if optional argument fast is True, then returns an Alea instance, the method uses an efficient algorithm (linear complexity), which is due to Nicky van Foreest, however, unlike most of Lea methods, the distribution returned loses any dependency with given args; this could be important if some args appear in the same expression as Lea.max_of (…) but outside it, e.g. conditional probability expressions;requires at least one argument in args
- min_of(*args, fast=False)¶
- returns a Lea instance giving the probabilities to have the minimum value of each combination of the given args;if optional argument fast is False (default), then returns a Flea2 instance or, if there is only one argument in args, this argument unchanged, the returned distribution keeps dependencies with args but the calculation could be prohibitively slow (exponential complexity);if optional argument fast is True, then returns an Alea instance, the method uses an efficient algorithm (linear complexity), which is due to Nicky van Foreest; however, unlike most of Lea methods, the distribution returned loses any dependency with given args; this could be important if some args appear in the same expression as Lea.min_of (…) but outside it, e.g. conditional probability expressions;requires at least one argument in args
- mutual_information(lea1, lea2)¶
- returns the mutual information between given arguments, expressed in bits;the returned type is a float or a sympy expression (see Lea.entropy)
- pmf(arg, prob_type=None, ordered=False, sorting=None, normalization=True, check=None)¶
- returns an Alea instance representing a probability distribution for a probability mass function specified by the given arg, which is:- either a dictionary { v1: p1, … , vn: pn },- or an iterable of pairs (v1, p1), … , (vn, pn);pi is the probability of occurrence of vi or a number proportional to it (see normalization argument below);in the iterable case, if the same value v occurs multiple times, then the associated p are summed together;* prob_type argument allows converting the given probabilities:- -1: no conversion,- None (default): default conversion, as set by lea.set_prob_type,- other: see lea.get_prob_type;the method admits three other optional Boolean argument (in kwargs);* ordered (default:False): if ordered is True, then the values for displaying the distribution or getting the values will follow the given order (requires: arg is an iterable or a collections.OrderedDict);* sorting (default:not ordered): if True, then the values for displaying the distribution or getting the values will be sorted if possible (i.e. no exception on sort); otherwise, the order of values is unspecified unless ordered=True;* normalization (default:True): if True, then each element of the given ps is divided by the sum of all ps before being stored (in such case, it’s not mandatory to have true probabilities for ps elements; these could be simple counters, for example);requires: all the given values vi are hashable;requires: prob_dict is not empty;requires: ordered and sorting are not set to True together
- poisson(mean, precision=1e-20)¶
- returns a Plea instance representing a Poisson probability distribution having the given mean;the distribution is approximated by the finite set of values that have probability > precision (i.e. low/high values with too small probabilities are dropped);the probabilities are stored as float, whatever the current probability type configured
- pop_evidence()¶
- removes the last added evidence context and returns it;requires: there is at least one evidence context
- read_bif_file(filename, create_vars=False)¶
- reads the file with given filename and parses it as a BIF file (Bayesian Interchange Format);returns a dictionary giving the created Lea instances by their names if create_vars is True (default: False), then these entries are created/updated in the dictionary passed to last call to lea.declare_namespace, typically by the call “lea.declare_namespace (globals())”;requires: filename refers to a readable valid BIF file;note: the BIF parsing is simplistic, it can probably parse all BIF files found in http://www.bnlearn.com/bnrepository but it is not guaranteed to be able to parse any valid BIF file; in particular, BIF comments are not treated; if the parsing fails, then an exception is raised
- read_csv_file(csv_file, col_names=None, dialect='excel', prob_type=None, sorting=True, ordered=False, normalization=True, create_vars=False, **fmtparams)¶
- returns an Alea instance representing the joint probability distribution of the data read in the given CSV file;if csv_file is a string, then it is interpreted as a filename, otherwise, csv_file is interpreted as a file object ready to be read;the arguments follow the same semantics as those of Python’s csv.reader method, which supports different CSV formats; see doc in http://docs.python.org/3/library/csv.html ;* if col_names is None, then the fields found in the first read row of the CSV file provides information on the attributes: each field is made up of a name, which shall be a valid identifier, followed by an optional 3-characters type code among- {b} -> Boolean- {i} -> integer- {f} -> float- {s} -> string- {#} -> count if the type code is missing for a given field, the type string is assumed for this field; for example, using the comma delimiter (default), the first row in the CSV file could be:name,age{i},heigth{f},married{b};* if col_names is not None, then col_names shall be a sequence of strings giving attribute information as described above, for example:(‘name’, ‘age{i}’, ‘heigth{f}’, ‘married{b}’), it assumed that there is NO header row in the CSV file;the type code defines the conversion to be applied to the fields read on the data lines; if the read value is empty, then it is converted to Python’s None, except if the type is string, then, the value is the empty string;if the read value is not empty and cannot be parsed for the expected type, then an exception is raised;for Boolean type, the following values (case insensitive):- ‘1’, ‘t’, ‘true’, ‘y’, ‘yes’ are interpreted as Python’s True,- ‘0’, ‘f’, ‘false’, ‘n’, ‘no’ are interpreted as Python’s False;the {#} code identifies a field that provides a count number of the row, representing the probability of the row or its frequency as a positive integer; such field is NOT included as attribute of the joint distribution; it is useful to define non-uniform probability distribution, as alternative to repeating the same row multiple times;if create_vars is True (default: False), then variables named as column names are created in the dictionary passed to last call to lea.declare_namespace, typically by the call “lea.declare_namespace(globals())”
- read_pandas_df(dataframe, index_col_name=None, create_vars=False, **kwargs)¶
- returns an Alea instance representing the joint probability distribution from the given pandas dataframe;the attribute names of the distribution are those of the column of the given dataframe; the first field in each item of the dataframe is assumed to be the index; its treatment depends on given index_col_name:if index_col_name is None, then this index field is ignored, otherwise, it is put in the joint distribution with index_col_name as attribute name;if create_vars is True (default: False), then variables named as the dataframe’s attribute names are created in the dictionary passed to last call to lea.declare_namespace, typically by the call “lea.declare_namespace(globals())”
- reduce_all(op, args, absorber=None, special=None)¶
- returns a new Flea2 instance that join the given iterable args with the given function op, from left to right;requires: op is a 2-ary function, accepting all elements of all args as arguments;if args is empty, then- if given special is None, an exception is raised,- otherwise, the given special is returned;if absorber is not None, then it is considered as a “left-absorber” value (i.e. op(absorber, x) = absorber); this activates a more efficient algorithm that prunes the tree search as soon as the absorber is met
- set_display_options(kind=-1, nb_decimals=-1, chart_size=-1, tabular=-1, one_line=-1)¶
- set default option values used for displaying probability distribution;see Alea.as_string method for a definition of each of the arguments;if some arguments are -1 or are not specified in the call, then these remain unchanged;if some arguments are None, then the corresponding options se are reset with following default values:* kind = None* nb_decimals = 6* chart_size = 100* tabular = True* one_line = False
- set_prob_type(prob_type)¶
- change the representation of probability values for newly created Lea instances, according to the given prob_type;if prob_type is a callable object, then it is set as such, otherwise, the given prob_type is a code interpreted as follows:- ‘f’ -> float (instance of Python’s float) - default,- ‘d’ -> decimal (instance of Python’s decimal.Decimal),- ‘r’ -> rational (instance of Python’s fractions.Fraction),- ‘s’ -> symbolic (instance of a sympy Symbol) - see Alea.create_prob_symbol method,- ‘x’ -> any: if probability given in a string, then determines the type from it (decimal, rational or symbol) and convert into it, otherwise, takes the object as-is - see Alea.create_prob_obj method;requires: a prob_type is a callable or a code among the ones given above
- vals(*values, prob_type=None, ordered=False, sorting=None, normalization=True, check=True)¶
- returns an Alea instance representing a distribution for the given values, so that each value occurrence is taken as equiprobable;if each value occurs exactly once, then the probability distribution is uniform, i.e. the probability of each value is equal to 1 / #values, otherwise, the probability of each value is equal to its frequency in the sequence;the optional arguments are: prob_type, ordered, sorting, normalization, check: see lea.pmf function;requires: values argument has at least one element
lea.leaf submodule¶
Helper functions and objects for getting dice and card decks as Alea instances
- dice(nb_dice, nb_faces=6, prob_type=None)¶
- returns an Alea instance representing the total value obtained by throwing nb_dice independent fair dice with faces marked from 1 to nb_faces
- dice_seq(nb_dice, nb_faces=6, sorted=True, prob_type=None)¶
- returns an Alea instance representing the individual results obtained by throwing nb_dice independent fair dice with faces marked from 1 to nb_faces (each value is a tuple with nb_dice elements);* if sorted is True (default), then the combinations of dice which are the same apart from order are considered equal; the particular value used is chosen to be in order from smallest to largest value* if sorted is False, then all nb_dice**nb_faces combinations are produced, with equal probabilities
- die(nb_faces=6, prob_type=None)¶
- returns an Alea instance representing the value obtained by throwing a fair die with faces marked from 1 to nb_faces
lea.leaf defines also the following objects, using fractions for probabilities:
D6represents the value obtained by throwing a fair die with 6 faces; it is defined bylea.leaf.die(6, prob_type='r')fliprepresents a True/False random variable with 50-50 probabilities; it is defined bylea.event('1/2', prob_type='r')card_suiteis a random one-character symbol representing a card suite among Spades, Hearts, Diamonds and Clubs; it is defined bylea.vals(*"SHDC", prob_type='r')card_rankis a one character-symbol representing a card rank among Ace, 2, 3, 4, 5, 6, 7, 8, 9, 10, Jack, Queen and King; it is defined bylea.vals(*"A23456789TJQK", prob_type='r')cardis a random two characters symbol representing a card having a rank # and a suite chosen in a standard deck of 52 cards; it is defined bycard_rank + card_suite
Note: following the definitions, card is interdependent of card_suite and card_rank, so these could be used to calculate conditional probabilities.
lea.markov submodule¶
Primitives for defining Markov chains
- class Chain(next_state_lea_per_state)¶
Bases:
objectA markov.Chain instance represents a Markov chain, with a given set of states and given probabilities of transition from state to state.Two instance attributes are public:- states: tuple containing the states of the Markov chain, without any probability,- state : markov.StateAlea instance representing an equiprobable distribution of states of the Markov chain; it can be used to express a condition to pass to Chain.state_given or markov.Chain.next_state_given methods- absorbing_mc_info(as_array=False)¶
- returns a tuple (is_absorbing, transient_states, absorbing_states, q_matrix, r_matrix, n_matrix) where:* is_absorbing is a Boolean telling whether the markov.Chain is absorbing,* transient_states is a tuple containing the t transient states,* absorbing_states is a tuple containing the r absorbing states,* q_matrix is the t x t probability matrix from transient to transient states,* r_matrix is the t x r probability matrix from transient to absorbing states,* n_matrix is the t x t fundamental matrix, defined as inv(I-q_matrix), calculated only if as_array is True (=None, otherwise);notes:- t > 0 iff the Markov chain is absorbing,- the returned states in transient_states and absorbing_states are ordered as defined in the ‘states’ attribute of the MC; the q_matrix and r_matrix matrices follow the same order,- if as_array is False (default), then the matrices are returned as a tuple of tuples; otherwise, they are returned as numpy arrays (an exception is raised if numpy is not installed)
- classmethod from_matrix(states, *trans_probs_per_state)¶
- class method returning a new markov.Chain (or subclass) instance from given arguments;states is a sequence of objects representing states (typically strings);trans_probs_per_state arguments contain the transition probabilities;there is one such argument per state, it is a tuple (state, trans_probs) where trans_probs is the sequence of probabilities of transition from state to each declared state, in the order of their declarations
- classmethod from_seq(state_seq)¶
- class method returning a new markov.Chain (or subclass) instance from given sequence of state objects; the probabilities of state transitions are set according to transition frequencies in the given sequence;if last state of state_seq does not occur elsewhere in state_seq, then this state is defined arbitrarily as an absorbing state (i.e. its next state is itself with probability 1)
- get_state(state_lea)¶
- returns a markov.StateAlea instance corresponding to the probability distribution given in state_lea;if state_lea is not a Lea instance, then it is assumed to be a certain state
- get_states()¶
- returns a tuple containing one markov.StateAlea instance per state declared in the markov.Chain, in the order of their declaration; each instance represents a certain, unique, state
- matrix(from_states=None, to_states=None, as_array=False)¶
- returns the probability matrix of transition from given iterable from_states to given iterable to_states;if from_states or to_states is None (default), then it is replaced by all states of markov.Chain (so, without arguments, the full transition matrix is returned);if as_array is False (default), then the matrix is returned as a tuple of tuples; otherwise, it is returned as a numpy array (an exception is raised if numpy is not installed)
- next_state(from_state=None, n=1, keeps_dependency=True)¶
- returns the markov.StateLea instance obtained after n transitions from an initial state defined by the given from_state, which is either a given certain state (coerced to Lea instance) or a Lea instance giving the probability distribution of states;if from_state is None, then the initial state is the uniform probability distribution of the declared states;* if keeps_dependency is True (default) and from_state is None or a Lea instance, then a markov.StateTlea instance is returned, keeping the dependency with from_state;* otherwise, a markov.StateAlea instance is returned, losing the dependency with from_state;the returned probability distribution is the same in the two cases, but putting keeps_dependency=False, can be useful to get a result if a stack overflow occurs for a too big n (the markov.StateTlea instance actually stores a DAG with depth close to n);requires: n >= 1
- next_state_given(cond_lea, n=1, keeps_dependency=True)¶
- returns the markov.StateLea instance obtained after n transitions from initial state defined by the state distribution verifying the given cond_lea, a Lea instance expressing a condition using the ‘state’ instance attribute;the returned instance is either a markov.StateTlea or a markov.StateAlea depending on given keeps_dependency argument -> see markov.Chain.next_state method for the meaning of this argument;requires: n >= 1
- reachable_states(from_state, _cur_reachable_states=None)¶
- returns a tuple containing the states that can be reached starting from the given from_state; the returned states are ordered as defined in the ‘states’ attribute of the MC
- state_given(*cond_leas)¶
- returns the markov.StateIlea instance verifying the given cond_lea, this last being a Lea instance expressing a condition using the ‘state’ instance attribute
- class StateAlea(state_lea, chain)¶
-
A markov.StateAlea instance represents a fixed probability distribution of states, for a given markov.Chain (see superclasses Alea and markov.StateLea)
- class StateIlea(lea1, cond_leas, chain)¶
-
A markov.StateIlea instance represents a probability distribution of states, defined by another distibutions under a given condition, for a given markov.Chain (see superclasses Ilea and markov.StateLea)
- class StateLea(chain)¶
Bases:
objectmarkov.StateLea is an abstract mixin class storing a markov.Chain instance;- gen_random_seq()¶
- generates an infinite sequence of random state objects, starting from self and obeying the transition probabilities defined in the chain
- next_state(n=1, keeps_dependency=True)¶
- returns the markov.StateLea instance obtained after n transitions from initial state self;the returned instance is either a markov.StateTlea or a markov.StateAlea depending on given keeps_dependency argument -> see markov.Chain.next_state method for the meaning of this argument;requires n >= 1
- random_seq(n)¶
- returns a tuple containing n state objects representing a random sequence starting from self and obeying the transition probabilities defined in the chain;note: this uses pseudo-random number generator of Python’s random module; for having deterministic results from run to run, the random.seed method should be called before calling the present method
- class StateTlea(lea_c, lea_dict, chain)¶
-
A markov.StateTlea instance represents a probability distribution of states, defined by a conditional probability table, for a given Markov chain (see superclasses Tlea and markov.StateLea)