sumu.gadget module
The module implements the algorithm Gadget as first detailed in [1].
- sumu.gadget.DBUG(msg)
- class sumu.gadget.Defaults
Bases:
object
- class sumu.gadget.EmptyDataScore(**kwargs)
Bases:
object
- candidate_score_array(C)
- clear_cache()
- complement_psets_and_scores(v, C, d)
- local(v, pset)
- class sumu.gadget.Gadget(*, data, validate_params=True, is_prerun=False, run_mode={}, mcmc={}, score={}, structure_prior={}, constraints={}, candidate_parent_algorithm={}, metropolis_coupling={}, catastrophic_cancellation={}, pruning_tolerance=None, scoresum_tolerance=None, candidate_parents_path=None, candidate_parents=None, logging={})
Bases:
object
Class implementing the Gadget pipeline for MCMC sampling from the structure posterior of DAG models. The user interface consists of:
Constructor for setting all the parameters.
sample()
method which runs the MCMC chain and returns the sampled DAGs along with meta data.
All the constructor arguments are keyword arguments, i.e., the data argument should be given as
data=data
, etc. Only the data argument is required; other arguments have some more or less sensible defaults.There are many parameters that can be adjusted. To make managing the parameters easier, most of them are grouped into dictionary objects around some common theme.
In this documentation nested parameters within the dictionaries are referenced as outer:inner (e.g.,
silent
inlogging={'silent': True}
, corresponds to logging:silent) or as:outer
inner
Some of the parameters have the keys name and params (i.e.,
foo={'name': 'bar', 'params': {'baz': 'qux'}}
), params being a dictionary the structure of which depends on the value of name. When describing such a parameter the following style is used:foo
Default:
bar
name:
bar
General description of
bar
.params
baz: Description of the parameter baz.
To only adjust some parameter within a dictionary argument while keeping the others at default, it suffices to set the one parameter. For example, to set the number of candidate parents constraints:K to some value \(K\) but let the maximum size of arbitrary parent sets constraints:d be determined automatically you should construct the constraints parameter as
>>> Gadget(data=data, constraints={"K": K}).
The following describes all Gadget constructor parameters.
data
Data to run the MCMC simulation on. Accepts any valid constructor argument for a
Data
object.run_mode
Which mode to run Gadget in.
Default:
budget
.name:
normal
MCMC simulation is run for a predetermined number of iteration steps.
params
n_target_chain_iters: For how many iterations to run the target chain (i.e., the unheated chain; see metropolis_coupling).
name:
budget
MCMC simulation is run until a given time budget is used up. A fraction of the time budget is allocated for precomputations, with the remainder being used on the Markov chain simulation itself.
There are three precomputation phases:
Selecting candidate parents.
Building score sum data structures given the candidate parents.
Computing scores of the parent sets for which we allow nodes outside of the candidates.
The parameters principally governing the time use for each part are, correspondingly, the number of nodes to add at the final step of the greedy candidate selection algorithm (candidate_parent_algorithm:params:K_f), the number of candidate parents (constraints:K), and the maximum size of the parent sets for which nodes outside the chosen candidates are permitted (constraints:d). In budget mode, these parameters are set automatically, so as to use a predetermined fraction of the budget on each precomputation phase.
With big time budgets the automatically selected parameters might result in infeasibly large memory requirements. To avoid this there is an additional memory budget parameter to cap constraints:K and constraints:d.
If any of the parameters constraints:K, constraints:d or candidate_parent_algorithm:params:K_f are explicitly set, that parameter will not be programmatically adjusted. The parameters constraints:min_K and constraints:min_d can be used to set the minimum values for constraints:K and constraints:d, respectively.
params
t: Time budget in seconds.
Default: Roughly sensible amount of time as a function of data shape.
mem: Memory budget in megabytes.
Default: Amount of memory available.
t_share: Dictionary with the keys
C
,K
, andd
corresponding to the above precomputation phases 1-3, respectively, with float values determining the fraction of the overall budget to use on the particular phase.Default:
{'C': 1 / 9, 'K': 1 / 9, 'd': 1 / 9}
name:
anytime
If ran in this mode the first
CTRL-C
after callingsample()
stops the burn-in phase and starts sampling DAGs. The secondCTRL-C
stops the sampling. DAG sampling first accumulates up to 2 \(\cdot\) mcmc:n_dags - 1 DAGs with thinning 1 (i.e., a DAG is sampled for each sampled root-partition), then each time the number of DAGs reaches 2 \(\cdot\) mcmc:n_dags the thinning is doubled and every 2nd already sampled DAG is deleted.
mcmc
General Markov Chain Monte Carlo arguments.
initial_rootpartition
The root-partition to initialize the MCMC chain(s) with. If not set the chains will start from a random state.
The root-partition format is a list partitioning integers 0..n to sets.
Default: Not set.
n_independent: Number of independent chains to run. For each independent chain there will be metropolis_coupling:M coupled chains run in logical parallel. The DAGs are sampled evenly from each unheated chain (see metropolis_coupling).
Values greater than 1 are mostly useful for analyzing mixing.
Default: 1
burnin: The fraction of run_mode:params:n_target_chain_iters (run_mode
normal
), or the fraction of time budget remaining after precomputations (run_modebudget
) to use on burn-in.Default: 0.5
n_dags: The (approximate) number of DAGs to sample.
Default: 10 000
move_weights: Dictionary with the keys
R_split_merge
,R_swap_node_pair
andDAG_edge_reversal
. The values are integer weights specifying the unnormalized probability distribution from which the type of each proposed move is sampled.Note that when employing Metropolis coupling the edge reversal move is only available for the unheated chain (in heated chains its weight will be 0), and that the state swap move is proposed uniformly at random for any two adjacent chains always after each chain has first progressed by one step (i.e., its proposal probability cannot be adjusted).
Default:
{'R_split_merge': 1, 'R_swap_node_pair': 1, 'DAG_edge_reversal': 2}
score
The score to use.
Default:
bdeu
for discrete andbge
for continuous data.name:
bdeu
Bayesian Dirichlet equivalent uniform.
params
ess: Equivalent sample size.
Default: 10
name:
bge
Bayesian Gaussian equivalent.
structure_prior
The modular structure prior to use. Modular structure priors are composed as a product of local factors, i.e., the prior for graph \(G\) factorizes as
\[P(G) \propto \prod_{i=1}^{n}\rho_i(\mathit{pa}(i)),\]where \(\rho_i\) are node specific factors, and \(\mathit{pa}(i)\) are the parents of node \(i\) in \(G\).
Two types of factors are implemented, dubbed
fair
andunif
. See [2].Default:
fair
.name:
fair
Balances the probabilities of different indegrees (i.e., of \(|\mathit{pa}(i)|\)), with the factors taking the form
\[\rho_i(S) = 1\big/\binom{n-1}{|S|}.\]name:
unif
Uniform over different graphs, i.e., the factors are simply
\[\rho_i(S) = 1.\]
constraints
Constraints on the explored DAG space.
K: Number of candidate parents per node.
Default: \(\min(n-1,16)\), where \(n\) is the number of nodes (run_mode
normal
oranytime
), or parameter not set (run_modebudget
).K_min: Sets the minimum level for K if using run_mode
budget
.Default: 1 (run_mode
budget
), or parameter not set (run_modenormal
oranytime
).d: Maximum size of parent sets for which nodes outside of the candidates are allowed.
Default: \(\min(n-1, 3)\), where \(n\) is the number of nodes (run_mode
normal
oranytime
) or parameter not set (run_modebudget
).d_min: Sets the minimum level for d if using run_mode
budget
.Default: 2 (run_mode
budget
), or parameter not set (run_modenormal
oranytime
).max_id: Maximum size of parent sets that are subsets of the candidate parents. Set to -1 for unlimited. There should be no reason to limit the size.
Default: -1
candidate_parent_algorithm
Algorithm to use for finding the candidate parents \(C = C_1C_2 \ldots C_n\), where \(C_i\) is the set of candidates for node \(i\).
Implemented algorithms are
greedy
,random
andoptimal
.Default:
greedy
name:
greedy
Iteratively, add a best node to the initially empty \(C_i\) (given the already added ones), until \(|C_i|=K-K_f\). Finally add the \(K_f\) best nodes.
params
association_measure: The measure for goodness of a candidate parent \(j\). One of
score
: \(\max_{S \subseteq C_i} \pi_i(S\cup\{j\})\)gain
: \(\max_{S \subseteq C_i} \pi_i(S\cup\{j\}) - \pi_i(S)\),
where \(\pi_i(S)\) is the local score of the parent set \(S\) for node \(i\).
Default:
gain
.K_f: The number of nodes to add at the final step of the algorithm. Higher values result in faster computation.
name:
random
Select the candidates uniformly at random.
name:
optimal
Select the candidates so as to maximize the posterior probability that \(\mathit{pa}(i) \subseteq C_i\). Only feasible up to a couple of dozen variables.
metropolis_coupling
Metropolis coupling is implemented by running \(M\) “heated” chains in parallel with their stationary distributions proportional to the \(\beta_i\text{th}\) power of the posterior, with some appropriate inverse temperatures \(1 = \beta_1 > \beta_2 > \ldots > \beta_M \geq 0\). Periodically, a swap of states between two adjacent chains is proposed and accepted with a certain probability.
Two modes available:
static
andadaptive
. To not use Metropolis coupling usestatic
with params:M set to 1.Default:
adaptive
.name:
static
Metropolis coupling with fixed number of chains and temperatures. The lowest inverse temperature is exactly zero, corresponding to the uniform distribution.
params
M: The number of coupled chains to run. Set to 1 to disable Metropolis coupling.
Default: 16
heating: The functional form of the sequence of inverse temperatures. One of:
linear
: \(\beta_i = (M - i) \big/ (M - 1)\)quadratic
: \(\beta_i = 1- ((i-1) \big/ (M - 1))^2\)sigmoid
: \(\beta_1=1,\beta_{i:i\not\in \{1,M\}} = \frac{1}{1+e^{(M-1)/2-(M-i)}},\beta_M=0\)
Default:
linear
sliding_window: The number of the most recent swap proposals from which a local swap probability is computed.
Default: 1000
name:
adaptive
The number of chains is initialized to \(M=2\) (params:M). The inverse temperatures are defined as
\[\beta_k = 1 / (1 + \sum_{k`=1}^{k-1}{\delta_{k`}}),\]where \(\delta_k, k = 1,...,M-1\) are auxiliary temperature delta variables with an initial value \(\delta_1 = 0.5\) (params:delta_init) yielding \(\beta_1 = 1\) and \(\beta_2 \approx 0.67\). Then, the following process to adjust \(M\) and \(\beta_1,\beta_2,\ldots,\beta_M\) is repeated indefinitely:
1. Each of the chains are sampled for a specific number of iterations (params:update_n), with the number multiplied on each cycle by a slowdown factor (params:slowdown). Then the empirical probability \(p_{s}(k)\) for a proposed swap between chains \(k\) and \(k+1\) to have been accepted is computed. In addition to computing the probability given the full history of the chain it is also computed for a specific number of the most recent proposals (params:sliding_window), with the latter probablity denoted by \(p_{sw}(k)\).
2. The temperature delta variables for \(k=1,...,M-1\) are updated as
\[\delta_k := \delta_k (1 + \frac{p_{sw}(k) - p_{t}}{s}),\]where \(p_{t}\) (params:p_target) is the target swap probability, and \(s\) is a smoothing factor (params:smoothing).
3. Finally, denoting the number of chains for which \(p_{s}(k) > 0.9\) by \(r\), either
a new chain is added (if \(r = 0\)),
the number of chains remains unchanged (if \(r = 1\)),
a chain is deleted (if \(r > 1\) and \(M\) > 2).
On addition the new chain takes the index position \(M+1\), its state is initialized to that of chain \(M\) and \(t_M\) is set to value \(2(M+1)\). On deletion the chain at the index position \(M\) is removed. Here \(M\) refers to the number of chains prior to the addition or removal.
params
M: The initial number of coupled chains to run.
Default: 2
p_target: Target local swap probability
Default: 0.234 [3]
delta_init: The initial temperature difference between adjacent chains.
Default: 0.5
sliding_window: The number of most recent swap proposals from which the local swap probability is computed.
Default: 1000
update_n: Temperatures are updated every nth iteration, with n set by this parameter. update_n is multiplied by slowdown on every temperature update event.
Default: 100
smoothing: A parameter with effect on the magnitude of change in temperature update. See description above.
Default: 1
slowdown: A parameter with effect on the frequency of temperature updates. See description above and update_n.
Default: 1
catastrophic_cancellation: Catastrophic cancellation occurs when a score sum \(\tau_i(U,T)\) computed as \(\tau_i(U) - \tau_i(U \setminus T)\) evaluates to zero due to numerical reasons.
tolerance: how small should the absolute difference between two log score sums be in order for the subtraction to be determined to lead to catastrofic cancellation.
Default: \(2^{-32}\).
cache_size: Maximum amount of score sums that cannot be computed through subtraction to be stored separately. If there is a lot of catastrofic cancellations, setting this value high can have a big impact on memory use.
Default: \(10^7\)
pruning_tolerance: Allowed relative error for a root-partition node score sum. Setting this to a positive value allows some candidate parent sets to be pruned, expediting parent set sampling.
Default: 0.001.
scoresum_tolerance: Tolerated relative error when computing score sums from individual parent set local scores.
Default: 0.01.
candidate_parents_path
Alternatively to candidate_parent_algorithm the candidate parents can be precomputed and read from a file, with the path given under this parameter.
In the expected format row numbers determine the nodes, while \(K\) space separated numbers on the rows identify the candidate parents.
candidate_parents
Alternatively to candidate_parent_algorithm the candidate parents can be precomputed and read from a Python dictionary given under this parameter.
In the expected format integer keys determine the nodes, while the values are tuples with \(K\) integers identifying the parents.
logging: Parameters determining the logging output.
silent: Whether to suppress logging output or not.
Default:
False
.period: Interval in seconds for printing more statistics.
Default: 15.
verbose_prefix: If set, more verbose output is created in files at
<working directory>/prefix
. For example prefixx/y
would create files with names starting withy
in directoryx
.Default:
None
.overwrite: If
True
files inverbose_prefix
are overwritten if they exist.Default:
False
.
- precompute()
- sample()
- class sumu.gadget.GadgetLogger(gadget)
Bases:
Logger
Stuff for printing stuff.
- finalize()
- h(title, secnum=True)
- last_root_partition_scores()
- move_probs(stats)
- periodic_stats()
- plot_score_trace()
- progress(stats)
- run_stats()
- class sumu.gadget.GadgetParameters(*, data, validate_params=True, is_prerun=False, run_mode={}, mcmc={}, score={}, structure_prior={}, constraints={}, candidate_parent_algorithm={}, metropolis_coupling={}, catastrophic_cancellation={}, pruning_tolerance=None, scoresum_tolerance=None, candidate_parents_path=None, candidate_parents=None, logging={})
Bases:
object
- adjust_to_time_budget_K(budget, K_min=1)
- adjust_to_time_budget_d(budget, d_min=0)
- left()
- static mem_estimate(n, K, d, n_cc, a=443.449129, b=3.22309337e-05, c=0.000498862512)
- pred_time_use_K(K)
- pred_time_use_d(d)
- prerun(K_max)
- class sumu.gadget.LocalScore(*, data, score=None, prior={'name': 'fair'}, maxid=-1)
Bases:
object
Class for computing local scores given input data.
Implemented scores are BDeu and BGe. The scores by default use the “fair” modular structure prior [2].
- all_scores_dict(C=None)
- candidate_scores(C=None)
- clear_cache()
- complement_psets_and_scores(v, C, d)
- local(v, pset)
Local score for input node v and pset, with score function self.scoref.
This is the “safe” version, raising error if queried with invalid input. The unsafe self._local will just segfault.
- score_dag(dag)
- class sumu.gadget.Logger(*, logfile, mode='a', overwrite=False)
Bases:
object
- br(n=1)
- dict(data)
- fraction(n, d)
- numpy(array, fmt='%.2f')
- silent()
- unlink()
- class sumu.gadget.Score(*, C, c_r_score, c_c_score)
Bases:
object
- sample_DAG(R)
- sample_pset(v, U, T={})
- score_rootpartition(R)
- sum(v, U, T={})
Returns the sum of scores for node v over the parent sets that 1. are subsets of U; 2. and, if T is not empty, have at least one member in T.
The sum is computed over first the scores restricted to candidate parents (self.C), and then the result is augmented by scores complementary to those restricted to the candidate parents, until some predefined level of error.
- Parameters:
v (int) – Label of the node whose local scores are summed.
U (set) – Parent sets of scores to be summed are the subsets of U.
T (set) – Parent sets must have at least one member in T (if T is not empty).
- Returns:
Sum of scores (float).