sumu.gadget module

The module implements the algorithm Gadget as first detailed in [1].

sumu.gadget.DBUG(msg)
class sumu.gadget.Defaults

Bases: object

class sumu.gadget.EmptyDataScore(**kwargs)

Bases: object

candidate_score_array(C)
clear_cache()
complement_psets_and_scores(v, C, d)
local(v, pset)
class sumu.gadget.Gadget(*, data, validate_params=True, is_prerun=False, run_mode={}, mcmc={}, score={}, structure_prior={}, constraints={}, candidate_parent_algorithm={}, metropolis_coupling={}, catastrophic_cancellation={}, pruning_tolerance=None, scoresum_tolerance=None, candidate_parents_path=None, candidate_parents=None, logging={})

Bases: object

Class implementing the Gadget pipeline for MCMC sampling from the structure posterior of DAG models. The user interface consists of:

  1. Constructor for setting all the parameters.

  2. sample() method which runs the MCMC chain and returns the sampled DAGs along with meta data.

All the constructor arguments are keyword arguments, i.e., the data argument should be given as data=data, etc. Only the data argument is required; other arguments have some more or less sensible defaults.

There are many parameters that can be adjusted. To make managing the parameters easier, most of them are grouped into dictionary objects around some common theme.

In this documentation nested parameters within the dictionaries are referenced as outer:inner (e.g., silent in logging={'silent': True}, corresponds to logging:silent) or as:

  • outer

    • inner

Some of the parameters have the keys name and params (i.e., foo={'name': 'bar', 'params': {'baz': 'qux'}}), params being a dictionary the structure of which depends on the value of name. When describing such a parameter the following style is used:

  • foo

    Default: bar

    • name: bar

      General description of bar.

      params

      • baz: Description of the parameter baz.

To only adjust some parameter within a dictionary argument while keeping the others at default, it suffices to set the one parameter. For example, to set the number of candidate parents constraints:K to some value \(K\) but let the maximum size of arbitrary parent sets constraints:d be determined automatically you should construct the constraints parameter as

>>> Gadget(data=data, constraints={"K": K}).

The following describes all Gadget constructor parameters.

  • data

    Data to run the MCMC simulation on. Accepts any valid constructor argument for a Data object.

  • run_mode

    Which mode to run Gadget in.

    Default: budget.

    • name: normal

      MCMC simulation is run for a predetermined number of iteration steps.

      params

      • n_target_chain_iters: For how many iterations to run the target chain (i.e., the unheated chain; see metropolis_coupling).

    • name: budget

      MCMC simulation is run until a given time budget is used up. A fraction of the time budget is allocated for precomputations, with the remainder being used on the Markov chain simulation itself.

      There are three precomputation phases:

      1. Selecting candidate parents.

      2. Building score sum data structures given the candidate parents.

      3. Computing scores of the parent sets for which we allow nodes outside of the candidates.

      The parameters principally governing the time use for each part are, correspondingly, the number of nodes to add at the final step of the greedy candidate selection algorithm (candidate_parent_algorithm:params:K_f), the number of candidate parents (constraints:K), and the maximum size of the parent sets for which nodes outside the chosen candidates are permitted (constraints:d). In budget mode, these parameters are set automatically, so as to use a predetermined fraction of the budget on each precomputation phase.

      With big time budgets the automatically selected parameters might result in infeasibly large memory requirements. To avoid this there is an additional memory budget parameter to cap constraints:K and constraints:d.

      If any of the parameters constraints:K, constraints:d or candidate_parent_algorithm:params:K_f are explicitly set, that parameter will not be programmatically adjusted. The parameters constraints:min_K and constraints:min_d can be used to set the minimum values for constraints:K and constraints:d, respectively.

      params

      • t: Time budget in seconds.

        Default: Roughly sensible amount of time as a function of data shape.

      • mem: Memory budget in megabytes.

        Default: Amount of memory available.

      • t_share: Dictionary with the keys C, K, and d corresponding to the above precomputation phases 1-3, respectively, with float values determining the fraction of the overall budget to use on the particular phase.

        Default: {'C': 1 / 9, 'K': 1 / 9, 'd': 1 / 9}

    • name: anytime

      If ran in this mode the first CTRL-C after calling sample() stops the burn-in phase and starts sampling DAGs. The second CTRL-C stops the sampling. DAG sampling first accumulates up to 2 \(\cdot\) mcmc:n_dags - 1 DAGs with thinning 1 (i.e., a DAG is sampled for each sampled root-partition), then each time the number of DAGs reaches 2 \(\cdot\) mcmc:n_dags the thinning is doubled and every 2nd already sampled DAG is deleted.

  • mcmc

    General Markov Chain Monte Carlo arguments.

    • initial_rootpartition

      The root-partition to initialize the MCMC chain(s) with. If not set the chains will start from a random state.

      The root-partition format is a list partitioning integers 0..n to sets.

      Default: Not set.

    • n_independent: Number of independent chains to run. For each independent chain there will be metropolis_coupling:M coupled chains run in logical parallel. The DAGs are sampled evenly from each unheated chain (see metropolis_coupling).

      Values greater than 1 are mostly useful for analyzing mixing.

      Default: 1

    • burnin: The fraction of run_mode:params:n_target_chain_iters (run_mode normal), or the fraction of time budget remaining after precomputations (run_mode budget) to use on burn-in.

      Default: 0.5

    • n_dags: The (approximate) number of DAGs to sample.

      Default: 10 000

    • move_weights: Dictionary with the keys R_split_merge, R_swap_node_pair and DAG_edge_reversal. The values are integer weights specifying the unnormalized probability distribution from which the type of each proposed move is sampled.

      Note that when employing Metropolis coupling the edge reversal move is only available for the unheated chain (in heated chains its weight will be 0), and that the state swap move is proposed uniformly at random for any two adjacent chains always after each chain has first progressed by one step (i.e., its proposal probability cannot be adjusted).

      Default: {'R_split_merge': 1, 'R_swap_node_pair': 1, 'DAG_edge_reversal': 2}

  • score

    The score to use.

    Default: bdeu for discrete and bge for continuous data.

    • name: bdeu

      Bayesian Dirichlet equivalent uniform.

      params

      • ess: Equivalent sample size.

        Default: 10

    • name: bge

      Bayesian Gaussian equivalent.

  • structure_prior

    The modular structure prior to use. Modular structure priors are composed as a product of local factors, i.e., the prior for graph \(G\) factorizes as

    \[P(G) \propto \prod_{i=1}^{n}\rho_i(\mathit{pa}(i)),\]

    where \(\rho_i\) are node specific factors, and \(\mathit{pa}(i)\) are the parents of node \(i\) in \(G\).

    Two types of factors are implemented, dubbed fair and unif. See [2].

    Default: fair.

    • name: fair

      Balances the probabilities of different indegrees (i.e., of \(|\mathit{pa}(i)|\)), with the factors taking the form

      \[\rho_i(S) = 1\big/\binom{n-1}{|S|}.\]
    • name: unif

      Uniform over different graphs, i.e., the factors are simply

      \[\rho_i(S) = 1.\]
  • constraints

    Constraints on the explored DAG space.

    • K: Number of candidate parents per node.

      Default: \(\min(n-1,16)\), where \(n\) is the number of nodes (run_mode normal or anytime), or parameter not set (run_mode budget).

    • K_min: Sets the minimum level for K if using run_mode budget.

      Default: 1 (run_mode budget), or parameter not set (run_mode normal or anytime).

    • d: Maximum size of parent sets for which nodes outside of the candidates are allowed.

      Default: \(\min(n-1, 3)\), where \(n\) is the number of nodes (run_mode normal or anytime) or parameter not set (run_mode budget).

    • d_min: Sets the minimum level for d if using run_mode budget.

      Default: 2 (run_mode budget), or parameter not set (run_mode normal or anytime).

    • max_id: Maximum size of parent sets that are subsets of the candidate parents. Set to -1 for unlimited. There should be no reason to limit the size.

      Default: -1

  • candidate_parent_algorithm

    Algorithm to use for finding the candidate parents \(C = C_1C_2 \ldots C_n\), where \(C_i\) is the set of candidates for node \(i\).

    Implemented algorithms are greedy, random and optimal.

    Default: greedy

    • name: greedy

      Iteratively, add a best node to the initially empty \(C_i\) (given the already added ones), until \(|C_i|=K-K_f\). Finally add the \(K_f\) best nodes.

      params

      • association_measure: The measure for goodness of a candidate parent \(j\). One of

        • score: \(\max_{S \subseteq C_i} \pi_i(S\cup\{j\})\)

        • gain: \(\max_{S \subseteq C_i} \pi_i(S\cup\{j\}) - \pi_i(S)\),

        where \(\pi_i(S)\) is the local score of the parent set \(S\) for node \(i\).

        Default: gain.

      • K_f: The number of nodes to add at the final step of the algorithm. Higher values result in faster computation.

    • name: random

      Select the candidates uniformly at random.

    • name: optimal

      Select the candidates so as to maximize the posterior probability that \(\mathit{pa}(i) \subseteq C_i\). Only feasible up to a couple of dozen variables.

  • metropolis_coupling

    Metropolis coupling is implemented by running \(M\) “heated” chains in parallel with their stationary distributions proportional to the \(\beta_i\text{th}\) power of the posterior, with some appropriate inverse temperatures \(1 = \beta_1 > \beta_2 > \ldots > \beta_M \geq 0\). Periodically, a swap of states between two adjacent chains is proposed and accepted with a certain probability.

    Two modes available: static and adaptive. To not use Metropolis coupling use static with params:M set to 1.

    Default: adaptive.

    • name: static

      Metropolis coupling with fixed number of chains and temperatures. The lowest inverse temperature is exactly zero, corresponding to the uniform distribution.

      params

      • M: The number of coupled chains to run. Set to 1 to disable Metropolis coupling.

        Default: 16

      • heating: The functional form of the sequence of inverse temperatures. One of:

        • linear: \(\beta_i = (M - i) \big/ (M - 1)\)

        • quadratic: \(\beta_i = 1- ((i-1) \big/ (M - 1))^2\)

        • sigmoid: \(\beta_1=1,\beta_{i:i\not\in \{1,M\}} = \frac{1}{1+e^{(M-1)/2-(M-i)}},\beta_M=0\)

        Default: linear

      • sliding_window: The number of the most recent swap proposals from which a local swap probability is computed.

        Default: 1000

    • name: adaptive

      The number of chains is initialized to \(M=2\) (params:M). The inverse temperatures are defined as

      \[\beta_k = 1 / (1 + \sum_{k`=1}^{k-1}{\delta_{k`}}),\]

      where \(\delta_k, k = 1,...,M-1\) are auxiliary temperature delta variables with an initial value \(\delta_1 = 0.5\) (params:delta_init) yielding \(\beta_1 = 1\) and \(\beta_2 \approx 0.67\). Then, the following process to adjust \(M\) and \(\beta_1,\beta_2,\ldots,\beta_M\) is repeated indefinitely:

      1. Each of the chains are sampled for a specific number of iterations (params:update_n), with the number multiplied on each cycle by a slowdown factor (params:slowdown). Then the empirical probability \(p_{s}(k)\) for a proposed swap between chains \(k\) and \(k+1\) to have been accepted is computed. In addition to computing the probability given the full history of the chain it is also computed for a specific number of the most recent proposals (params:sliding_window), with the latter probablity denoted by \(p_{sw}(k)\).

      2. The temperature delta variables for \(k=1,...,M-1\) are updated as

      \[\delta_k := \delta_k (1 + \frac{p_{sw}(k) - p_{t}}{s}),\]

      where \(p_{t}\) (params:p_target) is the target swap probability, and \(s\) is a smoothing factor (params:smoothing).

      3. Finally, denoting the number of chains for which \(p_{s}(k) > 0.9\) by \(r\), either

      • a new chain is added (if \(r = 0\)),

      • the number of chains remains unchanged (if \(r = 1\)),

      • a chain is deleted (if \(r > 1\) and \(M\) > 2).

      On addition the new chain takes the index position \(M+1\), its state is initialized to that of chain \(M\) and \(t_M\) is set to value \(2(M+1)\). On deletion the chain at the index position \(M\) is removed. Here \(M\) refers to the number of chains prior to the addition or removal.

      params

      • M: The initial number of coupled chains to run.

        Default: 2

      • p_target: Target local swap probability

        Default: 0.234 [3]

      • delta_init: The initial temperature difference between adjacent chains.

        Default: 0.5

      • sliding_window: The number of most recent swap proposals from which the local swap probability is computed.

      Default: 1000

      • update_n: Temperatures are updated every nth iteration, with n set by this parameter. update_n is multiplied by slowdown on every temperature update event.

        Default: 100

      • smoothing: A parameter with effect on the magnitude of change in temperature update. See description above.

        Default: 1

      • slowdown: A parameter with effect on the frequency of temperature updates. See description above and update_n.

        Default: 1

  • catastrophic_cancellation: Catastrophic cancellation occurs when a score sum \(\tau_i(U,T)\) computed as \(\tau_i(U) - \tau_i(U \setminus T)\) evaluates to zero due to numerical reasons.

    • tolerance: how small should the absolute difference between two log score sums be in order for the subtraction to be determined to lead to catastrofic cancellation.

      Default: \(2^{-32}\).

    • cache_size: Maximum amount of score sums that cannot be computed through subtraction to be stored separately. If there is a lot of catastrofic cancellations, setting this value high can have a big impact on memory use.

      Default: \(10^7\)

  • pruning_tolerance: Allowed relative error for a root-partition node score sum. Setting this to a positive value allows some candidate parent sets to be pruned, expediting parent set sampling.

    Default: 0.001.

  • scoresum_tolerance: Tolerated relative error when computing score sums from individual parent set local scores.

    Default: 0.01.

  • candidate_parents_path

    Alternatively to candidate_parent_algorithm the candidate parents can be precomputed and read from a file, with the path given under this parameter.

    In the expected format row numbers determine the nodes, while \(K\) space separated numbers on the rows identify the candidate parents.

  • candidate_parents

    Alternatively to candidate_parent_algorithm the candidate parents can be precomputed and read from a Python dictionary given under this parameter.

    In the expected format integer keys determine the nodes, while the values are tuples with \(K\) integers identifying the parents.

  • logging: Parameters determining the logging output.

    • silent: Whether to suppress logging output or not.

      Default: False.

    • period: Interval in seconds for printing more statistics.

      Default: 15.

    • verbose_prefix: If set, more verbose output is created in files at <working directory>/prefix. For example prefix x/y would create files with names starting with y in directory x.

      Default: None.

    • overwrite: If True files in verbose_prefix are overwritten if they exist.

      Default: False.

precompute()
sample()
class sumu.gadget.GadgetLogger(gadget)

Bases: Logger

Stuff for printing stuff.

finalize()
h(title, secnum=True)
last_root_partition_scores()
move_probs(stats)
periodic_stats()
plot_score_trace()
progress(stats)
run_stats()
class sumu.gadget.GadgetParameters(*, data, validate_params=True, is_prerun=False, run_mode={}, mcmc={}, score={}, structure_prior={}, constraints={}, candidate_parent_algorithm={}, metropolis_coupling={}, catastrophic_cancellation={}, pruning_tolerance=None, scoresum_tolerance=None, candidate_parents_path=None, candidate_parents=None, logging={})

Bases: object

adjust_to_time_budget_K(budget, K_min=1)
adjust_to_time_budget_d(budget, d_min=0)
left()
static mem_estimate(n, K, d, n_cc, a=443.449129, b=3.22309337e-05, c=0.000498862512)
pred_time_use_K(K)
pred_time_use_d(d)
prerun(K_max)
class sumu.gadget.LocalScore(*, data, score=None, prior={'name': 'fair'}, maxid=-1)

Bases: object

Class for computing local scores given input data.

Implemented scores are BDeu and BGe. The scores by default use the “fair” modular structure prior [2].

all_scores_dict(C=None)
candidate_scores(C=None)
clear_cache()
complement_psets_and_scores(v, C, d)
local(v, pset)

Local score for input node v and pset, with score function self.scoref.

This is the “safe” version, raising error if queried with invalid input. The unsafe self._local will just segfault.

score_dag(dag)
class sumu.gadget.Logger(*, logfile, mode='a', overwrite=False)

Bases: object

br(n=1)
dict(data)
fraction(n, d)
numpy(array, fmt='%.2f')
silent()
class sumu.gadget.Score(*, C, c_r_score, c_c_score)

Bases: object

sample_DAG(R)
sample_pset(v, U, T={})
score_rootpartition(R)
sum(v, U, T={})

Returns the sum of scores for node v over the parent sets that 1. are subsets of U; 2. and, if T is not empty, have at least one member in T.

The sum is computed over first the scores restricted to candidate parents (self.C), and then the result is augmented by scores complementary to those restricted to the candidate parents, until some predefined level of error.

Parameters:
  • v (int) – Label of the node whose local scores are summed.

  • U (set) – Parent sets of scores to be summed are the subsets of U.

  • T (set) – Parent sets must have at least one member in T (if T is not empty).

Returns:

Sum of scores (float).