| Title: | Handling Missing Data in Stochastic Block Models |
|---|---|
| Description: | When a network is partially observed (here, NAs in the adjacency matrix rather than 1 or 0 due to missing information between node pairs), it is possible to account for the underlying process that generates those NAs. 'missSBM', presented in 'Barbillon, Chiquet and Tabouy' (2022) <doi:10.18637/jss.v101.i12>, adjusts the popular stochastic block model from network data sampled under various missing data conditions, as described in 'Tabouy, Barbillon and Chiquet' (2019) <doi:10.1080/01621459.2018.1562934>. |
| Authors: | Julien Chiquet [aut, cre] (ORCID: <https://orcid.org/0000-0002-3629-3429>), Pierre Barbillon [aut] (ORCID: <https://orcid.org/0000-0002-7766-7693>), Timothée Tabouy [aut], Jean-Benoist Léger [ctb] (provided C++ implementaion of K-means), François Gindraud [ctb] (provided C++ interface to NLopt), großBM team [ctb] |
| Maintainer: | Julien Chiquet <[email protected]> |
| License: | GPL-3 |
| Version: | 1.0.5 |
| Built: | 2026-05-12 09:24:36 UTC |
| Source: | https://github.com/grosssbm/misssbm |
Class for defining a block dyad sampler
Class for defining a block dyad sampler
missSBM::networkSampling -> missSBM::networkSampler -> missSBM::dyadSampler -> blockDyadSampler
dfthe number of parameters of this sampling
new()
constructor for networkSampling
blockDyadSampler$new( parameters = NA, nbNodes = NA, directed = FALSE, clusters = NA )
parametersthe vector of parameters associated to the sampling at play
nbNodesnumber of nodes in the network
directedlogical, directed network of not
clustersa vector of class memberships
clone()
The objects of this class are cloneable with this method.
blockDyadSampler$clone(deep = FALSE)
deepWhether to make a deep clone.
Class for fitting a block-dyad sampling
Class for fitting a block-dyad sampling
missSBM::networkSampling -> missSBM::networkSamplingDyads_fit -> blockDyadSampling_fit
vExpecvariational expectation of the sampling
log_lambdamatrix, term for adjusting the imputation step which depends on the type of sampling
new()
constructor
blockDyadSampling_fit$new(partlyObservedNetwork, blockInit)
partlyObservedNetworka object with class partlyObservedNetwork representing the observed data with possibly missing entries
blockInitn x Q matrix of initial block indicators
update_parameters()
a method to update the estimation of the parameters. By default, nothing to do (corresponds to MAR sampling)
blockDyadSampling_fit$update_parameters(nu, Z)
nuthe matrix of (uncorrected) imputation for missing entries
Zprobabilities of block memberships
clone()
The objects of this class are cloneable with this method.
blockDyadSampling_fit$clone(deep = FALSE)
deepWhether to make a deep clone.
Class for defining a block node sampler
Class for defining a block node sampler
missSBM::networkSampling -> missSBM::networkSampler -> missSBM::nodeSampler -> blockNodeSampler
new()
constructor for networkSampling
blockNodeSampler$new( parameters = NA, nbNodes = NA, directed = FALSE, clusters = NA )
parametersthe vector of parameters associated to the sampling at play
nbNodesnumber of nodes in the network
directedlogical, directed network of not
clustersa vector of class memberships
clone()
The objects of this class are cloneable with this method.
blockNodeSampler$clone(deep = FALSE)
deepWhether to make a deep clone.
Class for fitting a block-node sampling
Class for fitting a block-node sampling
missSBM::networkSampling -> missSBM::networkSamplingNodes_fit -> blockNodeSampling_fit
vExpecvariational expectation of the sampling
log_lambdadouble, term for adjusting the imputation step which depends on the type of sampling
new()
constructor
blockNodeSampling_fit$new(partlyObservedNetwork, blockInit)
partlyObservedNetworka object with class partlyObservedNetwork representing the observed data with possibly missing entries
blockInitn x Q matrix of initial block indicators
update_parameters()
a method to update the estimation of the parameters. By default, nothing to do (corresponds to MAR sampling)
blockNodeSampling_fit$update_parameters(imputedNet, Z)
imputedNetan adjacency matrix where missing values have been imputed
Zindicator of blocks
clone()
The objects of this class are cloneable with this method.
blockNodeSampling_fit$clone(deep = FALSE)
deepWhether to make a deep clone.
Extracts model coefficients from objects missSBM_fit returned by estimateMissSBM()
## S3 method for class 'missSBM_fit' coef( object, type = c("mixture", "connectivity", "covariates", "sampling"), ... )## S3 method for class 'missSBM_fit' coef( object, type = c("mixture", "connectivity", "covariates", "sampling"), ... )
object |
an R6 object with class |
type |
type of parameter that should be extracted. Either "mixture" (default), "connectivity", "covariates" or "sampling" |
... |
additional parameters for S3 compatibility. Not used |
A vector or matrix of coefficients extracted from the missSBM_fit model.
Class for fitting a dyad sampling with covariates
Class for fitting a dyad sampling with covariates
missSBM::networkSampling -> missSBM::networkSamplingDyads_fit -> covarDyadSampling_fit
vExpecvariational expectation of the sampling
new()
constructor
covarDyadSampling_fit$new(partialNet, ...)
partialNeta object with class partlyObservedNetwork representing the observed data with possibly missing entries
...used for compatibility
clone()
The objects of this class are cloneable with this method.
covarDyadSampling_fit$clone(deep = FALSE)
deepWhether to make a deep clone.
Class for fitting a node-centered sampling with covariate
Class for fitting a node-centered sampling with covariate
missSBM::networkSampling -> missSBM::networkSamplingNodes_fit -> covarNodeSampling_fit
vExpecvariational expectation of the sampling
new()
constructor
covarNodeSampling_fit$new(partlyObservedNetwork, ...)
partlyObservedNetworka object with class partlyObservedNetwork representing the observed data with possibly missing entries
...used for compatibility
clone()
The objects of this class are cloneable with this method.
covarNodeSampling_fit$clone(deep = FALSE)
deepWhether to make a deep clone.
Class for defining a degree sampler
Class for defining a degree sampler
missSBM::networkSampling -> missSBM::networkSampler -> missSBM::nodeSampler -> degreeSampler
new()
constructor for networkSampling
degreeSampler$new(parameters = NA, degrees = NA, directed = FALSE)
parametersthe vector of parameters associated to the sampling at play
degreesvector of nodes' degrees
directedlogical, directed network of not
clone()
The objects of this class are cloneable with this method.
degreeSampler$clone(deep = FALSE)
deepWhether to make a deep clone.
Class for fitting a degree sampling
Class for fitting a degree sampling
missSBM::networkSampling -> missSBM::networkSamplingNodes_fit -> degreeSampling_fit
vExpecvariational expectation of the sampling
new()
constructor
degreeSampling_fit$new(partlyObservedNetwork, blockInit, connectInit)
partlyObservedNetworka object with class partlyObservedNetwork representing the observed data with possibly missing entries
blockInitn x Q matrix of initial block indicators
connectInitQ x Q matrix of initial block probabilities of connection
update_parameters()
a method to update the estimation of the parameters. By default, nothing to do (corresponds to MAR sampling)
degreeSampling_fit$update_parameters(imputedNet, ...)
imputedNetan adjacency matrix where missing values have been imputed
...used for compatibility
update_imputation()
a method to update the imputation of the missing entries.
degreeSampling_fit$update_imputation(PI, ...)
PIthe matrix of inter/intra class probability of connection
...use for compatibility
clone()
The objects of this class are cloneable with this method.
degreeSampling_fit$clone(deep = FALSE)
deepWhether to make a deep clone.
Class for defining a double-standard sampler
Class for defining a double-standard sampler
missSBM::networkSampling -> missSBM::networkSampler -> missSBM::dyadSampler -> doubleStandardSampler
new()
constructor for networkSampling
doubleStandardSampler$new(parameters = NA, adjMatrix = NA, directed = FALSE)
parametersthe vector of parameters associated to the sampling at play
adjMatrixmatrix of adjacency
directedlogical, directed network of not
clone()
The objects of this class are cloneable with this method.
doubleStandardSampler$clone(deep = FALSE)
deepWhether to make a deep clone.
Class for fitting a double-standard sampling
Class for fitting a double-standard sampling
missSBM::networkSampling -> missSBM::networkSamplingDyads_fit -> doubleStandardSampling_fit
vExpecvariational expectation of the sampling
new()
constructor
doubleStandardSampling_fit$new(partlyObservedNetwork, ...)
partlyObservedNetworka object with class partlyObservedNetwork representing the observed data with possibly missing entries
...used for compatibility
update_parameters()
a method to update the estimation of the parameters. By default, nothing to do (corresponds to MAR sampling)
doubleStandardSampling_fit$update_parameters(nu, ...)
nuan adjacency matrix with imputed values (only)
...use for compatibility
update_imputation()
a method to update the imputation of the missing entries.
doubleStandardSampling_fit$update_imputation(nu)
nuthe matrix of (uncorrected) imputation for missing entries
clone()
The objects of this class are cloneable with this method.
doubleStandardSampling_fit$clone(deep = FALSE)
deepWhether to make a deep clone.
Virtual class for all dyad-centered samplers
Virtual class for all dyad-centered samplers
missSBM::networkSampling -> missSBM::networkSampler -> dyadSampler
new()
constructor for networkSampling
dyadSampler$new(type = NA, parameters = NA, nbNodes = NA, directed = FALSE)
typecharacter for the type of sampling. must be in ("dyad", "covar-dyad", "node", "covar-node", "block-node", "block-dyad", "double-standard", "degree")
parametersthe vector of parameters associated to the sampling at play
nbNodesnumber of nodes in the network
directedlogical, directed network of not
clone()
The objects of this class are cloneable with this method.
dyadSampler$clone(deep = FALSE)
deepWhether to make a deep clone.
Class for fitting a dyad sampling
Class for fitting a dyad sampling
missSBM::networkSampling -> missSBM::networkSamplingDyads_fit -> dyadSampling_fit
vExpecvariational expectation of the sampling
new()
constructor
dyadSampling_fit$new(partlyObservedNetwork, ...)
partlyObservedNetworka object with class partlyObservedNetwork representing the observed data with possibly missing entries
...used for compatibility
clone()
The objects of this class are cloneable with this method.
dyadSampling_fit$clone(deep = FALSE)
deepWhether to make a deep clone.
A dataset containing the weighted PPI network centered around the ESR1 (ER) protein
er_networker_network
A sparse symmetric matrix with 741 rows and 741 columns ESR1
data("er_network") class(er_network)data("er_network") class(er_network)
Variational EM inference of Stochastic Block Models indexed by block number from a partially observed network.
estimateMissSBM( adjacencyMatrix, vBlocks, sampling, covariates = list(), control = list() )estimateMissSBM( adjacencyMatrix, vBlocks, sampling, covariates = list(), control = list() )
adjacencyMatrix |
The N x N adjacency matrix of the network data. If |
vBlocks |
The vector of number of blocks considered in the collection. |
sampling |
The model used to described the process that originates the missing data: MAR designs ("dyad", "node","covar-dyad","covar-node","snowball") and MNAR designs ("double-standard", "block-dyad", "block-node" , "degree") are available. See details. |
covariates |
An optional list with M entries (the M covariates). If the covariates are node-centered, each entry of |
control |
a list of parameters controlling advanced features. See details. |
Internal functions use future_lapply, so set your plan to 'multisession' or
'multicore' to use several cores/workers.
The list of parameters control tunes more advanced features, such as the
initialization, how covariates are handled in the model, and the variational EM algorithm:
useCov logical. If covariates is not null, should they be used for the
for the SBM inference (or just for the sampling)? Default is TRUE.
clusterInit Initial method for clustering: either a character ("spectral")
or a list with length(vBlocks) vectors, each with size ncol(adjacencyMatrix),
providing a user-defined clustering. Default is "spectral".
similarity An R x R -> R function to compute similarities between node covariates. Default is
l1_similarity, that is, -abs(x-y). Only relevant when the covariates are node-centered
(i.e. covariates is a list of size-N vectors).
threshold V-EM algorithm stops stop when an optimization step changes the objective function or the parameters by less than threshold. Default is 1e-2.
maxIter V-EM algorithm stops when the number of iteration exceeds maxIter. Default is 50.
fixPointIter number of fix-point iterations in the V-E step. Default is 3.
exploration character indicating the kind of exploration used among "forward", "backward", "both" or "none". Default is "both".
iterates integer for the number of iterations during exploration. Only relevant when exploration is different from "none". Default is 1.
trace logical for verbosity. Default is TRUE.
The different sampling designs are split into two families in which we find dyad-centered and node-centered samplings. See doi:10.1080/01621459.2018.1562934 for a complete description.
Missing at Random (MAR)
dyad parameter = p = Prob(Dyad(i,j) is observed)
node parameter = p = Prob(Node i is observed)
covar-dyad": parameter = beta in R^M, such that Prob(Dyad (i,j) is observed) = logistic(parameter' covarArray (i,j, .))
covar-node": parameter = nu in R^M such that Prob(Node i is observed) = logistic(parameter' covarMatrix (i,)
snowball": parameter = number of waves with Prob(Node i is observed in the 1st wave)
Missing Not At Random (MNAR)
double-standard parameter = (p0,p1) with p0 = Prob(Dyad (i,j) is observed | the dyad is equal to 0), p1 = Prob(Dyad (i,j) is observed | the dyad is equal to 1)
block-node parameter = c(p(1),...,p(Q)) and p(q) = Prob(Node i is observed | node i is in cluster q)
block-dyad parameter = c(p(1,1),...,p(Q,Q)) and p(q,l) = Prob(Edge (i,j) is observed | node i is in cluster q and node j is in cluster l)
Returns an R6 object with class missSBM_collection.
observeNetwork, missSBM_collection and missSBM_fit.
## SBM parameters N <- 100 # number of nodes Q <- 3 # number of clusters pi <- rep(1,Q)/Q # block proportion theta <- list(mean = diag(.45,Q) + .05 ) # connectivity matrix ## Sampling parameters samplingParameters <- .75 # the sampling rate sampling <- "dyad" # the sampling design ## generate a undirected binary SBM with no covariate sbm <- sbm::sampleSimpleSBM(N, pi, theta) ## Uncomment to set parallel computing with future ## future::plan("multicore", workers = 2) ## Sample some dyads data + Infer SBM with missing data collection <- observeNetwork(sbm$networkData, sampling, samplingParameters) %>% estimateMissSBM(vBlocks = 1:4, sampling = sampling) plot(collection, "monitoring") plot(collection, "icl") collection$ICL coef(collection$bestModel$fittedSBM, "connectivity") myModel <- collection$bestModel plot(myModel, "expected") plot(myModel, "imputed") plot(myModel, "meso") coef(myModel, "sampling") coef(myModel, "connectivity") predict(myModel)[1:5, 1:5]## SBM parameters N <- 100 # number of nodes Q <- 3 # number of clusters pi <- rep(1,Q)/Q # block proportion theta <- list(mean = diag(.45,Q) + .05 ) # connectivity matrix ## Sampling parameters samplingParameters <- .75 # the sampling rate sampling <- "dyad" # the sampling design ## generate a undirected binary SBM with no covariate sbm <- sbm::sampleSimpleSBM(N, pi, theta) ## Uncomment to set parallel computing with future ## future::plan("multicore", workers = 2) ## Sample some dyads data + Infer SBM with missing data collection <- observeNetwork(sbm$networkData, sampling, samplingParameters) %>% estimateMissSBM(vBlocks = 1:4, sampling = sampling) plot(collection, "monitoring") plot(collection, "icl") collection$ICL coef(collection$bestModel$fittedSBM, "connectivity") myModel <- collection$bestModel plot(myModel, "expected") plot(myModel, "imputed") plot(myModel, "meso") coef(myModel, "sampling") coef(myModel, "connectivity") predict(myModel)[1:5, 1:5]
missSBM_fit, return by estimateMissSBM()
Extract model fitted values from object missSBM_fit, return by estimateMissSBM()
## S3 method for class 'missSBM_fit' fitted(object, ...)## S3 method for class 'missSBM_fit' fitted(object, ...)
object |
an R6 object with class |
... |
additional parameters for S3 compatibility. |
A matrix of estimated probabilities of connection
French Political Blogosphere network dataset consists of a single day snapshot of over 200 political blogs automatically extracted the 14 October 2006 and manually classified by the "Observatoire Présidentielle" project. Originally part of the 'mixer' package
frenchblog2007frenchblog2007
An igraph object with 196 nodes. The vertex attribute "party" provides a possible clustering of the nodes.
https://www.meltwater.com/en/suite/consumer-intelligence?utm_source=direct&utm_medium=linkfluence
data(frenchblog2007) igraph::V(frenchblog2007)$party igraph::plot.igraph(frenchblog2007, vertex.color = factor(igraph::V(frenchblog2007)$party), vertex.label = NA )data(frenchblog2007) igraph::V(frenchblog2007)$party igraph::plot.igraph(frenchblog2007, vertex.color = factor(igraph::V(frenchblog2007)$party), vertex.label = NA )
Compute l1-similarity between two vectors
l1_similarity(x, y)l1_similarity(x, y)
x |
a vector |
y |
a vector |
a vector equal to -abs(x-y)
The function estimateMissSBM() fits a collection of SBM with missing data for
a varying number of block. These models with class missSBM_fit are stored in an instance
of an object with class missSBM_collection, described here.
Fields are accessed via active binding and cannot be changed by the user.
This class comes with a set of R6 methods, some of them being useful for the user and exported
as S3 methods. See the documentation for show() and print()
modelsa list of models
ICLthe vector of Integrated Classification Criterion (ICL) associated to the models in the collection (the smaller, the better)
bestModelthe best model according to the ICL
vBlocksa vector with the number of blocks
optimizationStatusa data.frame summarizing the optimization process for all models
new()
constructor for networkSampling
missSBM_collection$new(partlyObservedNet, sampling, clusterInit, control)
partlyObservedNetAn object with class partlyObservedNetwork.
samplingThe sampling design for the modelling of missing data: MAR designs ("dyad", "node") and MNAR designs ("double-standard", "block-dyad", "block-node" ,"degree")
clusterInitInitial clustering: a list of vectors, each with size ncol(adjacencyMatrix).
controla list of parameters controlling advanced features. Only 'trace' and 'useCov' are relevant here. See estimateMissSBM() for details.
estimate()
method to launch the estimation of the collection of models
missSBM_collection$estimate(control)
controla list of parameters controlling the variational EM algorithm. See details of function estimateMissSBM()
explore()
method for performing exploration of the ICL
missSBM_collection$explore(control)
controla list of parameters controlling the exploration, similar to those found in the regular function estimateMissSBM()
plot()
plot method for missSBM_collection
missSBM_collection$plot(type = c("icl", "elbo", "monitoring"))typethe type specifies the field to plot, either "icl", "elbo" or "monitoring". Default is "icl"
show()
show method for missSBM_collection
missSBM_collection$show()
print()
User friendly print method
missSBM_collection$print()
clone()
The objects of this class are cloneable with this method.
missSBM_collection$clone(deep = FALSE)
deepWhether to make a deep clone.
## Uncomment to set parallel computing with future ## future::plan("multicore", workers = 2) ## Sample 75% of dyads in French political Blogosphere's network data adjacencyMatrix <- missSBM::frenchblog2007 %>% igraph::delete.vertices(1:100) %>% igraph::as_adj () %>% missSBM::observeNetwork(sampling = "dyad", parameters = 0.75) collection <- estimateMissSBM(adjacencyMatrix, 1:5, sampling = "dyad") class(collection)## Uncomment to set parallel computing with future ## future::plan("multicore", workers = 2) ## Sample 75% of dyads in French political Blogosphere's network data adjacencyMatrix <- missSBM::frenchblog2007 %>% igraph::delete.vertices(1:100) %>% igraph::as_adj () %>% missSBM::observeNetwork(sampling = "dyad", parameters = 0.75) collection <- estimateMissSBM(adjacencyMatrix, 1:5, sampling = "dyad") class(collection)
The function estimateMissSBM() fits a collection of SBM for varying number of block.
Each fitted SBM is an instance of an R6 object with class missSBM_fit, described here.
Fields are accessed via active binding and cannot be changed by the user.
This class comes with a set of R6 methods, some of them being useful for the user and exported
as S3 methods. See the documentation for show(), print(), fitted(), predict(), plot().
fittedSBMthe fitted SBM with class SimpleSBM_fit_noCov, SimpleSBM_fit_withCov or
SimpleSBM_fit_MNAR inheriting from class sbm::SimpleSBM_fit
fittedSamplingthe fitted sampling, inheriting from class networkSampling and corresponding fits
imputedNetworkThe network data as a matrix with NAs values imputed with the current model
monitoringa list carrying information about the optimization process
entropyImputedthe entropy of the distribution of the imputed dyads
entropythe entropy due to the distribution of the imputed dyads and of the clustering
vExpecdouble: variational expectation of the complete log-likelihood
penaltydouble, value of the penalty term in ICL
loglikdouble: approximation of the log-likelihood (variational lower bound) reached
ICLdouble: value of the integrated classification log-likelihood
new()
constructor for networkSampling
missSBM_fit$new(partlyObservedNet, netSampling, clusterInit, useCov = TRUE)
partlyObservedNetAn object with class partlyObservedNetwork.
netSamplingThe sampling design for the modelling of missing data: MAR designs ("dyad", "node") and MNAR designs ("double-standard", "block-dyad", "block-node" ,"degree")
clusterInitInitial clustering: a vector with size ncol(adjacencyMatrix), providing a user-defined clustering. The number of blocks is deduced from the number of levels in with clusterInit.
useCovlogical. If covariates are present in partlyObservedNet, should they be used for the inference or of the network sampling design, or just for the SBM inference? default is TRUE.
doVEM()
a method to perform inference of the current missSBM fit with variational EM
missSBM_fit$doVEM( control = list(threshold = 0.01, maxIter = 100, fixPointIter = 3, trace = TRUE) )
controla list of parameters controlling the variational EM algorithm. See details of function estimateMissSBM()
show()
show method for missSBM_fit
missSBM_fit$show()
print()
User friendly print method
missSBM_fit$print()
clone()
The objects of this class are cloneable with this method.
missSBM_fit$clone(deep = FALSE)
deepWhether to make a deep clone.
## Sample 75% of dyads in French political Blogosphere's network data adjMatrix <- missSBM::frenchblog2007 %>% igraph::as_adj (sparse = FALSE) %>% missSBM::observeNetwork(sampling = "dyad", parameters = 0.75) collection <- estimateMissSBM(adjMatrix, 3:5, sampling = "dyad") my_missSBM_fit <- collection$bestModel class(my_missSBM_fit) plot(my_missSBM_fit, "imputed")## Sample 75% of dyads in French political Blogosphere's network data adjMatrix <- missSBM::frenchblog2007 %>% igraph::as_adj (sparse = FALSE) %>% missSBM::observeNetwork(sampling = "dyad", parameters = 0.75) collection <- estimateMissSBM(adjMatrix, 3:5, sampling = "dyad") my_missSBM_fit <- collection$bestModel class(my_missSBM_fit) plot(my_missSBM_fit, "imputed")
Definition of R6 Class 'networkSampling_sampler'
Definition of R6 Class 'networkSampling_sampler'
This class is use to define a sampling model for a network. Inherits from 'networkSampling'. Owns a rSampling method which takes an adjacency matrix as an input and send back an object with class partlyObservedNetwork.
missSBM::networkSampling -> networkSampler
samplingMatrixa matrix of logical indicating observed entries
new()
constructor for networkSampling
networkSampler$new(type = NA, parameters = NA, nbNodes = NA, directed = FALSE)
typecharacter for the type of sampling. must be in ("dyad", "covar-dyad", "node", "covar-node", "block-node", "block-dyad", "double-standard", "degree")
parametersthe vector of parameters associated to the sampling at play
nbNodesnumber of nodes in the network
directedlogical, directed network of not
rSamplingMatrix()
a method for drawing a sampling matrix according to the current sampling design
networkSampler$rSamplingMatrix()
clone()
The objects of this class are cloneable with this method.
networkSampler$clone(deep = FALSE)
deepWhether to make a deep clone.
Definition of R6 Class 'networkSampling'
Definition of R6 Class 'networkSampling'
this virtual class is the mother of all subtypes of networkSampling (either sampler or fit) It is used to define a sampling model for a network. It has a rSampling method which takes an adjacency matrix as an input and send back an object with class partlyObservedNetwork.
typea character for the type of sampling
parametersthe vector of parameters associated with the sampling at play
dfthe number of entries in the vector of parameters
new()
constructor for networkSampling
networkSampling$new(type = NA, parameters = NA)
typecharacter for the type of sampling. must be in ("dyad", "covar-dyad", "node", "covar-node", "block-node", "block-dyad", "double-standard", "degree")
parametersthe vector of parameters associated to the sampling at play
show()
show method
networkSampling$show( type = paste0(private$name, "-model for network sampling\n") )
typecharacter used to specify the type of sampling
print()
User friendly print method
networkSampling$print()
clone()
The objects of this class are cloneable with this method.
networkSampling$clone(deep = FALSE)
deepWhether to make a deep clone.
Virtual class used to define a family of networkSamplingDyads_fit
Virtual class used to define a family of networkSamplingDyads_fit
missSBM::networkSampling -> networkSamplingDyads_fit
penaltydouble, value of the penalty term in ICL
log_lambdadouble, term for adjusting the imputation step which depends on the type of sampling
new()
constructor for networkSampling_fit
networkSamplingDyads_fit$new(partlyObservedNetwork, name)
partlyObservedNetworka object with class partlyObservedNetwork representing the observed data with possibly missing entries
namea character for the name of sampling to fit on the partlyObservedNetwork
show()
show method
networkSamplingDyads_fit$show()
update_parameters()
a method to update the estimation of the parameters. By default, nothing to do (corresponds to MAR sampling)
networkSamplingDyads_fit$update_parameters(...)
...use for compatibility
update_imputation()
a method to update the imputation of the missing entries.
networkSamplingDyads_fit$update_imputation(nu)
nuthe matrix of (uncorrected) imputation for missing entries
clone()
The objects of this class are cloneable with this method.
networkSamplingDyads_fit$clone(deep = FALSE)
deepWhether to make a deep clone.
Virtual class used to define a family of networkSamplingNodes_fit
Virtual class used to define a family of networkSamplingNodes_fit
missSBM::networkSampling -> networkSamplingNodes_fit
penaltydouble, value of the penalty term in ICL
log_lambdadouble, term for adjusting the imputation step which depends on the type of sampling
new()
constructor
networkSamplingNodes_fit$new(partlyObservedNetwork, name)
partlyObservedNetworka object with class partlyObservedNetwork representing the observed data with possibly missing entries
namea character for the name of sampling to fit on the partlyObservedNetwork
show()
show method
networkSamplingNodes_fit$show()
update_parameters()
a method to update the estimation of the parameters. By default, nothing to do (corresponds to MAR sampling)
networkSamplingNodes_fit$update_parameters(...)
...use for compatibility
update_imputation()
a method to update the imputation of the missing entries.
networkSamplingNodes_fit$update_imputation(nu)
nuthe matrix of (uncorrected) imputation for missing entries
clone()
The objects of this class are cloneable with this method.
networkSamplingNodes_fit$clone(deep = FALSE)
deepWhether to make a deep clone.
Virtual class for all node-centered samplers
Virtual class for all node-centered samplers
missSBM::networkSampling -> missSBM::networkSampler -> nodeSampler
clone()
The objects of this class are cloneable with this method.
nodeSampler$clone(deep = FALSE)
deepWhether to make a deep clone.
Class for fitting a node sampling
Class for fitting a node sampling
missSBM::networkSampling -> missSBM::networkSamplingNodes_fit -> nodeSampling_fit
vExpecvariational expectation of the sampling
new()
constructor
nodeSampling_fit$new(partlyObservedNetwork, ...)
partlyObservedNetworka object with class partlyObservedNetwork representing the observed data with possibly missing entries
...used for compatibility
clone()
The objects of this class are cloneable with this method.
nodeSampling_fit$clone(deep = FALSE)
deepWhether to make a deep clone.
This function draws observations in an adjacency matrix according to a given network sampling design.
observeNetwork( adjacencyMatrix, sampling, parameters, clusters = NULL, covariates = list(), similarity = l1_similarity, intercept = 0 )observeNetwork( adjacencyMatrix, sampling, parameters, clusters = NULL, covariates = list(), similarity = l1_similarity, intercept = 0 )
adjacencyMatrix |
The N x N adjacency matrix of the network to sample. |
sampling |
The sampling design used to observe the adjacency matrix, see details. |
parameters |
The sampling parameters (adapted to each sampling, see details). |
clusters |
An optional clustering membership vector of the nodes. Only necessary for block samplings. |
covariates |
An optional list with M entries (the M covariates). If the covariates are node-centered,
each entry of |
similarity |
An optional function to compute similarities between node covariates. Default is
|
intercept |
An optional intercept term to be added in case of the presence of covariates. Default is 0. |
Internal functions use future_lapply, so set your plan to 'multisession' or
'multicore' to use several cores/workers.
The list of parameters control tunes more advanced features, such as the
initialization, how covariates are handled in the model, and the variational EM algorithm:
useCov logical. If covariates is not null, should they be used for the
for the SBM inference (or just for the sampling)? Default is TRUE.
clusterInit Initial method for clustering: either a character ("spectral")
or a list with length(vBlocks) vectors, each with size ncol(adjacencyMatrix),
providing a user-defined clustering. Default is "spectral".
similarity An R x R -> R function to compute similarities between node covariates. Default is
l1_similarity, that is, -abs(x-y). Only relevant when the covariates are node-centered
(i.e. covariates is a list of size-N vectors).
threshold V-EM algorithm stops stop when an optimization step changes the objective function or the parameters by less than threshold. Default is 1e-2.
maxIter V-EM algorithm stops when the number of iteration exceeds maxIter. Default is 50.
fixPointIter number of fix-point iterations in the V-E step. Default is 3.
exploration character indicating the kind of exploration used among "forward", "backward", "both" or "none". Default is "both".
iterates integer for the number of iterations during exploration. Only relevant when exploration is different from "none". Default is 1.
trace logical for verbosity. Default is TRUE.
The different sampling designs are split into two families in which we find dyad-centered and node-centered samplings. See doi:10.1080/01621459.2018.1562934 for a complete description.
Missing at Random (MAR)
dyad parameter = p = Prob(Dyad(i,j) is observed)
node parameter = p = Prob(Node i is observed)
covar-dyad": parameter = beta in R^M, such that Prob(Dyad (i,j) is observed) = logistic(parameter' covarArray (i,j, .))
covar-node": parameter = nu in R^M such that Prob(Node i is observed) = logistic(parameter' covarMatrix (i,)
snowball": parameter = number of waves with Prob(Node i is observed in the 1st wave)
Missing Not At Random (MNAR)
double-standard parameter = (p0,p1) with p0 = Prob(Dyad (i,j) is observed | the dyad is equal to 0), p1 = Prob(Dyad (i,j) is observed | the dyad is equal to 1)
block-node parameter = c(p(1),...,p(Q)) and p(q) = Prob(Node i is observed | node i is in cluster q)
block-dyad parameter = c(p(1,1),...,p(Q,Q)) and p(q,l) = Prob(Edge (i,j) is observed | node i is in cluster q and node j is in cluster l)
an adjacency matrix with the same dimension as the input, yet with additional NAs.
## SBM parameters N <- 300 # number of nodes Q <- 3 # number of clusters pi <- rep(1,Q)/Q # block proportion theta <- list(mean = diag(.45,Q) + .05 ) # connectivity matrix ## simulate an unidrected binary SBM without covariate sbm <- sbm::sampleSimpleSBM(N, pi, theta) ## Sample network data # some sampling design and their associated parameters sampling_parameters <- list( "dyad" = .3, "node" = .3, "double-standard" = c(0.4, 0.8), "block-node" = c(.3, .8, .5), "block-dyad" = theta$mean, "degree" = c(.01, .01), "snowball" = c(2,.1) ) observed_networks <- list() for (sampling in names(sampling_parameters)) { observed_networks[[sampling]] <- missSBM::observeNetwork( adjacencyMatrix = sbm$networkData, sampling = sampling, parameters = sampling_parameters[[sampling]], cluster = sbm$memberships ) }## SBM parameters N <- 300 # number of nodes Q <- 3 # number of clusters pi <- rep(1,Q)/Q # block proportion theta <- list(mean = diag(.45,Q) + .05 ) # connectivity matrix ## simulate an unidrected binary SBM without covariate sbm <- sbm::sampleSimpleSBM(N, pi, theta) ## Sample network data # some sampling design and their associated parameters sampling_parameters <- list( "dyad" = .3, "node" = .3, "double-standard" = c(0.4, 0.8), "block-node" = c(.3, .8, .5), "block-dyad" = theta$mean, "degree" = c(.01, .01), "snowball" = c(2,.1) ) observed_networks <- list() for (sampling in names(sampling_parameters)) { observed_networks[[sampling]] <- missSBM::observeNetwork( adjacencyMatrix = sbm$networkData, sampling = sampling, parameters = sampling_parameters[[sampling]], cluster = sbm$memberships ) }
An R6 Class used for internal representation of a partially observed network
An R6 Class used for internal representation of a partially observed network
This class is not exported to the user
samplingRateThe percentage of observed dyads
nbNodesThe number of nodes
nbDyadsThe number of dyads
is_directedlogical indicating if the network is directed or not
networkDataThe adjacency matrix of the network
covarArraythe array of covariates
covarMatrixthe matrix of covariates
samplingMatrixmatrix of observed and non-observed edges
samplingMatrixBarmatrix of observed and non-observed edges
observedNodesa vector of observed and non-observed nodes (observed means at least one non NA value)
new()
constructor
partlyObservedNetwork$new( adjacencyMatrix, covariates = list(), similarity = l1_similarity )
adjacencyMatrixThe adjacency matrix of the network
covariatesA list with M entries (the M covariates), each of whom being either a size-N vector or N x N matrix.
similarityAn R x R -> R function to compute similarities between node covariates. Default is l1_similarity, that is, -abs(x-y).
clustering()
method to cluster network data with missing value
partlyObservedNetwork$clustering( vBlocks, imputation = ifelse(is.null(private$phi), "median", "average") )
vBlocksThe vector of number of blocks considered in the collection.
imputationcharacter indicating the type of imputation among "median", "average"
imputation()
basic imputation from existing clustering
partlyObservedNetwork$imputation(type = c("median", "average", "zero"))typea character, the type of imputation. Either "median" or "average"
clone()
The objects of this class are cloneable with this method.
partlyObservedNetwork$clone(deep = FALSE)
deepWhether to make a deep clone.
missSBM_fit
Plot function for the various fields of a missSBM_fit: the fitted
SBM (network or connectivity), and a plot monitoring the optimization.
## S3 method for class 'missSBM_fit' plot( x, type = c("imputed", "expected", "meso", "monitoring"), dimLabels = list(row = "node", col = "node"), ... )## S3 method for class 'missSBM_fit' plot( x, type = c("imputed", "expected", "meso", "monitoring"), dimLabels = list(row = "node", col = "node"), ... )
x |
an object with class |
type |
the type specifies the field to plot, either "imputed", "expected", "meso", or "monitoring" |
dimLabels |
: a list of two characters specifying the labels of the nodes. Default to |
... |
additional parameters for S3 compatibility. Not used |
a ggplot object
missSBM_fit (i.e. network with imputed missing dyads)Prediction of a missSBM_fit (i.e. network with imputed missing dyads)
## S3 method for class 'missSBM_fit' predict(object, ...)## S3 method for class 'missSBM_fit' predict(object, ...)
object |
an R6 object with class |
... |
additional parameters for S3 compatibility. |
an adjacency matrix between pairs of nodes. Missing dyads are imputed with their expected values, i.e. by there estimated probabilities of connection under the missing SBM.
Class for defining a simple dyad sampler
Class for defining a simple dyad sampler
missSBM::networkSampling -> missSBM::networkSampler -> missSBM::dyadSampler -> simpleDyadSampler
new()
constructor for networkSampling
simpleDyadSampler$new( parameters = NA, nbNodes = NA, directed = FALSE, covarArray = NULL, intercept = 0 )
parametersthe vector of parameters associated to the sampling at play
nbNodesnumber of nodes in the network
directedlogical, directed network of not
covarArrayan array of covariates used
interceptdouble, intercept term used to compute the probability of sampling in the presence of covariates. Default 0.
clone()
The objects of this class are cloneable with this method.
simpleDyadSampler$clone(deep = FALSE)
deepWhether to make a deep clone.
Class for defining a simple node sampler
Class for defining a simple node sampler
missSBM::networkSampling -> missSBM::networkSampler -> missSBM::nodeSampler -> simpleNodeSampler
new()
constructor for networkSampling
simpleNodeSampler$new( parameters = NA, nbNodes = NA, directed = FALSE, covarMatrix = NULL, intercept = 0 )
parametersthe vector of parameters associated to the sampling at play
nbNodesnumber of nodes in the network
directedlogical, directed network of not
covarMatrixa matrix of covariates used
interceptdouble, intercept term used to compute the probability of sampling in the presence of covariates. Default 0.
clone()
The objects of this class are cloneable with this method.
simpleNodeSampler$clone(deep = FALSE)
deepWhether to make a deep clone.
This internal class is designed to adjust a binary Stochastic Block Model in the context of missSBM.
This internal class is designed to adjust a binary Stochastic Block Model in the context of missSBM.
It is not designed not be call by the user
sbm::SBM -> sbm::SimpleSBM -> SimpleSBM_fit
typethe type of SBM (distribution of edges values, network type, presence of covariates)
penaltydouble, value of the penalty term in ICL
entropydouble, value of the entropy due to the clustering distribution
loglikdouble: approximation of the log-likelihood (variational lower bound) reached
ICLdouble: value of the integrated classification log-likelihood
new()
constructor for simpleSBM_fit for missSBM purpose
SimpleSBM_fit$new(networkData, clusterInit, covarList = list())
networkDataa structure to store network under missing data condition: either a matrix possibly with NA, or a missSBM:::partlyObservedNetwork
clusterInitInitial clustering: a vector with size ncol(adjacencyMatrix), providing a user-defined clustering with nbBlocks levels.
covarListAn optional list with M entries (the M covariates).
doVEM()
method to perform estimation via variational EM
SimpleSBM_fit$doVEM( threshold = 0.01, maxIter = 100, fixPointIter = 3, trace = FALSE )
thresholdstop when an optimization step changes the objective function by less than threshold. Default is 1e-4.
maxIterV-EM algorithm stops when the number of iteration exceeds maxIter. Default is 10
fixPointIternumber of fix-point iterations in the Variational E step. Default is 5.
tracelogical for verbosity. Default is FALSE.
reorder()
permute group labels by order of decreasing probability
SimpleSBM_fit$reorder()
clone()
The objects of this class are cloneable with this method.
SimpleSBM_fit$clone(deep = FALSE)
deepWhether to make a deep clone.
This internal class is designed to adjust a binary Stochastic Block Model in the context of missSBM.
This internal class is designed to adjust a binary Stochastic Block Model in the context of missSBM.
It is not designed not be call by the user
sbm::SBM -> sbm::SimpleSBM -> missSBM::SimpleSBM_fit -> missSBM::SimpleSBM_fit_noCov -> SimpleSBM_MNAR_noCov
imputationthe matrix of imputed values
vExpecdouble: variational approximation of the expectation complete log-likelihood
new()
constructor for simpleSBM_fit for missSBM purpose
SimpleSBM_fit_MNAR$new(networkData, clusterInit)
networkDataa structure to store network under missing data condition: either a matrix possibly with NA, or a missSBM:::partlyObservedNetwork
clusterInitInitial clustering: a vector with size ncol(adjacencyMatrix), providing a user-defined clustering with nbBlocks levels.
update_parameters()
update parameters estimation (M-step)
SimpleSBM_fit_MNAR$update_parameters(nu = NULL)
nucurrently imputed values
update_blocks()
update variational estimation of blocks (VE-step)
SimpleSBM_fit_MNAR$update_blocks(log_lambda = 0)
log_lambdaadditional term sampling dependent used to de-bias estimation of tau
clone()
The objects of this class are cloneable with this method.
SimpleSBM_fit_MNAR$clone(deep = FALSE)
deepWhether to make a deep clone.
This internal class is designed to adjust a binary Stochastic Block Model in the context of missSBM.
This internal class is designed to adjust a binary Stochastic Block Model in the context of missSBM.
It is not designed not be call by the user
sbm::SBM -> sbm::SimpleSBM -> missSBM::SimpleSBM_fit -> SimpleSBM_fit_noCov
imputationthe matrix of imputed values
vExpecdouble: variational approximation of the expectation complete log-likelihood
vExpec_correcteddouble: variational approximation of the expectation complete log-likelihood with correction to be comparable with MNAR criteria
update_parameters()
update parameters estimation (M-step)
SimpleSBM_fit_noCov$update_parameters(...)
...additional arguments, only required for MNAR cases
update_blocks()
update variational estimation of blocks (VE-step)
SimpleSBM_fit_noCov$update_blocks(...)
...additional arguments, only required for MNAR cases
clone()
The objects of this class are cloneable with this method.
SimpleSBM_fit_noCov$clone(deep = FALSE)
deepWhether to make a deep clone.
This internal class is designed to adjust a binary Stochastic Block Model in the context of missSBM.
This internal class is designed to adjust a binary Stochastic Block Model in the context of missSBM.
It is not designed not be call by the user
sbm::SBM -> sbm::SimpleSBM -> missSBM::SimpleSBM_fit -> SimpleSBM_fit_withCov
imputationthe matrix of imputed values
vExpecdouble: variational approximation of the expectation complete log-likelihood
vExpec_correcteddouble: variational approximation of the expectation complete log-likelihood with correction to be comparable with MNAR criteria
update_parameters()
update parameters estimation (M-step)
SimpleSBM_fit_withCov$update_parameters(...)
...use for compatibility
controla list to tune nlopt for optimization, see documentation of nloptr
update_blocks()
update variational estimation of blocks (VE-step)
SimpleSBM_fit_withCov$update_blocks(...)
...use for compatibility
clone()
The objects of this class are cloneable with this method.
SimpleSBM_fit_withCov$clone(deep = FALSE)
deepWhether to make a deep clone.
Class for defining a snowball sampler
Class for defining a snowball sampler
missSBM::networkSampling -> missSBM::networkSampler -> missSBM::nodeSampler -> snowballSampler
new()
constructor for networkSampling
snowballSampler$new(parameters = NA, adjacencyMatrix = NA, directed = FALSE)
parametersthe vector of parameters associated to the sampling at play
adjacencyMatrixthe adjacency matrix of the network
directedlogical, directed network of not
clone()
The objects of this class are cloneable with this method.
snowballSampler$clone(deep = FALSE)
deepWhether to make a deep clone.
missSBM_fit
Summary method for a missSBM_fit
## S3 method for class 'missSBM_fit' summary(object, ...)## S3 method for class 'missSBM_fit' summary(object, ...)
object |
an R6 object with class |
... |
additional parameters for S3 compatibility. |
a basic printing output
This dataset contains two networks where the nodes are countries and an
edge in network "belligerent" means that the two countries have been at
least once at war between years 1816 to 2007 while an edge in network "alliance"
means that the two countries have had a formal alliance between years 1816 to 2012.
The network belligerent have less nodes since countries which have not been at
war are not considered.
warwar
A list with 2 two igraph objects, alliance and belligerent.
Each graph have three attributes: 'name' (the country name), 'power' (a score related to military power: the higher, the better) and
'trade' (a score related to the trade effort between pairs of countries).
networks were extracted from https://correlatesofwar.org/
Sarkees, Meredith Reid and Frank Wayman (2010). Resort to War: 1816 - 2007. Washington DC: CQ Press.
Gibler, Douglas M. 2009. International military alliances, 1648-2008. CQ Press
data(war) class(war$belligerent) igraph::gorder(war$alliance) igraph::gorder(war$belligerent) igraph::edges(war$alliance) igraph::get.graph.attribute(war$alliance)data(war) class(war$belligerent) igraph::gorder(war$alliance) igraph::gorder(war$belligerent) igraph::edges(war$alliance) igraph::get.graph.attribute(war$alliance)