Title: | Handling Missing Data in Stochastic Block Models |
---|---|
Description: | When a network is partially observed (here, NAs in the adjacency matrix rather than 1 or 0 due to missing information between node pairs), it is possible to account for the underlying process that generates those NAs. 'missSBM', presented in 'Barbillon, Chiquet and Tabouy' (2022) <doi:10.18637/jss.v101.i12>, adjusts the popular stochastic block model from network data sampled under various missing data conditions, as described in 'Tabouy, Barbillon and Chiquet' (2019) <doi:10.1080/01621459.2018.1562934>. |
Authors: | Julien Chiquet [aut, cre] , Pierre Barbillon [aut] , Timothée Tabouy [aut], Jean-Benoist Léger [ctb] (provided C++ implementaion of K-means), François Gindraud [ctb] (provided C++ interface to NLopt), großBM team [ctb] |
Maintainer: | Julien Chiquet <[email protected]> |
License: | GPL-3 |
Version: | 1.0.4 |
Built: | 2024-11-02 05:15:32 UTC |
Source: | https://github.com/grosssbm/misssbm |
Class for defining a block dyad sampler
Class for defining a block dyad sampler
missSBM::networkSampling
-> missSBM::networkSampler
-> missSBM::dyadSampler
-> blockDyadSampler
df
the number of parameters of this sampling
new()
constructor for networkSampling
blockDyadSampler$new( parameters = NA, nbNodes = NA, directed = FALSE, clusters = NA )
parameters
the vector of parameters associated to the sampling at play
nbNodes
number of nodes in the network
directed
logical, directed network of not
clusters
a vector of class memberships
clone()
The objects of this class are cloneable with this method.
blockDyadSampler$clone(deep = FALSE)
deep
Whether to make a deep clone.
Class for fitting a block-dyad sampling
Class for fitting a block-dyad sampling
missSBM::networkSampling
-> missSBM::networkSamplingDyads_fit
-> blockDyadSampling_fit
vExpec
variational expectation of the sampling
log_lambda
matrix, term for adjusting the imputation step which depends on the type of sampling
new()
constructor
blockDyadSampling_fit$new(partlyObservedNetwork, blockInit)
partlyObservedNetwork
a object with class partlyObservedNetwork representing the observed data with possibly missing entries
blockInit
n x Q matrix of initial block indicators
update_parameters()
a method to update the estimation of the parameters. By default, nothing to do (corresponds to MAR sampling)
blockDyadSampling_fit$update_parameters(nu, Z)
nu
the matrix of (uncorrected) imputation for missing entries
Z
probabilities of block memberships
clone()
The objects of this class are cloneable with this method.
blockDyadSampling_fit$clone(deep = FALSE)
deep
Whether to make a deep clone.
Class for defining a block node sampler
Class for defining a block node sampler
missSBM::networkSampling
-> missSBM::networkSampler
-> missSBM::nodeSampler
-> blockNodeSampler
new()
constructor for networkSampling
blockNodeSampler$new( parameters = NA, nbNodes = NA, directed = FALSE, clusters = NA )
parameters
the vector of parameters associated to the sampling at play
nbNodes
number of nodes in the network
directed
logical, directed network of not
clusters
a vector of class memberships
clone()
The objects of this class are cloneable with this method.
blockNodeSampler$clone(deep = FALSE)
deep
Whether to make a deep clone.
Class for fitting a block-node sampling
Class for fitting a block-node sampling
missSBM::networkSampling
-> missSBM::networkSamplingNodes_fit
-> blockNodeSampling_fit
vExpec
variational expectation of the sampling
log_lambda
double, term for adjusting the imputation step which depends on the type of sampling
new()
constructor
blockNodeSampling_fit$new(partlyObservedNetwork, blockInit)
partlyObservedNetwork
a object with class partlyObservedNetwork representing the observed data with possibly missing entries
blockInit
n x Q matrix of initial block indicators
update_parameters()
a method to update the estimation of the parameters. By default, nothing to do (corresponds to MAR sampling)
blockNodeSampling_fit$update_parameters(imputedNet, Z)
imputedNet
an adjacency matrix where missing values have been imputed
Z
indicator of blocks
clone()
The objects of this class are cloneable with this method.
blockNodeSampling_fit$clone(deep = FALSE)
deep
Whether to make a deep clone.
Extracts model coefficients from objects missSBM_fit
returned by estimateMissSBM()
## S3 method for class 'missSBM_fit' coef( object, type = c("mixture", "connectivity", "covariates", "sampling"), ... )
## S3 method for class 'missSBM_fit' coef( object, type = c("mixture", "connectivity", "covariates", "sampling"), ... )
object |
an R6 object with class |
type |
type of parameter that should be extracted. Either "mixture" (default), "connectivity", "covariates" or "sampling" |
... |
additional parameters for S3 compatibility. Not used |
A vector or matrix of coefficients extracted from the missSBM_fit model.
Class for fitting a dyad sampling with covariates
Class for fitting a dyad sampling with covariates
missSBM::networkSampling
-> missSBM::networkSamplingDyads_fit
-> covarDyadSampling_fit
vExpec
variational expectation of the sampling
new()
constructor
covarDyadSampling_fit$new(partialNet, ...)
partialNet
a object with class partlyObservedNetwork representing the observed data with possibly missing entries
...
used for compatibility
clone()
The objects of this class are cloneable with this method.
covarDyadSampling_fit$clone(deep = FALSE)
deep
Whether to make a deep clone.
Class for fitting a node-centered sampling with covariate
Class for fitting a node-centered sampling with covariate
missSBM::networkSampling
-> missSBM::networkSamplingNodes_fit
-> covarNodeSampling_fit
vExpec
variational expectation of the sampling
new()
constructor
covarNodeSampling_fit$new(partlyObservedNetwork, ...)
partlyObservedNetwork
a object with class partlyObservedNetwork representing the observed data with possibly missing entries
...
used for compatibility
clone()
The objects of this class are cloneable with this method.
covarNodeSampling_fit$clone(deep = FALSE)
deep
Whether to make a deep clone.
Class for defining a degree sampler
Class for defining a degree sampler
missSBM::networkSampling
-> missSBM::networkSampler
-> missSBM::nodeSampler
-> degreeSampler
new()
constructor for networkSampling
degreeSampler$new(parameters = NA, degrees = NA, directed = FALSE)
parameters
the vector of parameters associated to the sampling at play
degrees
vector of nodes' degrees
directed
logical, directed network of not
clone()
The objects of this class are cloneable with this method.
degreeSampler$clone(deep = FALSE)
deep
Whether to make a deep clone.
Class for fitting a degree sampling
Class for fitting a degree sampling
missSBM::networkSampling
-> missSBM::networkSamplingNodes_fit
-> degreeSampling_fit
vExpec
variational expectation of the sampling
new()
constructor
degreeSampling_fit$new(partlyObservedNetwork, blockInit, connectInit)
partlyObservedNetwork
a object with class partlyObservedNetwork representing the observed data with possibly missing entries
blockInit
n x Q matrix of initial block indicators
connectInit
Q x Q matrix of initial block probabilities of connection
update_parameters()
a method to update the estimation of the parameters. By default, nothing to do (corresponds to MAR sampling)
degreeSampling_fit$update_parameters(imputedNet, ...)
imputedNet
an adjacency matrix where missing values have been imputed
...
used for compatibility
update_imputation()
a method to update the imputation of the missing entries.
degreeSampling_fit$update_imputation(PI, ...)
PI
the matrix of inter/intra class probability of connection
...
use for compatibility
clone()
The objects of this class are cloneable with this method.
degreeSampling_fit$clone(deep = FALSE)
deep
Whether to make a deep clone.
Class for defining a double-standard sampler
Class for defining a double-standard sampler
missSBM::networkSampling
-> missSBM::networkSampler
-> missSBM::dyadSampler
-> doubleStandardSampler
new()
constructor for networkSampling
doubleStandardSampler$new(parameters = NA, adjMatrix = NA, directed = FALSE)
parameters
the vector of parameters associated to the sampling at play
adjMatrix
matrix of adjacency
directed
logical, directed network of not
clone()
The objects of this class are cloneable with this method.
doubleStandardSampler$clone(deep = FALSE)
deep
Whether to make a deep clone.
Class for fitting a double-standard sampling
Class for fitting a double-standard sampling
missSBM::networkSampling
-> missSBM::networkSamplingDyads_fit
-> doubleStandardSampling_fit
vExpec
variational expectation of the sampling
new()
constructor
doubleStandardSampling_fit$new(partlyObservedNetwork, ...)
partlyObservedNetwork
a object with class partlyObservedNetwork representing the observed data with possibly missing entries
...
used for compatibility
update_parameters()
a method to update the estimation of the parameters. By default, nothing to do (corresponds to MAR sampling)
doubleStandardSampling_fit$update_parameters(nu, ...)
nu
an adjacency matrix with imputed values (only)
...
use for compatibility
update_imputation()
a method to update the imputation of the missing entries.
doubleStandardSampling_fit$update_imputation(nu)
nu
the matrix of (uncorrected) imputation for missing entries
clone()
The objects of this class are cloneable with this method.
doubleStandardSampling_fit$clone(deep = FALSE)
deep
Whether to make a deep clone.
Virtual class for all dyad-centered samplers
Virtual class for all dyad-centered samplers
missSBM::networkSampling
-> missSBM::networkSampler
-> dyadSampler
new()
constructor for networkSampling
dyadSampler$new(type = NA, parameters = NA, nbNodes = NA, directed = FALSE)
type
character for the type of sampling. must be in ("dyad", "covar-dyad", "node", "covar-node", "block-node", "block-dyad", "double-standard", "degree")
parameters
the vector of parameters associated to the sampling at play
nbNodes
number of nodes in the network
directed
logical, directed network of not
clone()
The objects of this class are cloneable with this method.
dyadSampler$clone(deep = FALSE)
deep
Whether to make a deep clone.
Class for fitting a dyad sampling
Class for fitting a dyad sampling
missSBM::networkSampling
-> missSBM::networkSamplingDyads_fit
-> dyadSampling_fit
vExpec
variational expectation of the sampling
new()
constructor
dyadSampling_fit$new(partlyObservedNetwork, ...)
partlyObservedNetwork
a object with class partlyObservedNetwork representing the observed data with possibly missing entries
...
used for compatibility
clone()
The objects of this class are cloneable with this method.
dyadSampling_fit$clone(deep = FALSE)
deep
Whether to make a deep clone.
A dataset containing the weighted PPI network centered around the ESR1 (ER) protein
er_network
er_network
A sparse symmetric matrix with 741 rows and 741 columns ESR1
data("er_network") class(er_network)
data("er_network") class(er_network)
Variational EM inference of Stochastic Block Models indexed by block number from a partially observed network.
estimateMissSBM( adjacencyMatrix, vBlocks, sampling, covariates = list(), control = list() )
estimateMissSBM( adjacencyMatrix, vBlocks, sampling, covariates = list(), control = list() )
adjacencyMatrix |
The N x N adjacency matrix of the network data. If |
vBlocks |
The vector of number of blocks considered in the collection. |
sampling |
The model used to described the process that originates the missing data: MAR designs ("dyad", "node","covar-dyad","covar-node","snowball") and MNAR designs ("double-standard", "block-dyad", "block-node" , "degree") are available. See details. |
covariates |
An optional list with M entries (the M covariates). If the covariates are node-centered, each entry of |
control |
a list of parameters controlling advanced features. See details. |
Internal functions use future_lapply
, so set your plan to 'multisession'
or
'multicore'
to use several cores/workers.
The list of parameters control
tunes more advanced features, such as the
initialization, how covariates are handled in the model, and the variational EM algorithm:
useCov logical. If covariates
is not null, should they be used for the
for the SBM inference (or just for the sampling)? Default is TRUE.
clusterInit Initial method for clustering: either a character ("spectral")
or a list with length(vBlocks)
vectors, each with size ncol(adjacencyMatrix)
,
providing a user-defined clustering. Default is "spectral".
similarity An R x R -> R function to compute similarities between node covariates. Default is
l1_similarity
, that is, -abs(x-y). Only relevant when the covariates are node-centered
(i.e. covariates
is a list of size-N vectors).
threshold V-EM algorithm stops stop when an optimization step changes the objective function or the parameters by less than threshold. Default is 1e-2.
maxIter V-EM algorithm stops when the number of iteration exceeds maxIter. Default is 50.
fixPointIter number of fix-point iterations in the V-E step. Default is 3.
exploration character indicating the kind of exploration used among "forward", "backward", "both" or "none". Default is "both".
iterates integer for the number of iterations during exploration. Only relevant when exploration
is different from "none". Default is 1.
trace logical for verbosity. Default is TRUE.
The different sampling designs are split into two families in which we find dyad-centered and node-centered samplings. See doi:10.1080/01621459.2018.1562934 for a complete description.
Missing at Random (MAR)
dyad parameter = p = Prob(Dyad(i,j) is observed)
node parameter = p = Prob(Node i is observed)
covar-dyad": parameter = beta in R^M, such that Prob(Dyad (i,j) is observed) = logistic(parameter' covarArray (i,j, .))
covar-node": parameter = nu in R^M such that Prob(Node i is observed) = logistic(parameter' covarMatrix (i,)
snowball": parameter = number of waves with Prob(Node i is observed in the 1st wave)
Missing Not At Random (MNAR)
double-standard parameter = (p0,p1) with p0 = Prob(Dyad (i,j) is observed | the dyad is equal to 0), p1 = Prob(Dyad (i,j) is observed | the dyad is equal to 1)
block-node parameter = c(p(1),...,p(Q)) and p(q) = Prob(Node i is observed | node i is in cluster q)
block-dyad parameter = c(p(1,1),...,p(Q,Q)) and p(q,l) = Prob(Edge (i,j) is observed | node i is in cluster q and node j is in cluster l)
Returns an R6 object with class missSBM_collection
.
observeNetwork
, missSBM_collection
and missSBM_fit
.
## SBM parameters N <- 100 # number of nodes Q <- 3 # number of clusters pi <- rep(1,Q)/Q # block proportion theta <- list(mean = diag(.45,Q) + .05 ) # connectivity matrix ## Sampling parameters samplingParameters <- .75 # the sampling rate sampling <- "dyad" # the sampling design ## generate a undirected binary SBM with no covariate sbm <- sbm::sampleSimpleSBM(N, pi, theta) ## Uncomment to set parallel computing with future ## future::plan("multicore", workers = 2) ## Sample some dyads data + Infer SBM with missing data collection <- observeNetwork(sbm$networkData, sampling, samplingParameters) %>% estimateMissSBM(vBlocks = 1:4, sampling = sampling) plot(collection, "monitoring") plot(collection, "icl") collection$ICL coef(collection$bestModel$fittedSBM, "connectivity") myModel <- collection$bestModel plot(myModel, "expected") plot(myModel, "imputed") plot(myModel, "meso") coef(myModel, "sampling") coef(myModel, "connectivity") predict(myModel)[1:5, 1:5]
## SBM parameters N <- 100 # number of nodes Q <- 3 # number of clusters pi <- rep(1,Q)/Q # block proportion theta <- list(mean = diag(.45,Q) + .05 ) # connectivity matrix ## Sampling parameters samplingParameters <- .75 # the sampling rate sampling <- "dyad" # the sampling design ## generate a undirected binary SBM with no covariate sbm <- sbm::sampleSimpleSBM(N, pi, theta) ## Uncomment to set parallel computing with future ## future::plan("multicore", workers = 2) ## Sample some dyads data + Infer SBM with missing data collection <- observeNetwork(sbm$networkData, sampling, samplingParameters) %>% estimateMissSBM(vBlocks = 1:4, sampling = sampling) plot(collection, "monitoring") plot(collection, "icl") collection$ICL coef(collection$bestModel$fittedSBM, "connectivity") myModel <- collection$bestModel plot(myModel, "expected") plot(myModel, "imputed") plot(myModel, "meso") coef(myModel, "sampling") coef(myModel, "connectivity") predict(myModel)[1:5, 1:5]
missSBM_fit
, return by estimateMissSBM()
Extract model fitted values from object missSBM_fit
, return by estimateMissSBM()
## S3 method for class 'missSBM_fit' fitted(object, ...)
## S3 method for class 'missSBM_fit' fitted(object, ...)
object |
an R6 object with class |
... |
additional parameters for S3 compatibility. |
A matrix of estimated probabilities of connection
French Political Blogosphere network dataset consists of a single day snapshot of over 200 political blogs automatically extracted the 14 October 2006 and manually classified by the "Observatoire Présidentielle" project. Originally part of the 'mixer' package
frenchblog2007
frenchblog2007
An igraph object with 196 nodes. The vertex attribute "party" provides a possible clustering of the nodes.
https://www.meltwater.com/en/suite/consumer-intelligence?utm_source=direct&utm_medium=linkfluence
data(frenchblog2007) igraph::V(frenchblog2007)$party igraph::plot.igraph(frenchblog2007, vertex.color = factor(igraph::V(frenchblog2007)$party), vertex.label = NA )
data(frenchblog2007) igraph::V(frenchblog2007)$party igraph::plot.igraph(frenchblog2007, vertex.color = factor(igraph::V(frenchblog2007)$party), vertex.label = NA )
Compute l1-similarity between two vectors
l1_similarity(x, y)
l1_similarity(x, y)
x |
a vector |
y |
a vector |
a vector equal to -abs(x-y)
The function estimateMissSBM()
fits a collection of SBM with missing data for
a varying number of block. These models with class missSBM_fit
are stored in an instance
of an object with class missSBM_collection
, described here.
Fields are accessed via active binding and cannot be changed by the user.
This class comes with a set of R6 methods, some of them being useful for the user and exported
as S3 methods. See the documentation for show()
and print()
models
a list of models
ICL
the vector of Integrated Classification Criterion (ICL) associated to the models in the collection (the smaller, the better)
bestModel
the best model according to the ICL
vBlocks
a vector with the number of blocks
optimizationStatus
a data.frame summarizing the optimization process for all models
new()
constructor for networkSampling
missSBM_collection$new(partlyObservedNet, sampling, clusterInit, control)
partlyObservedNet
An object with class partlyObservedNetwork
.
sampling
The sampling design for the modelling of missing data: MAR designs ("dyad", "node") and MNAR designs ("double-standard", "block-dyad", "block-node" ,"degree")
clusterInit
Initial clustering: a list of vectors, each with size ncol(adjacencyMatrix)
.
control
a list of parameters controlling advanced features. Only 'trace' and 'useCov' are relevant here. See estimateMissSBM()
for details.
estimate()
method to launch the estimation of the collection of models
missSBM_collection$estimate(control)
control
a list of parameters controlling the variational EM algorithm. See details of function estimateMissSBM()
explore()
method for performing exploration of the ICL
missSBM_collection$explore(control)
control
a list of parameters controlling the exploration, similar to those found in the regular function estimateMissSBM()
plot()
plot method for missSBM_collection
missSBM_collection$plot(type = c("icl", "elbo", "monitoring"))
type
the type specifies the field to plot, either "icl", "elbo" or "monitoring". Default is "icl"
show()
show method for missSBM_collection
missSBM_collection$show()
print()
User friendly print method
missSBM_collection$print()
clone()
The objects of this class are cloneable with this method.
missSBM_collection$clone(deep = FALSE)
deep
Whether to make a deep clone.
## Uncomment to set parallel computing with future ## future::plan("multicore", workers = 2) ## Sample 75% of dyads in French political Blogosphere's network data adjacencyMatrix <- missSBM::frenchblog2007 %>% igraph::delete.vertices(1:100) %>% igraph::as_adj () %>% missSBM::observeNetwork(sampling = "dyad", parameters = 0.75) collection <- estimateMissSBM(adjacencyMatrix, 1:5, sampling = "dyad") class(collection)
## Uncomment to set parallel computing with future ## future::plan("multicore", workers = 2) ## Sample 75% of dyads in French political Blogosphere's network data adjacencyMatrix <- missSBM::frenchblog2007 %>% igraph::delete.vertices(1:100) %>% igraph::as_adj () %>% missSBM::observeNetwork(sampling = "dyad", parameters = 0.75) collection <- estimateMissSBM(adjacencyMatrix, 1:5, sampling = "dyad") class(collection)
The function estimateMissSBM()
fits a collection of SBM for varying number of block.
Each fitted SBM is an instance of an R6 object with class missSBM_fit
, described here.
Fields are accessed via active binding and cannot be changed by the user.
This class comes with a set of R6 methods, some of them being useful for the user and exported
as S3 methods. See the documentation for show()
, print()
, fitted()
, predict()
, plot()
.
fittedSBM
the fitted SBM with class SimpleSBM_fit_noCov
, SimpleSBM_fit_withCov
or
SimpleSBM_fit_MNAR
inheriting from class sbm::SimpleSBM_fit
fittedSampling
the fitted sampling, inheriting from class networkSampling
and corresponding fits
imputedNetwork
The network data as a matrix with NAs values imputed with the current model
monitoring
a list carrying information about the optimization process
entropyImputed
the entropy of the distribution of the imputed dyads
entropy
the entropy due to the distribution of the imputed dyads and of the clustering
vExpec
double: variational expectation of the complete log-likelihood
penalty
double, value of the penalty term in ICL
loglik
double: approximation of the log-likelihood (variational lower bound) reached
ICL
double: value of the integrated classification log-likelihood
new()
constructor for networkSampling
missSBM_fit$new(partlyObservedNet, netSampling, clusterInit, useCov = TRUE)
partlyObservedNet
An object with class partlyObservedNetwork
.
netSampling
The sampling design for the modelling of missing data: MAR designs ("dyad", "node") and MNAR designs ("double-standard", "block-dyad", "block-node" ,"degree")
clusterInit
Initial clustering: a vector with size ncol(adjacencyMatrix)
, providing a user-defined clustering. The number of blocks is deduced from the number of levels in with clusterInit
.
useCov
logical. If covariates are present in partlyObservedNet, should they be used for the inference or of the network sampling design, or just for the SBM inference? default is TRUE.
doVEM()
a method to perform inference of the current missSBM fit with variational EM
missSBM_fit$doVEM( control = list(threshold = 0.01, maxIter = 100, fixPointIter = 3, trace = TRUE) )
control
a list of parameters controlling the variational EM algorithm. See details of function estimateMissSBM()
show()
show method for missSBM_fit
missSBM_fit$show()
print()
User friendly print method
missSBM_fit$print()
clone()
The objects of this class are cloneable with this method.
missSBM_fit$clone(deep = FALSE)
deep
Whether to make a deep clone.
## Sample 75% of dyads in French political Blogosphere's network data adjMatrix <- missSBM::frenchblog2007 %>% igraph::as_adj (sparse = FALSE) %>% missSBM::observeNetwork(sampling = "dyad", parameters = 0.75) collection <- estimateMissSBM(adjMatrix, 3:5, sampling = "dyad") my_missSBM_fit <- collection$bestModel class(my_missSBM_fit) plot(my_missSBM_fit, "imputed")
## Sample 75% of dyads in French political Blogosphere's network data adjMatrix <- missSBM::frenchblog2007 %>% igraph::as_adj (sparse = FALSE) %>% missSBM::observeNetwork(sampling = "dyad", parameters = 0.75) collection <- estimateMissSBM(adjMatrix, 3:5, sampling = "dyad") my_missSBM_fit <- collection$bestModel class(my_missSBM_fit) plot(my_missSBM_fit, "imputed")
Definition of R6 Class 'networkSampling_sampler'
Definition of R6 Class 'networkSampling_sampler'
This class is use to define a sampling model for a network. Inherits from 'networkSampling'. Owns a rSampling method which takes an adjacency matrix as an input and send back an object with class partlyObservedNetwork.
missSBM::networkSampling
-> networkSampler
samplingMatrix
a matrix of logical indicating observed entries
new()
constructor for networkSampling
networkSampler$new(type = NA, parameters = NA, nbNodes = NA, directed = FALSE)
type
character for the type of sampling. must be in ("dyad", "covar-dyad", "node", "covar-node", "block-node", "block-dyad", "double-standard", "degree")
parameters
the vector of parameters associated to the sampling at play
nbNodes
number of nodes in the network
directed
logical, directed network of not
rSamplingMatrix()
a method for drawing a sampling matrix according to the current sampling design
networkSampler$rSamplingMatrix()
clone()
The objects of this class are cloneable with this method.
networkSampler$clone(deep = FALSE)
deep
Whether to make a deep clone.
Definition of R6 Class 'networkSampling'
Definition of R6 Class 'networkSampling'
this virtual class is the mother of all subtypes of networkSampling (either sampler or fit) It is used to define a sampling model for a network. It has a rSampling method which takes an adjacency matrix as an input and send back an object with class partlyObservedNetwork.
type
a character for the type of sampling
parameters
the vector of parameters associated with the sampling at play
df
the number of entries in the vector of parameters
new()
constructor for networkSampling
networkSampling$new(type = NA, parameters = NA)
type
character for the type of sampling. must be in ("dyad", "covar-dyad", "node", "covar-node", "block-node", "block-dyad", "double-standard", "degree")
parameters
the vector of parameters associated to the sampling at play
show()
show method
networkSampling$show( type = paste0(private$name, "-model for network sampling\n") )
type
character used to specify the type of sampling
print()
User friendly print method
networkSampling$print()
clone()
The objects of this class are cloneable with this method.
networkSampling$clone(deep = FALSE)
deep
Whether to make a deep clone.
Virtual class used to define a family of networkSamplingDyads_fit
Virtual class used to define a family of networkSamplingDyads_fit
missSBM::networkSampling
-> networkSamplingDyads_fit
penalty
double, value of the penalty term in ICL
log_lambda
double, term for adjusting the imputation step which depends on the type of sampling
new()
constructor for networkSampling_fit
networkSamplingDyads_fit$new(partlyObservedNetwork, name)
partlyObservedNetwork
a object with class partlyObservedNetwork representing the observed data with possibly missing entries
name
a character for the name of sampling to fit on the partlyObservedNetwork
show()
show method
networkSamplingDyads_fit$show()
update_parameters()
a method to update the estimation of the parameters. By default, nothing to do (corresponds to MAR sampling)
networkSamplingDyads_fit$update_parameters(...)
...
use for compatibility
update_imputation()
a method to update the imputation of the missing entries.
networkSamplingDyads_fit$update_imputation(nu)
nu
the matrix of (uncorrected) imputation for missing entries
clone()
The objects of this class are cloneable with this method.
networkSamplingDyads_fit$clone(deep = FALSE)
deep
Whether to make a deep clone.
Virtual class used to define a family of networkSamplingNodes_fit
Virtual class used to define a family of networkSamplingNodes_fit
missSBM::networkSampling
-> networkSamplingNodes_fit
penalty
double, value of the penalty term in ICL
log_lambda
double, term for adjusting the imputation step which depends on the type of sampling
new()
constructor
networkSamplingNodes_fit$new(partlyObservedNetwork, name)
partlyObservedNetwork
a object with class partlyObservedNetwork representing the observed data with possibly missing entries
name
a character for the name of sampling to fit on the partlyObservedNetwork
show()
show method
networkSamplingNodes_fit$show()
update_parameters()
a method to update the estimation of the parameters. By default, nothing to do (corresponds to MAR sampling)
networkSamplingNodes_fit$update_parameters(...)
...
use for compatibility
update_imputation()
a method to update the imputation of the missing entries.
networkSamplingNodes_fit$update_imputation(nu)
nu
the matrix of (uncorrected) imputation for missing entries
clone()
The objects of this class are cloneable with this method.
networkSamplingNodes_fit$clone(deep = FALSE)
deep
Whether to make a deep clone.
Virtual class for all node-centered samplers
Virtual class for all node-centered samplers
missSBM::networkSampling
-> missSBM::networkSampler
-> nodeSampler
clone()
The objects of this class are cloneable with this method.
nodeSampler$clone(deep = FALSE)
deep
Whether to make a deep clone.
Class for fitting a node sampling
Class for fitting a node sampling
missSBM::networkSampling
-> missSBM::networkSamplingNodes_fit
-> nodeSampling_fit
vExpec
variational expectation of the sampling
new()
constructor
nodeSampling_fit$new(partlyObservedNetwork, ...)
partlyObservedNetwork
a object with class partlyObservedNetwork representing the observed data with possibly missing entries
...
used for compatibility
clone()
The objects of this class are cloneable with this method.
nodeSampling_fit$clone(deep = FALSE)
deep
Whether to make a deep clone.
This function draws observations in an adjacency matrix according to a given network sampling design.
observeNetwork( adjacencyMatrix, sampling, parameters, clusters = NULL, covariates = list(), similarity = l1_similarity, intercept = 0 )
observeNetwork( adjacencyMatrix, sampling, parameters, clusters = NULL, covariates = list(), similarity = l1_similarity, intercept = 0 )
adjacencyMatrix |
The N x N adjacency matrix of the network to sample. |
sampling |
The sampling design used to observe the adjacency matrix, see details. |
parameters |
The sampling parameters (adapted to each sampling, see details). |
clusters |
An optional clustering membership vector of the nodes. Only necessary for block samplings. |
covariates |
An optional list with M entries (the M covariates). If the covariates are node-centered,
each entry of |
similarity |
An optional function to compute similarities between node covariates. Default is
|
intercept |
An optional intercept term to be added in case of the presence of covariates. Default is 0. |
Internal functions use future_lapply
, so set your plan to 'multisession'
or
'multicore'
to use several cores/workers.
The list of parameters control
tunes more advanced features, such as the
initialization, how covariates are handled in the model, and the variational EM algorithm:
useCov logical. If covariates
is not null, should they be used for the
for the SBM inference (or just for the sampling)? Default is TRUE.
clusterInit Initial method for clustering: either a character ("spectral")
or a list with length(vBlocks)
vectors, each with size ncol(adjacencyMatrix)
,
providing a user-defined clustering. Default is "spectral".
similarity An R x R -> R function to compute similarities between node covariates. Default is
l1_similarity
, that is, -abs(x-y). Only relevant when the covariates are node-centered
(i.e. covariates
is a list of size-N vectors).
threshold V-EM algorithm stops stop when an optimization step changes the objective function or the parameters by less than threshold. Default is 1e-2.
maxIter V-EM algorithm stops when the number of iteration exceeds maxIter. Default is 50.
fixPointIter number of fix-point iterations in the V-E step. Default is 3.
exploration character indicating the kind of exploration used among "forward", "backward", "both" or "none". Default is "both".
iterates integer for the number of iterations during exploration. Only relevant when exploration
is different from "none". Default is 1.
trace logical for verbosity. Default is TRUE.
The different sampling designs are split into two families in which we find dyad-centered and node-centered samplings. See doi:10.1080/01621459.2018.1562934 for a complete description.
Missing at Random (MAR)
dyad parameter = p = Prob(Dyad(i,j) is observed)
node parameter = p = Prob(Node i is observed)
covar-dyad": parameter = beta in R^M, such that Prob(Dyad (i,j) is observed) = logistic(parameter' covarArray (i,j, .))
covar-node": parameter = nu in R^M such that Prob(Node i is observed) = logistic(parameter' covarMatrix (i,)
snowball": parameter = number of waves with Prob(Node i is observed in the 1st wave)
Missing Not At Random (MNAR)
double-standard parameter = (p0,p1) with p0 = Prob(Dyad (i,j) is observed | the dyad is equal to 0), p1 = Prob(Dyad (i,j) is observed | the dyad is equal to 1)
block-node parameter = c(p(1),...,p(Q)) and p(q) = Prob(Node i is observed | node i is in cluster q)
block-dyad parameter = c(p(1,1),...,p(Q,Q)) and p(q,l) = Prob(Edge (i,j) is observed | node i is in cluster q and node j is in cluster l)
an adjacency matrix with the same dimension as the input, yet with additional NAs.
## SBM parameters N <- 300 # number of nodes Q <- 3 # number of clusters pi <- rep(1,Q)/Q # block proportion theta <- list(mean = diag(.45,Q) + .05 ) # connectivity matrix ## simulate an unidrected binary SBM without covariate sbm <- sbm::sampleSimpleSBM(N, pi, theta) ## Sample network data # some sampling design and their associated parameters sampling_parameters <- list( "dyad" = .3, "node" = .3, "double-standard" = c(0.4, 0.8), "block-node" = c(.3, .8, .5), "block-dyad" = theta$mean, "degree" = c(.01, .01), "snowball" = c(2,.1) ) observed_networks <- list() for (sampling in names(sampling_parameters)) { observed_networks[[sampling]] <- missSBM::observeNetwork( adjacencyMatrix = sbm$networkData, sampling = sampling, parameters = sampling_parameters[[sampling]], cluster = sbm$memberships ) }
## SBM parameters N <- 300 # number of nodes Q <- 3 # number of clusters pi <- rep(1,Q)/Q # block proportion theta <- list(mean = diag(.45,Q) + .05 ) # connectivity matrix ## simulate an unidrected binary SBM without covariate sbm <- sbm::sampleSimpleSBM(N, pi, theta) ## Sample network data # some sampling design and their associated parameters sampling_parameters <- list( "dyad" = .3, "node" = .3, "double-standard" = c(0.4, 0.8), "block-node" = c(.3, .8, .5), "block-dyad" = theta$mean, "degree" = c(.01, .01), "snowball" = c(2,.1) ) observed_networks <- list() for (sampling in names(sampling_parameters)) { observed_networks[[sampling]] <- missSBM::observeNetwork( adjacencyMatrix = sbm$networkData, sampling = sampling, parameters = sampling_parameters[[sampling]], cluster = sbm$memberships ) }
An R6 Class used for internal representation of a partially observed network
An R6 Class used for internal representation of a partially observed network
This class is not exported to the user
samplingRate
The percentage of observed dyads
nbNodes
The number of nodes
nbDyads
The number of dyads
is_directed
logical indicating if the network is directed or not
networkData
The adjacency matrix of the network
covarArray
the array of covariates
covarMatrix
the matrix of covariates
samplingMatrix
matrix of observed and non-observed edges
samplingMatrixBar
matrix of observed and non-observed edges
observedNodes
a vector of observed and non-observed nodes (observed means at least one non NA value)
new()
constructor
partlyObservedNetwork$new( adjacencyMatrix, covariates = list(), similarity = l1_similarity )
adjacencyMatrix
The adjacency matrix of the network
covariates
A list with M entries (the M covariates), each of whom being either a size-N vector or N x N matrix.
similarity
An R x R -> R function to compute similarities between node covariates. Default is l1_similarity
, that is, -abs(x-y).
clustering()
method to cluster network data with missing value
partlyObservedNetwork$clustering( vBlocks, imputation = ifelse(is.null(private$phi), "median", "average") )
vBlocks
The vector of number of blocks considered in the collection.
imputation
character indicating the type of imputation among "median", "average"
imputation()
basic imputation from existing clustering
partlyObservedNetwork$imputation(type = c("median", "average", "zero"))
type
a character, the type of imputation. Either "median" or "average"
clone()
The objects of this class are cloneable with this method.
partlyObservedNetwork$clone(deep = FALSE)
deep
Whether to make a deep clone.
missSBM_fit
Plot function for the various fields of a missSBM_fit
: the fitted
SBM (network or connectivity), and a plot monitoring the optimization.
## S3 method for class 'missSBM_fit' plot( x, type = c("imputed", "expected", "meso", "monitoring"), dimLabels = list(row = "node", col = "node"), ... )
## S3 method for class 'missSBM_fit' plot( x, type = c("imputed", "expected", "meso", "monitoring"), dimLabels = list(row = "node", col = "node"), ... )
x |
an object with class |
type |
the type specifies the field to plot, either "imputed", "expected", "meso", or "monitoring" |
dimLabels |
: a list of two characters specifying the labels of the nodes. Default to |
... |
additional parameters for S3 compatibility. Not used |
a ggplot object
missSBM_fit
(i.e. network with imputed missing dyads)Prediction of a missSBM_fit
(i.e. network with imputed missing dyads)
## S3 method for class 'missSBM_fit' predict(object, ...)
## S3 method for class 'missSBM_fit' predict(object, ...)
object |
an R6 object with class |
... |
additional parameters for S3 compatibility. |
an adjacency matrix between pairs of nodes. Missing dyads are imputed with their expected values, i.e. by there estimated probabilities of connection under the missing SBM.
Class for defining a simple dyad sampler
Class for defining a simple dyad sampler
missSBM::networkSampling
-> missSBM::networkSampler
-> missSBM::dyadSampler
-> simpleDyadSampler
new()
constructor for networkSampling
simpleDyadSampler$new( parameters = NA, nbNodes = NA, directed = FALSE, covarArray = NULL, intercept = 0 )
parameters
the vector of parameters associated to the sampling at play
nbNodes
number of nodes in the network
directed
logical, directed network of not
covarArray
an array of covariates used
intercept
double, intercept term used to compute the probability of sampling in the presence of covariates. Default 0.
clone()
The objects of this class are cloneable with this method.
simpleDyadSampler$clone(deep = FALSE)
deep
Whether to make a deep clone.
Class for defining a simple node sampler
Class for defining a simple node sampler
missSBM::networkSampling
-> missSBM::networkSampler
-> missSBM::nodeSampler
-> simpleNodeSampler
new()
constructor for networkSampling
simpleNodeSampler$new( parameters = NA, nbNodes = NA, directed = FALSE, covarMatrix = NULL, intercept = 0 )
parameters
the vector of parameters associated to the sampling at play
nbNodes
number of nodes in the network
directed
logical, directed network of not
covarMatrix
a matrix of covariates used
intercept
double, intercept term used to compute the probability of sampling in the presence of covariates. Default 0.
clone()
The objects of this class are cloneable with this method.
simpleNodeSampler$clone(deep = FALSE)
deep
Whether to make a deep clone.
This internal class is designed to adjust a binary Stochastic Block Model in the context of missSBM.
This internal class is designed to adjust a binary Stochastic Block Model in the context of missSBM.
It is not designed not be call by the user
sbm::SBM
-> sbm::SimpleSBM
-> SimpleSBM_fit
type
the type of SBM (distribution of edges values, network type, presence of covariates)
penalty
double, value of the penalty term in ICL
entropy
double, value of the entropy due to the clustering distribution
loglik
double: approximation of the log-likelihood (variational lower bound) reached
ICL
double: value of the integrated classification log-likelihood
new()
constructor for simpleSBM_fit for missSBM purpose
SimpleSBM_fit$new(networkData, clusterInit, covarList = list())
networkData
a structure to store network under missing data condition: either a matrix possibly with NA, or a missSBM:::partlyObservedNetwork
clusterInit
Initial clustering: a vector with size ncol(adjacencyMatrix)
, providing a user-defined clustering with nbBlocks
levels.
covarList
An optional list with M entries (the M covariates).
doVEM()
method to perform estimation via variational EM
SimpleSBM_fit$doVEM( threshold = 0.01, maxIter = 100, fixPointIter = 3, trace = FALSE )
threshold
stop when an optimization step changes the objective function by less than threshold. Default is 1e-4.
maxIter
V-EM algorithm stops when the number of iteration exceeds maxIter. Default is 10
fixPointIter
number of fix-point iterations in the Variational E step. Default is 5.
trace
logical for verbosity. Default is FALSE
.
reorder()
permute group labels by order of decreasing probability
SimpleSBM_fit$reorder()
clone()
The objects of this class are cloneable with this method.
SimpleSBM_fit$clone(deep = FALSE)
deep
Whether to make a deep clone.
This internal class is designed to adjust a binary Stochastic Block Model in the context of missSBM.
This internal class is designed to adjust a binary Stochastic Block Model in the context of missSBM.
It is not designed not be call by the user
sbm::SBM
-> sbm::SimpleSBM
-> missSBM::SimpleSBM_fit
-> missSBM::SimpleSBM_fit_noCov
-> SimpleSBM_MNAR_noCov
imputation
the matrix of imputed values
vExpec
double: variational approximation of the expectation complete log-likelihood
new()
constructor for simpleSBM_fit for missSBM purpose
SimpleSBM_fit_MNAR$new(networkData, clusterInit)
networkData
a structure to store network under missing data condition: either a matrix possibly with NA, or a missSBM:::partlyObservedNetwork
clusterInit
Initial clustering: a vector with size ncol(adjacencyMatrix)
, providing a user-defined clustering with nbBlocks
levels.
update_parameters()
update parameters estimation (M-step)
SimpleSBM_fit_MNAR$update_parameters(nu = NULL)
nu
currently imputed values
update_blocks()
update variational estimation of blocks (VE-step)
SimpleSBM_fit_MNAR$update_blocks(log_lambda = 0)
log_lambda
additional term sampling dependent used to de-bias estimation of tau
clone()
The objects of this class are cloneable with this method.
SimpleSBM_fit_MNAR$clone(deep = FALSE)
deep
Whether to make a deep clone.
This internal class is designed to adjust a binary Stochastic Block Model in the context of missSBM.
This internal class is designed to adjust a binary Stochastic Block Model in the context of missSBM.
It is not designed not be call by the user
sbm::SBM
-> sbm::SimpleSBM
-> missSBM::SimpleSBM_fit
-> SimpleSBM_fit_noCov
imputation
the matrix of imputed values
vExpec
double: variational approximation of the expectation complete log-likelihood
vExpec_corrected
double: variational approximation of the expectation complete log-likelihood with correction to be comparable with MNAR criteria
update_parameters()
update parameters estimation (M-step)
SimpleSBM_fit_noCov$update_parameters(...)
...
additional arguments, only required for MNAR cases
update_blocks()
update variational estimation of blocks (VE-step)
SimpleSBM_fit_noCov$update_blocks(...)
...
additional arguments, only required for MNAR cases
clone()
The objects of this class are cloneable with this method.
SimpleSBM_fit_noCov$clone(deep = FALSE)
deep
Whether to make a deep clone.
This internal class is designed to adjust a binary Stochastic Block Model in the context of missSBM.
This internal class is designed to adjust a binary Stochastic Block Model in the context of missSBM.
It is not designed not be call by the user
sbm::SBM
-> sbm::SimpleSBM
-> missSBM::SimpleSBM_fit
-> SimpleSBM_fit_withCov
imputation
the matrix of imputed values
vExpec
double: variational approximation of the expectation complete log-likelihood
vExpec_corrected
double: variational approximation of the expectation complete log-likelihood with correction to be comparable with MNAR criteria
update_parameters()
update parameters estimation (M-step)
SimpleSBM_fit_withCov$update_parameters(...)
...
use for compatibility
control
a list to tune nlopt for optimization, see documentation of nloptr
update_blocks()
update variational estimation of blocks (VE-step)
SimpleSBM_fit_withCov$update_blocks(...)
...
use for compatibility
clone()
The objects of this class are cloneable with this method.
SimpleSBM_fit_withCov$clone(deep = FALSE)
deep
Whether to make a deep clone.
Class for defining a snowball sampler
Class for defining a snowball sampler
missSBM::networkSampling
-> missSBM::networkSampler
-> missSBM::nodeSampler
-> snowballSampler
new()
constructor for networkSampling
snowballSampler$new(parameters = NA, adjacencyMatrix = NA, directed = FALSE)
parameters
the vector of parameters associated to the sampling at play
adjacencyMatrix
the adjacency matrix of the network
directed
logical, directed network of not
clone()
The objects of this class are cloneable with this method.
snowballSampler$clone(deep = FALSE)
deep
Whether to make a deep clone.
missSBM_fit
Summary method for a missSBM_fit
## S3 method for class 'missSBM_fit' summary(object, ...)
## S3 method for class 'missSBM_fit' summary(object, ...)
object |
an R6 object with class |
... |
additional parameters for S3 compatibility. |
a basic printing output
This dataset contains two networks where the nodes are countries and an
edge in network "belligerent" means that the two countries have been at
least once at war between years 1816 to 2007 while an edge in network "alliance"
means that the two countries have had a formal alliance between years 1816 to 2012.
The network belligerent
have less nodes since countries which have not been at
war are not considered.
war
war
A list with 2 two igraph objects, alliance
and belligerent
.
Each graph have three attributes: 'name' (the country name), 'power' (a score related to military power: the higher, the better) and
'trade' (a score related to the trade effort between pairs of countries).
networks were extracted from https://correlatesofwar.org/
Sarkees, Meredith Reid and Frank Wayman (2010). Resort to War: 1816 - 2007. Washington DC: CQ Press.
Gibler, Douglas M. 2009. International military alliances, 1648-2008. CQ Press
data(war) class(war$belligerent) igraph::gorder(war$alliance) igraph::gorder(war$belligerent) igraph::edges(war$alliance) igraph::get.graph.attribute(war$alliance)
data(war) class(war$belligerent) igraph::gorder(war$alliance) igraph::gorder(war$belligerent) igraph::edges(war$alliance) igraph::get.graph.attribute(war$alliance)