We present the performances of GREMLINS on a simulated multipartite
network. GREMLINS includes a function rMBM
to simulate
multipartite networks. Mathematical details can be found in Bar-Hen, Barbillon, and S. (2021).
We use the function rMBM
provided in the package to
simulate a multipartite network involving 2 functional groups (namely A and B) of
respective sizes nA = 60, , nB = 50.
A and B are divided respectively into 3 and 2 blocks. The sizes of the blocks are generated randomly. For reproductibility, we fix the random seed to an arbitrarily chosen value.
namesFG <- c('A','B')
v_NQ <- c(60,50) #size of each FG
list_pi = list(c(0.16 ,0.40 ,0.44),c(0.3,0.7)) #proportion of each block in each FG
list_pi[[1]]
#> [1] 0.16 0.40 0.44
We assume that we observe 3 interactions matrices
- A-B : continuous weighted interactions
- B-B : binary interactions
- A-A : counting directed interactions
E <- rbind(c(1,2),c(2,2),c(1,1))
typeInter <- c( "inc","diradj", "adj")
v_distrib <- c('ZIgaussian','bernoulli','poisson')
Note that the distributions may be Bernoulli
,
Poisson
, Gaussian
or Laplace
(with null mean). For the Gaussian distribution, a mean and a variance
must be given. We generate randomly the emission parameters θ.
list_theta <- list()
list_theta[[1]] <- list()
list_theta[[1]]$mean <- matrix(c(6.1, 8.9, 6.6, 9.8, 2.6, 1.0), 3, 2)
list_theta[[1]]$var <- matrix(c(1.6, 1.6, 1.8, 1.7 ,2.3, 1.5),3, 2)
list_theta[[1]]$p0 <- matrix(c(0.4, 0.1, 0.6, 0.5 , 0.2, 0),3, 2)
list_theta[[2]] <- matrix(c(0.7,1.0, 0.4, 0.6),2, 2)
m3 <- matrix(c(2.5, 2.6 ,2.2 ,2.2, 2.7 ,3.0 ,3.6, 3.5, 3.3),3,3 )
list_theta[[3]] <- (m3 + t(m3))/2# for symetrisation
We are now ready to simulate the data
library(GREMLINS)
dataSim <- rMBM(v_NQ,E , typeInter, v_distrib, list_pi,
list_theta, namesFG = namesFG, seed = 4,keepClassif = TRUE)
list_Net <- dataSim$list_Net
length(list_Net)
#> [1] 3
names(list_Net[[1]])
#> [1] "mat" "typeInter" "rowFG" "colFG"
list_Net[[1]]$typeInter
#> [1] "inc"
list_Net[[1]]$rowFG
#> [1] "A"
list_Net[[1]]$colFG
#> [1] "B"
The model selection and the estimation are performed with the
function multipartiteBM
.
res_MBMsimu <- multipartiteBM(list_Net,
v_distrib = v_distrib,
namesFG = c('A','B'),
v_Kinit = c(2,2),
nbCores = 2,
initBM = FALSE,
keep = FALSE)
#> [1] "------------Nb of entities in each functional group--------------"
#> A B
#> 60 50
#> [1] "------------Probability distributions on each network--------------"
#> [1] "ZIgaussian" "bernoulli" "poisson"
#> [1] "-------------------------------------------------------------------"
#> [1] " ------ Searching the numbers of blocks starting from [ 2 2 ] blocks"
#> [1] "ICL : -7085.81 . Nb of blocks: [ 2 2 ]"
#> [1] "ICL : -5901.15 . Nb of blocks: [ 3 2 ]"
#> [1] "Best model------ ICL : -5901.15 . Nb of clusters: [ 3 2 ] for [ A , B ] respectively"
We can now get the estimated parameters.
res_MBMsimu$fittedModel[[1]]$paramEstim$list_theta$AB$mean
#> [,1] [,2]
#> [1,] 1.004152 6.572955
#> [2,] 2.582062 8.881842
#> [3,] 9.994673 6.139221
extractClustersMBM
produces the clusters in each
functional group.
One may also want to estimate the parameters for given numbers of
clusters. The function multipartiteBMFixedModel
is designed
for this task.
res_MBMsimu_fixed <- multipartiteBMFixedModel(list_Net, v_distrib = v_distrib, nbCores = 2,namesFG = namesFG, v_K = c(3,2))
#> [1] "====================== First Forward Step =================="
#> [1] "====================== First Backward Step =================="
#> [1] "====================== Last Forward Step =================="
#> [1] "====================== Last Backward Step =================="
res_MBMsimu_fixed$fittedModel[[1]]$paramEstim$v_K
#> [1] 3 2
extractClustersMBM(res_MBMsimu_fixed)$A
#> [[1]]
#> [1] 1 4 5 10 11 13 15 16 17 23 24 25 27 29 32 33 34 35 39 40 42 48 51 56 57
#> [26] 58 59 60
#>
#> [[2]]
#> [1] 3 9 18 20 21 28 30 45 53 54 55
#>
#> [[3]]
#> [1] 2 6 7 8 12 14 19 22 26 31 36 37 38 41 43 44 46 47 49 50 52
GREMLINS is also able to handle missing data. In the following experiment, we artificially set missing data in the previously simulated matrices.
############# NA data at random in any matrix
epsilon = 10/100
list_Net_NA <- list_Net
for (m in 1:nrow(E)){
U <- sample(c(1,0),v_NQ[E[m,1]]*v_NQ[E[m,2]],replace=TRUE,prob = c(epsilon, 1-epsilon))
matNA <- matrix(U,v_NQ[E[m,1]],v_NQ[E[m,2]])
list_Net_NA[[m]]$mat[matNA== 1] = NA
if (list_Net_NA[[m]]$typeInter == 'adj') {
M <- list_Net_NA[[m]]$mat
diag(M) <- NA
M[lower.tri(M)] = t(M)[lower.tri(M)]
list_Net_NA[[m]]$mat <- M
}
}
res_MBMsimuNA <- multipartiteBM(list_Net_NA,
v_distrib = v_distrib,
namesFG = c('A','B'),
v_Kinit = c(2,2),
nbCores = 2,
keep = FALSE)
#> [1] "------------Nb of entities in each functional group--------------"
#> A B
#> 60 50
#> [1] "------------Probability distributions on each network--------------"
#> [1] "ZIgaussian" "bernoulli" "poisson"
#> [1] "-------------------------------------------------------------------"
#> [1] " ------ Searching the numbers of blocks starting from [ 2 2 ] blocks"
#> [1] "ICL : -6479.09 . Nb of blocks: [ 2 2 ]"
#> [1] "ICL : -5440.74 . Nb of blocks: [ 3 2 ]"
#> [1] "Best model------ ICL : -5440.74 . Nb of clusters: [ 3 2 ] for [ A , B ] respectively"
We then have a function to predict the missing edges (probability if binary or intensity if weighted)