Social Influence Regression (SIR) Model
sir.RdFits a Social Influence Regression model for network data with social influence effects. The SIR model captures how network connections influence outcomes through bilinear interaction terms, allowing for both sender and receiver effects in directed networks.
The model decomposes network influence into two components:
Sender influence (A matrix): A[i,k] measures how much node k's behavior (via X) shapes node i's outgoing ties.
Receiver influence (B matrix): B[j,l] measures how node l's position shapes node j's incoming ties.
Usage
sir(
Y,
W = NULL,
X = NULL,
Z = NULL,
family,
method = "ALS",
calc_se = TRUE,
fix_receiver = FALSE,
symmetric = FALSE,
bipartite = NULL,
kron_mode = FALSE,
...
)Arguments
- Y
A three-dimensional array of dimensions (m x m x T) containing the network outcomes. Y[i,j,t] represents the directed outcome from node i to node j at time t. Can contain NA values for missing observations. The diagonal (self-loops) can be included or excluded depending on the application.
- W
Optional influence covariate array, either:
3D array (m x m x p): Static influence covariates. W[i,j,r] represents the r-th covariate for the edge from i to j. The same W is used for all time periods.
4D array (m x m x p x T): Dynamic (time-varying) influence covariates. W[i,j,r,t] allows the influence structure to change over time. Parameters (alpha, beta) are still estimated jointly across all periods, but the influence matrices A_t and B_t vary with t. Only ALS method is supported for 4D W.
Common choices include graph Laplacians, geographic distance matrices, or node-level covariates expanded to edge-level. If NULL or p=0, the model uses only identity matrices (no network influence structure).
- X
Optional three-dimensional array of dimensions (m x m x T) representing the network state that carries influence. Typically this is a lagged version of Y (e.g., X[,,t] = Y[,,t-1]). If NULL and W is provided, an error is thrown. X determines which network patterns influence future outcomes.
- Z
Optional array of exogenous covariates. Can be either:
3D array (m x m x T): Single covariate varying across edges and time
4D array (m x m x q x T): Multiple (q) covariates
Examples include dyadic covariates (trade agreements, geographic distance) or node-level attributes (GDP, population) expanded to edge-level.
- family
Character string specifying the distribution family and link function. Must be one of "poisson", "normal", or "binomial". The choice depends on the nature of your outcome variable.
- method
Character string specifying the estimation method. Either "ALS" (Alternating Least Squares) or "optim" (direct optimization via BFGS). Default is "ALS" which is generally more stable.
- calc_se
Logical indicating whether to calculate standard errors for the parameters. Standard errors are computed using the observed information matrix. Setting to FALSE speeds up computation when uncertainty quantification is not needed.
- fix_receiver
Logical. If TRUE, fixes B = I (identity matrix) and estimates only (theta, alpha). This eliminates the bilinear identification problem (scaling ambiguity between A and B) by removing the receiver influence channel. The model becomes a standard GLM, yielding proper standard errors. Appropriate when receiver effects are negligible. Default is FALSE.
- symmetric
Logical. If TRUE, treats the network as undirected (symmetric). The function symmetrizes Y by averaging upper and lower triangles, uses only upper-triangle observations for fitting, and sets
fix_receiver = TRUE(since sender/receiver distinction is meaningless for undirected networks). Default is FALSE.- bipartite
Logical or NULL. Indicates whether the network is bipartite (senders and receivers are distinct node sets). If NULL (the default), bipartite status is inferred from Y: non-square arrays (n1 != n2) are treated as bipartite. Set to TRUE explicitly for square arrays where senders and receivers are nonetheless distinct populations. Setting FALSE on a non-square Y raises an error. Bipartite networks require
fix_receiver = TRUE.- kron_mode
Logical. If TRUE, replaces separate (alpha, beta) with a single p x p coefficient matrix C, where C[r,s] is the weight on W_r X W_s'. This is a general fix for the bilinear identification problem. Not yet implemented. Default is FALSE.
- ...
Additional arguments passed to the fitting functions:
trace: Logical or integer controlling output verbosity.tol: Convergence tolerance for ALS (default 1e-8).max_iter: Maximum ALS iterations (default 100).
Value
An object of class "sir" with the following components:
- summ
Data frame of parameter estimates with columns
coef,se(classical SE),rse(robust/sandwich SE),t_se(z-statistic using classical SE),t_rse(z-statistic using robust SE). Row names identify each parameter.- A
Sender influence matrix. For static W: n1 x n1 matrix. For dynamic (4D) W: n1 x n1 x T array. Off-diagonal entry A[i,k] measures how much node k's behavior (via X) shapes node i's outgoing ties. Diagonal is set to zero.
- B
Receiver influence matrix. Same dimensions as A. Off-diagonal entry B[j,l] measures how node l's position shapes node j's incoming ties. Identity when
fix_receiver = TRUE. Diagonal is zeroed.- tab
Numeric vector of all estimated parameters in order: [theta_1, ..., theta_q, alpha_2, ..., alpha_p, beta_1, ..., beta_p]. When
fix_receiver = TRUE: [theta_1, ..., theta_q, alpha_1, ..., alpha_p].- theta
Coefficients for exogenous covariates Z (length q).
- alpha
Full alpha vector including the fixed alpha_1 = 1 (length p). When
fix_receiver = TRUE, all alpha are free.- beta
Coefficients for receiver influence covariates (length p). Empty when
fix_receiver = TRUE.- ll
Log-likelihood at convergence.
- family
The distribution family used (
"poisson","normal", or"binomial").- method
The estimation method used (
"ALS"or"optim").- p
Number of influence covariates in W.
- q
Number of exogenous covariates in Z.
- m
Number of sender nodes (same as n1).
- n1
Number of sender (row) nodes.
- n2
Number of receiver (column) nodes.
- bipartite
Logical, TRUE if the network is bipartite (n1 != n2).
- n_periods
Number of time periods.
- nobs
Number of non-missing observations used in estimation.
- fitted.values
Array (n1 x n2 x T) of fitted values on the response scale (counts for Poisson, probabilities for binomial, means for normal).
- residuals
List with three components:
response(Y - fitted),pearson(standardized by variance function), anddeviance(signed square root of deviance contributions).- vcov
Variance-covariance matrix of parameters from the Hessian (classical SEs). NULL if
calc_se = FALSE.- vcov_robust
Sandwich (robust) variance-covariance matrix. NULL if
calc_se = FALSEor computation failed.- Y
The outcome array as used in fitting (with NAs from symmetric masking or Z missingness applied).
- W
The influence covariate array.
- X
The network state array (NAs replaced with 0).
- Z
The exogenous covariate array (converted to 4D if 3D input).
- fix_receiver
Logical, whether receiver effects were fixed.
- symmetric
Logical, whether the network was treated as undirected.
- kron_mode
Logical, whether Kronecker mode was used.
- iterations
Number of iterations until convergence.
- history
List with matrices ALPHA, BETA, THETA, DEV tracking parameter trajectories across iterations (useful for convergence diagnostics).
- convergence
Logical, TRUE if the algorithm converged.
- call
The matched function call.
- sigma2
Estimated error variance (only for
family = "normal").
Details
The SIR model specifies the expected outcome for the directed edge from node i to node j at time t as:
$$\mu_{i,j,t} = \theta^T z_{i,j,t} + \sum_{k,l} X_{k,l,t} A_{i,k} B_{j,l}$$
Where:
\(\mu_{i,j,t}\) is the expected value of the outcome Y_ijt
\(\theta\) is a q-dimensional vector of coefficients for exogenous covariates
\(z_{i,j,t}\) is a q-dimensional vector of exogenous covariates
\(X_{k,l,t}\) represents the network state (often lagged Y) that carries influence
\(A_{i,k}\) represents how node i is influenced by the behavior of node k
\(B_{j,l}\) represents how node j's reception is affected by node l's position
The bilinear term \(\sum_{k,l} X_{k,l,t} A_{i,k} B_{j,l}\) captures network influence and can be parameterized using influence covariates W through:
\(A = \sum_{r=1}^{p} \alpha_r W_r\) (sender effects, \(\alpha_1 = 1\) fixed)
\(B = \sum_{r=1}^{p} \beta_r W_r\) (receiver effects)
This parameterization reduces the number of parameters from \(O(m^2)\) to \(O(p)\), where \(p \ll m\).
Estimation Methods
Alternating Least Squares (ALS):
Iteratively optimizes A given B, then B given A
Generally more stable for high-dimensional problems
Better for sparse networks or when p is large
May converge to local optima
Direct Optimization (optim):
Uses BFGS to optimize all parameters simultaneously
Can be faster for small problems
May provide better solutions when good starting values are available
More prone to numerical issues in high dimensions
Distribution Families
Poisson: For count data (e.g., number of interactions)
Link function: log
Variance function: \(V(\mu) = \mu\)
Use when: Y_ijt represents counts
Normal: For continuous data (e.g., trade volumes, distances)
Link function: identity
Variance function: \(V(\mu) = \sigma^2\)
Use when: Y_ijt is continuous and approximately normal
Binomial: For binary data (e.g., presence/absence of ties)
Link function: logit
Variance function: \(V(\mu) = \mu(1 - \mu)\)
Use when: Y_ijt is binary (0/1)
Examples
# \donttest{
set.seed(123)
m <- 8; T_len <- 5; p <- 2
Y <- array(rpois(m * m * T_len, lambda = 2), dim = c(m, m, T_len))
X <- array(0, dim = c(m, m, T_len))
X[,,2:T_len] <- Y[,,1:(T_len - 1)]
W <- array(rnorm(m * m * p), dim = c(m, m, p))
model <- sir(Y = Y, W = W, X = X, family = "poisson",
method = "ALS", calc_se = FALSE, max_iter = 10)
print(model)
#>
#> Social Influence Regression Model
#> 8 nodes, 5 time periods (directed)
#> Config: poisson | ALS
#> Status: converged | N = 280 | Log-Lik: -576.96 | AIC: 1159.9
#> Coefficients:
#> Estimate
#> (alphaW) W2 3.9283
#> (betaW) W1 0.0010
#> (betaW) W2 0.0020
#> (SEs not computed)
#> Use `summary()` for detailed results
coef(model)
#> [1] 3.928314350 0.001010846 0.001972906
# }