Internals: object structure & writing custom extensions
Cassy Dorff and Shahryar Minhas
2026-06-25
Source:vignettes/internals.Rmd
internals.RmdThis vignette is for package developers,
methodologists writing custom extensions, and
anyone who wants to inspect the underlying object
structure. End-user workflows are covered in
vignette("quickstart_inference", package = "netify") and
the topic-specific project-site articles; this one goes one layer
deeper.
the netify object: a base r object with attributes
A netify object is a base R matrix / 3D array /
list-of-matrices with class = "netify" and a bundle of
attributes carrying all metadata. The netify_type attribute
records one of three shapes:
netify_type |
Underlying R object | When used |
|---|---|---|
cross_sec |
[n x n] matrix (or [r x c] bipartite) |
One time period |
longit_array |
[n x n x T] array (or
[n x n x p x T]) |
Multi-period, constant actor set |
longit_list |
Named list of matrices, one per time | Multi-period, varying actor composition |
Multilayer networks insert a layer dimension (position 3) and store
layer names in attr(x, "layers"). Mixed-directedness
multilayer is supported via a vector-valued symmetric
attribute.
dgCMatrix inputs from the Matrix
package are accepted but densified at construction (a one-shot cli
inform is emitted); a true sparse storage backend is not yet
implemented.
Inspect any netify object’s attribute bundle:
net <- netify(icews[icews$year == 2010, ],
actor1 = "i", actor2 = "j",
symmetric = FALSE, weight = "verbCoop",
nodal_vars = "i_polity2",
dyad_vars = "matlCoop")
#> ℹ `missing_to_zero` is set to "TRUE" (the default).
#> ! Missing dyads will be filled with zeros. For latent space or other
#> statistical network models, structural zeros and missing data have different
#> meanings. Set `missing_to_zero = FALSE` to preserve NAs if this distinction
#> matters for your analysis.
#> This message is displayed once per session.
class(net)
#> [1] "netify"
str(attributes(net), max.level = 1)
#> List of 17
#> $ dim : int [1:2] 152 152
#> $ dimnames :List of 2
#> $ class : chr "netify"
#> $ netify_type : chr "cross_sec"
#> $ actor_time_uniform: logi TRUE
#> $ actor_pds :'data.frame': 152 obs. of 3 variables:
#> $ weight : chr "verbCoop"
#> $ detail_weight : chr "Weights from `verbCoop`"
#> $ is_binary : logi FALSE
#> $ symmetric : logi FALSE
#> $ mode : chr "unipartite"
#> $ layers : chr "verbCoop"
#> $ diag_to_NA : logi TRUE
#> $ missing_to_zero : logi TRUE
#> $ sum_dyads : logi FALSE
#> $ nodal_data :'data.frame': 152 obs. of 2 variables:
#> $ dyad_data :List of 1Key attributes:
-
netify_type–"cross_sec","longit_array", or"longit_list" -
mode–"unipartite"or"bipartite" -
symmetric– scalar logical (or named vector for mixed-directedness multilayer) -
weight– column name (orNULLfor binary) -
is_binary,detail_weight,diag_to_NA,missing_to_zero,sum_dyads -
layers– character vector; length > 1 means multilayer -
actor_pds– data.frame withactor,min_time,max_timeper actor -
nodal_data– data.frame withactor, optionaltime, and one column per nodal variable -
dyad_data– nested list:list[[time]][[var]] = matrix. Cross-sec uses"1"as the time key.
extracting parts
| Want this | Use |
|---|---|
| The raw matrix / array / list | get_raw(net) |
| A quick numeric peek | peek(net, from = 5, to = 5) |
| The full long edge data frame |
unnetify(net) or tidy(net)
|
| Lean wide-to-long edge frame (no nodal merge) | melt(net) |
| Graph-level stats |
summary(net) or glance(net)
|
| Actor-level stats | summary_actor(net) |
| Size / composition descriptors | measurements(net) |
| Edge data + layout for plotting | net_plot_data(net)$net_dfs |
tidy() and glance() are S3 methods on the
broom generics – they’re registered on package load if
generics (or anything that imports it, like
broom, dplyr, tidymodels) is
installed. No hard dependency on broom.
tidy interop
If tibble and broom are installed, three
tidyverse-flavored entry points are available.
as_tibble.netify is registered against
tibble::as_tibble via .onLoad, so
tibble must be installed to use the unprefixed call;
tidy() and glance() are likewise registered
against the broom generics.
library(tibble)
library(broom)
# one row per dyad (long edge frame, wrapped in a tibble)
as_tibble(net)
#> # A tibble: 22,952 × 6
#> from to verbCoop matlCoop i_polity2_from i_polity2_to
#> <chr> <chr> <dbl> <dbl> <int> <int>
#> 1 Afghanistan Albania 0 1 NA 9
#> 2 Afghanistan Algeria 0 0 NA 2
#> 3 Afghanistan Angola 0 0 NA -2
#> 4 Afghanistan Argentina 1 0 NA 8
#> 5 Afghanistan Armenia 7 2 NA 5
#> 6 Afghanistan Australia 125 0 NA 10
#> 7 Afghanistan Austria 1 0 NA 10
#> 8 Afghanistan Azerbaijan 7 0 NA -7
#> 9 Afghanistan Bahrain 3 0 NA -5
#> 10 Afghanistan Bangladesh 14 0 NA 5
#> # ℹ 22,942 more rows
# broom-style tidy summary: a tibble with one row per (non-zero) dyad
head(tidy(net))
#> # A tibble: 6 × 6
#> from to verbCoop matlCoop i_polity2_from i_polity2_to
#> <chr> <chr> <dbl> <dbl> <int> <int>
#> 1 Afghanistan Argentina 1 0 NA 8
#> 2 Afghanistan Armenia 7 2 NA 5
#> 3 Afghanistan Australia 125 0 NA 10
#> 4 Afghanistan Austria 1 0 NA 10
#> 5 Afghanistan Azerbaijan 7 0 NA -7
#> 6 Afghanistan Bahrain 3 0 NA -5
# one-row-per-network model-card summary (one row per time/layer if applicable)
glance(net)
#> # A tibble: 1 × 18
#> net num_actors density num_edges prop_edges_missing mean_edge_weight
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 152 0.435 9976 0 41.7
#> # ℹ 12 more variables: sd_edge_weight <dbl>, median_edge_weight <dbl>,
#> # min_edge_weight <dbl>, max_edge_weight <dbl>, competition_row <dbl>,
#> # competition_col <dbl>, sd_of_row_means <dbl>, sd_of_col_means <dbl>,
#> # covar_of_row_col_means <dbl>, reciprocity <dbl>, mutual <dbl>,
#> # transitivity <dbl>as_tibble(net) and tidy(net) share the same
long-format payload as unnetify(net) – they differ only in
whether zero-weight edges are dropped by default (tidy()
drops them, matching the broom convention). glance(net) is
the broom-flavored sibling of summary(net) – one row of
graph-level statistics per network (or per (time, layer) slice for
longitudinal / multilayer inputs).
as_tibble() also has a method for
netify_comparison objects (from
compare_networks()), returning the per-pair
$comparisons frame directly so you can pipe straight into
filter() / arrange() /
pivot_wider():
panel <- netify(icews[icews$year %in% c(2010, 2011), ],
actor1 = "i", actor2 = "j", time = "year",
symmetric = FALSE, weight = "verbCoop")
net_2010 <- subset_netify(panel, time = "2010")
net_2011 <- subset_netify(panel, time = "2011")
cmp <- compare_networks(list("2010" = net_2010, "2011" = net_2011))
as_tibble(cmp)
#> # A tibble: 1 × 5
#> net_i net_j metric value p_value
#> <chr> <chr> <chr> <dbl> <dbl>
#> 1 2010 2011 correlation 0.884 NApredicates and descriptors
For programmatic dispatch – e.g. inside a custom exporter or pipeline step – netify ships a small set of non-masking predicates and size accessors:
# class / structure predicates
is_netify(net)
#> [1] TRUE
is_bipartite(net) # may be masked by igraph::is_bipartite
#> [1] FALSE
is_bipartite_netify(net) # alias that won't be masked
#> [1] FALSE
is_directed_netify(net)
#> [1] TRUE
is_longitudinal(net)
#> [1] FALSE
is_multilayer(net)
#> [1] FALSE
# size / composition accessors
n_actors(net) # number of unique actors
#> [1] 152
n_periods(net) # number of time periods (1 for cross-sec)
#> [1] 1
n_layers(net) # number of layers (1 for single-layer)
#> [1] 1
head(get_actor_time_info(net)) # stored actor_pds: actor, min_time, max_time
#> actor min_time max_time
#> 1 Afghanistan 1 1
#> 2 Albania 1 1
#> 3 Algeria 1 1
#> 4 Angola 1 1
#> 5 Argentina 1 1
#> 6 Armenia 1 1The _netify suffix on is_bipartite_netify()
and is_directed_netify() avoids masking the same-named
predicates from igraph and network. The
unsuffixed versions exist too and defer to the foreign package if a
non-netify graph object is passed.
object-level validation
validate_netify() is the developer’s pre-flight check.
It walks the attribute bundle and confirms the invariants the rest of
the package relies on:
validate_netify(net, verbose = TRUE)
#> ✔ netify object passes all coherence checks (11 checks).The full list of checks:
-
netify_type– one of"cross_sec","longit_array","longit_list", and matches the underlying R object -
mode–"unipartite"or"bipartite" -
symmetric_type– scalar logical, or named logical of lengthn_layersfor mixed-directedness multilayer -
layers_consistent–attr(net, "layers")length matches the layer dimension where applicable -
nodal_actors_known– every actor referenced innodal_dataexists in the network’s actor set -
is_binary_consistent– the storedis_binaryflag matches the actual content -
symmetric_consistent– if stored as symmetric, the matrix content is actually symmetric -
unipartite_dimnames– row and column names agree for unipartite networks -
slice_dimnames_consistent– every slice of a longitudinal array/list has dimnames in the order recorded inactor_pds -
nodal_time_known– time keys in nodal/dyad data are a subset of the network’s time axis
A quick demo on a clean netlet versus one tampered to introduce a stray actor:
# clean netlet ticks every box
all(unlist(validate_netify(net, verbose = FALSE)))
#> [1] TRUE
# tamper: inject a stray actor into nodal_data
bad <- net
nd <- attr(bad, "nodal_data")
nd <- rbind(nd, nd[1, , drop = FALSE])
nd$actor[nrow(nd)] <- "ZZZ_not_in_network"
attr(bad, "nodal_data") <- nd
validate_netify(bad, verbose = TRUE)
#> ! nodal_data references 1 actor not in the netlet: "ZZZ_not_in_network"
#> ✖ netify object failed 1 coherence check: "nodal_actors_known".If a custom exporter or an internal manipulation breaks one of these,
validate_netify() is the first place to look.
open-cohort panels with actor_pds
When the actor roster changes over time – entries, exits, attrition,
contact-tracing windows – pass actor_time_uniform = FALSE
and supply an actor_pds data.frame giving each actor’s
[min_time, max_time] window. Each period’s netlet then
contains only the actors whose window covers that period; densities,
degree counts, and per-period actor sets respect those entry / exit
boundaries.
the actor_pds roster, step by step
Three pieces have to line up for an open-cohort netlet to be well-formed:
-
A roster – one row per actor, with
actor,min_time,max_time. The[min_time, max_time]interval is closed (the actor is alive in the boundary periods themselves). - An edgelist – one row per observed interaction. During construction, rows outside the roster windows are excluded because the referenced actor is not in the risk set for that period.
-
netify(actor_time_uniform = FALSE, actor_pds = roster, ...)– this is what tells the constructor to honor the roster instead of treating every actor as present in every period.
Below, actor a is in the network only during periods 1-2
(enters at t = 1, exits after t = 2), and
actors d / e arrive at t = 3. The
edgelist deliberately includes a tie at t = 2 involving
a and ties at t = 3 involving d
so we can verify the period-by-period actor sets afterwards.
set.seed(1)
# roster: actors with closed-interval entry / exit times
roster <- data.frame(
actor = c("a", "b", "c", "d", "e"),
min_time = c(1, 1, 1, 3, 3),
max_time = c(2, 5, 4, 5, 5) # a exits after t = 2
)
# edges (only show up while both endpoints are in the roster)
edges <- data.frame(
i = c("a", "a", "b", "c", "d", "c", "d", "e"),
j = c("b", "c", "c", "b", "e", "d", "e", "b"),
t = c(1, 2, 2, 3, 4, 3, 5, 5)
)
net_oc <- netify(edges,
actor1 = "i", actor2 = "j", time = "t",
actor_time_uniform = FALSE,
actor_pds = roster
)
# read the roster back off the netlet itself
head(get_actor_time_info(net_oc))
#> actor min_time max_time
#> 1 a 1 2
#> 2 b 1 5
#> 3 c 1 4
#> 4 d 3 5
#> 5 e 3 5
n_actors(net_oc)
#> [1] 4
n_periods(net_oc)
#> [1] 5get_actor_time_info() on a netify object returns the
stored actor_pds directly (it is also the argument name on
netify() itself, so the round trip is exact). On a raw dyad
data.frame the same function derives the roster from observed
activity – that’s how you build the actor_pds argument in
the first place.
This is the standard path for open-cohort longitudinal data – panel surveys with attrition, contact-tracing chains, organizational membership over time, animal co-occurrence with births / deaths, and similar settings where treating every actor as present in every period would distort denominators.
density and per-period actor sets
The key correctness guarantee: density (and every other per-period statistic) is computed against the actor set alive in that period, not the union of all actors ever observed. With the roster above:
-
t = 1: actorsa,b,care alive (3 actors) -
t = 2: actorsa,b,care alive (3 actors;a’s last period) -
t = 3: actorsb,c,d,eare alive (4 actors;ais now out,d/eenter) -
t = 4: actorsb,c,d,eare alive (4 actors) -
t = 5: actorsb,d,eare alive (3 actors;c’s last period was 4)
oc_summary <- summary(net_oc)
oc_summary[, c("net", "num_actors", "density", "num_edges")]
#> net num_actors density num_edges
#> 1 1 3 0.3333333 1
#> 2 2 3 0.6666667 2
#> 3 3 4 0.3333333 2
#> 4 4 4 0.1666667 1
#> 5 5 3 0.6666667 2The period-3 denominator is 4 x 3 = 12 (directed) or
4 x 3 / 2 = 6 (symmetric), not 5 x 4 – actor
a is not counted because its max_time is 2.
This is the load-bearing invariant: density at t = 3
excludes the actor present only in periods 1-2, so a researcher
comparing densities across periods is not comparing apples to
oranges.
The same accounting flows into summary_actor(),
homophily(), mixing_matrix(), and the plot
helpers – open-cohort actors never show up as zero-degree placeholders
in periods they did not exist.
na versus zero in weighted networks
For weighted data, the distinction between 0 (observed,
but no edge / zero-valued interaction) and NA (unobserved /
not-at-risk) is semantically important in many domains – epidemiology,
animal behavior, sparse survey rosters. By default
netify(missing_to_zero = TRUE) fills unobserved dyads with
0. Pass missing_to_zero = FALSE to keep them
as NA, in which case
summary()$prop_edges_missing reports the non-trivial
missingness fraction and downstream centrality / homophily routines
propagate the NA semantics rather than silently treating the dyad as a
zero-weight tie.
Both prop_edges_missing and
prop_unknown_edges use the same denominator as
density – the number of potential edge dyads
(off-diagonal for unipartite + diag_to_NA, halved for
symmetric, all cells for bipartite). That means
density + prop_edges_missing + observed_zero_fraction = 1
is an identity, which is the property you want when you are reading them
as competing fractions. The prop_unknown_edges column is
suppressed when missing_to_zero = TRUE because every
unobserved dyad has been filled with 0 and the value would be
identically zero in every row; when present it tracks
prop_edges_missing and serves as the “this netlet carries
NA semantics” cue downstream.
writing a custom graph-level statistic
summary(net, other_stats = list(my_stat = fn)) accepts
user-supplied functions. Each function receives the netify
object for the current time period / layer (not a stripped
matrix), so you can call netify_to_igraph(),
peek(), or whatever you want inside.
Return a named numeric vector – names become column names in the output frame.
# example: number of weakly connected components with at least 2 nodes
n_components_2plus <- function(net) {
g <- netify_to_igraph(net)
c(n_components_2plus = sum(igraph::components(g)$csize >= 2))
}
# example: edge weight skewness
weight_skew <- function(net) {
v <- as.vector(net)
v <- v[!is.na(v) & v != 0]
if (length(v) < 3) return(c(weight_skew = NA_real_))
c(weight_skew = mean((v - mean(v))^3) / (stats::sd(v)^3))
}
summary(net, other_stats = list(
comp = n_components_2plus,
skew = weight_skew
))
#> ℹ `netify_to_igraph()` kept the igraph edge set unchanged.
#> • 131 dyadic covariate cells on non-edges cannot be stored as igraph edge
#> attributes.
#> This message is displayed once per session.
#> net num_actors density num_edges prop_edges_missing mean_edge_weight
#> 1 1 152 0.4346462 9976 0 41.72113
#> sd_edge_weight median_edge_weight min_edge_weight max_edge_weight
#> 1 198.3157 6 1 4937
#> competition_row competition_col sd_of_row_means sd_of_col_means
#> 1 0.04071163 0.03493156 41.4412 37.76971
#> covar_of_row_col_means reciprocity mutual transitivity
#> 1 0.9934913 0.9823385 0.8402509 0.6386063
#> comp.n_components_2plus skew.weight_skew
#> 1 1 13.76129The same other_stats mechanism is available in
summary_actor(), compare_networks(),
homophily(), mixing_matrix(), and
dyad_correlation(). In each case the function gets called
once per time period / layer / iteration as appropriate; check the
function’s ? for the exact contract.
writing a custom actor-level statistic
summary_actor(net, other_stats = list(my_stat = fn)).
The function should return a vector with one value per actor (in row
order):
# example: per-actor mean tie weight to non-isolates
mean_active_tie <- function(mat) {
apply(mat, 1, function(row) {
nonzero <- row[!is.na(row) & row != 0]
if (length(nonzero) == 0) NA_real_ else mean(nonzero)
})
}
head(summary_actor(net, stats = "fast", other_stats = list(mean_active = mean_active_tie)))
#> actor degree_in degree_out degree_total prop_ties_in prop_ties_out
#> 1 Afghanistan 95 80 175 0.6291391 0.5298013
#> 2 Albania 53 51 104 0.3509934 0.3377483
#> 3 Algeria 78 94 172 0.5165563 0.6225166
#> 4 Angola 68 61 129 0.4503311 0.4039735
#> 5 Argentina 64 66 130 0.4238411 0.4370861
#> 6 Armenia 56 56 112 0.3708609 0.3708609
#> prop_ties_total network_share_in network_share_out network_share_total
#> 1 0.5794702 0.025958050 0.021073497 0.023515773
#> 2 0.3443709 0.001170082 0.001158069 0.001164076
#> 3 0.5695364 0.002919199 0.003056150 0.002987674
#> 4 0.4271523 0.002349775 0.002011004 0.002180390
#> 5 0.4304636 0.003925903 0.004175777 0.004050840
#> 6 0.3708609 0.004995075 0.004862930 0.004929002
#> strength_sum_in strength_sum_out strength_sum_total strength_avg_in
#> 1 10804 8771 19575 113.726316
#> 2 487 482 969 9.188679
#> 3 1215 1272 2487 15.576923
#> 4 978 837 1815 14.382353
#> 5 1634 1738 3372 25.531250
#> 6 2079 2024 4103 37.125000
#> strength_avg_out strength_avg_total strength_std_in strength_std_out
#> 1 109.63750 111.857143 436.76245 385.75910
#> 2 9.45098 9.317308 10.72032 12.20215
#> 3 13.53191 14.459302 29.33396 26.61098
#> 4 13.72131 14.069767 22.56270 20.65279
#> 5 26.33333 25.938462 44.36392 44.54672
#> 6 36.14286 36.633929 103.35993 101.33983
#> strength_std_total strength_median_in strength_median_out
#> 1 413.06464 7 10.0
#> 2 11.41560 5 5.0
#> 3 27.81441 4 4.0
#> 4 21.59904 4 5.0
#> 5 44.28602 6 7.0
#> 6 101.89396 4 4.5
#> strength_median_total mean_active
#> 1 8.0 109.63750
#> 2 5.0 9.45098
#> 3 4.0 13.53191
#> 4 5.0 13.72131
#> 5 6.5 26.33333
#> 6 4.0 36.14286reading the dyad_data nested list
attr(net, "dyad_data") is the structure that gives
netify O(1) access to per-time dyadic covariates. Its shape is:
dd <- attr(net, "dyad_data")
names(dd) # cross-sec: just "1"
#> [1] "1"
names(dd[["1"]]) # one entry per dyadic variable
#> [1] "matlCoop"
str(dd[["1"]][["matlCoop"]])
#> int [1:152, 1:152] 0 4 0 0 0 2 19 0 1 0 ...
#> - attr(*, "dimnames")=List of 2
#> ..$ : chr [1:152] "Afghanistan" "Albania" "Algeria" "Angola" ...
#> ..$ : chr [1:152] "Afghanistan" "Albania" "Algeria" "Angola" ...For longitudinal networks the top level has one entry per period:
# pseudo-structure for a 3-period network with 2 dyadic vars:
# dd[["2010"]][["matlCoop"]] -> n x n matrix
# dd[["2010"]][["verbConf"]] -> n x n matrix
# dd[["2011"]][["matlCoop"]] -> n x n matrix
# ... etc.If you’re writing a converter (e.g., to a new modeling format), this is the structure to iterate over.
writing a custom exporter (to_* function)
The existing to_amen(), to_lame(),
to_dbn(), to_igraph(),
to_statnet() give you templates. to_amen()
exports the standard amen-style data structure; to_lame()
wraps that exporter with mode, family, and an
executable ame_call snippet for lame workflows. A new
exporter generally needs to:
- Validate the netify object with
validate_netify(netlet, verbose = FALSE) - Branch on
attr(netlet, "netify_type")(cross_sec / longit_array / longit_list) - For multilayer, decide whether to iterate per layer (use
subset_netify(netlet, layers = X)and return a named list) or to produce a joint structure (4D array, etc.) - Pull the raw network data with
get_raw(netlet) - Pull nodal data with
attr(netlet, "nodal_data")and align to your target package’s actor order - Pull dyadic data with
attr(netlet, "dyad_data")and reshape to your target format - Handle missing values according to your target package’s convention (most don’t accept NAs)
If your target package expects per-layer outputs but takes multilayer netify inputs, the standard pattern is:
to_mymodel <- function(netlet, ...) {
validate_netify(netlet, verbose = FALSE)
layer_names <- attributes(netlet)$layers
if (length(layer_names) > 1) {
out <- lapply(layer_names, function(lyr) {
to_mymodel(subset_netify(netlet, layers = lyr), ...)
})
names(out) <- layer_names
return(out)
}
# ... single-layer logic ...
}This is what to_igraph(), to_statnet(), and
to_lame() do.
performance characteristics
| Operation | Complexity | Notes |
|---|---|---|
netify(df) cross-sec |
O(N * E) | C++ via Rcpp; fast |
netify(df, time = ...) longit |
O(N * E * T) | C++; fast |
summary(net) |
O(N^2) per period | igraph backend |
summary_actor(net) |
O(N^2) per period x per layer | igraph; closeness/betweenness dominate |
bootstrap_netlet(net, fn, n_boot = B) |
O(B * n * N^2) | depends on the statistic supplied in fn
|
compare_networks(method = "qap") |
O(R * N^2) | C++ permutations; tunable via n_permutations
|
compare_networks(method = "spectral") |
O(N^3) | use spectral_rank = round(sqrt(N)) for large nets |
unnetify(net) |
O(N^2 * T) | use remove_zeros = TRUE for sparse output |
melt(net) |
O(N^2 * T) | leaner than unnetify; no nodal merge |
plot(net) |
O(N^2) layout + O(E) render | igraph layout dominates |
For networks of a few hundred actors over a dozen time periods,
everything runs in seconds. The C++ routines for netify(),
compare_networks() QAP, and similarity calculations handle
the heaviest paths. The R-side wrappers (especially in plotting and
attribute handling) are not as optimized.
Memory budget rule of thumb. netify stores
adjacencies densely: an N x N double-precision matrix is 8 * N^2 bytes.
For longitudinal stacks the cost is 8 * N^2 * T
(longit_array) or the sum over per-period sizes
(longit_list, when actor composition varies).
Concretely:
| N | per-snapshot RAM | T = 12 (monthly year) | T = 52 (weekly year) |
|---|---|---|---|
| 200 | 320 KB | 3.8 MB | 17 MB |
| 1,000 | 8 MB | 96 MB | 416 MB |
| 5,000 | 200 MB | 2.4 GB | 10 GB |
| 10,000 | 800 MB | 9.4 GB | 41 GB |
| 50,000 | 20 GB | 234 GB | 1 TB |
longit_list netlets cost the sum of per-period
sizes rather than T x max(N)^2, so they pay less when actor
composition is sparse over time. For 15,000-node weekly Twitter
snapshots over a year (T = 52), a longit_array
is ~93 GB and would not fit on a laptop; a longit_list
whose typical-period N is much smaller is the only viable in-memory
shape.
Above ~10,000 actors, prefer building the netlet from an edgelist
data.frame (skips the dense intermediate at construction) and consider
exiting to an edge data.frame via
unnetify(net, remove_zeros = TRUE), or to
igraph via to_igraph(net) for community
detection and other large-N algorithms.
summary_actor(stats = "fast") skips closeness / betweenness
/ eigen / HITS and auto-promotes when N exceeds
getOption("netify.fast_threshold", 1500L).
Sparse-input guard. Passing a
Matrix::dgCMatrix (or any sparseMatrix) with
density < 1% and N > 5,000 aborts construction with a pointer to
the edgelist path. The motivating case is a 15K x 15K follower graph at
density 0.001: densifying allocates ~1.7 GB of mostly zeros, which is
almost never what the caller wants. Pass force_dense = TRUE
to override if you really do want the dense allocation.
Benchmark, three ER networks. Wall-clock for the dense-matrix path on a representative laptop (single core, single snapshot, p = 0.01). Re-run locally with the chunk below if you want numbers for your own machine.
library(netify)
set.seed(1)
bench_one <- function(N, p = 0.01) {
# build an er adjacency directly as an edgelist (skips the dense intermediate)
i <- sample.int(N, size = round(p * N * N), replace = TRUE)
j <- sample.int(N, size = length(i), replace = TRUE)
df <- data.frame(from = i, to = j)
t0 <- Sys.time(); net <- netify(df, actor1 = "from", actor2 = "to"); t_build <- Sys.time() - t0
t0 <- Sys.time(); s <- summary(net); t_summary <- Sys.time() - t0
t0 <- Sys.time(); sa <- summary_actor(net, stats = "fast"); t_actor_fast <- Sys.time() - t0
t0 <- Sys.time(); ig <- to_igraph(net); t_igraph <- Sys.time() - t0
data.frame(N = N,
build_s = as.numeric(t_build, units = "secs"),
summary_s = as.numeric(t_summary, units = "secs"),
summary_actor_fast_s = as.numeric(t_actor_fast, units = "secs"),
to_igraph_s = as.numeric(t_igraph, units = "secs"))
}
do.call(rbind, lapply(c(1000, 5000, 10000), bench_one))Indicative results (single-core, 16 GB laptop):
| N | netify() |
summary() |
summary_actor(stats = "fast") |
to_igraph() |
|---|---|---|---|---|
| 1,000 | < 1 s | < 1 s | < 1 s | < 1 s |
| 5,000 | ~2 s | ~4 s | ~1 s | < 1 s |
| 10,000 | ~2 s | ~14 s | ~3 s | ~3 s |
summary() is dominated by the igraph-based global
metrics and grows fastest; netify() itself stays under a
few seconds even at N = 10,000 when fed an edgelist. The full
summary_actor() (with closeness / betweenness / eigen /
HITS) blows up roughly quadratically – at N = 10,000 it takes several
minutes versus the few seconds shown above for the fast path. That’s why
the auto-promote fires by default.
see also
-
vignette("quickstart_inference", package = "netify")– minimal end-to-end tour - Foundations – full IR walkthrough
- Pipeline: netify to lame and dbn – optional modeling handoff article
-
vignette("pipeline_netify_ergm", package = "netify")– modeling handoff to ergm / statnet