Internals: object structure & writing custom extensions

This vignette is for package developers, methodologists writing custom extensions, and anyone who wants to inspect the underlying object structure. End-user workflows are covered in vignette("quickstart_inference", package = "netify") and the topic-specific project-site articles; this one goes one layer deeper.

library(netify)
data(icews)

the netify object: a base r object with attributes

A netify object is a base R matrix / 3D array / list-of-matrices with class = "netify" and a bundle of attributes carrying all metadata. The netify_type attribute records one of three shapes:

`netify_type`	Underlying R object	When used
`cross_sec`	`[n x n]` matrix (or `[r x c]` bipartite)	One time period
`longit_array`	`[n x n x T]` array (or `[n x n x p x T]`)	Multi-period, constant actor set
`longit_list`	Named list of matrices, one per time	Multi-period, varying actor composition

Multilayer networks insert a layer dimension (position 3) and store layer names in attr(x, "layers"). Mixed-directedness multilayer is supported via a vector-valued symmetric attribute.

dgCMatrix inputs from the Matrix package are accepted but densified at construction (a one-shot cli inform is emitted); a true sparse storage backend is not yet implemented.

Inspect any netify object’s attribute bundle:

net <- netify(icews[icews$year == 2010, ],
              actor1 = "i", actor2 = "j",
              symmetric = FALSE, weight = "verbCoop",
              nodal_vars = "i_polity2",
              dyad_vars  = "matlCoop")
#> ℹ `missing_to_zero` is set to "TRUE" (the default).
#> ! Missing dyads will be filled with zeros. For latent space or other
#>   statistical network models, structural zeros and missing data have different
#>   meanings. Set `missing_to_zero = FALSE` to preserve NAs if this distinction
#>   matters for your analysis.
#> This message is displayed once per session.

class(net)
#> [1] "netify"
str(attributes(net), max.level = 1)
#> List of 17
#>  $ dim               : int [1:2] 152 152
#>  $ dimnames          :List of 2
#>  $ class             : chr "netify"
#>  $ netify_type       : chr "cross_sec"
#>  $ actor_time_uniform: logi TRUE
#>  $ actor_pds         :'data.frame':  152 obs. of  3 variables:
#>  $ weight            : chr "verbCoop"
#>  $ detail_weight     : chr "Weights from `verbCoop`"
#>  $ is_binary         : logi FALSE
#>  $ symmetric         : logi FALSE
#>  $ mode              : chr "unipartite"
#>  $ layers            : chr "verbCoop"
#>  $ diag_to_NA        : logi TRUE
#>  $ missing_to_zero   : logi TRUE
#>  $ sum_dyads         : logi FALSE
#>  $ nodal_data        :'data.frame':  152 obs. of  2 variables:
#>  $ dyad_data         :List of 1

Key attributes:

netify_type – "cross_sec", "longit_array", or "longit_list"
mode – "unipartite" or "bipartite"
symmetric – scalar logical (or named vector for mixed-directedness multilayer)
weight – column name (or NULL for binary)
is_binary, detail_weight, diag_to_NA, missing_to_zero, sum_dyads
layers – character vector; length > 1 means multilayer
actor_pds – data.frame with actor, min_time, max_time per actor
nodal_data – data.frame with actor, optional time, and one column per nodal variable
dyad_data – nested list: list[[time]][[var]] = matrix. Cross-sec uses "1" as the time key.

extracting parts

Want this	Use
The raw matrix / array / list	`get_raw(net)`
A quick numeric peek	`peek(net, from = 5, to = 5)`
The full long edge data frame	`unnetify(net)` or `tidy(net)`
Lean wide-to-long edge frame (no nodal merge)	`melt(net)`
Graph-level stats	`summary(net)` or `glance(net)`
Actor-level stats	`summary_actor(net)`
Size / composition descriptors	`measurements(net)`
Edge data + layout for plotting	`net_plot_data(net)$net_dfs`

tidy() and glance() are S3 methods on the broom generics – they’re registered on package load if generics (or anything that imports it, like broom, dplyr, tidymodels) is installed. No hard dependency on broom.

tidy interop

If tibble and broom are installed, three tidyverse-flavored entry points are available. as_tibble.netify is registered against tibble::as_tibble via .onLoad, so tibble must be installed to use the unprefixed call; tidy() and glance() are likewise registered against the broom generics.

library(tibble)
library(broom)

# one row per dyad (long edge frame, wrapped in a tibble)
as_tibble(net)
#> # A tibble: 22,952 × 6
#>    from        to         verbCoop matlCoop i_polity2_from i_polity2_to
#>    <chr>       <chr>         <dbl>    <dbl>          <int>        <int>
#>  1 Afghanistan Albania           0        1             NA            9
#>  2 Afghanistan Algeria           0        0             NA            2
#>  3 Afghanistan Angola            0        0             NA           -2
#>  4 Afghanistan Argentina         1        0             NA            8
#>  5 Afghanistan Armenia           7        2             NA            5
#>  6 Afghanistan Australia       125        0             NA           10
#>  7 Afghanistan Austria           1        0             NA           10
#>  8 Afghanistan Azerbaijan        7        0             NA           -7
#>  9 Afghanistan Bahrain           3        0             NA           -5
#> 10 Afghanistan Bangladesh       14        0             NA            5
#> # ℹ 22,942 more rows

# broom-style tidy summary: a tibble with one row per (non-zero) dyad
head(tidy(net))
#> # A tibble: 6 × 6
#>   from        to         verbCoop matlCoop i_polity2_from i_polity2_to
#>   <chr>       <chr>         <dbl>    <dbl>          <int>        <int>
#> 1 Afghanistan Argentina         1        0             NA            8
#> 2 Afghanistan Armenia           7        2             NA            5
#> 3 Afghanistan Australia       125        0             NA           10
#> 4 Afghanistan Austria           1        0             NA           10
#> 5 Afghanistan Azerbaijan        7        0             NA           -7
#> 6 Afghanistan Bahrain           3        0             NA           -5

# one-row-per-network model-card summary (one row per time/layer if applicable)
glance(net)
#> # A tibble: 1 × 18
#>   net   num_actors density num_edges prop_edges_missing mean_edge_weight
#>   <chr>      <dbl>   <dbl>     <dbl>              <dbl>            <dbl>
#> 1 1            152   0.435      9976                  0             41.7
#> # ℹ 12 more variables: sd_edge_weight <dbl>, median_edge_weight <dbl>,
#> #   min_edge_weight <dbl>, max_edge_weight <dbl>, competition_row <dbl>,
#> #   competition_col <dbl>, sd_of_row_means <dbl>, sd_of_col_means <dbl>,
#> #   covar_of_row_col_means <dbl>, reciprocity <dbl>, mutual <dbl>,
#> #   transitivity <dbl>

as_tibble(net) and tidy(net) share the same long-format payload as unnetify(net) – they differ only in whether zero-weight edges are dropped by default (tidy() drops them, matching the broom convention). glance(net) is the broom-flavored sibling of summary(net) – one row of graph-level statistics per network (or per (time, layer) slice for longitudinal / multilayer inputs).

as_tibble() also has a method for netify_comparison objects (from compare_networks()), returning the per-pair $comparisons frame directly so you can pipe straight into filter() / arrange() / pivot_wider():

panel <- netify(icews[icews$year %in% c(2010, 2011), ],
                actor1 = "i", actor2 = "j", time = "year",
                symmetric = FALSE, weight = "verbCoop")
net_2010 <- subset_netify(panel, time = "2010")
net_2011 <- subset_netify(panel, time = "2011")
cmp <- compare_networks(list("2010" = net_2010, "2011" = net_2011))
as_tibble(cmp)
#> # A tibble: 1 × 5
#>   net_i net_j metric      value p_value
#>   <chr> <chr> <chr>       <dbl>   <dbl>
#> 1 2010  2011  correlation 0.884      NA

predicates and descriptors

For programmatic dispatch – e.g. inside a custom exporter or pipeline step – netify ships a small set of non-masking predicates and size accessors:

# class / structure predicates
is_netify(net)
#> [1] TRUE
is_bipartite(net)          # may be masked by igraph::is_bipartite
#> [1] FALSE
is_bipartite_netify(net)   # alias that won't be masked
#> [1] FALSE
is_directed_netify(net)
#> [1] TRUE
is_longitudinal(net)
#> [1] FALSE
is_multilayer(net)
#> [1] FALSE

# size / composition accessors
n_actors(net)              # number of unique actors
#> [1] 152
n_periods(net)             # number of time periods (1 for cross-sec)
#> [1] 1
n_layers(net)              # number of layers (1 for single-layer)
#> [1] 1
head(get_actor_time_info(net))   # stored actor_pds: actor, min_time, max_time
#>         actor min_time max_time
#> 1 Afghanistan        1        1
#> 2     Albania        1        1
#> 3     Algeria        1        1
#> 4      Angola        1        1
#> 5   Argentina        1        1
#> 6     Armenia        1        1

The _netify suffix on is_bipartite_netify() and is_directed_netify() avoids masking the same-named predicates from igraph and network. The unsuffixed versions exist too and defer to the foreign package if a non-netify graph object is passed.

object-level validation

validate_netify() is the developer’s pre-flight check. It walks the attribute bundle and confirms the invariants the rest of the package relies on:

validate_netify(net, verbose = TRUE)
#> ✔ netify object passes all coherence checks (11 checks).

The full list of checks:

netify_type – one of "cross_sec", "longit_array", "longit_list", and matches the underlying R object
mode – "unipartite" or "bipartite"
symmetric_type – scalar logical, or named logical of length n_layers for mixed-directedness multilayer
layers_consistent – attr(net, "layers") length matches the layer dimension where applicable
nodal_actors_known – every actor referenced in nodal_data exists in the network’s actor set
is_binary_consistent – the stored is_binary flag matches the actual content
symmetric_consistent – if stored as symmetric, the matrix content is actually symmetric
unipartite_dimnames – row and column names agree for unipartite networks
slice_dimnames_consistent – every slice of a longitudinal array/list has dimnames in the order recorded in actor_pds
nodal_time_known – time keys in nodal/dyad data are a subset of the network’s time axis

A quick demo on a clean netlet versus one tampered to introduce a stray actor:

# clean netlet ticks every box
all(unlist(validate_netify(net, verbose = FALSE)))
#> [1] TRUE

# tamper: inject a stray actor into nodal_data
bad <- net
nd <- attr(bad, "nodal_data")
nd <- rbind(nd, nd[1, , drop = FALSE])
nd$actor[nrow(nd)] <- "ZZZ_not_in_network"
attr(bad, "nodal_data") <- nd

validate_netify(bad, verbose = TRUE)
#> ! nodal_data references 1 actor not in the netlet: "ZZZ_not_in_network"
#> ✖ netify object failed 1 coherence check: "nodal_actors_known".

If a custom exporter or an internal manipulation breaks one of these, validate_netify() is the first place to look.

open-cohort panels with `actor_pds`

When the actor roster changes over time – entries, exits, attrition, contact-tracing windows – pass actor_time_uniform = FALSE and supply an actor_pds data.frame giving each actor’s [min_time, max_time] window. Each period’s netlet then contains only the actors whose window covers that period; densities, degree counts, and per-period actor sets respect those entry / exit boundaries.

the `actor_pds` roster, step by step

Three pieces have to line up for an open-cohort netlet to be well-formed:

A roster – one row per actor, with actor, min_time, max_time. The [min_time, max_time] interval is closed (the actor is alive in the boundary periods themselves).
An edgelist – one row per observed interaction. During construction, rows outside the roster windows are excluded because the referenced actor is not in the risk set for that period.
netify(actor_time_uniform = FALSE, actor_pds = roster, ...) – this is what tells the constructor to honor the roster instead of treating every actor as present in every period.

Below, actor a is in the network only during periods 1-2 (enters at t = 1, exits after t = 2), and actors d / e arrive at t = 3. The edgelist deliberately includes a tie at t = 2 involving a and ties at t = 3 involving d so we can verify the period-by-period actor sets afterwards.

set.seed(1)

# roster: actors with closed-interval entry / exit times
roster <- data.frame(
    actor = c("a", "b", "c", "d", "e"),
    min_time = c(1, 1, 1, 3, 3),
    max_time = c(2, 5, 4, 5, 5)   # a exits after t = 2
)

# edges (only show up while both endpoints are in the roster)
edges <- data.frame(
    i = c("a", "a", "b", "c", "d", "c", "d", "e"),
    j = c("b", "c", "c", "b", "e", "d", "e", "b"),
    t = c(1, 2, 2, 3, 4, 3, 5, 5)
)

net_oc <- netify(edges,
    actor1 = "i", actor2 = "j", time = "t",
    actor_time_uniform = FALSE,
    actor_pds = roster
)

# read the roster back off the netlet itself
head(get_actor_time_info(net_oc))
#>   actor min_time max_time
#> 1     a        1        2
#> 2     b        1        5
#> 3     c        1        4
#> 4     d        3        5
#> 5     e        3        5
n_actors(net_oc)
#> [1] 4
n_periods(net_oc)
#> [1] 5

get_actor_time_info() on a netify object returns the stored actor_pds directly (it is also the argument name on netify() itself, so the round trip is exact). On a raw dyad data.frame the same function derives the roster from observed activity – that’s how you build the actor_pds argument in the first place.

This is the standard path for open-cohort longitudinal data – panel surveys with attrition, contact-tracing chains, organizational membership over time, animal co-occurrence with births / deaths, and similar settings where treating every actor as present in every period would distort denominators.

density and per-period actor sets

The key correctness guarantee: density (and every other per-period statistic) is computed against the actor set alive in that period, not the union of all actors ever observed. With the roster above:

t = 1: actors a, b, c are alive (3 actors)
t = 2: actors a, b, c are alive (3 actors; a’s last period)
t = 3: actors b, c, d, e are alive (4 actors; a is now out, d / e enter)
t = 4: actors b, c, d, e are alive (4 actors)
t = 5: actors b, d, e are alive (3 actors; c’s last period was 4)

oc_summary <- summary(net_oc)
oc_summary[, c("net", "num_actors", "density", "num_edges")]
#>   net num_actors   density num_edges
#> 1   1          3 0.3333333         1
#> 2   2          3 0.6666667         2
#> 3   3          4 0.3333333         2
#> 4   4          4 0.1666667         1
#> 5   5          3 0.6666667         2

The period-3 denominator is 4 x 3 = 12 (directed) or 4 x 3 / 2 = 6 (symmetric), not 5 x 4 – actor a is not counted because its max_time is 2. This is the load-bearing invariant: density at t = 3 excludes the actor present only in periods 1-2, so a researcher comparing densities across periods is not comparing apples to oranges.

The same accounting flows into summary_actor(), homophily(), mixing_matrix(), and the plot helpers – open-cohort actors never show up as zero-degree placeholders in periods they did not exist.

na versus zero in weighted networks

For weighted data, the distinction between 0 (observed, but no edge / zero-valued interaction) and NA (unobserved / not-at-risk) is semantically important in many domains – epidemiology, animal behavior, sparse survey rosters. By default netify(missing_to_zero = TRUE) fills unobserved dyads with 0. Pass missing_to_zero = FALSE to keep them as NA, in which case summary()$prop_edges_missing reports the non-trivial missingness fraction and downstream centrality / homophily routines propagate the NA semantics rather than silently treating the dyad as a zero-weight tie.

Both prop_edges_missing and prop_unknown_edges use the same denominator as density – the number of potential edge dyads (off-diagonal for unipartite + diag_to_NA, halved for symmetric, all cells for bipartite). That means density + prop_edges_missing + observed_zero_fraction = 1 is an identity, which is the property you want when you are reading them as competing fractions. The prop_unknown_edges column is suppressed when missing_to_zero = TRUE because every unobserved dyad has been filled with 0 and the value would be identically zero in every row; when present it tracks prop_edges_missing and serves as the “this netlet carries NA semantics” cue downstream.

writing a custom graph-level statistic

summary(net, other_stats = list(my_stat = fn)) accepts user-supplied functions. Each function receives the netify object for the current time period / layer (not a stripped matrix), so you can call netify_to_igraph(), peek(), or whatever you want inside.

Return a named numeric vector – names become column names in the output frame.

# example: number of weakly connected components with at least 2 nodes
n_components_2plus <- function(net) {
  g <- netify_to_igraph(net)
  c(n_components_2plus = sum(igraph::components(g)$csize >= 2))
}

# example: edge weight skewness
weight_skew <- function(net) {
  v <- as.vector(net)
  v <- v[!is.na(v) & v != 0]
  if (length(v) < 3) return(c(weight_skew = NA_real_))
  c(weight_skew = mean((v - mean(v))^3) / (stats::sd(v)^3))
}

summary(net, other_stats = list(
  comp = n_components_2plus,
  skew = weight_skew
))
#> ℹ `netify_to_igraph()` kept the igraph edge set unchanged.
#> • 131 dyadic covariate cells on non-edges cannot be stored as igraph edge
#>   attributes.
#> This message is displayed once per session.
#>   net num_actors   density num_edges prop_edges_missing mean_edge_weight
#> 1   1        152 0.4346462      9976                  0         41.72113
#>   sd_edge_weight median_edge_weight min_edge_weight max_edge_weight
#> 1       198.3157                  6               1            4937
#>   competition_row competition_col sd_of_row_means sd_of_col_means
#> 1      0.04071163      0.03493156         41.4412        37.76971
#>   covar_of_row_col_means reciprocity    mutual transitivity
#> 1              0.9934913   0.9823385 0.8402509    0.6386063
#>   comp.n_components_2plus skew.weight_skew
#> 1                       1         13.76129

The same other_stats mechanism is available in summary_actor(), compare_networks(), homophily(), mixing_matrix(), and dyad_correlation(). In each case the function gets called once per time period / layer / iteration as appropriate; check the function’s ? for the exact contract.

writing a custom actor-level statistic

summary_actor(net, other_stats = list(my_stat = fn)). The function should return a vector with one value per actor (in row order):

# example: per-actor mean tie weight to non-isolates
mean_active_tie <- function(mat) {
  apply(mat, 1, function(row) {
    nonzero <- row[!is.na(row) & row != 0]
    if (length(nonzero) == 0) NA_real_ else mean(nonzero)
  })
}

head(summary_actor(net, stats = "fast", other_stats = list(mean_active = mean_active_tie)))
#>         actor degree_in degree_out degree_total prop_ties_in prop_ties_out
#> 1 Afghanistan        95         80          175    0.6291391     0.5298013
#> 2     Albania        53         51          104    0.3509934     0.3377483
#> 3     Algeria        78         94          172    0.5165563     0.6225166
#> 4      Angola        68         61          129    0.4503311     0.4039735
#> 5   Argentina        64         66          130    0.4238411     0.4370861
#> 6     Armenia        56         56          112    0.3708609     0.3708609
#>   prop_ties_total network_share_in network_share_out network_share_total
#> 1       0.5794702      0.025958050       0.021073497         0.023515773
#> 2       0.3443709      0.001170082       0.001158069         0.001164076
#> 3       0.5695364      0.002919199       0.003056150         0.002987674
#> 4       0.4271523      0.002349775       0.002011004         0.002180390
#> 5       0.4304636      0.003925903       0.004175777         0.004050840
#> 6       0.3708609      0.004995075       0.004862930         0.004929002
#>   strength_sum_in strength_sum_out strength_sum_total strength_avg_in
#> 1           10804             8771              19575      113.726316
#> 2             487              482                969        9.188679
#> 3            1215             1272               2487       15.576923
#> 4             978              837               1815       14.382353
#> 5            1634             1738               3372       25.531250
#> 6            2079             2024               4103       37.125000
#>   strength_avg_out strength_avg_total strength_std_in strength_std_out
#> 1        109.63750         111.857143       436.76245        385.75910
#> 2          9.45098           9.317308        10.72032         12.20215
#> 3         13.53191          14.459302        29.33396         26.61098
#> 4         13.72131          14.069767        22.56270         20.65279
#> 5         26.33333          25.938462        44.36392         44.54672
#> 6         36.14286          36.633929       103.35993        101.33983
#>   strength_std_total strength_median_in strength_median_out
#> 1          413.06464                  7                10.0
#> 2           11.41560                  5                 5.0
#> 3           27.81441                  4                 4.0
#> 4           21.59904                  4                 5.0
#> 5           44.28602                  6                 7.0
#> 6          101.89396                  4                 4.5
#>   strength_median_total mean_active
#> 1                   8.0   109.63750
#> 2                   5.0     9.45098
#> 3                   4.0    13.53191
#> 4                   5.0    13.72131
#> 5                   6.5    26.33333
#> 6                   4.0    36.14286

reading the dyad_data nested list

attr(net, "dyad_data") is the structure that gives netify O(1) access to per-time dyadic covariates. Its shape is:

dd <- attr(net, "dyad_data")
names(dd)             # cross-sec: just "1"
#> [1] "1"
names(dd[["1"]])      # one entry per dyadic variable
#> [1] "matlCoop"
str(dd[["1"]][["matlCoop"]])
#>  int [1:152, 1:152] 0 4 0 0 0 2 19 0 1 0 ...
#>  - attr(*, "dimnames")=List of 2
#>   ..$ : chr [1:152] "Afghanistan" "Albania" "Algeria" "Angola" ...
#>   ..$ : chr [1:152] "Afghanistan" "Albania" "Algeria" "Angola" ...

For longitudinal networks the top level has one entry per period:

# pseudo-structure for a 3-period network with 2 dyadic vars:
# dd[["2010"]][["matlCoop"]]   -> n x n matrix
# dd[["2010"]][["verbConf"]]   -> n x n matrix
# dd[["2011"]][["matlCoop"]]   -> n x n matrix
# ... etc.

If you’re writing a converter (e.g., to a new modeling format), this is the structure to iterate over.

writing a custom exporter (`to_*` function)

The existing to_amen(), to_lame(), to_dbn(), to_igraph(), to_statnet() give you templates. to_amen() exports the standard amen-style data structure; to_lame() wraps that exporter with mode, family, and an executable ame_call snippet for lame workflows. A new exporter generally needs to:

Validate the netify object with validate_netify(netlet, verbose = FALSE)
Branch on attr(netlet, "netify_type") (cross_sec / longit_array / longit_list)
For multilayer, decide whether to iterate per layer (use subset_netify(netlet, layers = X) and return a named list) or to produce a joint structure (4D array, etc.)
Pull the raw network data with get_raw(netlet)
Pull nodal data with attr(netlet, "nodal_data") and align to your target package’s actor order
Pull dyadic data with attr(netlet, "dyad_data") and reshape to your target format
Handle missing values according to your target package’s convention (most don’t accept NAs)

If your target package expects per-layer outputs but takes multilayer netify inputs, the standard pattern is:

to_mymodel <- function(netlet, ...) {
    validate_netify(netlet, verbose = FALSE)
    layer_names <- attributes(netlet)$layers
    if (length(layer_names) > 1) {
        out <- lapply(layer_names, function(lyr) {
            to_mymodel(subset_netify(netlet, layers = lyr), ...)
        })
        names(out) <- layer_names
        return(out)
    }
    # ... single-layer logic ...
}

This is what to_igraph(), to_statnet(), and to_lame() do.

performance characteristics

Operation	Complexity	Notes
`netify(df)` cross-sec	O(N * E)	C++ via Rcpp; fast
`netify(df, time = ...)` longit	O(N * E * T)	C++; fast
`summary(net)`	O(N^2) per period	igraph backend
`summary_actor(net)`	O(N^2) per period x per layer	igraph; closeness/betweenness dominate
`bootstrap_netlet(net, fn, n_boot = B)`	O(B * n * N^2)	depends on the statistic supplied in `fn`
`compare_networks(method = "qap")`	O(R * N^2)	C++ permutations; tunable via `n_permutations`
`compare_networks(method = "spectral")`	O(N^3)	use `spectral_rank = round(sqrt(N))` for large nets
`unnetify(net)`	O(N^2 * T)	use `remove_zeros = TRUE` for sparse output
`melt(net)`	O(N^2 * T)	leaner than `unnetify`; no nodal merge
`plot(net)`	O(N^2) layout + O(E) render	igraph layout dominates

For networks of a few hundred actors over a dozen time periods, everything runs in seconds. The C++ routines for netify(), compare_networks() QAP, and similarity calculations handle the heaviest paths. The R-side wrappers (especially in plotting and attribute handling) are not as optimized.

Memory budget rule of thumb. netify stores adjacencies densely: an N x N double-precision matrix is 8 * N^2 bytes. For longitudinal stacks the cost is 8 * N^2 * T (longit_array) or the sum over per-period sizes (longit_list, when actor composition varies). Concretely:

N	per-snapshot RAM	T = 12 (monthly year)	T = 52 (weekly year)
200	320 KB	3.8 MB	17 MB
1,000	8 MB	96 MB	416 MB
5,000	200 MB	2.4 GB	10 GB
10,000	800 MB	9.4 GB	41 GB
50,000	20 GB	234 GB	1 TB

longit_list netlets cost the sum of per-period sizes rather than T x max(N)^2, so they pay less when actor composition is sparse over time. For 15,000-node weekly Twitter snapshots over a year (T = 52), a longit_array is ~93 GB and would not fit on a laptop; a longit_list whose typical-period N is much smaller is the only viable in-memory shape.

Above ~10,000 actors, prefer building the netlet from an edgelist data.frame (skips the dense intermediate at construction) and consider exiting to an edge data.frame via unnetify(net, remove_zeros = TRUE), or to igraph via to_igraph(net) for community detection and other large-N algorithms. summary_actor(stats = "fast") skips closeness / betweenness / eigen / HITS and auto-promotes when N exceeds getOption("netify.fast_threshold", 1500L).

Sparse-input guard. Passing a Matrix::dgCMatrix (or any sparseMatrix) with density < 1% and N > 5,000 aborts construction with a pointer to the edgelist path. The motivating case is a 15K x 15K follower graph at density 0.001: densifying allocates ~1.7 GB of mostly zeros, which is almost never what the caller wants. Pass force_dense = TRUE to override if you really do want the dense allocation.

Benchmark, three ER networks. Wall-clock for the dense-matrix path on a representative laptop (single core, single snapshot, p = 0.01). Re-run locally with the chunk below if you want numbers for your own machine.

library(netify)
set.seed(1)
bench_one <- function(N, p = 0.01) {
  # build an er adjacency directly as an edgelist (skips the dense intermediate)
  i <- sample.int(N, size = round(p * N * N), replace = TRUE)
  j <- sample.int(N, size = length(i), replace = TRUE)
  df <- data.frame(from = i, to = j)

  t0 <- Sys.time(); net <- netify(df, actor1 = "from", actor2 = "to"); t_build <- Sys.time() - t0
  t0 <- Sys.time(); s   <- summary(net);                                 t_summary <- Sys.time() - t0
  t0 <- Sys.time(); sa  <- summary_actor(net, stats = "fast");           t_actor_fast <- Sys.time() - t0
  t0 <- Sys.time(); ig  <- to_igraph(net);                               t_igraph <- Sys.time() - t0

  data.frame(N = N,
             build_s = as.numeric(t_build, units = "secs"),
             summary_s = as.numeric(t_summary, units = "secs"),
             summary_actor_fast_s = as.numeric(t_actor_fast, units = "secs"),
             to_igraph_s = as.numeric(t_igraph, units = "secs"))
}
do.call(rbind, lapply(c(1000, 5000, 10000), bench_one))

Indicative results (single-core, 16 GB laptop):

N	`netify()`	`summary()`	`summary_actor(stats = "fast")`	`to_igraph()`
1,000	< 1 s	< 1 s	< 1 s	< 1 s
5,000	~2 s	~4 s	~1 s	< 1 s
10,000	~2 s	~14 s	~3 s	~3 s

summary() is dominated by the igraph-based global metrics and grows fastest; netify() itself stays under a few seconds even at N = 10,000 when fed an edgelist. The full summary_actor() (with closeness / betweenness / eigen / HITS) blows up roughly quadratically – at N = 10,000 it takes several minutes versus the few seconds shown above for the fast path. That’s why the auto-promote fires by default.

Cassy Dorff and Shahryar Minhas

2026-07-12

the netify object: a base r object with attributes

extracting parts

tidy interop

predicates and descriptors

object-level validation

open-cohort panels with `actor_pds`

the `actor_pds` roster, step by step

density and per-period actor sets

na versus zero in weighted networks

writing a custom graph-level statistic

writing a custom actor-level statistic

reading the dyad_data nested list

writing a custom exporter (`to_*` function)

performance characteristics

see also

Internals: object structure & writing custom extensions

Cassy Dorff and Shahryar Minhas

2026-07-12

the netify object: a base r object with attributes

extracting parts

tidy interop

predicates and descriptors

object-level validation

open-cohort panels with actor_pds

the actor_pds roster, step by step

density and per-period actor sets

na versus zero in weighted networks

writing a custom graph-level statistic

writing a custom actor-level statistic

reading the dyad_data nested list

writing a custom exporter (to_* function)

performance characteristics

see also

open-cohort panels with `actor_pds`

the `actor_pds` roster, step by step

writing a custom exporter (`to_*` function)