tests whether connected actors have similar attributes (homophily). calculates the correlation between attribute similarity and tie presence, with support for multiple similarity metrics and significance testing.
Usage
homophily(
netlet,
attribute,
method = "correlation",
threshold = 0,
signed_handling = c("abs", "drop_negative", "preserve_sign"),
significance_test = TRUE,
n_permutations = 1000,
alpha = 0.05,
other_stats = NULL,
...
)Arguments
- netlet
a netify object containing network data.
- attribute
character string specifying the nodal attribute to analyze.
- method
character string specifying the similarity metric:
- "correlation"
negative absolute difference for continuous data (default)
- "euclidean"
negative euclidean distance for continuous data
- "manhattan"
negative manhattan/city-block distance for continuous data
- "cosine"
cosine similarity for continuous data
- "categorical"
binary similarity (0/1) for categorical data
- "jaccard"
jaccard similarity for binary/presence-absence data
- "hamming"
negative hamming distance for categorical data
- threshold
numeric value or function to determine tie presence in weighted networks. if numeric, edges with weights > threshold are considered ties. if a function, it should take the network matrix and return a logical matrix. default is 0 (any positive weight is a tie). common values: 0 (default), mean(weights), median(weights), or quantile-based thresholds. for pre-binarized networks, consider using
mutate_weights()first.- signed_handling
character. strategy for signed (negative-weight) edges:
"abs"(default) take absolute values before thresholding so any tie magnitude – positive or negative – can become a connection.
"drop_negative"set negative weights to zero before thresholding (positive ties only).
"preserve_sign"treat any non-zero entry (positive or negative) as a connection;
thresholdis ignored for sign decisions.
ignored when the network has no negative weights.
- significance_test
logical. whether to perform a dyad-level permutation test. default TRUE.
- n_permutations
number of permutations for significance testing. default 1000.
- alpha
significance level for confidence intervals. default 0.05.
- other_stats
named list of custom functions for additional statistics.
- ...
additional arguments passed to custom functions.
Value
data frame with homophily statistics per network/time period:
netnetwork/time identifier
layerlayer name
attributeanalyzed attribute name
methodsimilarity method used
threshold_valuethreshold used for determining ties (na for binary networks)
homophily_correlationcorrelation between similarity and tie presence (binary: tie/no tie)
mean_similarity_connectedmean similarity among connected pairs (weight > threshold)
mean_similarity_unconnectedmean similarity among unconnected pairs (weight <= threshold or missing)
similarity_differencedifference between connected and unconnected mean similarities
p_valuepermutation test p-value
ci_lower,ci_upperconfidence interval bounds
n_connected_pairsnumber of connected pairs
n_unconnected_pairsnumber of unconnected pairs
Details
auto-promotion to categorical:
if you leave method at its default and pass a character,
factor, or logical attribute, homophily() will
switch to method = "categorical" automatically and inform you
once per attribute. this avoids the c++-level error that would
otherwise come from feeding non-numeric data to the correlation-based
similarity routine. pass method = "categorical" (or any other
explicit choice) to silence the message.
similarity metrics:
for continuous attributes:
correlation: based on absolute difference, good general purpose metriceuclidean: similar to correlation for single attributesmanhattan: less sensitive to outliers than euclideancosine: useful for normalized data or when sign matters
for categorical/binary attributes:
categorical: simple matching (1 if same, 0 if different)jaccard: for binary data, emphasizes shared presence over shared absencehamming: counts positions where values differ (negated for similarity)
threshold parameter:
for weighted networks, the threshold parameter determines what edge weights
constitute a "connection". you can specify:
a numeric value: edges with weight > threshold are ties
a function: should take a matrix and return a single numeric threshold
common threshold functions:
function(x) mean(x, na.rm = TRUE)- mean weightfunction(x) median(x, na.rm = TRUE)- median weightfunction(x) quantile(x, 0.75, na.rm = TRUE)- 75th percentile
for more complex binarization needs (e.g., different thresholds by time period),
consider using mutate_weights() to pre-process your network.
permutation test:
when significance_test = TRUE, homophily() holds the tie
indicators fixed and permutes the dyad-level similarity values to form an
exploratory reference distribution. the resulting p-value and confidence
interval summarize how unusual the observed dyad-level association is under
that exchangeability assumption. they are not a node-label permutation test
and should not be read as a causal estimate of tie formation.
Examples
# quick homophily check on the bundled classroom friendship data
data(classroom_edges)
data(classroom_nodes)
net <- netify(
classroom_edges,
actor1 = "from", actor2 = "to",
symmetric = TRUE,
nodal_data = classroom_nodes
)
# do students cluster by gender?
homophily(net, attribute = "gender", method = "categorical")
#> net layer attribute method threshold_value homophily_correlation
#> 1 1 weight1 gender categorical NA 0.07436588
#> mean_similarity_connected mean_similarity_unconnected similarity_difference
#> 1 0.6666667 0.5520833 0.1145833
#> p_value ci_lower ci_upper n_connected_pairs n_unconnected_pairs
#> 1 0.1278721 -0.02260239 0.1701738 51 384
#> n_missing n_pairs
#> 1 0 435
if (FALSE) { # \dontrun{
# basic homophily analysis with default threshold (> 0)
homophily_default <- homophily(net, attribute = "group")
# using different similarity metrics for continuous data
homophily_manhattan <- homophily(
net,
attribute = "age",
method = "manhattan" # less sensitive to outliers
)
# for binary attributes (e.g., gender, membership)
homophily_jaccard <- homophily(
net,
attribute = "member",
method = "jaccard" # better for binary data than correlation
)
# for categorical attributes
homophily_categorical <- homophily(
net,
attribute = "department",
method = "categorical"
)
# combining method and threshold
homophily_combined <- homophily(
net,
attribute = "score",
method = "manhattan",
threshold = function(x) quantile(x, 0.75, na.rm = TRUE)
)
} # }