Skip to contents

Tests whether connected actors have similar attributes (homophily). Calculates the correlation between attribute similarity and tie presence, with support for multiple similarity metrics and significance testing.

Usage

homophily(
  netlet,
  attribute,
  method = "correlation",
  threshold = 0,
  significance_test = TRUE,
  n_permutations = 1000,
  alpha = 0.05,
  other_stats = NULL,
  ...
)

Arguments

netlet

A netify object containing network data.

attribute

Character string specifying the nodal attribute to analyze.

method

Character string specifying the similarity metric:

"correlation"

Negative absolute difference for continuous data (default)

"euclidean"

Negative euclidean distance for continuous data

"manhattan"

Negative Manhattan/city-block distance for continuous data

"cosine"

Cosine similarity for continuous data

"categorical"

Binary similarity (0/1) for categorical data

"jaccard"

Jaccard similarity for binary/presence-absence data

"hamming"

Negative Hamming distance for categorical data

threshold

Numeric value or function to determine tie presence in weighted networks. If numeric, edges with weights > threshold are considered ties. If a function, it should take the network matrix and return a logical matrix. Default is 0 (any positive weight is a tie). Common values: 0 (default), mean(weights), median(weights), or quantile-based thresholds. For pre-binarized networks, consider using mutate_weights() first.

significance_test

Logical. Whether to perform permutation test. Default TRUE.

n_permutations

Number of permutations for significance testing. Default 1000.

alpha

Significance level for confidence intervals. Default 0.05.

other_stats

Named list of custom functions for additional statistics.

...

Additional arguments passed to custom functions.

Value

Data frame with homophily statistics per network/time period:

net

Network/time identifier

layer

Layer name

attribute

Analyzed attribute name

method

Similarity method used

threshold_value

Threshold used for determining ties (NA for binary networks)

homophily_correlation

Correlation between similarity and tie presence (binary: tie/no tie)

mean_similarity_connected

Mean similarity among connected pairs (weight > threshold)

mean_similarity_unconnected

Mean similarity among unconnected pairs (weight <= threshold or missing)

similarity_difference

Difference between connected and unconnected mean similarities

p_value

Permutation test p-value

ci_lower, ci_upper

Confidence interval bounds

n_connected_pairs

Number of connected pairs

n_unconnected_pairs

Number of unconnected pairs

Details

Similarity Metrics:

For continuous attributes:

  • correlation: Based on absolute difference, good general purpose metric

  • euclidean: Similar to correlation for single attributes

  • manhattan: Less sensitive to outliers than euclidean

  • cosine: Useful for normalized data or when sign matters

For categorical/binary attributes:

  • categorical: Simple matching (1 if same, 0 if different)

  • jaccard: For binary data, emphasizes shared presence over shared absence

  • hamming: Counts positions where values differ (negated for similarity)

Threshold Parameter:

For weighted networks, the threshold parameter determines what edge weights constitute a "connection". You can specify:

  • A numeric value: edges with weight > threshold are ties

  • A function: should take a matrix and return a single numeric threshold

  • Common threshold functions:

    • function(x) mean(x, na.rm = TRUE) - mean weight

    • function(x) median(x, na.rm = TRUE) - median weight

    • function(x) quantile(x, 0.75, na.rm = TRUE) - 75th percentile

For more complex binarization needs (e.g., different thresholds by time period), consider using mutate_weights() to pre-process your network.

Examples

if (FALSE) { # \dontrun{
# Basic homophily analysis with default threshold (> 0)
homophily_default <- homophily(net, attribute = "group")

# Using different similarity metrics for continuous data
homophily_manhattan <- homophily(
    net,
    attribute = "age",
    method = "manhattan" # Less sensitive to outliers
)

# For binary attributes (e.g., gender, membership)
homophily_jaccard <- homophily(
    net,
    attribute = "member",
    method = "jaccard" # Better for binary data than correlation
)

# For categorical attributes
homophily_categorical <- homophily(
    net,
    attribute = "department",
    method = "categorical"
)

# Combining method and threshold
homophily_combined <- homophily(
    net,
    attribute = "score",
    method = "manhattan",
    threshold = function(x) quantile(x, 0.75, na.rm = TRUE)
)
} # }