Netify: Manual Plotting with ggplot2

This vignette provides an overview of how to create customizable plots using ggplot2 while still using netify to prepare the data.

Lets load the necessary libraries.

library(netify)
library(ggplot2)

We’ll also use the ggnewscale package to create multiple legends when necessary in the same plot (e.g., if you want to have legends for a color aesthetic for both nodes and edges).

if(!'ggnewscale' %in% rownames(installed.packages())){
  install.packages('ggnewscale', repos='https://cloud.r-project.org') }
library(ggnewscale)

Preparing data

First lets create a netlet object from some dyadic data (ICEWS data) using the netify package.

# load icews data
data(icews)

# choose attributes
nvars = c( 'i_polity2', 'i_log_gdp', 'i_log_pop' )
dvars = c( 'matlCoop', 'verbConf', 'matlConf' )

# create a netify object
netlet = netify(
    dyad_data=icews, actor1='i', actor2='j',
    time = 'year',
    symmetric=FALSE, weight='verbCoop',
    mode='unipartite', sum_dyads=FALSE,
    actor_time_uniform=TRUE, actor_pds=NULL,
    diag_to_NA=TRUE, missing_to_zero=TRUE,
    nodal_vars = nvars, 
    dyad_vars = dvars
)

# subset to a few actors
actors_to_keep = c(
  'Australia', 'Brazil',
  'Canada', 'Chile', 'China',
  'Colombia', 'Egypt', 'Ethiopia',
  'France', 'Germany', 'Ghana', 
  'Hungary', 'India', 'Indonesia', 
  'Iran, Islamic Republic Of', 
  'Israel', 'Italy', 'Japan', 'Kenya', 
  "Korea, Democratic People's Republic Of",
  'Korea, Republic Of', 'Nigeria', 'Pakistan', 
  'Qatar', 'Russian Federation', 'Saudi Arabia',
  'South Africa', 'Spain', 'Sudan', 
  'Syrian Arab Republic', 'Thailand', 
  'United Kingdom', 'United States', 
  'Zimbabwe' )
netlet = subset_netlet(
  netlet, 
  what_to_subset = actors_to_keep
)

# print
netlet

## ✔ Hello, you have created network data, yay!
## • Unipartite
## • Asymmetric
## • Weights from `verbCoop`
## • Longitudinal: 13 Periods
## • # Unique Actors: 34
## Network Summary Statistics (averaged across time):
##           dens miss    mean recip trans
## verbCoop 0.887    0 179.484 0.978 0.928
## • Nodal Features: i_polity2, i_log_gdp, i_log_pop
## • Dyad Features: matlCoop, verbConf, matlConf

This is a longitudinal, weighted network with nodal and dyadic attributes. In a few more steps we will show how to highlight these attributes in the plot.

Next, we use the net_plot_data function to create a data frame for ggplot2. net_plot_data extracts and sets up node and edge data from a netify object according to specified plotting arguments. It returns a list of different components but the most important one for users is the net_dfs element. This element contains two objects: edge_data and nodal_data. These are data frames that can be passed to ggplot2.

# create a data frame for plotting
plot_data = net_plot_data(netlet)

# get relevant dfs
net_dfs = plot_data$net_dfs

# check structure of what's here
str(net_dfs)

## List of 2
##  $ edge_data :'data.frame':  12937 obs. of  11 variables:
##   ..$ from    : chr [1:12937] "Australia" "Australia" "Australia" "Australia" ...
##   ..$ to      : chr [1:12937] "Brazil" "Brazil" "Brazil" "Brazil" ...
##   ..$ time    : chr [1:12937] "2002" "2003" "2004" "2005" ...
##   ..$ verbCoop: num [1:12937] 3 3 24 27 54 4 26 7 12 5 ...
##   ..$ matlCoop: num [1:12937] 0 1 0 0 0 0 0 0 1 0 ...
##   ..$ verbConf: num [1:12937] 0 2 0 2 3 0 2 1 0 0 ...
##   ..$ matlConf: num [1:12937] 0 0 2 0 1 0 0 0 0 0 ...
##   ..$ x1      : num [1:12937] -3.187 -0.3978 -0.0496 -0.3853 -0.5135 ...
##   ..$ y1      : num [1:12937] 2.535 -1.3386 0.0651 0.9399 1.4266 ...
##   ..$ x2      : num [1:12937] -2.9515 -0.0649 -0.0429 -0.2949 -0.3298 ...
##   ..$ y2      : num [1:12937] 2.315 -1.441 -0.148 1.045 1.485 ...
##  $ nodal_data:'data.frame':  442 obs. of  10 variables:
##   ..$ name      : chr [1:442] "Australia" "Australia" "Australia" "Australia" ...
##   ..$ time      : chr [1:442] "2002" "2003" "2004" "2005" ...
##   ..$ i_polity2 : int [1:442] 10 10 10 10 10 10 10 10 10 10 ...
##   ..$ i_log_gdp : num [1:442] 27.6 27.6 27.6 27.7 27.7 ...
##   ..$ i_log_pop : num [1:442] 16.8 16.8 16.8 16.8 16.8 ...
##   ..$ x         : num [1:442] -3.187 -0.3978 -0.0496 -0.3853 -0.5135 ...
##   ..$ y         : num [1:442] 2.535 -1.3386 0.0651 0.9399 1.4266 ...
##   ..$ name_text : chr [1:442] "Australia" "Australia" "Australia" "Australia" ...
##   ..$ name_label: chr [1:442] "Australia" "Australia" "Australia" "Australia" ...
##   ..$ id        : chr [1:442] "Australia_2002" "Australia_2003" "Australia_2004" "Australia_2005" ...

# check the first few rows of the edge data
head(net_dfs$edge_data)

##        from     to time verbCoop matlCoop verbConf matlConf          x1
## 1 Australia Brazil 2002        3        0        0        0 -3.18695808
## 2 Australia Brazil 2003        3        1        2        0 -0.39782446
## 3 Australia Brazil 2004       24        0        0        2 -0.04957632
## 4 Australia Brazil 2005       27        0        2        0 -0.38526735
## 5 Australia Brazil 2006       54        0        3        1 -0.51353531
## 6 Australia Brazil 2007        4        0        0        0 -0.69085788
##            y1          x2         y2
## 1  2.53500179 -2.95152256  2.3151148
## 2 -1.33861627 -0.06486153 -1.4411348
## 3  0.06513246 -0.04293486 -0.1479501
## 4  0.93989844 -0.29488038  1.0449019
## 5  1.42660061 -0.32983771  1.4847187
## 6  1.79965809 -0.86547402  1.7542202

# check the first few rows of the nodal data
head(net_dfs$nodal_data)

##        name time i_polity2 i_log_gdp i_log_pop           x           y
## 1 Australia 2002        10  27.55492  16.78568 -3.18695808  2.53500179
## 2 Australia 2003        10  27.58556  16.79718 -0.39782446 -1.33861627
## 3 Australia 2004        10  27.62686  16.80787 -0.04957632  0.06513246
## 4 Australia 2005        10  27.65791  16.82005 -0.38526735  0.93989844
## 5 Australia 2006        10  27.68495  16.83354 -0.51353531  1.42660061
## 6 Australia 2007        10  27.72203  16.85179 -0.69085788  1.79965809
##   name_text name_label             id
## 1 Australia  Australia Australia_2002
## 2 Australia  Australia Australia_2003
## 3 Australia  Australia Australia_2004
## 4 Australia  Australia Australia_2005
## 5 Australia  Australia Australia_2006
## 6 Australia  Australia Australia_2007

The x and y in nodal_data and the x1, y1, x2, and y2 in edge_data are the coordinates of the nodes and edges, respectively. These are the coordinates that will be used to plot the network.

Creating a plot

Now that we have the data, we can create a plot using ggplot2. We’ll use the geom_segment and geom_point (or, geom_label, geom_text, and the `ggrepel package equivalents) functions to plot the edges and nodes, respectively.

ggplot() + 
    geom_segment(
        data = net_dfs$edge_data,
        aes(
            x=x1, 
            y=y1, 
            xend=x2, 
            yend=y2
        ),
        color='lightgrey',
        alpha=.2
    ) +
    geom_point(
        data = net_dfs$nodal_data,
        aes(
            x=x, 
            y=y,
            size=i_log_pop, 
            color=i_polity2
        )
    ) +
  labs(
    color='Polity', 
    size='Log(Pop.)'
  ) +
    scale_color_gradient(low='#a6bddb', high='#014636') +
    facet_wrap(~time, scales='free') +
    theme_netify()

Changing the layout

By default layouts for node positions are drawn from the layout_nicely algorithm in the igraph package. Users can specify other layouts as, for example, say that you wanted to use the mds algorithm instead:

# create a df using mds instead 
plot_data_mds = net_plot_data(netlet, 
    list(
        layout='mds'
        )
    )

# see new x-y coordinates
lapply(plot_data_mds$net_dfs, head)

## $edge_data
##        from     to time verbCoop matlCoop verbConf matlConf          x1
## 1 Australia Brazil 2002        3        0        0        0 -0.35485450
## 2 Australia Brazil 2003        3        1        2        0  0.20857397
## 3 Australia Brazil 2004       24        0        0        2 -0.04224981
## 4 Australia Brazil 2005       27        0        2        0 -0.06014447
## 5 Australia Brazil 2006       54        0        3        1  0.15073774
## 6 Australia Brazil 2007        4        0        0        0 -0.03161665
##            y1         x2         y2
## 1 -0.16803048 -0.7281113  0.3619659
## 2 -0.12705283  0.2552975 -0.1025320
## 3  0.05123628  0.2101186  0.9327511
## 4 -0.02708750  0.4552562  0.1927606
## 5  0.46114648  0.1278155  0.3895673
## 6 -0.03885021 -0.3606698  0.8028908
## 
## $nodal_data
##        name time i_polity2 i_log_gdp i_log_pop           x           y
## 1 Australia 2002        10  27.55492  16.78568 -0.35485450 -0.16803048
## 2 Australia 2003        10  27.58556  16.79718  0.20857397 -0.12705283
## 3 Australia 2004        10  27.62686  16.80787 -0.04224981  0.05123628
## 4 Australia 2005        10  27.65791  16.82005 -0.06014447 -0.02708750
## 5 Australia 2006        10  27.68495  16.83354  0.15073774  0.46114648
## 6 Australia 2007        10  27.72203  16.85179 -0.03161665 -0.03885021
##   name_text name_label             id
## 1 Australia  Australia Australia_2002
## 2 Australia  Australia Australia_2003
## 3 Australia  Australia Australia_2004
## 4 Australia  Australia Australia_2005
## 5 Australia  Australia Australia_2006
## 6 Australia  Australia Australia_2007

Add Edge Information

So far, we have focused on using color to convey information about nodal attributes in the network (population size and polity score). Now, let’s add more edge information to the plot. For example, we can include information about the matlConf dyadic attribute. Imagine we want to highlight edges of verbal cooperation that occur at the same time as when higher than average levels of material conflict occur in the network. First, let’s create the variable in the edge data.

if(!'dplyr' %in% rownames(installed.packages())){
  install.packages('dplyr', repos='https://cloud.r-project.org') }
library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

# create high_matlConf variable
net_dfs$edge_data = net_dfs$edge_data |>
    group_by(time) |>
    mutate(
        high_matlConf = matlConf > mean(matlConf, na.rm=TRUE)
    ) |>
    ungroup() |>
    as.data.frame()

# check
head(net_dfs$edge_data)

##        from     to time verbCoop matlCoop verbConf matlConf          x1
## 1 Australia Brazil 2002        3        0        0        0 -3.18695808
## 2 Australia Brazil 2003        3        1        2        0 -0.39782446
## 3 Australia Brazil 2004       24        0        0        2 -0.04957632
## 4 Australia Brazil 2005       27        0        2        0 -0.38526735
## 5 Australia Brazil 2006       54        0        3        1 -0.51353531
## 6 Australia Brazil 2007        4        0        0        0 -0.69085788
##            y1          x2         y2 high_matlConf
## 1  2.53500179 -2.95152256  2.3151148         FALSE
## 2 -1.33861627 -0.06486153 -1.4411348         FALSE
## 3  0.06513246 -0.04293486 -0.1479501         FALSE
## 4  0.93989844 -0.29488038  1.0449019         FALSE
## 5  1.42660061 -0.32983771  1.4847187         FALSE
## 6  1.79965809 -0.86547402  1.7542202         FALSE

Now that we have the new variable in the data.frame, we can plot by it but note that we now need a color aesthetic for both points and segments, even though ggplot2 only supports one legend by aesthetic by default. We can get around this by using the new_scale_color function from the `ggnewscale package.

# color line segments by this new variable
ggplot() + 
    geom_segment(
        data = net_dfs$edge_data,
        aes(
            x=x1, 
            y=y1, 
            xend=x2, 
            yend=y2,
            color=high_matlConf
        ),
        alpha=.2
    ) +
    scale_color_manual(
      name='', 
      values=c('grey', 'red'),
      labels=c('Below Avg. Matl. Conf', 'Above Avg.')
      ) +
    new_scale_color() +
    geom_point(
        data = net_dfs$nodal_data,
        aes(
            x=x, 
            y=y,
            size=i_log_pop, 
            color=i_polity2
        )
    ) +
    scale_color_gradient(
      name='Polity',
      low='#a6bddb', high='#014636') +
  labs(
    size='Log(Pop.)'
  ) +
    facet_wrap(~time, scales='free') +
    theme_netify() +
  theme(
    legend.position='right'
  )

References

Boschee, Elizabeth; Lautenschlager, Jennifer; O’Brien, Sean; Shellman, Steve; Starz, James; Ward, Michael, 2015, ``ICEWS Coded Event Data’’, doi:10.7910/DVN/28075 , Harvard Dataverse.
Pedersen, T. L. (2020). ggnewscale: Multiple Fill and Colour Scales in ‘ggplot2’. R package version 0.4.3. https://CRAN.R-project.org/package=ggnewscale
Wickham, H. (2016). ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York.

Cassy Dorff and Shahryar Minhas

2024-11-07

Preparing data

Creating a plot

Changing the layout

Add Edge Information

References