Network Analysis / Visualization

Document Overview & Resources

This document provides a general overview of Netlytic’s network analysis features. 

Additional Documentation:

Introduction

In addition to its text analysis functions, Netlytic also provides users with network analysis capabilities. At its most basic, network analysis involves building networks from members (‘network actors’) connected together based on some common form of interaction (‘ties’). When building networks from interaction data, however, there are a lot of different parameters and threshold choices to choose from. For example, one of the choices that is likely to influence network formation is how to discover ties between individuals. Netlytic approaches this task by building two types of social networks: (1) Name network and (2) Chain (reply-to) network.

Name Networks

Figure 9, Name network pane, options for adjusting network parameters.

A name network is a social network built from mining personal names in the messages. To discover ties in Name networks, a user can choose from two primary options: “connect a sender to all names found in his/her messages” and/or “connect people whose names co-occur in the same messages”.

Both of these options, along with some additional network parameters, can be adjusted by clicking on the “See more processing options” field at the bottom of the ‘Name Network’ pane (See Figure 9). Then, once the user has decided how best to build his/her network, s/he need only click the “Analyze” button for Netlyic to automatically construct the network.

Further, by clicking on the number of users listed next to “# of names found” in the Name Network pane, users can review all names found by Netlytic in the network and add or delete names as necessary.

Chain Networks

Figure 10, Processing options for chain networks.

A chain network (also known as a “who replies to whom” network) is a social network based on participants’ posting behaviour. To build chain networks, Netlytic provides a range of options for tie discovery, similarly accessed by clicking on the “See more processing options” field at the bottom of the “Chain Network” pane.

Processing options here include connecting network actors to all senders in a reference chain, or only connecting network actors to the first and/or last senders to have posted in the chain. Users can further decide how each of these ties will be ‘weighted’ – some ties can be valued or counted differently than others – in the analysis (See Figure 10).

Network Visualization

Next, regardless of whether the user chooses to focus on name or chain networks (or to analyse both), s/he can explore the dataset interactively through Netlytic’s visualization capabilities. By clicking on the “Visualize” button in the “Chain Network” or  “Name Network” pane, the user can access the new, touch enabled, HTML5 Network Visualizer (See Figure 11). The HTML5 visualizer has customizable layout, node size, and cluster visibility features. Users can also label and export an image of the network.

Figure 11, Name network visualization.

Network Features

The Netlytic Visualization Screen hasthree main areas of interest. On the left is the screen Menu Bar, on the top right is the Show/Hide Menu function, and on the bottom right is the Note function.

Figure 12a, Network visualization, featured areas of interest

Located on the left side of the Netlytic Visualization screen, the Menu Bar houses a variety of different visualization options, which will be discussed in more detail below: Search, Visibility, Layout, Node Size, Colors, Auto Clusters, and Share.

figure 12b, feature navigation panel

Search

This feature allows the user to search for specific nodes or a group of nodes by separating their names by commas. This can be used to quickly identify individuals of interest in the network.

Figure 13a, search feature

Visibility

In the Visibility section of the Netlytic visualization menu, users have the option of de-selecting Node Labels or Edges, which are selected by default.

A user might de-select Node Labels in order to export an image of the network without the extra “noise” of text, for example, or de-select Edges in order to get a clearer view of the nodes – particularly if they were located near the center of a dense network and partially obstructed by the numerous edges.

Figure 13b, visibility feature

Layout

The network’s layout is an important feature because it enables the user to identify patterns in the network such as clusters of individuals, which, once examined can then inform the network analysis (e.g. Who are the primary individuals that make up a given cluster? What groups them together, or alternatively, what could be the reason for other individuals’ exclusion from the cluster?).

Figure 13c, layout feature

Netlytic provides three layout options: Fruchterman-Reingold, DrL and LGL. Their various qualities are listed in the table below.

Layout

Details

Image

Fruchterman-Reingold

  • A popular force-based algorithm

  • Good for networks with  <1000 nodes

Reference:

  • Fruchterman, T.M.J. and Reingold, E.M. (1991). Graph Drawing by Force-directed Placement. Software – Practice and Experience, 21(11):1129-1164.
*

DrL

  • Another force-directed graph layout

  • Effective for visualizing large networks

  • Long edges are cut to highlight clusters

Reference:

  • Martin, S., Brown, W.M., Klavans, R., Boyack, K.W., DrL: Distributed Recursive (Graph) Layout. SAND Reports, 2008. 2936: p. 1-10.
*
 Lgl
  • Very effective for visualizing large networks

The speed at which the layout is created is determined by several things: the layout’s algorithm, the size of the network file, the time it takes for the file to go from our server to a user’s computer, and finally, the speed and power of the user’s computer and its ability to generate the network on the browser.

Node Size

Figure 13d, Node Size Feature

Why is the Node size important?

Changing the node size by different calculations enables the user to understand the interactions between individuals in a network in a number of different ways. There are currently 3 different rankings available with this feature, which determine node size by: indegree, outdegree and total degree. All of these options mean something different, and can help the user discern patterns in the network. For a full description of what each option means and examples of how they might show patterns in the network, please see below.

Indegree

With this ranking, the nodes with a higher centrality based on indegree would become larger. Indegree centrality is determined by the number of ties directed to or received by a node. It should be noted that this kind of measurement is only useful with directed data, when the ties are received or directed out (such as being mentioned in a Tweet, or mentioning someone in a Tweet) (Hanneman & Riddle, 2005).

Example

A high profile individual like Hillary Clinton who is mentioned in thousands of tweets daily would have a high indegree. A bot on Twitter would likely have a very low indegree, as they send out many, many tweets but rarely receive any.

Why is Indegree important?

Indegree is significant because it demonstrates the prominence or popularity of an individual, since they are the target of communication or interest (Hanneman & Riddle, 2005).

Outdegree

With this ranking, the nodes with a higher centrality based on outdegree would become larger. Outdegree centrality is the opposite of indegree centrality, as it is determined by the number of ties directed or sent from a node to others. This form of measurement is also only useful with directed data, as in undirected data there would simply be a set number of ties per node, with no direction coming or going to each node (Hanneman & Riddle, 2005).

Example

Using the same example as earlier, Hillary Clinton would have a low outdegree centrality, as she tweets very rarely (though since she is followed by so many, this may still give her a higher outdegree). In another network the Twitter bot would likely have a high outdegree, since it would be sending out multiple tweets that mention other Twitter users frequently.

Why is Outdegree important?

Outdegree is significant it can help to identify influential individuals in the network, or individuals that are particularly active communicators (Hanneman & Riddle, 2005). This could be used to compare how active various individuals are (e.g. three different Twitter news handles – which ones rank higher in terms of their output?).

Total Degree

Total degree ranking combines the Indegree and Outdegree counts together to create the node’s total degree, or in the case of an undirected network (such as a co-authorship network) would essentially be determined by the number of connections of a specific node.

Example

If Bob has an indgree ranking of 3 in a twitter network (e.g. he was mentioned three times in someone else’s tweets) and has an outdegree ranking of 7 (e.g. he sent a number of tweets which mentioned 7 individuals in the network), then his total degree would be 10. Alternatively, if Bob were in a co-authorship network and has collaborated with 5 other authors, he would have the total degree ranking of 5.

Why is Total Degree important?

This node size setting is useful for determining the key players in a network at a glance. Further inspection (either using the indegree or outdegree node size rankings in a directed network, or examining a node’s specific properties) would indicate whether these key players are individuals with a high prominence or who are highly vocal in their network.

Colours

Figure 13e, Colour Feature

This dropdown menu allows users to select the colors of the labels and background of the visualization. There are a number of choices available, with the first color indicating node label, and the second indicating the main background color. They include: white & dark gray, white & black, black & white, and custom colors.

The default is set to white & dark gray, however white & black is also a popular setting as it creates a higher contrast and enables the user to see the nodes and edges more clearly, which is helpful in the analysis stage. Using the custom color setting the user can also personalize the colors of the network according to their preferences, as demonstrated below, where using the custom setting we created a network with black labels and a white background.

White & Gray

White & Black

* *

Custom: With Black & White as example

Auto Clusters

There is also the option to select/de-select various clusters (identified automatically). This can be helpful in identifying the various clusters within a network, or for examining a particular cluster more closely.

Clusters are identified automatically using a community detection algorithm called FastGreedy (implemented in igraph for R).

Here is an example from a network of @readingcampaign on Twitter. In this first image, all the auto clusters are selected. The largest pink cluster (Cluster 1) represents @readingcampaign and their followers.

In this second image, you can see that Cluster 1 has been de-selected. Now it is easier to examine other clusters in this network.

Network Properties

There are five network properties Netlytic measure, which describe network characteristics, such as how individuals interact with each other, how information flows, and whether there are distinct voices and groups within the network.

Netlytic-Properties
Diameter calculates the longest distance between two network participants. This measure indicates a network’s size, by calculating the number of nodes it takes to get from one side to the other.

Density is a proportion of existing ties to the total number of possible ties in a network. In other words, it is calculated by dividing the number of existing ties (connections) by the number of possible ties. This measure helps to illustrate how close participants are within a network. The density measure is complementary to diameter, as both assess the speed of information flow. The closer this measurement is to 1, the more close-knit the community/conversation, which suggests participants are talking with many others. On the other hand if the value is closer to 0, this suggests almost no one is connected to others in the network.

Reciprocity is a proportion of ties that show two-way communication (also called reciprocal ties) in relation to the total number of existing ties. It is measured by the number of reciprocal ties in relation to the total number of ties in the network (not all possible ties). A higher value indicates many participants have two way conversation, whereas a low reciprocity value suggests many conversations are one-sided, so there is little back and forth conversation.

Centralization measures the average degree centrality of all nodes within a network. When a network has a high centralization value closer to 1, it suggests there are a few central participants who dominate the flow of information in the network. Networks with a low measurement of centralization closer to 0 are considered to be decentralized where information flows more freely between many participants.

Modularity. To understand modularity, we first need to understand the concept of clusters in the network visualization. A cluster is a group of densely connected nodes that are more likely to communicate with each other than to nodes outside of the cluster. Modularity, helps to determine whether the clusters found represent distinct communities in the network. Higher values of modularity indicate clear divisions between communities as represented by clusters in Netlytic. Low values of modularity, usually less than 0.5, suggest that clusters, found by Netlytic, will overlap more; the network is more likely to consist of a core group of nodes.

Share

The Share feature allows the user to export images of their network. These can be kept as part of the visualization of the dataset (and will reappear upon reopening the visualization) or can be saved as an image on the user’s computer for documentation purposes or shared online.

This feature can be quite valuable since the network visualization doesn’t allow the user to look examine the network with respect to time lapse, so capturing images of the network can be helpful in determining how it grows and changes over time

Network Participants

By clicking on any node within the network, you can view how the individual relates to the larger network. In the side bar, the following is shown:

  • visibility options (to show the node or not)

  • node color options

  • measures relating to an individual node’s centrality measures, as well as shows the connections to other participants in the network.
  • centrality values that are used in the main menu to determine node size (indegree, outdegree, and total degree)
  • node’s connections with others

To exit this view mode and return back to the main menu, simply click “Go Back” at the top of the menu bar.

Accessing Messages

Depending on the type of network, a node’s connection with others might reflect comments, tweets, etc. in which they mentioned or were mentioned by others. To access these individual tweets, comments or messages, one needs to simply select the connection in question (e.g. agamasimowska displayed as the first connection in the list).

Annotating the Network

To make use of the Notes feature, navigate to the network visualization and  click the yellow box containing a plus sign in the bottom right corner of the visualization screen.

This feature allows the user to post notes on various aspects of the network to themselves for use in the analysis stage (See example below).

Once Notes are added to the network, a new box appears next to the Notes box in the bottom right corner: “1: Sticky notes (zoom 0%)”. As users add notes at various zoom levels of the network (e.g. 0%, 25%, 50%, etc.), new boxes will be added to allow for a faster navigation between notes as displayed below.

Please note: Once you exit the network visualization notes will be cleared. Saving the network image is the best way to capture annotations.

References