Document Overview & Resources
This document provides a general overview of Netlytic’s network analysis features.
- Netlytic Tutorials (Videos and Instructional Guides)
- 1. System Overview
- 2. Text Analysis
- 3. Category (Topical) Analysis
- 4. Network / Visualization Analysis
In addition to its text analysis functions, Netlytic also provides users with network analysis capabilities. At its most basic, network analysis involves building networks from members (‘network actors’) connected together based on some common form of interaction (‘ties’). When building networks from interaction data, however, there are a lot of different parameters and threshold choices to choose from. For example, one of the choices that is likely to influence network formation is how to discover ties between individuals. Netlytic approaches this task by building two types of social networks: (1) Name network and (2) Chain (reply-to) network.
Figure 9, Name network pane, options for adjusting network parameters.
A name network is a social network built from mining personal names in the messages. To discover ties in Name networks, a user can choose from two primary options: “connect a sender to all names found in his/her messages” and/or “connect people whose names co-occur in the same messages”.
Both of these options, along with some additional network parameters, can be adjusted by clicking on the “See more processing options” field at the bottom of the ‘Name Network’ pane (See Figure 9). Then, once the user has decided how best to build his/her network, s/he need only click the “Analyze” button for Netlyic to automatically construct the network.
Further, by clicking on the number of users listed next to “# of names found” in the Name Network pane, users can review all names found by Netlytic in the network and add or delete names as necessary.
Figure 10, Processing options for chain networks.
A chain network (also known as a “who replies to whom” network) is a social network based on participants’ posting behaviour. To build chain networks, Netlytic provides a range of options for tie discovery, similarly accessed by clicking on the “See more processing options” field at the bottom of the “Chain Network” pane.
Processing options here include connecting network actors to all senders in a reference chain, or only connecting network actors to the first and/or last senders to have posted in the chain. Users can further decide how each of these ties will be ‘weighted’ – some ties can be valued or counted differently than others – in the analysis (See Figure 10).
Next, regardless of whether the user chooses to focus on name or chain networks (or to analyse both), s/he can explore the dataset interactively through Netlytic’s visualization capabilities. By clicking on the “Visualize” button in the “Chain Network” or “Name Network” pane, the user can access the new, touch enabled, HTML5 Network Visualizer (See Figure 11). The HTML5 visualizer has customizable layout, node size, and cluster visibility features. Users can also label and export an image of the network.
Figure 11, Name network visualization.
The Netlytic Visualization Screen hasthree main areas of interest. On the left is the screen Menu Bar, on the top right is the Show/Hide Menu function, and on the bottom right is the Note function.
Figure 12a, Network visualization, featured areas of interest
Located on the left side of the Netlytic Visualization screen, the Menu Bar houses a variety of different visualization options, which will be discussed in more detail below: Search, Visibility, Layout, Node Size, Colors, Auto Clusters, and Share.
figure 12b, feature navigation panel
This feature allows the user to search for specific nodes or a group of nodes by separating their names by commas. This can be used to quickly identify individuals of interest in the network.
Figure 13a, search feature
In the Visibility section of the Netlytic visualization menu, users have the option of de-selecting Node Labels or Edges, which are selected by default.
A user might de-select Node Labels in order to export an image of the network without the extra “noise” of text, for example, or de-select Edges in order to get a clearer view of the nodes – particularly if they were located near the center of a dense network and partially obstructed by the numerous edges.
Figure 13b, visibility feature
The network’s layout is an important feature because it enables the user to identify patterns in the network such as clusters of individuals, which, once examined can then inform the network analysis (e.g. Who are the primary individuals that make up a given cluster? What groups them together, or alternatively, what could be the reason for other individuals’ exclusion from the cluster?).
Figure 13c, layout feature
Netlytic provides three layout options: Fruchterman-Reingold, DrL and LGL. Their various qualities are listed in the table below.
The speed at which the layout is created is determined by several things: the layout’s algorithm, the size of the network file, the time it takes for the file to go from our server to a user’s computer, and finally, the speed and power of the user’s computer and its ability to generate the network on the browser.
Figure 13d, Node Size Feature
Why is the Node size important?
Changing the node size by different calculations enables the user to understand the interactions between individuals in a network in a number of different ways. There are currently 3 different rankings available with this feature, which determine node size by: indegree, outdegree and total degree. All of these options mean something different, and can help the user discern patterns in the network. For a full description of what each option means and examples of how they might show patterns in the network, please see below.
With this ranking, the nodes with a higher centrality based on indegree would become larger. Indegree centrality is determined by the number of ties directed to or received by a node. It should be noted that this kind of measurement is only useful with directed data, when the ties are received or directed out (such as being mentioned in a Tweet, or mentioning someone in a Tweet) (Hanneman & Riddle, 2005).
A high profile individual like Hillary Clinton who is mentioned in thousands of tweets daily would have a high indegree. A bot on Twitter would likely have a very low indegree, as they send out many, many tweets but rarely receive any.
Why is Indegree important?
Indegree is significant because it demonstrates the prominence or popularity of an individual, since they are the target of communication or interest (Hanneman & Riddle, 2005).
With this ranking, the nodes with a higher centrality based on outdegree would become larger. Outdegree centrality is the opposite of indegree centrality, as it is determined by the number of ties directed or sent from a node to others. This form of measurement is also only useful with directed data, as in undirected data there would simply be a set number of ties per node, with no direction coming or going to each node (Hanneman & Riddle, 2005).
Using the same example as earlier, Hillary Clinton would have a low outdegree centrality, as she tweets very rarely (though since she is followed by so many, this may still give her a higher outdegree). In another network the Twitter bot would likely have a high outdegree, since it would be sending out multiple tweets that mention other Twitter users frequently.
Why is Outdegree important?
Outdegree is significant it can help to identify influential individuals in the network, or individuals that are particularly active communicators (Hanneman & Riddle, 2005). This could be used to compare how active various individuals are (e.g. three different Twitter news handles – which ones rank higher in terms of their output?).
Total degree ranking combines the Indegree and Outdegree counts together to create the node’s total degree, or in the case of an undirected network (such as a co-authorship network) would essentially be determined by the number of connections of a specific node.
If Bob has an indgree ranking of 3 in a twitter network (e.g. he was mentioned three times in someone else’s tweets) and has an outdegree ranking of 7 (e.g. he sent a number of tweets which mentioned 7 individuals in the network), then his total degree would be 10. Alternatively, if Bob were in a co-authorship network and has collaborated with 5 other authors, he would have the total degree ranking of 5.
Why is Total Degree important?
This node size setting is useful for determining the key players in a network at a glance. Further inspection (either using the indegree or outdegree node size rankings in a directed network, or examining a node’s specific properties) would indicate whether these key players are individuals with a high prominence or who are highly vocal in their network.
Figure 13e, Colour Feature
This dropdown menu allows users to select the colors of the labels and background of the visualization. There are a number of choices available, with the first color indicating node label, and the second indicating the main background color. They include: white & dark gray, white & black, black & white, and custom colors.
The default is set to white & dark gray, however white & black is also a popular setting as it creates a higher contrast and enables the user to see the nodes and edges more clearly, which is helpful in the analysis stage. Using the custom color setting the user can also personalize the colors of the network according to their preferences, as demonstrated below, where using the custom setting we created a network with black labels and a white background.
White & Gray
White & Black
Custom: With Black & White as example
There is also the option to select/de-select various clusters (identified automatically). This can be helpful in identifying the various clusters within a network, or for examining a particular cluster more closely.
Clusters are identified automatically using a community detection algorithm called FastGreedy (implemented in igraph for R).
Here is an example from a network of @readingcampaign on Twitter. In this first image, all the auto clusters are selected. The largest pink cluster (Cluster 1) represents @readingcampaign and their followers.
In this second image, you can see that Cluster 1 has been de-selected. Now it is easier to examine other clusters in this network.
There are five network properties Netlytic measure, which describe network characteristics, such as how individuals interact with each other, how information flows, and whether there are distinct voices and groups within the network.
Density is a proportion of existing ties to the total number of possible ties in a network. In other words, it is calculated by dividing the number of existing ties (connections) by the number of possible ties. This measure helps to illustrate how close participants are within a network. The density measure is complementary to diameter, as both assess the speed of information flow. The closer this measurement is to 1, the more close-knit the community/conversation, which suggests participants are talking with many others. On the other hand if the value is closer to 0, this suggests almost no one is connected to others in the network.
Reciprocity is a proportion of ties that show two-way communication (also called reciprocal ties) in relation to the total number of existing ties. It is measured by the number of reciprocal ties in relation to the total number of ties in the network (not all possible ties). A higher value indicates many participants have two way conversation, whereas a low reciprocity value suggests many conversations are one-sided, so there is little back and forth conversation.
Centralization measures the average degree centrality of all nodes within a network. When a network has a high centralization value closer to 1, it suggests there are a few central participants who dominate the flow of information in the network. Networks with a low measurement of centralization closer to 0 are considered to be decentralized where information flows more freely between many participants.
Modularity. To understand modularity, we first need to understand the concept of clusters in the network visualization. A cluster is a group of densely connected nodes that are more likely to communicate with each other than to nodes outside of the cluster. Modularity, helps to determine whether the clusters found represent distinct communities in the network. Higher values of modularity indicate clear divisions between communities as represented by clusters in Netlytic. Low values of modularity, usually less than 0.5, suggest that clusters, found by Netlytic, will overlap more; the network is more likely to consist of a core group of nodes.
The Share feature allows the user to export images of their network. These can be kept as part of the visualization of the dataset (and will reappear upon reopening the visualization) or can be saved as an image on the user’s computer for documentation purposes or shared online.
This feature can be quite valuable since the network visualization doesn’t allow the user to look examine the network with respect to time lapse, so capturing images of the network can be helpful in determining how it grows and changes over time
By clicking on any node within the network, you can view how the individual relates to the larger network. In the side bar, the following is shown:
visibility options (to show the node or not)
node color options
- measures relating to an individual node’s centrality measures, as well as shows the connections to other participants in the network.
- centrality values that are used in the main menu to determine node size (indegree, outdegree, and total degree)
node’s connections with others
To exit this view mode and return back to the main menu, simply click “Go Back” at the top of the menu bar.
Depending on the type of network, a node’s connection with others might reflect comments, tweets, etc. in which they mentioned or were mentioned by others. To access these individual tweets, comments or messages, one needs to simply select the connection in question (e.g. agamasimowska displayed as the first connection in the list).
Annotating the Network
To make use of the Notes feature, navigate to the network visualization and click the yellow box containing a plus sign in the bottom right corner of the visualization screen.
This feature allows the user to post notes on various aspects of the network to themselves for use in the analysis stage (See example below).
Once Notes are added to the network, a new box appears next to the Notes box in the bottom right corner: “1: Sticky notes (zoom 0%)”. As users add notes at various zoom levels of the network (e.g. 0%, 25%, 50%, etc.), new boxes will be added to allow for a faster navigation between notes as displayed below.
Please note: Once you exit the network visualization notes will be cleared. Saving the network image is the best way to capture annotations.
- Cheliotis, G. (2010). Social network analysis: including a tutorial on concepts and methods.[Slideshare]. Retrieved from http://www.slideshare.net/gcheliotis/social-network-analysis-3273045
- Cherny, L. (2012). Visualizing networks: beyond the ‘hairball’. [Slideshare]. Retrieved from http://www.slideshare.net/OReillyStrata/visualizing-networks-beyond-the-hairball
- Gephi Consortium. (2011). Gephi tutorial layouts. [Slideshare]. Retrieved from http://www.slideshare.net/gephi/gephi-tutorial-layouts
- Hanneman, R.A. & Riddle, M. (2005). Introduction to social network methods. Riverside, CA: University of California, Riverside. Retrieved from http://faculty.ucr.edu/~hanneman/
- Hirst, T. (2010, May 10). Getting started with Gephi Network Visualization App – my Facebook network, part III: ego filters and simple network stats. [Blog post]. Retrieved from http://blog.ouseful.info/2010/05/10/getting-started-with-gephi-network-visualisation-app-%E2%80%93-my-facebook-network-part-iii-ego-filters-and-simple-network-stats/
- Niewiarowski, T. & Gruzd, A. (2012). Interactive Network Visualization of Research Collaborations using Social Media Data. [Conference Poster].
- Nooy, W., Mrvar, A., & Batagelj, V. (2011). Exploratory social network analysis with Pajek. New York: Cambridge University Press. Retrieved from http://dal.worldcat.org/oclc/762325079
- Tsvetovat, M. & Kouznetsov, A. (2011). Social network analysis for startups. Retrieved from http://mediashow.ru/sites/default/files/books/2011/11/social.network.analysis.for_.startups.1449306462.pdf
- Twitter Counter. (2013). Twitter top 100: most followers. Retrieved from http://twittercounter.com/pages/100
- Wikipedia. (2013, Nov 11). Centrality. Retrieved from http://en.wikipedia.org/wiki/Centrality
- Wikipedia. (2013, Dec 3). PageRank. Retrieved from http://en.wikipedia.org/wiki/PageRank
- Hanneman, R.A. and Riddle, M. (2005). Chapter 7: Connection. In Introduction to social network methods. Riverside, CA: University of California, Riverside. Retrieved from http://faculty.ucr.edu/~hanneman/nettext/C7_Connection.html
- Hanneman, R.A. and Riddle, M. (2005). Chapter 8: Embedding. In Introduction to social network methods. Riverside, CA: University of California, Riverside. Retrieved from http://faculty.ucr.edu/~hanneman/nettext/C8_Embedding.html#reciprocity
- Valente, T. W., Coronges, K., Lakon, C., & Costenbader, E. (2008). How Correlated Are Network Centrality Measures? Connections (Toronto, Ont.),28(1), 16–26. PMCID: PMC2875682
- Scott, J. (2013). Chapter 5: Centrality and Centralization. In Social Network Analysis (86-102). London: Sage. Retrieved from https://books.google.ca/books?id=MJoIGBfYDGEC&printsec=frontcover#v=onepage&q&f=false
- Newman, M. E. J. (2006). Modularity and community structure in networks. Proc. Natl. Acad. Sci. USA. 103 (23), 8577-8582. DOI: 10.1073/pnas.0601602103
- Chen, M., Nguyen, T., and Szymanski, B.K. (2013). On Measuring the Quality of a Network Community Structure. Proceedings IEEE Social Computing Conference, Washington, DC, September 8-14 (pp. 122-127). Retrieved from http://www.cs.rpi.edu/~szymansk/papers/SocCom-metric.13.pdf