Text Analysis

Document Overview & Resources

This document provides a general overview of Netlytic’s text (keyword) analysis features. 

Additional Documentation:

Introduction

Netlytic provides analysis of keywords (keyword extractor) and dictionaries (manual categories). To analyze the data, Netlytic will build concise summaries of the communal textual discourse present in the dataset(s) (See Figure 5A). Specifically in relation to keywords, the focus is the content of the messages, for instance, how individual words and topics relate to the rest of the dataset. 

Since messages have multiple data points, Netlytic also allows users to adjust the parameters by clicking the “See more processing options” field at the bottom of the ‘Keyword Extractor’ and ‘Dictionaries (Manual Categories)’ panes (See Figure 5B). The default setting is the body, but users can also select author, description, guid, link, pubdate, source, or title.

Figure 5A, Text Analysis.

Netlytic-Textanalysis-keywords2

Figure 5B, Text Analysis processing options.

Visualization: Word Cloud

Netlytic extracts the most frequently used terms through the Keyword Extractor to create two visuals. The first, a word cloud to express frequently appearing topics/concepts within a conversation (Figure 6a). 

Figure 6, An example of a Word Cloud (A) and Concordance for a selected concept (B).

After the content has been initially analysed by the system, Netlytic provides the user with options for exploring the dataset further. By clicking on the “Results” button in the Keyword Extractor, Netlytic will proceed to identify and count the most recurrent words, concepts or phrases from the dataset and presents them in the form of an interactive concept or word cloud (See Figure 6A). It does this by automatically removing all common words such as ‘of’, ‘will’, and ‘to,’ according list of noise or stop-words from 18 different languages (Arabic, Catalian, Czech, Dutch, Finnish, French, German, Greek, Hungarian, Indonesian, Italian, Norwegian, Polish, Portugese, Russian, Slovak, Spanish, Swedish and Turkish) and then count the words that remain. Each term in the concept cloud is accompanied by the number of iterations of that particular term or word as found in the dataset.  The size of a word or term in the concept cloud is directly related to the number of times it appears in the dataset relative to the all other terms found in that same dataset.

Each of the words or terms in the word cloud can then be clicked on individually to reveal not only all of its contextual references, but also a pair of graphs illustrating when the term was most often mentioned by date, and how many times the term was used by each of the top-ten most active network actors (See Figure 6B above). Additionally, each word or term in the concept cloud is accompanied by a red “X”. Clicking on any of the red “X” will allow the researcher to manually  remove any words or terms deemed to be noise by the researcher. This will allow the researcher more control over the results of the Keyword Extractor.