Netlytic FAQ

Tier Information

Netlytic provides three types of accounts. Tier 1 is ideal for exploring Netlytic’s capabilities; Tier 2 is great for small class projects; Tier 3 is ideal for large research projects. We are committed to maintain free access to this service for Tier 1 & 2 accounts.  However, collecting and analyzing millions of data points from social media require a lot of computing power. To make sure that we have enough "juice" to keep Netlytic running smoothly, we rely on a commercial web hosting company to run this tool.  If you like Netlytic, please support hosting of this project by upgrading to “Tier 3?. Below is more information about each Tier.

Tier 1
(Free)

Tier 2
(Free)

Tier 3
(Community-supported)
Max # of Datasets35100
Max # of Records/Dataset250010000100000
Great for exploring what Netlytic can do! Great for smaller projects and class assignments! Great for larger research projects! With the total storage capacity of up to 10M(!) records (100 datasets x 100k records).
This is a default tierRequest a free upgrade by logging in to your account and clicking on the "My Account" pageThis tier is no longer available. Please try the Pro version of our new and improved platform for social media researchers at Communalytic
If you are new to Netlytic tier 1 is great for exploring text and network analysis features, while tier 2 is an effective option for small projects and class assignments. Both of these tiers are free of charge. Tier 3 is a community- supported option for those who require access to collect for numerous large datasets.
Netlytic is committed to maintaining free access for Tier 1 & 2 accounts. However, collecting and analyzing millions of data points from social media requires a lot of computing power. We rely on a community supported approach to assist with infrastructure costs and ensure Netlytic runs smoothly and securely through a commercial web hosting company. If you like Netlytic, please support the hosting of our project by upgrading to “Tier 3". Below is more information about each Tier.

Tier 1
(Free)

Tier 2
(Free)

Tier 3
(Community-supported)
Max # of Datasets35100
Max # of Records/Dataset250010000100000
Great for exploring what Netlytic can do! Great for smaller projects and class assignments! Great for larger research projects! With the total storage capacity of up to 10M(!) records (100 datasets x 100k records).
This is a default tierRequest a free upgrade by logging in to your account and clicking on the "My Account" pageThis tier is no longer available. Please try the Pro version of our new and improved platform for social media researchers at Communalytic
Note: For larger datasets (>100,000 records per dataset), a dedicated solution might be required. Please contact us at netlytic@gmail.com
You can automatically upgrade from a Tier 1 to Tier 2 account free of charge by logging in and visiting the “My Account” tab, then click the “request an upgrade” link.
The limit was set based on our empirical testing of the current technology for network data visualization. With datasets larger than 100k, the network structure becomes too dense to draw any meaningful conclusions. Also the computational complexity of visualizing such large networks is very high and it would take few hours per visualization and a lot of computing resources to complete the task. Therefore, a more balanced approach (from the empirical as well as computational sides) is to split your dataset into smaller periods of time and perform the analysis on each of the subsets separately. This approach will also enable you to draw conclusions about changes in communication networks and actors over time. Tip: You can also split your dataset(s) manually based on a period of time using the built-in feature (look for the ‘scissors’ button under My Datasets) or exporting/downloading your dataset and splitting it in Excel, and then re-uploading it to Netlytic.

Platforms & Data Collection

Netlytic can pull data from different sources including: Twitter, YouTube, RSS Feeds, as well as .csv or .txt files from Google Drive. When importing data from social media sites, Netlytic is using an API to collect publicly available data. Please see below for specific requirements and information per platform
Data sourceRequest Frequency Max Records Per RequestAccount Linking Required?Note
Twitter

every 15 minutesup to 1000Twitter requires to link your Twitter account to use this importer.This importer uses the Twitter REST API v1.1 search/tweets endpoint.

This returns a collection of relevant Tweets matching a specified query.

Please note that Twitter's search service and, by extension, the Search API is not meant to be an exhaustive source of Tweets. Not all Tweets will be indexed or made available via the search interface.

Twitter API rate limit allows about 10 active collectors per user.

Typically tweets older than a week will not be returned.
Youtube

onceYoutube does not require to link your Youtube account to use this importerThis importer uses the YouTube Data video comments feed API v2.0
RSS

dailyNo requiredThis option allows to import records using Really Simple Syndication (RSS) feeds.
Text file / Google Drive

onceNo requiredThis option allows you to import messages from a text or CSV file by uploading files to Netlytic.

Note: If your dataset includes more than one text file, you will need to upload and import one file at the time.

Acceptable formats:

1) CSV file (delimiter = a comma; enclosure = a double quotation mark; escape = a backslash). The first line should include columns' names.

2) Full-text transcript with the headers:
From: test@gmail.com
Date: Sun, 1 Apr 2007 14:10:17 -0400
Subject: Origin of the term "Internet" ?
In-Reply-To: c8fc@mail.gmail.com
Message-ID: ffff8260@mx.google.com

I would prefer to not have to do it, but each time I try to submit a course paper without it capitalized, I get the paper back marked up by the professors, telling me it is capital I- internet.
[...]
Netlytic doesn't have access to historical data from Twitter, but you can use a search operator "until" to retrieve tweets that are not older than 7 days.
For example, the following query will retrieve up to 1000 most recent tweets about COVID-19 and posted before October 1, 2020:

COVID until:2020-10-01
Netlytic uses APIs (application program interface) to collect data from each import source. In the case of Twitter, API is required to authenticate each user (data collector). The process of authentication is used for two primary reasons: first, that the collector has permission; and second that the collector does not exceed the number of records allowed for collection, as each API has a specified limit. Netlytic takes the user through this authentication process when you link a Twitter account to your Netlytic account. (Please note that Netlytic does not post anything to social media, it only allows users to collect from these platforms.) Tip: some researchers create a separate account just for data collection purposes.
To use Twitter's advanced search operators in Netlytic, you need to include them as part of your search query. For example, you can search and retrieve tweets by location, using the following Twitter operators like “near” or “geocode" as shown below. Please note that the majority of tweets are not geo-coded.

coronavirus near:Toronto within:50km
coronavirus geocode:43.655700,-79.380650,50km

Note: you can use Google Maps to identify coordinates for a give location
If you don't want to exclude retweets and replies you can use the following filters, prefacing them the minus "-" sign (no space).

coronavirus -filter:nativeretweets -filter:replies

You can also combine search operators like this:
coronavirus geocode:43.655700,-79.380650,50km -filter:nativeretweets -filter:replies

To collect tweets posted by a given account, you can use the following search operator:

from:username

Check out this external blog post with more examples of Twitter's advanced search operators.
Yes, you can create a subset while the data is still collecting as well as once the collection period has ended. To do this you can create a subset on your dataset home screen (clicking on the date stamp or the scissor icon). Additionally, if your dataset has over 10,000 records, you can access the scissor icon under the dataset’s text analysis tab. A new window will open with a calendar, here you will select the dates for the new subset.
Netlytic does not collect pictures posted with a tweet at this time.
Since APIs are generally not case sensitive, you should get the same results whether your search query has capitals or lower-case letter.
Netlytic only collects publicly available posts which are made available through social media platform’s public API. If someone mentions a “private” account in their message, the account's “username” will appear in the network visualization because of the mention by another user (name network). PLEASE NOTE: It is the responsibility of every researcher to determine an appropriate level of their data anonymization and data abstraction when reporting/presenting their results to the public.
Twitter limits the number of live collections you can setup in Netlytic. It varies based on specific queries you plan to run, but on average, you should be able to run up to 15 simultaneous collections.
Once you download your dataset from Netlytic, the “pubdate” column shows the date and time when the message was posted in the Atlantic time zone. You can confirm this by going to the URL listed in the "link" column to view the post in question directly on the web; specifically, you can compare the date/time shown online in your own time zone versus what’s in the spreadsheet that you downloaded from Netlytic. For example, the following tweet is shown in Netlytic as posted on Feb 12, 2018 at 2:19pm. When you see this tweet on Twitter, you can confirm when it was posted in your local time zone. If you are in the Eastern time zone, you will see it posted at “1:19pm”, which is an hour behind the Atlantic time zone shown in the spreadsheet. You can use the following online tool to help you with the time conversion between different time zones. https://www.timeanddate.com/worldclock/converter-classic.html Finally, whenever available (when provided by Twitter), Netlytic records poster’s local time zone in the “postertimezone” column. The value in this column shows the UTC time offset in hours. For example, the poster of the above mentioned tweet is located in North Carolina, USA, which is in the Eastern time zone and currently is not observing the Daylight Saving time. So the UTC offset for this state is "-5" hours: https://www.timeanddate.com/time/zone/usa/north-carolina
FIELDDEFINITION
guidUnique identifier for the comment
linkURL of the comment on YouTube
pubdate
Date when the comment was made
authorUsername of the commentator
to
If the comment is a reply, this field will store username of the previous commentator
likecount
Number of likes the comment received (as of the date when the dataset was collected)
replycount
Number of replies the comment received (as of the date when the dataset was collected)
descriptionText of the actual comment
titleSame as 'description'

Visualizations and Image Exporting

Clusters are determined by Netlytic’s algorithms, and the nodes (which represent individuals) in the visualization are grouped based on a unique characteristic, for instance, a cluster could be based on geographic location. Each cluster is given a different colour to help users distinguish between groups they are examining. This is especially useful with larger and dense networks.
There are two ways you can export your work in Netlytic.
  • Exporting Data: You may export the dataset, or the raw data, as a csv file. To do this, navigate to your dataset home screen by logging into Netlytic. Locate the dataset you would like to export and click on the download icon. Netlytic-downloadIn the pop-up window,  click on the csv image to begin the download.Netlytic-saveas_CSV
  • Exporting Images: You may wish also to export images of your text and network visualizations. For any of the text visualizations (word cloud, words over time, and categories) you will need to take a screen shot. To capture your network visualizations, begin by visualizing the network. Netlytic-export-imageAlong the left hand panel at the button, click the “save image” button. The network image will now be saved in this panel. You can download this image to your compute
r by first clicking on this icon and in the next popup screen, right click on the image and select save as.
Please note, you can only have 3 images saved in your network analysis panel, however, you can download as many images as you wish to your computer.
You may notice with some dataset, that emojis appear in the world cloud visualization, however analyzing emojis specifically can be unreliable.
The stacked graph provides a visual representation of how popular topics within the conversation change over time. The x-axis represents a specific time period (e.g. June 1 to September 29th), while the y-axis illustrates the popular topics.

Research

Understanding that some projects require complete anonymity, for instance, in the case of removing user names from the text and network visualizations, we have put together a few suggestions you may want to consider: • You may choose to blackout usernames in screen shots of visualizations or dataset • When using the text analysis visualizations, within the word cloud, you can remove any usernames by clicking on the red x button beside each word. This process will remove usernames in the words over time visualization. In the network analysis, along the left hand side panel, you may disable the “node labels” button to remove any usernames appearing on the visualization.
As each research project is unique, it’s hard to determine an “ideal” sample size. However, you may wish to keep the following in mind, when determining the appropriate sample size for your research: • What is your research question (e.g. changes over time need to have a defined time period that would be long enough to show changes - this is up to the researcher as well as depends on the area studies)? • Is there specified criteria outlined by the publication venue? • What type of analysis are you conducting: qualitative vs. quantitative studies? • The availability of data: some topics will have more data. If you are looking at sample of all possible records, you will need to justify whether the sample is representative or random. Once you see the same posters are coming back to the “community” this may indicate you’ve reached saturation point. It’s also important to be aware of external factors that influence posting behaviour (e.g. marketing campaigns, particular events, etc). Ideally you would like to collect data for whole duration of the event and if possible before and after (same period). This is so that you can make conclusions whether user behaviour are influenced by these external factors.
Once the Netlytic user is authenticated, only publically available social media posts are being collected. It is up to the individual researcher to determine whether there is a need and/or expectation to inform social media users before, during or after data collection. Netlytic does not manage an informed consent process, nor, other form of consent as it is outside the scope of the tool.
Gruzd, A. (2020). Netlytic: Software for Automated Text and Social Network Analysis. Available at https://Netlytic.org

Troubleshooting

Check headers are properly formatted. Our Text File import page outlines the headers needed to import data into Netlytic by way of either a .csv or .txt file. Formatting also required that headers are all in lower case.
Download the CVS file from Netlytic. Open Excel and import the CSV through the “Data”->”From Text” option. You will need to select the file origin as “Unicode (UTF-8)”. You can also export your dataset as an Excel file which is more robust to handle non English characters, new lines, emojis and other special characters in tweets.