What’s New With @Netlytic?

Netlytic gets a refresh for 2021. We have added lots of new researcher-friendly features including:

  • an easier to use query-creation interface for Twitter data collection,
  • a new log file to help users (Tier 3) to monitor data collection progress and adjust data collection criteria in real time,
  • new Twitter data collection modes: “Recent tweets” and “Live tweets” modes.
  • Ability to export a labelled dataset…

About: Netlytic is a cloud-based text and social networks analyzer that can automatically summarize textual data and discover communication networks from publicly accessible social media posts. It has been in use for over a decade by researchers to conduct research in the public interest and by educators and students from around the world to teach and learn about social media analytics.

New General Features

More Informative ‘My Datasets‘ Page

  1. In the top right corner of the My Datasets page, you can now see 
    • the server’s local date and time, which is in the Eastern time zone (all expiry dates for accounts and live data collections are set in accordance with this date/time), and  
    • your account’s email address, which is especially useful to know if you are managing and using multiple Netlytic accounts.

Updated Preview Page

  1. You can now review and search for relevant posts using keywords in the Preview step. (Use this to confirm that your data collection query is surfacing posts of the types you expect. If it’s not, consider modifying your search criterias.)

New Treemap Visualization

  1. Since most popular browsers no longer support Adobe Flash, we retired and replaced the Treemap visualization of “Manual Categories” under Text Analysis with an alternative chart using the Google Charts library. This new visualization should work in most browsers as long as you have JavaScript enabled. 

Ability to Export a Labelled Dataset

  1. You can now export a labelled dataset based on the Manual Categories analysis (under Text Analysis)

When analyzing a dataset using content categories that you created under the Text Analysis step (called “Manual Categories”), you can now export a labelled dataset as a CSV file, which will contain the original dataset + two additional columns: “category” and “terms” (see the screenshots below): 

  • For each post that was categorized under one of the “manual categories”, the “category” column in the CSV file will include the name of the relevant category. 
  • The “terms” column will list one or more relevant keywords or phrases found in the post that also correspond to the category listed in the “category” column.
  • Both fields will be empty if a post does not include any relevant terms. 

In the example below, the second post (third row in the table) has been classified under the “feelings (good)” category because it contains the word “nice”, which happened to be one of the dictionary words listed/entered in the “feelings (good)” category.

Read more about this feature here

New Twitter-specific Improvements

New Twitter Data Collection Modes in Netlytic: “Recent tweets” and “Live tweets” modes. 

In this update, we have made some improvements to Netlytic’s Twitter data collection interface and protocols, here is a quick review of two modes for Twitter data collection: (a) “recent tweets” and (b) “live tweets” modes. 

Both modes use the same Twitter Search API. The main difference is how often this API is being queried and how many records are retrieved per API call. These differences are outlined below.

Twitter Data Collection: “Recent tweets” Mode

  1. “Recent Tweets” mode (Available in Tier 1,2 and 3 accounts)

This mode will attempt to retrieve recent tweets that fits your search parameters (posted within the past 7 days) up to the number of tweets permitted to be stored in your Netlytic account (2.5K tweets per dataset for a Tier 1 account; 10K tweets per dataset for a Tier 2 account; 100k per dataset tweets for a Tier 3 account). 

For Tier 1 & 2 accounts

Once the collection starts, you will see the following screen with two progress bars. The first bar on the left (“Overall progress”) shows the overall progress of data collection (in percentage) relative to the maximum number of records that can be stored in your Netlytic account. Once this progress bar reaches 100%, the collection will stop. 

The second bar on the right (“Twitter API call limit …”) shows the percentage of API calls made under your account during the current collection session. Once this progress bar reaches 100%, the collection will pause until it is permitted to resume in accordance with Twitter’s rate limit. If it happens, Netlytic will display the number of seconds during which the collection is paused. 

Important: In any case, if you have a  Tier 1 or 2 account do not close the browser until the “Overall progress” bar reaches 100%. If you close the browser you will stop the collection. 

Once the screen shows the number of saved/updated records, and you will be able to see the “Go Back” and “Next Step” buttons, this  means that your collection is now done and you can safely proceed to the data analysis stage. 

For Tier 3 accounts

Because Tier 3 accounts permit collection of up to 100k recent tweets (posted within the past 7 days), the collection will automatically proceed in the background. You will see the following message confirming that the collection has been queued and that it is safe to close the browser or go to a different page. 

During the collection process on the My Datasets page, you will see a note indicating that tweets are still being collected (as shown in the screenshot below). You can click on this message to see the progress. 

If some tweets have already been collected, the Preview step of the dataset will display a searchable table with the collected records (see below). Important: It is safe to do a preliminary exploration and analysis of tweets that have already been collected, even if the collection is still ongoing. However, if you do any exploratory (text or network analysis) before data collection is complete, you will need to rerun them again once data collection is complete.

Twitter Data Collection: “Live tweets” Mode

  1. “Live Tweets” mode (Available in Tier 3 accounts only)

Because of the computing demands on the server, the Live Tweets mode is only available to Tier 3 accounts. 

This mode will query Twitter every 15 min to collect up to 1000 recent tweets that fit your search parameters per dataset. (This mode is best suited for collection of ongoing discussions during live events.)

If you have a Tier 3 account and  you want to collect data using the  “Live Tweet” mode of data collection, click on the “Enable data collection …” checkbox when you are creating a new dataset and specify the time period for data collection (between 1 to 62 days) as shown below. 

Important: When you start a new dataset using the “Live Tweets” mode of data collection, you will need to wait for the first search to complete. This initial “pull” will retrieve up to 1000 most recent tweets. This is a good time to confirm that your search query is not too narrow and that the parameters you have set for the data collection is able to find some relevant tweets. If it is not, consider refining/expanding your search parameters.

Once the screen shows the number of saved/updated records and the “Go Back” and “Next Step” buttons appear, it means that the initial “pull” is done and you can go to a different page. After this, the collection will continue to automatically run in the background by querying Twitter API every 15 minutes for the number of days that you have specified. You will receive an email once the live collection reaches the expiry date/time.

Please note that the live collection end date/time is based on when you started the collection. For example, if you started collecting tweets today at 3:14pm and set the collection to continue for 1 day, the end date/time for your collection will be next day at around 3:14pm. To confirm the end date/time, simply mouse over the live collection icon next to the dataset under the My Datasets page (see below). 

  1. If you are running a live data collection, you can now stop it at any time using the Stop Collection button under the Edit page of a selected dataset: 
  1. For live collections, you can also see the latest log message from the collector about the number of tweets that have been retrieved during the most recent call to Twitter API. This detailed message is available in the Preview step of a selected dataset. 

Note on how to interpret the log text

  • Since Netlytic queries Twitter API every 15 minutes to look for new tweets, the number of “skipped duplicates” listed at the end of the log message can give you an idea if your search query is too broad or too specific. If you are consistently seeing that Netlytic skips a few duplicates, it is a sign that your collection is likely capturing all/most of the relevant tweets. On the other hand, if the number of skipped posts is consistently 0, it suggests that your search query returns more tweets than Netlytic can capture within the 15 min window. 
    • By itself, it’s not an issue if you are only interested in collecting a sample of tweets that correspond to your search criteria. Otherwise, to capture more relevant tweets per 15 min, consider splitting your broad query into multiple more specific ones to run as separate live collections in parallel.
  • Another number to pay attention to is the total number of tweets returned in responses to your search query. If the log shows it as 0, it is usually a sign that your search is too narrow and that you need to update your search criteria. If this happens, just go to the Edit step of a select dataset, modify your search query, and click the Update button to relaunch the query. 
  • Also keep an eye on the message stating “The script reached the API limit”. This usually happens if you are running too many live collections at the same time. If you see this message, check the log again in 15 minutes and 30 minutes. If the message is still there, you might need to reduce the number of live data collections. We recommend having no more than 10-15 simultaneous live collections. 
    • “Fun” API fact: The way to calculate how many live collections you can run is to consider that each user can send 180 calls to Twitter API every 15 minutes, and each call can return up to 100 tweets. So, for Netlytic to collect up to 1000 tweets per 15 min, it needs to make at least 10 calls to Twitter API. The absolute maximum number of simultaneous live collection is 18.

Twitter Data Collection: New Query Creation Interface

  1. Regardless of whether you are creating a new collection in the “Live” or “Recent Tweets” mode, you can now use a new interface for creating a search query for Twitter data collection (see the screenshot below). This new interface makes it easier to create complex search queries using Twitter-supported advanced search operators. 

Tip: Once you enter your search keywords and select additional filters, we highly recommend testing your query using Twitter’s web interface before starting a new collection. You can do it using the “Test Query on Twitter” button provided at the end of the search form. Once clicked, it will launch a new search directly on Twitter and will appear in a new tab of your browser. 

Examine Twitter search results in this new tab to see if your search query returns at least some tweets posted within the last 7 days. This is because Twitter’s search API used by Netlytic to collect public tweets can not collect historical tweets, ie… tweets older than 7 days.

If you are satisfied with the search results as displayed on Twitter, return to Netlytic and click the Import button at the end of the page to start your data collection. 

  1. Netlytic also makes it easier to restrict your collection to only tweets from users located within the given radius of a given location. This is useful to track breaking news or live events and incidents occurring in real time. In Netlytic, the location-based filter is labelled as #3 in Netlytic’s new Twitter data collection form for Twitter data:

To use the location-based filter, you will need to provide four parameters:

  • starting with the coordinates of a specific location in the form of two decimal numbers: latitude – a number between -90.0 and +90.0 (North is positive); longitude – a number between -180.0 and +180.0 inclusive (East is positive); 
  • distance from the specified location which will be used as a radius, and 
  • whether to measure this distance in kilometers (km) or miles.

Hint: An easy way to identify the latitude & longitude of a desired location is to use Google Maps. Here is how:

  1. In a new tab of your browser, go to Google Maps.
  2. Search for the desired location. For example, let’s search for Times Square
  3. Select the most central point on the map and right-click to see the context menu as shown on the screenshot below.
  4. The two decimal numbers listed at the beginning of the context menu are the latitude & longitude of the selected location that you will need to save to enter later when creating a new dataset in Netlytic.
  5. To copy these two numbers into your clipboard, simply click on them, and Google Maps will save them for you into the clipboard for later use.
  1. Next, you’ll need to decide on the radius from this location to collect tweets from. To do this, right click on the same location again in Google Maps to see the context menu. After this, click on the “Measure distance” option as shown below.
  2. This option will allow you to drag and drop a line from the original point on the map as far as you intend to collect tweets from. In the case of our previous example, 2km would cover the area from Times Square to Hudson River, and from Times Square  to Hudson River on the other side of Manhattan island. Important: To cover the whole island of Manhattan, you will have to create multiple datasets with different central points. 

Twitter Data Collection: Additional Metadata Fields

  1. Netlytic now collects additional metadata fields from Twitter API. Below is the list of all metadata fields; the newly added fields are bolded in blue.
FieldDefinition (as per Twitter API)
tweetidUnique identifier for the tweet
guidLink to the tweet
linkLink to the tweet
authorUsername of the account that posted the tweet
titleTweet content, truncated to 140 characters (starts a username(s) if it’s a reply or “RT @username” if it’s a retweet)
descriptionFull tweet
pubdateDate and time when the tweet was posted
sourceTwitter client used to post the tweet
favorite_countNumber of times the tweet was liked at the time of its collection
retweet_countNumber of times the tweet was retweeted at the time of its collection
langLanguage of the tweet, as determined by Twitter
user_mentionsList of usernames mentioned in the tweet
quoted_textIf the tweet is a quote, this field contains the content of the quoted tweet (if applicable)
tweet_typetweet type (original tweet, reply, retweet, quote)
in_reply_to_screen_nameIf the tweet is a reply, username of the original poster this tweet is replying to
in_reply_to_user_idIf the tweet is a reply, user id of the original poster this tweet is replying to
in_reply_to_status_idIf the tweet is a reply, tweet id of the original tweet this tweet is replying to
retweeted_screen_nameIf the tweet is a retweet, username of the original poster
retweeted_user_idIf the tweet is a retweet, user id of the original poster
retweeted_status_idIf the tweet is a retweet, tweet id of the original tweet
user_idUnique identifier for the user who posted this tweet
profile_image_urlLink to the profile image of the poster
user_statuses_countThe total number of tweets shared by the poster
user_friends_countThe total number of accounts followed by the poster
user_followers_countThe total number of accounts that follow the poster
user_created_atThe date/time when the poster joined Twitter
user_bioA bio description as provided by the poster in their Twitter profile
user_locationLocation description as provided by the poster in their Twitter profile
user_verifiedWhether the poster is a verified user, as approved by Twitter

Twitter Data Analysis: New Network Discovery Interface

  1. Finally, for new Twitter datasets, we’ve simplified the network discovery interface. It now gives you four distinct options to connect users found in your datasets as shown in the screenshot below.