Once you have created, curated, and cleaned the OCR texts in your Content Set, you are ready to move on to the Analyze phase. THe Analysis tools allow you to take hundreds or thousands of documents and use digital tools to interrogate them in ways that would have been too time consuming without the help of computational algorithms. In this section, learn how to choose tools, run them, and interpret their output.
1Selecting the Right Tool
It’s important to know what tools are available and what they can do. In this section, you will learn about the questions you should ask yourself as you choose the right tool for your analysis.
Learn how to Select the Right Tool
2Setting Up and Running
Each tool has settings you can use to refine the results you can get. It is important to know how to use these options in order to return the best possible visualization results.
Learn more about Setting Up and Running
Document Clustering analyzes documents using statistical measures and groups them based on term frequencies and the K-means algorithm to determine similarity between each document in your Content Set.
More about Document Clustering
Named Entity Recognition
Named Entity Recognition (NER) recognizes and extracts proper and common nouns from documents using spaCy’s Parts of Speech tagging model, and outputs them as lists grouped by entity types, including people, organizations, companies, locations, and more.
More about Named Entity Recognition
An Ngram is a term, or collocation of terms, found in your Content Set. You set the range or number of terms (“N”) you wish to consider in your analysis. Then, the frequency of those Ngrams is counted and displayed for analysis.
More about Ngrams
Parts of Speech
Parts of Speech uses natural language processing of syntax to recognize and tag various parts of speech. It provides users with the building blocks for looking at how phrases are constructed within each document in a Content Set.
More about Parts of Speech
Sentiment Analysis scores assign an overall sentiment to each document by assigning positive and negative values to each term and then averaging those scores. Terms are assigned scores based on the AFINN lexicon.
More about Sentiment Analysis
Topic Modeling allows users to analyze a large collection of unstructured text and groups terms that co-occur frequently. These groups of terms are “topics” that you then assign meaning to based on the terms and other measures.
More about Topic Modeling