FAQ

Overview

Data & Privacy

Uploading Outside Content

Access

Development

What is the Gale Digital Scholar Lab?

Gale Digital Scholar Lab is a cloud-based research and learning platform that allows students and researchers to apply natural language processing tools to raw text data (OCR text) from Gale’s primary source collections in a single research platform.

What are the main features of Gale Digital Scholar Lab?

What are the main features of Gale Digital Scholar Lab?

  • Single Platform Text and Data Mining (TDM) Environment - Gale’s unparalleled Primary Sources (GPS) collections are available for the first time alongside familiar open source text mining and natural language processing tools, removing two of the key barriers to entry in the digital humanities: finding and curating a quality content set and the appropriate (and accessible) digital tools with which to analyze it.
  • Create bespoke content sets - Greatly reduces the effort and time needed to create, clean, parse, and analyze large sets of archival text data.
  • Cloud-Hosted, Optimized Data - Gale’s OCR data is optimized for text mining; providing the institution’s Gale Primary Sources data in one place (see exceptions), without the concern of hosting and managing it. Gale Digital Scholar Lab makes the institution’s Gale collections more widely accessible and opens digital scholarship up to more researchers.
  • Tools familiar to the DH community - Gale Digital Scholar Lab brings together tools from various Open Source providers and a streamlined user interface that allows customization of the analysis tools according to particular research needs.

What content can I find in Gale Digital Scholar Lab?

Your institution’s library has access to a number of Gale’s primary source collections, and they can be found by clicking on the Available Text links on the Home Page or in the Learning Center. Most of Gale’s archival collections are text mineable, with the exception of those that are primarily manuscript-based or have specific rights restrictions that prevent text mining at this time:

  • British Literary Manuscripts Online
  • Chatham House Archive
  • Early Arabic Printed Books
  • The Financial Times Historical Archive
  • State Papers Online
  • National Geographic Magazine Archive

Hand-written texts (including Arabic texts) present considerable difficulty in rendering the content in plain text due to limitations on handwritten text recognition. While OCR engines are continuously improving their ability to recognize a wide variety of character sets, the variables presented by handwritten text remain challenging to most platforms today. Even so, Gale has employed a number of new technologies to derive OCR from manuscript collections like the Crime and Punishment module of the Nineteenth Century Collections Online (NCCO) and will continue to create and advance the state of manuscript OCR in the future.

How much content can I analyze in Gale Digital Scholar Lab?

At present, users are limited to 10,000 documents per content set. The limit was determined through consultation with our source library partners, researchers, beta testers, and programmers. It allows us to analyze analysis pipeline performance and make changes to both hardware and software to respond to computational needs in the future.

What digital humanities tools are included in Gale Digital Scholar Lab? And why were they selected?

Gale Digital Scholar Lab includes a variety of tools that support well-known text analysis methods that are both qualitative and quantitative. Four of these tools are open-source and are widely recognized and used in the academic space today; the remaining two tools are built in similar fashion to their Open Source equivalents or utilize Open Source components in the analysis process. Providing these tools along with millions of pages of primary source content and accompanying OCR text gives users the ability to quickly move from corpus creation to text analysis in one platform.

Gale Digital Scholar Lab includes the following tools:

Name of Software Tool
What type of tool is it?
Mallet*
Topic Modeling - a widely used toolset for text mining. Having Mallet loaded into the Lab and ready for use will not only support established researchers who are already using these tools, but also support those who are new to text mining and are only beginning to learn about them. (Java based)
SciKit Learn*
Clustering - Automatic grouping of similar objects into sets. SciKit Learn is open source and offers other tools for text mining and data analysis. (Python compatible)
Gale Custom Tool**
Ngrams - a type of collocation where words appear next to or in the proximity of others. When computing n-grams you typically move one word forward to show co-occurring words.
spaCy*
Named entity recognition (NER) - recognizes and extracts Named Entities from documents within a content set, and output lists of entities spaCy*.
spaCy*
Parts of Speech Tagger - The purpose is to parse document sentences into Parts of Speech (PoS) and tag them accordingly. PoS tagging effectively creates a lexicographical index or dictionary of a content set.
Gale Custom Tool***
Sentiment Analysis - Sentiment analysis determines a tally of the positive or negative words within each document of a content set. It uses the AFINN lexicon (dictionary of words and their sentiment value) to compile sentiment scores for each phrase, which are then compiled to produce a document-level sentiment value.

How do I cite the work I have completed in Gale Digital Scholar Lab?

Here is some information about how to cite a content set collection in Gale Digital Scholar Lab, which follows the protocols established for citing datasets generally.

  • Author
  • Title of Content Set (this is created by the researcher when they initially build the content set, and can be edited if necessary)
  • Source of Content Set (i.e. the distributor which is Gale. This includes any uploaded content sets which have been created independently in the DSLab or comingled with Gale documents)
  • Date of Creation of Content Set
  • Content Set URL (currently following the format provided below)

Chicago Manual of Style Bibliographic Format: Ketchley, Sarah L. 1840-1849 American Fiction Female Authors. Gale Digital Scholar Lab [distributor]. February 2022. [insert Content Set URL]

MLA Format: Sarah L. Ketchley. 1840-1849 American Fiction Female Authors. Gale Digital Scholar Lab [distributor]. February 2022. [insert Content Set URL]

An additional consideration is citing tool runs. The following is the suggested the format:

  • Name of Tool
  • Platform
  • Year or version
  • Date of Access
  • URL

Chicago Manual of Style Note Format: “Named Entity Recognition”, Gale Digital Scholar Lab, accessed February 15, 2022, [insert Tool Run URL]

MLA Format: “Named Entity Recognition.” Gale Digital Scholar Lab. 2022. Web. February 15, 2022. [insert Tool Run URL]

Can I clean the content sets I create in Gale Digital Scholar Lab?

Yes. The Clean feature of Gale Digital Scholar Lab lets you strip out blank spaces, punctuation, special characters, and more in order to ensure cleaner, more accurate analytical output. It’s designed to work seamlessly with the included analysis tools, in addition to cleaning content sets before downloading them locally. Cleaning is a critical part of the preparation for any text analysis. Gale Digital Scholar Lab includes the ability to clean content sets as a separate feature, so you can ensure that documents in specific Content Sets are prepared in precisely the same way. Users can decide how they’re altered and make adjustments according to their individual research needs.

Can I analyze content outside of the Gale Primary Source Collections provided by my library?

The majority of Gale Primary Sources and Archives Unbound collections can be analyzed within Gale Digital Scholar Lab (please see exclusion list in earlier in this FAQ). While the mission of the platform is to provide access to OCR text of your institution’s Gale Primary Source collections, in the future we will also support the ability to analyze non-Gale texts with the Digital Scholar Lab. We will continue to explore possibilities to extend our content reach to include outside collections that are frequently asked for by customers.

What is Gale’s privacy policy as it pertains to personally identifiable information?

Cengage Group is committed to protecting personal information. Cengage’s Global Privacy Program protects all of the personal information that may be required to access products and helps ensure that personal information is handled properly and securely. If at any point you would like to learn more about our Privacy Program, please email privacy@cengage.com.

While our Privacy Program and Policies govern how we interact with Personal Information, there are certain products that differ in their integrations, one of those products is the Gale Digital Scholar Lab. Below, you will find more information regarding how this product integrates with Google and Microsoft and what occurs with the data.

More information on Gale’s privacy policy can be found on the Privacy Policy page in the Learning Center.

How will usage be captured for Gale Digital Scholar Lab?

Gale Digital Scholar Lab will appear in usage reports as its own line item in both COUNTER and Gale reports. The following metrics will apply at the product level:

  • Searches
  • Searches (Regular)
  • Sessions
  • Record Views
  • Result Clicks

Retrievals will not be tracked at the product level [Gale Digital Scholar Lab]. Retrievals will be tracked at the individual archive level only [Nineteenth Century Collections Online-1, Archives of Sexuality and Gender -1, etc.].

The documents contained within a Sample Project are sourced from many of Gale’s primary source archives. As such, retrievals generated by documents contained in Gale Primary Sources not acquired by an institution will be tracked in the line item “DSLAB Sample Projects.” For example:

  • A user clicks into a Sample Project document that is sourced from Archives of Sexuality and Gender, but the library does not own or subscribe to this collection a retrieval will be tracked under “DSLAB Sample Projects.”
  • Alternatively, if a user clicks into a Sample Project document that is sourced from Eighteenth Century Collections Online and the library does own or subscribe to this collection a retrieval will be tracked under Eighteenth Century Collections Online.

Will users of the Lab have the ability to download the document files contained in their content sets?

Users of Gale Digital Scholar Lab can download individual content sets (up to 5,000 documents). Keep in mind that, depending on the document types within a content set, actual dataset sizes will vary greatly. Downloaded content sets are for scholarly, non-commercial use only.

How do I upload my own documents into Gale Digital Scholar Lab?

Users can upload plain text files by navigating to the Upload feature on the Build page of Gale Digital Scholar Lab. They can select one or more files from their computer to upload, apply metadata, manage, and add to a Content Set.

What can I do with my uploaded documents?

Users are the only ones who can access their documents and have control over their state. Once a document has been uploaded in the Lab they can edit the document’s text, apply metadata, and add to a Content Set. Users can also delete their documents from the Gale Digital Scholar Lab environment at any time. It is important to note that deleting documents means they will no longer be available for inclusion in content sets or analysis. They will also be removed from any content set currently containing them and no longer be available to view in past analyses.

Are there any size limitations?

The Upload feature accepts files that are at least 80 characters minimum and 10 megabytes maximum at this time. It is important to note that there is a 10 megabyte limit per upload.

Are there any storage limitations?

There are currently no storage limitations on the Upload feature.

Are there any format requirements?

The Upload feature only accepts plain text (.txt) files at this time.

What measures are in place to secure my uploaded documents?

All documents pass through a Sentinel One security scanner and file sniffer to ensure what is uploaded is a plain text file. There is also cross-scripting on the text entry and metadata forms to prevent any malicious attacks on the environment. All documents are stored in an encrypted cloud-based storage solution with high availability.

Are there any rights restrictions around uploaded content?

Gale provides a text upload feature that allows users to analyze non-Gale content within Gale Digital Scholar Lab. The upload of personally identifiable information is not recommended. Additionally, users assume sole responsibility for clearing rights to any content loaded into this feature. In the event texts are loaded into the platform without proper rights clearance, the user indemnifies Gale from resulting litigation related to the use of that content outside the bounds of its legally stated use.

Can I upload non-English texts?

Yes, while many of the analysis tools are trained for English content specifically, users can still upload and clean texts of non-English language -- with the exception of non-Latin based characters.

How can I access Gale Digital Scholar Lab?

The product itself may be accessed via IP authentication (on-campus and remotely via proxy service). Once authenticated via IP, the user will land on a pre-log in page where they are asked to log in or create an account through their Google or Microsoft account, or using Shibboleth. Once logged in, users can create custom content sets, configure and run analysis tools, and drive visual and data outputs that can be interpreted in and outside of the platform.

Why am I required to log in via Google or Microsoft account?

The time spent in Gale Digital Scholar Lab is an investment in research and learning. In a given session, users can create custom content sets, configure and run analysis tools, and drive visual and data outputs that can be interpreted in and outside of the Lab. In order to save these research outputs, the application requires a unique identifier to ensure a persistent connection between the user and his or her content/analysis. Gale Digital Scholar Lab allows users to sign up or log in using a Google or Microsoft account. These have been identified as popular authentication methods by our users as they can leverage existing credentials.

When a user creates an account or logs into the Lab using one of these methods, a unique identifier (a series of characters, numbers and letters) is generated that will allow Gale to establish this user/content connection. It’s important to note that this ID contains no personal information (email, name, etc.) to prevent the retention of users’ personally identifiable information.

Additional global authentication methods will be added in upcoming releases of the Lab, as they are identified and validated with our development staff and end users.

Can you give me more details about the University Credentials sign in method?

Users can sign in to Gale Digital Scholar Lab using their university credentials. This sign-in method uses Shibboleth authentication to anonymously log users in to the Lab and persist their content sets, configurations, and analyses across multiple sessions.

If you do not see the “University Credentials” button, it is likely that your institution does not use Shibboleth authentication. We recommend getting in touch with your librarian if you are unsure if your institution uses Shibboleth to support single-sign-on functionality.

If your institution does use Shibboleth, but the “University Credentials” button is still missing, we may not have the persistent ID attribute required for this feature. In order to make this option available to users at your institution, you may need to work with your IT department and/or affiliated federation to release a persistent ID attribute as part of the SAML response sent to Gale. Whether the persistent ID attribute is sent in the SAML response is dependent on your institution's Shibboleth setup.

Gale currently supports the following Shibboleth federations:

  • InCommon Federation
  • CARSI Federation
  • Renater Federation
  • UK Federation
  • Open Athens Federation
  • WAYF .DK Federation
  • DFN-AAI Federation
  • Canadian Access Federation
  • COFRe - Communidad Federada REUNA
  • GakuNin Federation

The only data passed in the SAML response that we consume and store is the persistent ID. This ID is stored in an encrypted database. We do not ask for any personally identifiable information in this response. The persistent ID will only be used within Gale Digital Scholar Lab and is never shared.

Issues can be reported to our technical support team by sending an email to Gale.TechnicalSupport@cengage.com.

What browsers are optimized for use of the Lab?

Chrome, Firefox, and Safari browsers are currently optimized for use of Gale Digital Scholar Lab.

What is behind the creation and design of Gale Digital Scholar Lab?

The design ethos of the Gale Digital Scholar Lab is to bring simplicity and accessibility to complex research questions and the ability to answer them. Prior to the release of Gale Digital Scholar Lab, Gale’s support of text mining was limited to shipping physical hard drives for use in libraries where raw data is needed to support digital research. While the value of these drives has been widely noted, unless there are resources to store, convert and support this content in a way that is usable, its value remains largely unrecognized. Through this cloud-based solution, new research possibilities emerge:

  • The complex nature of the research workflow is often a barrier to entry into digital projects; Gale Digital Scholar Lab aligns content with analysis tools while applying best practices to text mining and visualization projects.
  • Gale Digital Scholar Lab provides a familiar navigational style for students new to text analysis but familiar with library databases.
  • For more experienced scholars, Gale Digital Scholar Lab gives access to large content sets that can be curated, mined and edited for use outside of the platform in custom applications and digital tools.
  • For librarians, Gale Digital Scholar Lab gives staff a clear path to encourage the use of their Gale Primary Sources collection by a new user: the digital scholar or DH lecturer. It also brings a broad message of support to researchers across the campus, and a new awareness of the library as the center of scholarly information.

Who is Gale Digital Scholar Lab designed for?

  • Undergraduates / Graduates/ Researchers learning the fundamentals of text mining, archival research and analysis methodologies.
  • Teachers who want to introduce elements of text mining into their teaching, but are aware that students’ technical skills might be limited. Alongside digital literacy skills, research and archival skills are also being taught, so a tool that contextualizes research is ideal.
  • Post-Graduate / Postdoctoral / Traditional Humanities Faculty beginning to explore incorporating DH methodologies into their scholarship; introducing more text analysis into research; concerned with research outputs. Often teaches classes/acts as TA. May have some self-taught technical skills, or very little.
  • Established Digital Humanties (DH) Faculty and DH Librarians conducting research, often with coding skills and experience in DH and who may be lecturers on Digital Humanities, and who may also be published on the subject.
  • Librarians supporting high-level concepts of disciplinary method and intellectual interests; in-depth knowledge of existing and new collections and holdings of the library, and how they might relate to existing tools.

Will Gale Digital Scholar Lab continue to be enhanced over time?

Gale Digital Scholar Lab is an iterative product: releases will continue into the future to ensure that it keeps pace with academic research and teaching needs while continuing to be at the cutting edge of technology in the digital humanities.

How can users get involved in the evolution of the Lab?

Gale Digital Scholar Lab is a constantly evolving research and instruction platform, evolving in line with the needs of our library customers, researchers, and students. If you’re interested in learning about ways to help drive the development of the platform or if you have any comments or questions, please send feedback to galedslab@cengage.com.