LibGuides: Text & Data Mining: Resources

Projects

A project that seeks to improve uptake of text and data mining (TDM) in the EU by actively engaging with stakeholders such as researchers, developers, publishers and SMEs

OpenMinTeD

OpenMinTeD is a project that "aspires to enable the creation of an infrastructure that fosters and facilitates the use of text mining technologies in the scientific publications world, builds on existing text mining tools and platforms, and renders them discoverable and interoperable through appropriate registries and a standards-based interoperability layer, respectively."

Discipline-specific tools

ChemDataExtractor

National Centre for Text Mining

Tools, APIs, support services for TDM

Content Mine uses machines to liberate 100,000,000 facts from the scientific literature

COnnecting Repositories (CORE) facilitates free access to scholarly publications distributed across many systems. It gives access to millions of scholarly articles aggregated from many open access repositories

CrossRef text and data mining service provides a CrossRef metadata API for researchers to access the full text of content identified by CrossRef digital object identifiers (DOIs) across publisher sites regardless of their business model. Both components are free to use by researchers and the public

APIs, short for application programming interfaces, are tools used to share content and data between software applications. Many scholarly publishers, databases, and products offer APIs to allow users with programming skills to more powerfully extract data to serve a variety of research purposes.

Catalogue of APIs for scholarly research (MIT libraries)

Digital humanities

JSTOR Analyser Tool

Sketch Engine tools to work with the British National Corpus

JISC & TDM

Jisc is considering creating a new research tool to help you make more of your research. It would allow you to extract information more easily and detect patterns and trends by simplifying the process of text and data mining across these collections.

The proposed new service will initially cover CORE and Journal Archives collections. In the future it is intended to extend the service to cover other open and subscription collections.

For those who are getting started with text and data mining, the tool should provide an easier path into exploring how text mining could benefit your research, without the need for any setup. For those familiar with the techniques the service aims to simplify access to documents for the purposes of text and data mining (TDM).

Case studies

Robots Reading Vogue: How text and data mining (TDM) cast new light on a large historical archive

Peter Leonard and Lindsay King from Yale University Library explain how text and data mining techniques cast new light on a large historical archive: every page of every issue of American Vogue magazine from 1892 to 2013. They will start with simple techniques, such as n-gram visualizations, and move on to more sophisticated approaches such as topic modeling and word embedding. Finally, they will consider some emerging ways of applying data mining techniques to photography and illustrations.

Text and data mining in history

In this interview Joris van Eijnatten discusses his fascination wth numbers, a fascination not many historians share. Last year he was the research fellow for digital humanities at the National Library of The Netherlands, where he applied text and data mining to study the image people have of Europe based on newspapers.

Study Skills