Data sources for text and data mining
Text and data mining (TDM) refers to algorithm-based processes to automatically extract information from unstructured or semi-structured text data (text mining) and structured data (data mining).
On this page you will find text and data mining resources – ordered by content category – which are either freely available on the web or through UB Bern’s licenses.
Unless other contact details are provided, if you are interested in obtaining data please refer to UB Bern.
Documents from past events on TDM:
-
Text and Data Mining: A First View (2021, Slides in English)
-
Text- und Datamining in den Sozialwissenschaften (2022, Slides in German)
Licensed data, text and image collections
Resource | Contents | Detailed information |
---|---|---|
Swiss Media content:
Swissdox@LiRI (general information on the Swissdox database) |
|
|
WBIS Online (DeGruyter) (general information about the database) |
|
|
Germanistik Online (DeGruyter) (general information about the database) |
|
|
Romance Studies Bibliography (DeGruyter) (general information about the database) |
|
|
Books International: HathiTrust Research Center |
|
|
Cambridge Histories (CUP) |
|
|
English-language periodicals (Gale Cengage) | ||
English-language periodicals (ProQuest) |
|
|
English-language monographs (Gale Cengage) |
|
|
UK Parliamentary Papers (ProQuest) |
|
Free accessible data, text and image collections
Platform | Contents | Detailed information |
---|---|---|
e-rara |
|
Overview of data interfaces and terms |
e-manuscripta |
|
Overview of data interfaces and terms |
e-periodica |
|
Overview of data interfaces and terms |
Chronicling America |
|
Freely accessible, public domain |
CLARIN Resource Families Website |
|
Partly available for free, various licenses |
Deutsches Textarchiv |
|
Freely accessible, CC-BY-SA |
GLAM Workbench Website |
|
Freely accessible, various licenses |
Internet Archive Documentation |
|
Freely accessible, various licenses, sometimes not specified |
OpenGLAM Survey Overview |
|
Freely accessible, public domain or open licenses |
Project Gutenberg Documentation |
|
Freely accessible, public domain |
Text Creation Partnership |
|
Freely accessible, public domain |
Legal aspects
The resources and their interfaces are subject to various legal and technical terms of use. Please consult these before any automated access. In particular, automated access is often excluded for licensed content that is not listed here and may cause the provider to block access to the database. Please contact us to check the legality of access if you are in any doubt.
According to the Swiss Federal Act on Copyright and Related Rights, duplication and storage of legally accessible content for scientific purposes as in the context of TDM is permitted.
The use of e-media or parts thereof in combination with artificial intelligence (AI) technologies is in many cases contractually prohibited. If you are planning to use AI in this way, you must contact us in advance to clarify the relevant framework conditions.
For any questions or clarifications, please reach out to us.