Tool |
Description |
@nnotate |
Semi-automatic annotation of corpus data |
aConCorde |
Multilingual concordance tool (English and Arabic) |
almaneser / SALTA |
Semantic Parser/POS Tagger for English |
AMALGAM |
Tool for grammatical annotation (POS and phrase structure). Tagging a text that was entered via email. |
ANNIS |
Search and visualization tool for multi-layer linguistic corpora with diverse types of annotation |
AntCLAWSGUI |
Front-end interface for CLAWS tagger |
AntConc |
Corpus analysis toolkit |
AntCorGen |
A freeware discipline-specific corpus creation tool. |
AntFileConverter |
Freeware tool to convert PDF and Word (DOCX) files into plain text |
AntFileSplitter |
A freeware text file splitting tool. |
AntGram |
A freeware n-gram and p-frame (open-slot n-gram) generation tool. |
AntMover |
Tool for text structure (moves) analysis |
AntPConc |
Corpus analysis toolkit for files encoded with UTF-8 |
AntWordProfiler |
Tool for profiling vocabulary level and text complexity |
Atomic |
Multi-layer corpus annotation platform. |
BFSU Collocator |
A collocation analysis toolkit |
BFSU English Sentence Segmenter |
A simple sentence segmenter |
BFSU Qualitative Coder |
A tool for manual coding of corpora |
BFSU Sentence Collector |
A pedagogic concordancer |
BFSU Stanford Parser |
A simple parser |
BNCWeb |
BNCweb is a web-based client program for searching and retrieving lexical, grammatical and textual data from the British National Corpus (BNC). |
BootCat |
Tool for crawling and compiling data from the web with a list of seed words. |
Bow |
Statistical Language Modeling, Text Retrieval, Classification and Clustering |
BSFU ParaConc |
A parallel concordancer |
BSFU PowerConc |
A fairly powerful concordancer |
BSFU Stanford POS Tagger |
A PoS tagger |
CasualConc |
CasualConc is a concordance program that runs natively on Mac 10.9 or late |
Chared |
Tool for detecting the character encoding of a text |
Chi-Square and Log Likelihood Calculator |
A simple tool for calculating Chi-squared and LL |
CLaRK |
XML Based System For Corpora Development |
CLAWS POS-Tagger |
CLAWS- POS Tagger |
CLiC |
A corpus tool to support the analysis of literary texts. |
Colligator 2.0 |
A colligation query/analysis toolkit |
Collocate |
Tool for the extraction of concordances and collocations |
Concordance Randomizer |
A concordance randomizer |
Concordancer |
Online tool for frequency counts and text clouds |
CorpKit |
An advanced modern corpus toolkit with an emphasis on visualization and annotated corpora. |
CorporaCoCo |
A set of R functions used to compare co-occurrence between corpora |
Corpus Presenter |
Tree tagger and corpus analysis software |
Corpus-Tools |
Text annotation and analysis tool |
CorpusExplorer |
A complex corpus analysis toolkit combining 45 interactive tools. |
CorpusSearchLite |
Searches parsed corpora in the Penn Treebank format |
CPQWeb |
Overview of and access to a wide range of corpora |
DART |
An annotation tool and research environment for annotating dialogues. |
DeTagging Tool |
A tool that strips annotation/tags from files |
Dexter |
Tool for text annotation |
DISCO |
Corpus pre-processing tool for a variety of languages that Dallows to retrieve the semantic similarity between arbitrary words and phrases |
ELAN |
Transcription and annotation of sound or video files |
EncodeAnt |
Tool for the detection and conversion of character encodings |
EXMARaLDA |
Tool for transcription, annotation, corpus analysis of spoken data |
FireAnt |
Social media analysis toolkit |
Flesh PC |
Calculating Flesh-scores |
FrameNet |
Dictionary of more than 10,000 word senses, tagged for semantic roles (according to Fillmorean Frame Semantics) |
gensim |
Deep learning via word2vec |
Google Ngrams |
An ngram-viewer for the whole of Google Books |
GraphColl |
Tool for building and exploring networks of linguistic collocations |
Gsearch |
Tool for syntactic pattern matching |
HeidelGram Web-Based Tools |
Basic corpus analysis toolkit for the HeidelGram Corpus |
HeidelTime |
A multilingual, domain-sensitive temporal tagger |
HGSimpleCorpusNetwork |
Batch frequency analysis on corrupted (e.g. OCR) corpus data and generation of network analysis data. |
HTST Samuels |
Historical Thesaurus Semantic Tagger via web-interface |
ICARUS |
Search and visualization tool for dependency trees |
ICEweb |
A tool for compiling, downloading, and analyzing web corpora in accordance with the ICE |
IMS Corpus Workbench |
Tool for sorting frequencies in corpora |
jTokenizer |
Tokenizing natural language |
JusText |
Tool for removing boilerplate content, such as navigation links, headers, and footers from HTML pages |
Kaleidographic |
A dynamic and interactive visualization tool for multivariate data. |
KAT Tool |
Grouping patterns based on search terms |
kdiff3 |
KDiff3 is a diff and merge program. |
Keyword Plus |
A keyword generation/analysis tool |
Khepri |
A view-based toolfor exploring (historical sociolinguistic) data |
KoGra-R |
An R-based online tool that provides statistical measures for corpus-based frequencies |
LancsBox |
The Lancaster Desktop Corpus Toolbox; Software package for the analysis of language data and corpora |
LEXA |
A complex lemmatizer. |
LexisNexis |
A database containing (new and old) news articles. They also have other (business) data. |
LightSide |
A machine learning workbench. |
Linguistica |
Word segmentation and morphological analysis? |
MALLET |
Package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to text |
MAT – Multidemensional Analysis Tagger |
A tagger for MDA (Biber et al.) |
MLCT |
Tool for building and processing corpora |
MonoConc Esy |
Concordancing and text search tool that allows primary and secondary concordancing |
MorphAdorner |
Tool for performing morphological tagging of texts |
Natural Language Toolkit |
Platform for building Python programs to work with human language data |
NooJ |
Tags texts and corpora (i.e. sets of text files) at the Orthographical, Lexical, Morphological, Syntactic and Semantic levels |
NoSketch Engine |
Word sketches, thesaurus, keyword computation, corpus creation |
Onion |
Tool for removing duplicate parts from large collections of texts |
Online Graded Text Editor |
Tool for profiling a text’s vocabulary level and complexity |
OpenConc |
Tool for concordancing |
PALinkA |
Annotation tool |
ParaConc |
A bilingual/multilingual concordancer |
Pareidoscope |
Pareidoscope is a collection of tools for determining the association between arbitrary linguistic structures, such as collocations, collostructions or between structures. |
PatCount |
A pattern counting tool with powerful statistic capabilities and regex support |
Pattern Builder |
A tool helping with regular expressions and PoS tags |
Pepper |
Conversion between linguistic formats, e.g. from TEI to ANNIS to Tiger XML to EXMARaLDA. |
Phonological CorpusTools (PCT) |
Phonological analysis on transcribed corpora |
PhraseContext |
Tool for wordlists, concordancing, collocation, TTR, |
Praaline |
Praaline is a system for metadata management, annotation, visualisation and analysis of spoken language corpora. |
PRAAT |
A tool for doing phonetics by computer |
ProtAnt |
Tool for prototypical text analysis |
pysupersensetagger |
Analyses texts for MWE and supersenses. |
PyXMLConc |
Concordancer for XML files with automatic tag and attribute detection. |
Query Tool for the Edenburgh Associative Thesaurus |
A query tool for the EAT |
Readability Analyzer |
A tool for generating various readability statistics |
RSTTool |
Tool that can annotate texts for constituency and rhetorical structure |
Salt |
Meta models for linguistic data. |
SarAnt |
Tool for batch search and replacing |
SegmentAnt |
Tool for the segmentation of Japanese and Chinese |
Shinyconc |
ShinyConc is a framework for generating custom web-based concordancers and is written in R and R Shiny. |
Simple Concordance Program |
Tool for concordance and word listing that works with many languages |
SketchEngine |
Word sketches, thesaurus, keyword computation, corpus creation |
SpiderLing |
Software for obtaining text from the web useful for building text corpora |
SPre |
Tool for segmenting and annotating texts |
Stanford Log-linear POS Tagger |
POS Tagger (with Penn Treebank Tagset) for English, Arabic, Chinese, German |
Stanford Topic Modeling Toolbox |
The Stanford Topic Modeling Toolbox (TMT) allows users to perform topic modeling on texts imported from spreadsheets. It supports both LDA and labelled LDA. |
Stylo for R |
Tool for computational stylistic analysis (authorship attribution, genre analysis) |
Sub-Corpus Creator |
A tool for creating sub-corpora based on search searchs and metadata |
Synpathy |
Tool for manual syntactic annotation |
TAACO |
TAACO is a tool that calculates 150 indices of textual/lexical cohesion. |
TAALES |
TAALES measures over 400 indices of lexical sophistication. |
TagAnt |
Part-of-speech tagging tool built on Tree Tagger |
Tagxedo |
A tool for generating word clouds. |
TASX-Annotator |
Tool for multilevel annotation and transcription of (multi-channel) video and audio data. |
Text Analysis Computing Tools (TACT) |
A simple, fairly old concordancer. |
Text Variation Explorer |
The Text Variation Explorer TVE is a tool for exploring the effect of window size on various common linguistic measures. It visualizes these measures and allows for PCA/Cluster analysis. |
Text Visualization Browser |
A survey/gallery of text visualizations |
Textanz |
Language analysis program that produces frequency lists, word lists, parts of speech tags. |
TextArc |
A tool for visualizing the structure of texts. |
TextDirectory |
TextDirectory is a tool for aggregating text files based on various filters and transformation functions. |
Textplot |
A tool for mapping a document into a network of terms in order to visualize the topic structure. |
TextSmith Tools |
A tool for genre-informed phraseological profiles |
TextSTAT |
Tool for creation and manipulation of linguistic data from different languages |
The (Phonetic) Transcription Editor |
An editor for creating phonetic transcriptions |
The Simple Corpus Tool |
A corpus analysis toolkit that supports XML annotations. |
The Simple PoS Tagger |
A simply PoS-tagger utilizing Perl Lingua::EN:Tagger |
The SPAADIA concordancer |
A concordancer for the SPAADIA corpus |
The Text Feature Analyser |
A tool for investigating textual features and various meassures |
Thesaurus.com |
English language thesaurus with links to English dictionary and translation sites. |
TigerSearch |
Tool for searching syntactically and POS-tagged corpora |
TnT – Thorsten Brants’s PoS Tagger |
A simple PoS-Tagger |
Tree Editor TrEd 2.0 |
Graphical editor and viewer for tree-like structures. |
TreeTagger |
Tool for annotating text with part-of-speech and lemma information |
TurboParser |
Multilingual dependency parser with linear programming |
Tweet NLP |
Tweet tokenizer, POS Tagger, hierarchical word clusters, and a dependency parser for tweets, along with annotated corpora and web-based annotation tools. Clusters: http://www.cs.cmu.edu/~ark/TweetNLP/cluster_viewer.html |
TXM |
XML & TEI compatible text analysis software based on TreeTagger, the CQP search engine and the R statistical environment. |
UAM CorpusTool |
Text annotation tool and statistics for various types of linguistic analysis and multilayer annotation |
UAM ImageTool |
Image annotation tool for visual data corpora |
Unitok |
Tool that splits texts into tokens |
VARD |
Spelling variant detection and deletion in historical corpora (particularly EModE) |
VariAnt |
Tool for the detection of spelling variants |
Voyant |
A web-based reading/analysis toolkit for digital texts. |
VU Amsterdam Metaphor Identification Corpus |
Corpus tool for metaphor identification |
WConcord 3.0 |
A full featured concordancer |
WebLicht |
WebLicht is an execution environment for automatic annotation of text corpora embedded with the CLARIN-D project. |
Wmatrix |
Tool for corpus analysis and comparison |
WordCruncher |
A tool for analyzing ebooks. |
WordFish |
Extract political positions from text documents. |
Wordscores |
A tool (approach) to extract dimensional information from political texts |
Wordsmith |
One of the most established corpus toolkits providing a variety of functionality |
Wordstatix |
Corpus analysis tool |
Worldbuilder |
Tool for annotation and visualisation in analysis applying text-world-theory |
Wordle |
A tool for generating word clouds. |
Xaira |
Indexing and analysis of XML resources, |
YACSI Chinese Tokeniser / PoS Tagger |
A Chinese tokenizer and PoS tagger |
Gephi |
A toolkit for network analysis |
DocuScope |
A tool for computer-aided rhetorical anyalysis |
juxta |
Comparing and collating multiple witnesses to single textual works |
WordHoard |
Close reading and scholarly analysis of deeply tagged texts |
Intelligent Archive |
Managing corpora for stylometry |
Twarc |
A command line tool (and Python library) for archiving Twitter JSON |
WebAnno |
A web-based annotation tool |
Coh-Metrix |
A web-based system to compute cohesion and coherence metrics. |
LIWC |
A tool that tries to compute scores for different emotions, thinkings styles, and social concerns. |
ANVIL |
A tool for video annoation. |
LDA-Toolkit |
A toolkit for linguistic discourse and image analysis. |
FLAIR (2.0) |
An online tool for language teachers and learners that analyzes grammatical constructions and readability on the fly. |
DisMo |
An automatic multi-level annotator for spoken language corpora. |
TagCrowd |
A simple tool for generating tag/word clouds online |
MMAX2 |
A multi-level annotation tool |
KorAP |
A complex platform for corpus analysis developed at the IDS in Mannheim |
kfNgram |
A simple tool for generating n-grams |
MAXQDA |
Sophisticated QDA software that works with multimodal data and supports mixed methods approaches |
ATLAS.ti |
A sophistaticated QDA software for mixed methods approaches |
Pipoca (formerly openQDA) |
A web-based QDA software |
f4analyse |
QDA software specifically geared towards interview (spoken) data |
f4transkript |
Software for transcribing audio data |
CATMA (Computer Assisted Text Markup and Analysis) |
A complex annotation and analysis package |
ANC2go |
A web service that allows users to create custom sub-corpora of the ANC |
CoMOn |
A tooil for corpus matching analysis |