• Home
  • About Us
  • Research
  • PhD
  • Resources
  • Tools
  • Summer School
  • Contact
  • Logo

    Resources and Datasets

    A list of free resources and datasets provided to you by UCREL NLP

    Habibi

    Arabic Song Lyrics

    KALIMAT

    Arabic NLP Dataset

    EASC

    Essex Arabic Summaries Corpus

    MultiLing

    Multi-document Summaries Corpora

    ABMC

    Arabic in Business and Management Corpora

    ADD

    Arabic Dialects Dataset

    Annual Reports Corpus

    UK Annual REports Key Sections Corpora

    StratScore

    N-gram list for UK annual report sections

    Strategic Commentary Corpus

    A corpus of UK Annual Reprots Strategic Commentary

    Arabic Diseases Ontology

    Arabic Infectious Diseas Ontology

    Arabic Infectious Diseases Corpus

    A corpus of Arabic Tweets about Infectious Diseases

    CFIE

    2012 - 2020

    COUNTER Urdu Corpus

    COrpus of Urdu News TExt Reuse

    Vard

    2013 - 2015

    Arabic COVID Corpus

    Covid-19 Arabic Tweets

    CLEU Urdu Corpus

    Cross-Language English-Urdu Corpus

    Plant Names and Historical Places

    Data and scripts for extracting plant names and collocates from historical texts

    CLEU Urdu Corpus

    Cross-Language English-Urdu Corpus

    Human Judgements

    Human Judgements of Sentiment Values

    Igbo Translations

    Igbo-English Machine Translations

    Arabic Influenza and Covid

    Influenza and Covid-19 Arabic labeled Tweets

    S-BiDD

    Self-reported BD diagnosis dataset