LoReLab
Low-Resource Languages Lab at JHU

We focus on minimally supervised ("low-resource") and massively multilingual techniques in machine learning (ML) and natural language processing (NLP). We apply these methods to machine translation, speech recognition, lexicon induction, and historical linguistics. We are also the core of the Universal Morphology (UniMorph) project and the c(ur|re)ators of the Johns Hopkins University Bible Corpus.
We are led by David Yarowsky, ACL Fellow and Treasurer, Professor of Computer Science, and member of the multi-departmental Center for Language and Speech Processing at Johns Hopkins University (JHU), who is also affiliated with the Human Language Technology Center of Excellence.
On campus? Visit us in Hackerman 226.
We are seeking talented undergraduate, PhD, and master's students for several high-impact research projects in multilingual NLP and core methodologies. If interested, please email David and cc Arya. Both are lastname at jhu dot edu.
Lab News
- Accepted to EACL 2023: "Meeting the needs of low-resource languages: Automatic alignments via pretrained models" by Ebrahimi, McCarthy, Oncevay, Ortega, Chiruzzo, Coto-Solano, Gimenez-Lugo, and Kann.
- Accepted to COLING 2022: "Deciphering and Characterizing Out-of-Vocabulary Words for Morphologically Rich Languages" by Botev, McCarthy, Wu, and Yarowsky
- Accepted to LREC 2022: "UniMorph 4.0" and "Hong Kong: Longitudinal and Synchronic Characterisations of Protest News between 1998 and 2020" by McCarthy and Dore
- Two papers accepted to Findings of ACL 2022.
- Accepted as spotlight to ICLR 2022: On the Uncomputability of Partition Functions in Energy-Based Sequence Models by Lin and McCarthy
- Congratulations to Dr. Winston Wu, LoReLab's latest graduate! His dissertation Computational Word Formation and Etymology was successfully defended on 7 January 2022. We congratulate him on his new position at the University of Michigan!
- Long time, no update—papers accepted to EACL and EMNLP this year. Good luck with COVID, all. May your families be well.
- David Yarowsky has been recognized by the ACL with a "Test of Time" award for contributions with long-lasting impact on the community.
- Four papers from the lab will appear at ACL 2020, spanning topics in morphology, machine translation, and multilingual language modeling.
- Winston Wu, Arya McCarthy, and others from JHU won Duolingo's STAPLE shared task on generating comprehensive translation lists in five languages. (overview paper)
- Seven papers from the lab will appear at LREC 2020, presenting resources and analysis in massively multilingual morphology, translation, etymology, grapheme-to-phoneme, typology, and core vocabulary.
- Congratulations to Garrett Nicolai for his new position in the Department of Linguistics, University of British Columbia!
- Presented at UNESCO Language Technologies for All (LT4All): A 1000-language Collaborative Universal Dictionary and Universal Translator by David Yarowsky, Arya D. McCarthy, Garrett Nicolai, Winston Wu, Aaron Mueller, Dylan Lewis, Yingqi Ding, Abhinav Nigam, Emre Ozgu, Debanik Purkayastha, James Scharf and Kenneth Zheng
- LoReLab alumnus John Hewitt was recognized at EMNLP for his best paper runner-up.
Research Interests
Core Learning Techniques
- Self-training
- Cross-language information projection
- Cross-domain knowledge transfer
- Co-training
- Active learning and human computation
- Creative bootstrapping from multiple knowledge sources
Machine Translation
- Translation discovery without aligned bilingual text (unsupervised machine translation)
- Exploiting language universals and language family relationships (linguistic typology)
Natural Language Processing
- Inflectional and derivational morphology
- Word sense disambiguation
- Broad-coverage core NLP tools for 800+ world languages (massively multilingual NLP)
Information Extraction
- Biographic fact extraction
- Characterizing communicants
Publications
We're still adding earlier papers! For now, be sure to check Google Scholar.
Current members
PhD students
- Arya McCarthy
- Aaron Mueller (with Mark Dredze)
- Niyati Bafna
Master's students
- Georgie Botev
- Emre Ozgu
- Jamie Scharf
Undergraduates
- Milind Agarwal
- Kevin Kim
Alumni
(Student co-authors, including undergraduates. Bolded if David advised their dissertation or supervised their postdoc)
- Winston Wu at University of Michigan
- Rachel Wicks
- Amrit Nidhi
- Sabrina Mielke
- Garrett Nicolai at University of British Columbia
- Trevor Lee at DoorDash
- Oliver Adams
- Chris Kirov at Google
- Ryan Cotterell at ETH Zurich
- Dylan Lewis at Peacock TV
- Steven Shearing
- Ryan Newell at Amazon
- Lawrence Wolf-Sonkin at Google
- Patrick Xia
- John Hewitt, now PhD student at Stanford
- John Sylak-Glassman at Meta
- Nidhi Vyas at Apple
- Sarah Mihuc
- Roger Que at Google
- Jin Yong Shin
- Ann Irvine, Head of Data Science at Arceo
- Svitlana Volkova, Senior Research Scientist at Pacific Northwest National Labs
- Mozhi Zhang
- Delip Rao at Amazon (Alexa)
- Elliot F. Drábek at Atreca
- Nikesh Garera at Treebo
- Shane Bergsma
- Charles Schafer at Google
- Gideon Mann, Head of Data Science at Bloomberg
- Silviu Cucerzan, Principal Research Manager at Microsoft Bing
- Richard Wicentowski, Chair of Computer Science at Swarthmore College
- Radu Florian at IBM Research
- Grace Ngai at Hong Kong Polytechnic University
- John Henderson at Mitre