“Computational Models of Idiomaticity”
Institute of Informatics
Federal University of Rio Grande do Sul, Brazil
In this talk I discuss some current trends in technology for computational modeling of idiomatic language. I start with an overview of automatic acquisition of Multiword Expressions (MWEs) from corpora. MWEs like compound nouns and verb particle constructions have proved a challenge for computational analysis, and we will look at some approaches for their automatic identification and classification. In particular models for representing words and MWEs in semantic space, from their automatic construction from corpora to their evaluation will be examined in a multilingual setting. I discuss a comparison of different models in terms of idiomaticity prediction of compound nouns in English and French.
I am a CNPq Fellow and a Reader at the Institute of Informatics, Federal University of Rio Grande do Sul. I was a Visiting Scholar at the Department of Linguistics and Philosophy of the MIT (USA, 2014-2015), a Visiting Scholar at École Normale Supérieure (France, 2014), an Erasmus-Mundus Visting Scholar at Saarland University (Germany, 2012-2013), a Visiting Scholar at the Laboratory of Information and Decision Systems of the MIT (USA, 2011-2012) and at the Computer Science Department, University of Bath (2006-2009). Prior to these I worked as a Senior Researcher in the Department of Language and Linguistics of the University of Essex (2004-2005) and in the Computer Laboratory of the University of Cambridge (2001-2004). I received my PhD in Computer Science also from the University of Cambridge (Computer Laboratory) in 2003. My research interests are in methods for automatic acquisition of linguistic information from data, both for language processing and for cognitive modeling, and language engineering using formalisms like CCG, HPSG, LFG and CFG. This work includes techniques for Multiword Expression treatment using statistical methods and distributional semantic models, and applications like Text Simplification and Question Answering, for languages like English and Portuguese.