In the last ten years, significant advances in deep learning models and the development of Large Language Models (LLMs) have revolutionized the fields of computational linguistics (CL) and natural language processing (NLP). In turn, this has led to a complete re-assessment of the language resources and evaluation practices necessary for training LLMs and analyzing their outputs.
In particular, the availability of very large amounts of unstructured data for training foundational models has come into focus, while the value of high-quality structured linguistic data with rich annotations at various levels of linguistic analysis has been downplayed by comparison. However, as CL and NLP practitioners engage further with LLMs and debate their strengths and weaknesses, the importance of high-quality, structured linguistic data has been re-emphasized.
The proposed workshop can be seen as related to the Treebanks and Linguistic Theories (TLT) conference series and the more recent SyntaxFest venue. The Workshop on Structured Linguistic Data is conceived as both a continuation of this tradition and an adaptation to the new realities of an LLM-dominated research landscape.
Topics include but are not limited to:
Presentations: