Methodology and fundamentals: My methodological work includes the analysis and improvement of established methods, e.g. Delta measures for authorship attribution, as well as the development of novel methods such as the generalized cooccurrence model. I am also involved in the creation of fundamental linguistic resources such as a part-of-speech tagset for Albanian.
Actual implementation: My work has a strong practical component and I am (co-)developer of several tools and web interfaces, some of which represent the state-of-the art in their respective fields.
Evaluation: As part of my research, I assess the performance of methods and tools in realistic settings, both to find out which established methods and tools work best and to evaluate my own work.
I have a strong interest in natural language processing (NLP). So far, I have been active in the following areas: tokenization, part-of-speech tagging, unsupervised dependency parsing, sentiment analysis, and semantic similarity. I am (co-)developer of two state-of-the-art tools (tokenizer and part-of-speech tagger) for German web and social media texts.
Proisl, Thomas, and Peter Uhrig. 2016. “SoMaJo: State-of-the-Art Tokenization for German Web and Social Media Texts.” In Proceedings of the 10th Web as Corpus Workshop (WAC-X) and the EmpiriST Shared Task, edited by Paul Cook, Stefan Evert, Roland Schäfer, and Egon Stemle, 57–62. Berlin: Association for Computational Linguistics. http://aclweb.org/anthology/W16-2607. [bib]
Kabashi, Besim, and Thomas Proisl. 2016. “A Proposal for a Part-of-Speech Tagset for the Albanian Language.” In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), edited by Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asunción Moreno, Jan Odijk, and Stelios Piperidis, 4305–10. Portorož: European Language Resources Association. http://www.lrec-conf.org/proceedings/lrec2016/pdf/1066_Paper.pdf. [bib]
Proisl, Thomas, Stefan Evert, Paul Greiner, and Besim Kabashi. 2014. “SemantiKLUE: Robust Semantic Similarity at Multiple Levels Using Maximum Weight Matching.” In Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), edited by Preslav Nakov and Torsten Zesch, 532–40. Dublin: Association for Computational Linguistics. http://www.aclweb.org/anthology/S14-2093. [bib]
Evert, Stefan, Thomas Proisl, Paul Greiner, and Besim Kabashi. 2014. “SentiKLUE: Updating a Polarity Classifier in 48 Hours.” In Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), edited by Preslav Nakov and Torsten Zesch, 551–55. Dublin: Association for Computational Linguistics. http://www.aclweb.org/anthology/S14-2096. [bib]
Proisl, Thomas, Paul Greiner, Stefan Evert, and Besim Kabashi. 2013. “KLUE: Simple and Robust Methods for Polarity Classification.” In Proceedings of the 7th International Workshop on Semantic Evaluation (SemEval 2013), edited by Mona T. Diab, Timothy Baldwin, and Marco Baroni, 395–401. Atlanta, GA: Association for Computational Linguistics. http://aclweb.org/anthology/S13-2065. [bib]
Evert, Stefan, Peter Uhrig, Sabine Bartsch, and Thomas Proisl. 2017. “E-VIEW-alation – a Large-Scale Evaluation Study of Association Measures for Collocation Identification.” In Electronic Lexicography in the 21st Century. Proceedings of the ELex 2017 Conference, edited by Iztok Kosem, Carole Tiberius, Miloš Jakubíček, Jelena Kallas, Simon Krek, and Vít Baisa, 531–49. Leiden: Lexical Computing. https://elex.link/elex2017/wp-content/uploads/2017/09/paper32.pdf. [bib, video, E-VIEW-alation]
Uhrig, Peter, and Thomas Proisl. 2012. “Less Hay, More Needles – Using Dependency-Annotated Corpora to Provide Lexicographers with More Accurate Lists of Collocation Candidates.” Lexicographica 28 (1): 141–80. doi:10.1515/lexi.2012-0009. [bib]
Proisl, Thomas, and Peter Uhrig. 2012. “Efficient Dependency Graph Matching with the IMS Open Corpus Workbench.” In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC 2012), edited by Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Mehmet Ugur Dogan, Bente Maegaard, Joseph Mariani, Jan Odijk, and Stelios Piperidis, 2750–6. Istanbul: European Language Resources Association. http://www.lrec-conf.org/proceedings/lrec2012/pdf/709_Paper.pdf. [bib]
Evert, Stefan, Thomas Proisl, Fotis Jannidis, Isabella Reger, Steffen Pielström, Christof Schöch, and Thorsten Vitt. 2017. “Understanding and Explaining Delta Measures for Authorship Attribution.” Digital Scholarship in the Humanities. doi:10.1093/llc/fqx023. [bib, alternative link]
Evert, Stefan, Fotis Jannidis, Friedrich Michael Dimpel, Christof Schöch, Steffen Pielström, Thorsten Vitt, Isabella Reger, Andreas Büttner, and Thomas Proisl. 2016. “‚Delta‘ in der stilometrischen Autorschaftsattribution.” In DHd 2016. Konferenzabstracts, 61–74. Leipzig: Nisaba. http://www.dhd2016.de/abstracts/sektionen-002.html. [bib]
Evert, Stefan, Thomas Proisl, Thorsten Vitt, Christof Schöch, Fotis Jannidis, and Steffen Pielström. 2015. “Towards a Better Understanding of Burrows’s Delta in Literary Authorship Attribution.” In Proceedings of the Fourth Workshop on Computational Linguistics for Literature (CLfL 2015), edited by Anna Feldman, Anna Kazantseva, Stan Szpakowicz, and Corina Koolen, 79–88. Denver, CO: Association for Computational Linguistics. http://www.aclweb.org/anthology/W15-0709. [bib]