Publications
Books
- Proisl, Thomas. 2019. The Cooccurrence of Linguistic Structures. Erlangen: FAU University Press. https://doi.org/10.25593/978-3-96147-201-7. [bib, Pareidoscope]
Awards:
- STAEDTLER Promotionspreis
- GSCL-Promotionspreis zum Gedenken an Wolfgang Hoeppner
Journal articles
- Proisl, Thomas. 2022. “Use Words, Not Constructions! A New Perspective on the Unit of Analysis in Collostructional Analysis.” International Journal of Corpus Linguistics 27 (3): 349–79. https://doi.org/10.1075/ijcl.20072.pro. [bib]
- Büttner, Andreas, Friedrich Michael Dimpel, Stefan Evert, Fotis Jannidis, Steffen Pielström, Thomas Proisl, Isabella Reger, Christof Schöch, and Thorsten Vitt. 2017. “‚Delta‘ in der stilometrischen Autorschaftsattribution.” Zeitschrift für digitale Geisteswissenschaften. https://doi.org/10.17175/2017_006. [bib]
- Evert, Stefan, Thomas Proisl, Fotis Jannidis, Isabella Reger, Steffen Pielström, Christof Schöch, and Thorsten Vitt. 2017. “Understanding and Explaining Delta Measures for Authorship Attribution.” Digital Scholarship in the Humanities 32 (suppl_2): ii4–ii16. https://doi.org/10.1093/llc/fqx023. [bib]
- Uhrig, Peter, and Thomas Proisl. 2012. “Less Hay, More Needles – Using Dependency-Annotated Corpora to Provide Lexicographers with More Accurate Lists of Collocation Candidates.” Lexicographica 28 (1): 141–80. https://doi.org/10.1515/lexi.2012-0009. [bib, pdf]
Articles in conference proceedings and collections
- Adrian, Axel, Nathan Dykes, Stephanie Evert, Philipp Heinrich, Michael Keuchen, and Thomas Proisl. 2022. “Manuelle und automatische Anonymisierung von Urteilen.” In Digitalisierung von Zivilprozess und Rechtsdurchsetzung, edited by Axel Adrian, Michael Kohlhase, Stephanie Evert, and Martin Zwickel, 173–97. Berlin: Duncker & Humblot. https://doi.org/10.3790/978-3-428-58644-8. [bib]
- Blombach, Andreas, Natalie Dykes, Philipp Heinrich, Besim Kabashi, and Thomas Proisl. 2020. “A Corpus of German Reddit Exchanges (GeRedE).” In Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), 6310–6. Marseille: European Language Resources Association. https://www.aclweb.org/anthology/2020.lrec-1.774. [bib, pdf, GeRedE]
- Proisl, Thomas, and Gabriella Lapesa. 2020. “KLUMSy@KIPoS: Experiments on Part-of-Speech Tagging of Spoken Italian.” In Proceedings of the 7th Evaluation Campaign of Natural Language Processing and Speech Tools for Italian (EVALITA 2020), edited by Valerio Basile, Danilo Croce, Maria Di Maro, and Lucia C. Passaro. CEUR-WS.org. http://ceur-ws.org/Vol-2765/paper140.pdf. [bib, pdf, video, model]
- Proisl, Thomas, Natalie Dykes, Philipp Heinrich, Besim Kabashi, Andreas Blombach, and Stefan Evert. 2020. “EmpiriST Corpus 2.0: Adding Manual Normalization, Lemmatization and Semantic Tagging to a German Web and CMC Corpus.” In Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), 6142–8. Marseille: European Language Resources Association. https://www.aclweb.org/anthology/2020.lrec-1.754. [bib, pdf, EmpiriST corpus]
- Dimpel, Friedrich Michael, and Thomas Proisl. 2019. “Gute Wörter für Delta: Verbesserung der Autorschaftsattribution durch autorspezifische distinktive Wörter.” In DHd 2019 Digital Humanities: multimedial & multimodal. Konferenzabstracts, edited by Patrick Sahle, 296–99. Frankfurt am Main. https://doi.org/10.5281/zenodo.4622117. [bib, pdf]
- Proisl, Thomas, Peter Uhrig, Philipp Heinrich, Andreas Blombach, Sefora Mammarella, Natalie Dykes, and Besim Kabashi. 2019. “The_Illiterati: Part-of-Speech Tagging for Magahi and Bhojpuri Without Even Knowing the Alphabet.” In Proceedings of the First International Workshop on NLP Solutions for Under Resourced Languages (NSURL 2019), 73–79. Trento: Association for Computational Linguistics. https://www.aclweb.org/anthology/2019.nsurl-1.11. [bib, pdf, model]
- Kabashi, Besim, and Thomas Proisl. 2018. “Albanian Part-of-Speech Tagging: Gold Standard and Evaluation.” In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), edited by Nicoletta Calzolari, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Koiti Hasida, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis, and Takenobu Tokunaga, 2593–9. Miyazaki: European Language Resources Association. https://www.aclweb.org/anthology/L18-1412. [bib, pdf]
- Proisl, Thomas. 2018. “SoMeWeTa: A Part-of-Speech Tagger for German Social Media and Web Texts.” In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), edited by Nicoletta Calzolari, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Koiti Hasida, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis, and Takenobu Tokunaga, 665–70. Miyazaki: European Language Resources Association. https://www.aclweb.org/anthology/L18-1106. [bib, pdf, SoMeWeTa]
- Proisl, Thomas, Stefan Evert, Fotis Jannidis, Christof Schöch, Leonard Konle, and Steffen Pielström. 2018. “Delta vs. N-Gram Tracing: Evaluating the Robustness of Authorship Attribution Methods.” In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), edited by Nicoletta Calzolari, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Koiti Hasida, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis, and Takenobu Tokunaga, 3309–14. Miyazaki: European Language Resources Association. https://www.aclweb.org/anthology/L18-1523. [bib, pdf]
- Proisl, Thomas, Philipp Heinrich, Besim Kabashi, and Stefan Evert. 2018. “EmotiKLUE at IEST 2018: Topic-Informed Classification of Implicit Emotions.” In Proceedings of the 9th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, edited by Alexandra Balahur, Saif M. Mohammad, Veronique Hoste, and Roman Klinger, 235–42. Brussels: Association for Computational Linguistics. https://doi.org/10.18653/v1/W18-6234. [bib, pdf, EmotiKLUE]
- Uhrig, Peter, Stefan Evert, and Thomas Proisl. 2018. “Collocation Candidate Extraction from Dependency-Annotated Corpora: Exploring Differences Across Parsers and Dependency Annotation Schemes.” In Lexical Collocation Analysis: Advances and Applications, edited by Pascual Cantos-Gómez and Moisés Almela-Sánchez, 111–40. Cham: Springer. https://doi.org/10.1007/978-3-319-92582-0_6. [bib]
- Evert, Stefan, Peter Uhrig, Sabine Bartsch, and Thomas Proisl. 2017. “E-VIEW-alation – a Large-Scale Evaluation Study of Association Measures for Collocation Identification.” In Electronic Lexicography in the 21st Century. Proceedings of the eLex 2017 Conference, edited by Iztok Kosem, Carole Tiberius, Miloš Jakubíček, Jelena Kallas, Simon Krek, and Vít Baisa, 531–49. Leiden: Lexical Computing. https://elex.link/elex2017/wp-content/uploads/2017/09/paper32.pdf. [bib, pdf, video, E-VIEW-alation]
- Proisl, Thomas, Philipp Heinrich, Stefan Evert, and Besim Kabashi. 2017. “Translation Inference Across Dictionaries via a Combination of Graph-Based Methods and Co-Occurrence Statistics.” In Proceedings of the LDK 2017 Workshops: 1st Workshop on the OntoLex Model (OntoLex-2017), Shared Task on Translation Inference Across Dictionaries & Challenges for Wordnets, edited by John P. McCrae, Francis Bond, Paul Buitelaar, Philipp Cimiano, Thierry Declerck, Jorge Gracia, Ilan Kernerman, Elena Montiel-Ponsoda, Noam Ordan, and Maciej Piasecki, 94–102. Galway: CEUR-WS.org. http://ceur-ws.org/Vol-1899/TIAD17_paper_1.pdf. [bib, pdf]
- Evert, Stefan, Fotis Jannidis, Friedrich Michael Dimpel, Christof Schöch, Steffen Pielström, Thorsten Vitt, Isabella Reger, Andreas Büttner, and Thomas Proisl. 2016. “‚Delta‘ in der stilometrischen Autorschaftsattribution.” In DHd 2016. Konferenzabstracts, 61–74. Leipzig: Nisaba. https://doi.org/10.5281/zenodo.4645227. [bib, pdf]
- Kabashi, Besim, and Thomas Proisl. 2016. “A Proposal for a Part-of-Speech Tagset for the Albanian Language.” In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), edited by Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asunción Moreno, Jan Odijk, and Stelios Piperidis, 4305–10. Portorož: European Language Resources Association. https://www.aclweb.org/anthology/L16-1682. [bib, pdf]
- Proisl, Thomas, and Peter Uhrig. 2016. “SoMaJo: State-of-the-Art Tokenization for German Web and Social Media Texts.” In Proceedings of the 10th Web as Corpus Workshop (WAC-X) and the EmpiriST Shared Task, edited by Paul Cook, Stefan Evert, Roland Schäfer, and Egon Stemle, 57–62. Berlin: Association for Computational Linguistics. https://doi.org/10.18653/v1/W16-2607. [bib, pdf, SoMaJo]
- Evert, Stefan, Thomas Proisl, Thorsten Vitt, Christof Schöch, Fotis Jannidis, and Steffen Pielström. 2015. “Towards a Better Understanding of Burrows’s Delta in Literary Authorship Attribution.” In Proceedings of the Fourth Workshop on Computational Linguistics for Literature (CLfL 2015), edited by Anna Feldman, Anna Kazantseva, Stan Szpakowicz, and Corina Koolen, 79–88. Denver, CO: Association for Computational Linguistics. https://doi.org/10.3115/v1/W15-0709. [bib, pdf]
- Plotnikova, Nataliia, Gabriella Lapesa, Thomas Proisl, and Stefan Evert. 2015. “SemantiKLUE: Semantic Textual Similarity with Maximum Weight Matching.” In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), edited by Daniel M. Cer, David Jurgens, Preslav Nakov, and Torsten Zesch, 111–16. Denver, CO: Association for Computational Linguistics. https://doi.org/10.18653/v1/S15-2020. [bib, pdf]
- Evert, Stefan, Thomas Proisl, Paul Greiner, and Besim Kabashi. 2014. “SentiKLUE: Updating a Polarity Classifier in 48 Hours.” In Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), edited by Preslav Nakov and Torsten Zesch, 551–55. Dublin: Association for Computational Linguistics. https://doi.org/10.3115/v1/S14-2096. [bib, pdf]
- Proisl, Thomas, Stefan Evert, Paul Greiner, and Besim Kabashi. 2014. “SemantiKLUE: Robust Semantic Similarity at Multiple Levels Using Maximum Weight Matching.” In Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), edited by Preslav Nakov and Torsten Zesch, 532–40. Dublin: Association for Computational Linguistics. https://doi.org/10.3115/v1/S14-2093. [bib, pdf]
- Greiner, Paul, Thomas Proisl, Stefan Evert, and Besim Kabashi. 2013. “KLUE-CORE: A Regression Model of Semantic Textual Similarity.” In Proceedings of the Second Joint Conference on Lexical and Computational Semantics (*SEM 2013), edited by Mona T. Diab, Timothy Baldwin, and Marco Baroni, 181–86. Atlanta, GA: Association for Computational Linguistics. http://aclweb.org/anthology/S13-1026. [bib, pdf]
- Proisl, Thomas, Paul Greiner, Stefan Evert, and Besim Kabashi. 2013. “KLUE: Simple and Robust Methods for Polarity Classification.” In Proceedings of the 7th International Workshop on Semantic Evaluation (SemEval 2013), edited by Mona T. Diab, Timothy Baldwin, and Marco Baroni, 395–401. Atlanta, GA: Association for Computational Linguistics. http://aclweb.org/anthology/S13-2065. [bib, pdf]
- Proisl, Thomas. 2012. “Automatically Exploring Lexical Tendencies in English.” In Corpus Linguistics and Variation in English: Theory and Description, edited by Joybrato Mukherjee and Magnus Huber, 143–54. Amsterdam, New York: Rodopi. https://doi.org/10.1163/9789401207713_012. [bib]
- Proisl, Thomas, and Peter Uhrig. 2012. “Efficient Dependency Graph Matching with the IMS Open Corpus Workbench.” In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC 2012), edited by Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Mehmet Ugur Dogan, Bente Maegaard, Joseph Mariani, Jan Odijk, and Stelios Piperidis, 2750–6. Istanbul: European Language Resources Association. http://www.lrec-conf.org/proceedings/lrec2012/pdf/709_Paper.pdf. [bib, pdf, CWB-treebank, Treebank.info]
- Proisl, Thomas, and Besim Kabashi. 2010. “Using High-Quality Resources in NLP: The Valency Dictionary of English as a Resource for Left-Associative Grammars.” In Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC 2010), edited by Nicoletta Calzolari, Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Mike Rosner, and Daniel Tapias, 3878–81. Valletta: European Language Resources Association. http://www.lrec-conf.org/proceedings/lrec2010/pdf/62_Paper.pdf. [bib, pdf]
- Handl, Johannes, Besim Kabashi, Thomas Proisl, and Carsten Weber. 2009. “JSLIM – Computational Morphology in the Framework of the SLIM Theory of Language.” In State of the Art in Computational Morphology. Workshop on Systems and Frameworks for Computational Morphology (SFCM 2009), edited by Cerstin Mahlow and Michael Piotrowski, 10–27. Berlin, Heidelberg, New York: Springer. https://doi.org/10.1007/978-3-642-04131-0_2. [bib]
Talks and presentations
- Blombach, Andreas, Stephanie Evert, Fotis Jannidis, Leonard Konle, Steffen Pielström, and Thomas Proisl. 2022. “Exploring Lexical Complexities.” In Digital Humanities 2022. Conference Abstracts, 130–34. Tokyo. https://dh2022.dhii.asia/dh2022bookofabsts.pdf. [bib, pdf]
- Proisl, Thomas. 2022. “Ein semantischer Tagger für das Deutsche.” Presentation at Oberseminar Computerlinguistik. Erlangen. [bib]
- Proisl, Thomas. 2022. “The Statistical Analysis of Cooccurrences: From Collocations to Arbitrary Structures.” Presentation in the GSCL Research Talks series. https://gscl.org/en/events/talks/februar-2022-research-talk. [bib, pdf]
- Blombach, Andreas, Thomas Proisl, Stefan Evert, Philipp Heinrich, and Natalie Dykes. 2021. “Into the Perryverse: A CL Journey to the Realm of Lexical Complexity.” Presentation at To boldly go: Corpus approaches to the language of Science Fiction. Dortmund. [bib, video, pdf]
- Blombach, Andreas, and Thomas Proisl. 2020. “Unexpected Complexity and Romance in Disguise: The Case of Science Fiction Novels and Fanfiction.” Presentation at 9th Hildesheim-Göttingen-Workshop on DH and CL. Göttingen. [bib]
- Proisl, Thomas, Natalie Dykes, Philipp Heinrich, Besim Kabashi, and Stefan Evert. 2020. “EmpiriST Corpus 2.0: Adding Normalization, Lemmatization and Semantic Tags to a German Web and Social Media Corpus.” In 42. Jahrestagung der Deutschen Gesellschaft für Sprachwissenschaft. Sprachliche Diversität: Theorien, Methoden, Ressourcen, 320. Hamburg. https://www.zfs.uni-hamburg.de/dgfs2020/programm/abstracts/dgfs2020-clp-proisl.pdf. [bib, poster, pdf]
- Blombach, Andreas, Natalie Dykes, Philipp Heinrich, and Thomas Proisl. 2019. “A New German Reddit Corpus.” In Proceedings of the 15th Conference on Natural Language Processing (KONVENS 2019), 278–79. Erlangen: German Society for Computational Linguistics & Language Technology. https://corpora.linguistik.uni-erlangen.de/data/konvens/proceedings/papers/kaleidoskop/BlombachETC.pdf. [bib, pdf]
- Diwersy, Sascha, Stefan Evert, Philipp Heinrich, and Thomas Proisl. 2019. “Means of Productivity – on the Statistical Modelling of the Restrictedness of Lexico-Grammatical Patterns.” In EUROPHRAS 2019. Productive Patterns in Phraseology, 20–21. Santiago de Compostela. https://docs.wixstatic.com/ugd/ce1970_d8f9d9a6d5e542b78ffffaeef8e07c5c.pdf. [bib, pdf]
- Proisl, Thomas, and Philipp Heinrich. 2019. “NLP for German CMC Data.” Poster presentation at Amazon Research Days. Berlin. [bib, poster, pdf]
- Proisl, Thomas, Natalie Dykes, Besim Kabashi, Philipp Heinrich, and Andreas Blombach. 2019. “NLP for German CMC Texts: Tokenization, POS Tagging, and a New Gold Standard for Lemmatization.” Presentation at Annotation of Non-Standard Corpora. Bamberg. [bib]
- Proisl, Thomas, Leonard Konle, Stefan Evert, and Fotis Jannidis. 2019. “Dependenzbasierte syntaktische Komplexitätsmaße.” In DHd 2019 Digital Humanities: multimedial & multimodal. Konferenzabstracts, edited by Patrick Sahle, 270–73. Frankfurt am Main. https://doi.org/10.5281/zenodo.4622254. [bib, poster, pdf]
- Evert, Stefan, Thomas Proisl, Peter Uhrig, and Maria Khokhlova. 2018. “Contrastive Collocation Analysis – a Comparison of Association Measures Across Three Different Languages Using Dependency-Parsed Corpora.” In The XVIII EURALEX International Congress. Lexicography in Global Contexts, 44–46. Ljubljana. http://euralex2018.cjvt.si/wp-content/uploads/sites/6/2018/07/Euralex2018_book_of_abstracts_FINAL.pdf. [bib, pdf]
- Proisl, Thomas, and Stefan Evert. 2018. “Delta vs. N-Gram-Tracing: Wie robust ist die Autorschaftsattribuierung?” In DHd 2018. Kritik der digitalen Vernunft. Konferenzabstracts, 366–69. https://doi.org/10.5281/zenodo.4622520. [bib, poster, pdf]
- Büttner, Andreas, and Thomas Proisl. 2016. “Delta und Merkmalsselektion: Welche Wörter unterscheiden arabisch-lateinische Übersetzer?” Presentation at <philtag n="13"/>. Würzburg. http://kallimachos.de/kallimachos/images/kallimachos/f/f5/Abstract%C3%9Cbersetzer.pdf. [bib, pdf]
- Evert, Stefan, and Thomas Proisl. 2016. “Burrows’s Delta verstehen.” Presentation at <philtag n="13"/>. Würzburg. http://kallimachos.de/kallimachos/images/kallimachos/b/bf/AbstractFAU.pdf. [bib, pdf]
- Evert, Stefan, Fotis Jannidis, Thomas Proisl, Thorsten Vitt, Christof Schöch, Steffen Pielström, and Isabella Reger. 2016. “Outliers or Key Profiles? Understanding Distance Measures for Authorship Attribution.” In Digital Humanities 2016. Conference Abstracts, 188–91. Kraków. http://dh2016.adho.org/abstracts/253. [bib, pdf]
- Bartsch, Sabine, Stefan Evert, Thomas Proisl, and Peter Uhrig. 2015. “(Association) Measure for Measure: Comparing Collocation Dictionaries with Co-Occurrence Data for a Better Understanding of the Notion of Collocation.” In ICAME 36. Words, Words, Words – Corpora and Lexis, 68–70. https://www.uni-trier.de/fileadmin/fb2/ANG/ICAME36/ICAME_36_abstracts_booklet.pdf. [bib, pdf]
- Evert, Stefan, Thomas Proisl, Christof Schöch, Fotis Jannidis, Steffen Pielström, and Thorsten Vitt. 2015. “Explaining Delta, or: How Do Distance Measures for Authorship Attribution Work?” Presentation at Corpus Linguistics 2015. Lancaster. [bib, pdf]
- Proisl, Thomas. 2015. “Maschinelles Lernen mit Python.” Presentation at DARIAH-Methodenworkshop Natural Language Processing für Literaturwissenschaftler. Würzburg. [bib]
- Proisl, Thomas, and Peter Uhrig. 2013. “Korpora mit dem Treebank.info-Projekt syntaktisch parsen und abfragen.” Poster presentation at DGfS 2013. Potsdam. [bib, pdf]
- Proisl, Thomas, and Peter Uhrig. 2012. “Using Dependency-Annotated Corpora to Improve Collocation Extraction.” In ICAME 33. Corpora at the Centre and Crossroads of English Linguistics, 210–12. Leuven. http://wwwling.arts.kuleuven.be/icame33/_pdf/icame33abstracts.pdf. [bib, pdf]
- Uhrig, Peter, and Thomas Proisl. 2012. “A Fast and User-Friendly Interface for Large Treebanks.” Presentation at Otto-Friedrich-Universität Bamberg. [bib]
- Uhrig, Peter, and Thomas Proisl. 2012. “Geparste Korpora für alle!” Workshop at GAL-Kongress 2012. Erlangen. [bib, pdf]
- Uhrig, Peter, and Thomas Proisl. 2012. “Sprachstrukturen effizient speichern, verarbeiten und abfragen.” Presentation at Vortragsreihe Digital Humanities. Erlangen. [bib]
- Proisl, Thomas, and Peter Uhrig. 2011. “Verbesserung der Kollokationsextraktion durch Verwendung dependenzannotierter Korpora.” Presentation at GAL Sektionentagung 2011. Bayreuth. [bib, pdf]
- Uhrig, Peter, and Thomas Proisl. 2011. “A Fast and User-Friendly Interface for Large Treebanks.” Presentation at Corpus Linguistics 2011. Birmingham. https://www.birmingham.ac.uk/documents/college-artslaw/corpus/conference-archives/2011/abs-207.pdf. [bib, pdf]
- Uhrig, Peter, and Thomas Proisl. 2011. “The Erlangen Treebank.” Presentation at Vortragsreihe Approaches to Corpus Linguistics. Erlangen. [bib]
- Uhrig, Peter, and Thomas Proisl. 2011. “The Treebank.info Project.” Presentation at ICAME 32. Oslo. [bib]
- Uhrig, Peter, Thomas Herbst, and Thomas Proisl. 2010. “Die Erlangen Valency Patternbank.” Software demonstration at Messe zur elektronischen Lexikografie, 46. Jahrestagung des Instituts für Deutsche Sprache (IDS). Mannheim. [bib]