D02 | Evaluation Metrics for Visual Analytics in Linguistics

Prof. Miriam Butt, University of Konstanz
Email | Website

Miriam Butt

Prof. Daniel Weiskopf, University of Stuttgart
Email | Website

Daniel Weiskopf

Dr. Christin Beck (on leave), University of Konstanz – EmailWebsite

Tafseer Ahmed Khan, University of Konstanz – Email | Website

Within linguistics, the use of large sets of data via a combination of rule-based and stochastic methods is now standardly part of the analysis of language structure. However, though scatter plots, bar or pie charts, and trees as provided by R, for example, are standardly used, novel visual computation techniques have only just begun to be explored. The overall aim of this project is to evaluate whether visual analytics indeed represents a methodology that can yield improved results for linguistic research and to establish metrics for the evaluation of visual analytics approaches by conducting linguistically motivated case studies on historical data.

Research Questions

What visual variables and representations are most effective for which problems?

Which metrics for evaluation can be established?

What visual variables and representations are most effective for which problems?

Can visual analytic methods yield improved results within linguistic research?

Can we find linguistic patterns/insights we could not have found without visual analytics?

Can we find patterns/insights more quickly with visual analytics than without?

Fig. 1: Glyph visualization of dative subjects and semantic verb classes in Icelandic.


  1. D. Hägele et al., “Uncertainty Visualization: Fundamentals and Recent Developments,” it - Information Technology, vol. 64, no. 4–5, Art. no. 4–5, 2022, doi: 10.1515/itit-2022-0033.
  2. H. Booth and C. Beck, “Verb-second and Verb-first in the History of Icelandic,” Journal of Historical Syntax, vol. 5, no. 27, Art. no. 27, 2021, doi: 10.18148/hs/2021.v5i28.112.
  3. R. Sevastjanova, A.-L. Kalouli, C. Beck, H. Schäfer, and M. El-Assady, “Explaining Contextualization in Language Models using Visual Analytics,” in Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Online, Aug. 2021, pp. 464--476. doi: 10.18653/v1/2021.acl-long.39.
  4. C. Schätzle and M. Butt, “Visual Analytics for Historical Linguistics: Opportunities and Challenges,” Journal of Data Mining and Digital Humanities, 2020, doi: 10.46298/jdmdh.6707.
  5. C. Beck, H. Booth, M. El-Assady, and M. Butt, “Representation Problems in Linguistic Annotations: Ambiguity, Variation, Uncertainty, Error and Bias,” in Proceedings of the 14th Linguistic Annotation Workshop, Barcelona, Spain, Dec. 2020, pp. 60--73. [Online]. Available: https://www.aclweb.org/anthology/2020.law-1.6
  6. C. Beck, “DiaSense at SemEval-2020 Task 1: Modeling Sense Change via Pre-trained BERT Embeddings,” in Proceedings of the Fourteenth Workshop on Semantic Evaluation, Barcelona (online), Dec. 2020, pp. 50--58. [Online]. Available: https://www.aclweb.org/anthology/2020.semeval-1.4
  7. C. Schätzle and H. Booth, “DiaHClust: an Iterative Hierarchical Clustering Approach for Identifying Stages in Language Change,” in Proceedings of the International Workshop on Computational Approaches to Historical Language Change, 2019, pp. 126–135. doi: 10.18653/v1/W19-4716.
  8. C. Schätzle, F. L. Denning, M. Blumenschein, D. A. Keim, and M. Butt, “Visualizing Linguistic Change as Dimension Interactions,” in Proceedings of the International Workshop on Computational Approaches to Historical Language Change, 2019, pp. 272–278. doi: 10.18653/v1/W19-4734.
  9. H. Booth and C. Schätzle, “The Syntactic Encoding of Information Structure in the History of Icelandic,” in Proceedings of the LFG’19 Conference, 2019, pp. 69–89. [Online]. Available: http://web.stanford.edu/group/cslipublications/cslipublications/LFG/LFG-2019/lfg2019-booth-schaetzle.pdf
  10. C. Schätzle, “Dative Subjects: Historical Change Visualized,” PhD diss., Universität Konstanz, Konstanz, 2018. [Online]. Available: http://nbn-resolving.de/urn:nbn:de:bsz:352-2-1d917i4avuz1a2
  11. A. Hautli-Janisz, C. Rohrdantz, C. Schätzle, A. Stoffel, M. Butt, and D. A. Keim, “Visual Analytics in Diachronic Linguistic Investigations,” Linguistic Visualizations, 2018.
  12. H. Booth, C. Schätzle, K. Börjars, and M. Butt, “Dative Subjects and the Rise of Positional Licensing in Icelandic,” in Proceedings of the LFG’17 Conference, 2017, pp. 104–124. [Online]. Available: http://web.stanford.edu/group/cslipublications/cslipublications/LFG/LFG-2017/lfg2017-bsbb.pdf
  13. C. Schätzle, M. Hund, F. L. Dennig, M. Butt, and D. A. Keim, “HistoBankVis: Detecting Language Change via Data Visualization,” in Proceedings of the NoDaLiDa 2017 Workshop Processing Historical Language, 2017, pp. 32–39. [Online]. Available: https://www.aclweb.org/anthology/W17-0507
  14. C. Schätzle, “Genitiv als Stilmittel in der Novelle,” Scalable Reading. Zeitschrift für Literaturwissenschaft und Linguistik (LiLi), vol. 47, pp. 125–140, 2017, doi: 10.1007/s41244-017-0043-9.
  15. C. Schätzle and D. Sacha, “Visualizing Language Change: Dative Subjects in Icelandic,” in Proceedings of the LREC 2016 Workshop VisLRII: Visualization as Added Value in the Development, Use and Evaluation of Language Resources, 2016, pp. 8–15. [Online]. Available: http://www.lrec-conf.org/proceedings/lrec2016/workshops/LREC2016Workshop-VisLR%20II_Proceedings.pdf
  16. C. Schulz et al., “Generative Data Models for Validation and Evaluation of Visualization Techniques,” in Proceedings of the Workshop on Beyond Time and Errors: Novel Evaluation Methods for Visualization (BELIV), 2016, pp. 112–124. doi: 10.1145/2993901.2993907.