Pre-Conference Workshops


AACL 2018 Pre-Conference Workshops

We are pleased to host three (free!) AACL 2018 Pre-Conference Workshops on Thursday, September 20, 2018. All workshops will be held at the GSU Student Center, 55 Gilmer St., SE,

Atlanta, GA, 30303. Room assignments and confirmation of your registration will be announced here or through email soon.

Pre-Conference Workshops Overview

Workshop 1: An Introduction to Natural Language Processing Tools: The Curious Case of the Lexicon – Scott Crossley, Georgia State University and Kristopher Kyle, University of Hawai’i, Manoa [1:00 pm – 5:00 pm, with breaks (Room – TBA)]
Workshop 2: Lexical Multi-Dimensional AnalysisTony Berber Sardinha, Catholic University of Sao Paulo, Brazil [10 am – 12 noon (Room – TBA)]
Workshop 3: Complexity in Writing Development: Untangling Two Approaches to Measuring Grammatical ComplexityBethany Gray, Iowa State University; Shelley Staples, University of Arizona; Jesse Egbert, Northern Arizona University [10:00am – 5:00 pm, with a lunch break and afternoon coffee break (on your own)]

  YOU NEED TO REGISTER! All interested participants, please send an email to Dr. Viviana Cortes at <dr.viviana.cortes[at]> to register. We have limited slots and registration will be on a first come, first served basis.


WORKSHOP 1: An Introduction to Natural Language Processing Tools: The Curious Case of the Lexicon


Scott Crossley, Georgia State University, scrossley[at]

Kristopher Kyle, University of Hawai’i, Manoa, kkyle[at]


 1:00 pm – 5:00 pm, with breaks (Room – TBA)

 Technical Requirements for the Workshop:

All participants will need to bring their own laptops with the following capabilities:

–       TAALES 2.8 (

–       MS Excel or similar

–       JAMOVI (

Data Requirements

The workshop organizers will provide participants with access to different set of learner corpora for natural language processing (NLP) analyses. However, it is strongly recommended that participants bring their own learner corpus for analysis (BYOC). Participants that bring their own corpus should ensure that:

  1. The corpus is separated into individual files wherein each file represents output from a single learner at a single time point.
  2. Each file is saved as a simple plain text format (.txt).
  3. Each file has at least 50 words of learner production (the more the better though). For the purposes of this workshop, the learner samples should be spell corrected if possible.
  4. The corpus is large enough to be representative (at minimum 100 samples).
  5. Each file has a dependent variable of interest (i.e., something to predict such as proficiency/grade level, length of study, demographic information, individual differences, test performance)

 Workshop Description

This workshop will focus on introducing participants to the basic notions underlying natural language processing tools and how they can be used in second language research. While the workshop will provide an overview of NLP tools and language features commonly assessed using NLP tools, this workshop will focus on using NLP tools to examine lexical sophistication. No computer science background is needed to join the workshop, but participants should be familiar with data analytics, data formatting, and basic statistical analyses.

The workshop will be divided into the following sections:

  1. Overview of NLP and NLP research in second language settings
  2. Available NLP tools (focusing on those that are open-source)
  3. Common methods and pitfalls in NLP analyses
  4. An introduction to the Tool for the Automatic Analysis of Lexical Sophistication (TAALES)
  5. Data analysis

WORKSHOP 2: Lexical Multi-Dimensional Analysis


Tony Berber Sardinha, Catholic University of Sao Paulo, tonycopuslg[at]


 10am to 12 noon  (Room – TBA)

Workshop Description

Multi-dimensional analysis is an approach to the study of register variation introduced by Douglas Biber in the 1980s (Biber 1988; Berber Sardinha and Veirano Pinto, 2014; in press). Biber designed the MD analysis framework in order to capture the parameters underlying register variation, which are called dimensions. Originally, a dimension is a set of correlated linguistic features, captured through a factor analysis, that is interpreted in functional terms. In such functional MD analyses, the linguistic features in question are typically structural features, such as parts of speech (nouns, adjectives, personal pronouns, etc.), clause types, and stance constructions. In contrast, in a lexical MD analysis, the features are entirely lexical, such as the actual tokens, lemmata, n-grams, or collocations present in the texts. The resulting lexical dimensions can reflect a range of linguistic phenomena enacted by the lexis, such as the prevailing discourses, the cultural representations, and the major topics discussed in the texts, among others. In this workshop, we will cover such topics as how to sample and organize data for a lexical MD analysis, how to conduct factor analyses based on lexical data, and how to interpret lexis-based factors into lexical dimensions.

Technical Requirements for the Workshop

In preparation for the workshop, please have a working copy of SPSS installed on your laptop. A trial version is available for free at Please note that this trial version will work for 14 days only and therefore attendants should time the installation so that it will be working during the workshop. In addition, please download data files and other support materials at


Biber, D. (1988). Variation across speech and writing. Cambridge: Cambridge University Press.

Berber Sardinha, T., & Veirano Pinto, M. (Eds.). (2014). Multi-Dimensional Analysis, 25 years on: A Tribute to Douglas Biber. Amsterdam/Philadelphia, PA: John Benjamins.

Berber Sardinha, T., & Veirano Pinto, M. (Eds.). (in press). Multi-dimensional analysis: Research methods and current issues. London / New York: Bloomsbury / Continuum.

Workshop Description

2018 American Association for Corpus Linguistics (AACL) Conference

Atlanta, GA

WORKSHOP 3: Complexity in Writing Development: Untangling Two Approaches to Measuring Grammatical Complexity


Bethany Gray, Iowa State University, begray[at]

Shelley Staples, University of Arizona, slstaples[at]

Jesse Egbert, Northern Arizona University, jesse.egbert[at]

Workshop Time

10:00am – 5:00 pm, with a lunch break and afternoon coffee break (on your own)

 Technical Requirements for the Workshop

All participants will need to bring their own laptops with the following capabilities:

–       AntConc 3.5.7 or newer

–       MS Excel or similar

–       Text Editor (e.g., Notepad, Wordpad, etc.)

 Workshop Description

 Linguistic complexity is often investigated with respect to language development, based on the assumption that more advanced L1/L2 writers use more complex language and produce more complex texts. But what constitutes complex language? How do we operationalize complexity to measure it in language production? How are approaches to complexity similar and different? How is complexity mediated by proficiency or level, register or genre, and other contextual factors?

This workshop focuses on these questions for one type of complexity – grammatical complexity – as it relates to L1 and L2 writing development. The goal of the workshop is to explore how fundamentally distinct measures approach the same underlying construct, to gain practice analyzing grammatical complexity in written texts, and to see a selection of complexity variables applied in research on writing development. The workshop begins with a brief comparison of two major approaches to grammatical complexity: the holistic (T-Unit) approach and the register/functional approach. Then, the workshop is divided into three parts:

Part 1: Hands-On, Practice-Oriented Session in Coding and Compiling Complexity Variables

Part 2: Research Synthesis on the Development of Complexity in Academic Writing

Part 3: Roundtable Discussion

 Part 1 is a hands-on, practice-based session in which participants will use manual and automatic corpus tools to analyze authentic texts for a range of complexity measures from both the holistic (T-Unit) and register/functional traditions. Participants will code and annotate complexity features and then quantify their coding using automatic procedures with AntConc 3.4.4 (Anthony, 2014).  Issues such as reliability (precision, recall) andinterrater reliability will be addressed.

 Part 2 is a research synthesis of recent work by the workshop organizers and colleagues on the development of grammatical complexity in academic writing, focusing on studies within the register/functional tradition.

 Part 3 provides an opportunity for extended discussion between workshop participants on the issues practiced and discussed in Parts 1 and 2.