The reading list

Research

The research that informs Project Verbatim - articles we have found interesting

Cognitive / Analytical Depth

Semantic similarity analysis using transformer-based sentence embeddings

Pavlyshenko & Stasiuk · Electronics and Information Technologies · 2025

Evaluates transformer-based sentence-embedding models on semantic-similarity tasks, weighing accuracy against processing speed to guide model selection (such as MPNet) for English text analysis.

Psychometric evaluation of lexical diversity indices: Assessing length effects

Fergadiotis, Wright & Green · J. Speech, Language & Hearing Research · 2015

A psychometric comparison of lexical-diversity indices in adult speech that quantifies how sample length biases each measure and supports length-robust indices for valid diversity estimates across speakers.

Coh-Metrix: Capturing linguistic features of cohesion

McNamara, Louwerse, McCarthy & Graesser · Discourse Processes · 2010

Introduces Coh-Metrix, a tool computing hundreds of cohesion and readability indices such as referential overlap, connectives and semantic similarity, and validates them against text comprehension and difficulty.

Automatic analysis of syntactic complexity in second language writing

Lu · Int. J. Corpus Linguistics (L2SCA) · 2010

Presents the L2 Syntactic Complexity Analyzer, which automatically computes fourteen measures of syntactic complexity such as clause length, subordination and coordination, for second-language writing research.

Variation in the contextuality of language: An empirical measure

Heylighen & Dewaele · Foundations of Science · 2002

Proposes the F-score, a part-of-speech formula gauging how formal versus context-dependent a text is, and demonstrates that it cleanly separates spoken from written language across many registers.

Language decline across the life span: Findings from the Nun Study

Kemper, Greiner, Marquis, Prenovost & Mitzner · Psychology and Aging · 2001

A follow-up Nun Study analysis charting how grammatical complexity and idea density change across the lifespan, and confirming that early-life linguistic ability forecasts later cognitive decline, supporting density as a marker of cognitive reserve.

Grammatical structures written at three grade levels

Hunt · NCTE Research Report No. 3 · 1965

Foundational work introducing the T-unit, the shortest grammatically complete unit, as a measure of syntactic maturity, showing that clause length and degree of embedding both rise steadily with grade level.

Agentic Drive

We ask men to win and women not to lose: Closing the gender gap in startup funding

Kanze, Huang, Conley & Higgins · Academy of Management Journal · 2018

At pitch competitions investors asked men promotion questions and women prevention questions; founders mirrored that framing, and promotion-focused answers attracted far more capital, a mechanism behind the gender funding gap.

Identifying locus of control in social media language

Rouhizadeh, Jaidka, Smith, Schwartz, Buffone & Ungar · EMNLP · 2018

Classifies locus of control from annotated Facebook posts against questionnaire scores; control is easy to detect, but labeling it internal versus external is harder, with lexical features outperforming syntactic ones.

Narrative identity

McAdams & McLean · Current Directions in Psychological Science · 2013

A concise overview of narrative identity, the internalized and evolving story people tell about themselves, and how the way they narrate experiences relates to wellbeing, maturity and development over time.

The psychology of life stories

McAdams · Review of General Psychology · 2001

Sets out narrative identity theory: people build evolving life stories whose themes, notably agency (mastery, achievement) and communion, reflect personality and predict psychological adjustment.

Beyond pleasure and pain

Higgins · American Psychologist · 1997

The foundational statement of regulatory focus theory: a promotion focus pursues gains and ideals while a prevention focus avoids losses and seeks security, two motivational systems with distinct strategies and risk profiles.

CAVE: Content Analysis of Verbatim Explanations

Peterson, Schulman, Castellon & Seligman · Cambridge UP (book chapter) · 1992

Describes CAVE, a method to extract and rate the causal attributions people give in any text, enabling explanatory-style (optimism versus pessimism) coding without administering questionnaires.

Communicative Effectiveness

Non-answers during conference calls

Gow, Larcker & Zakolyukina · J. Accounting Research · 2021

Builds a measure flagging managers' non-answers to analyst questions on earnings calls; about eleven percent of questions go unanswered, and higher non-answer rates predict negative market reactions and weaker future performance.

Pronoun use reflects standings in social hierarchies

Kacewicz, Pennebaker, Davis, Jeon & Graesser · J. Language & Social Psychology · 2014

Across several datasets shows higher-status people use fewer first-person-singular pronouns like I and me and more we and you, so pronoun use reliably signals relative standing within a hierarchy.

Detecting deceptive discussions in conference calls

Larcker & Zakolyukina · J. Accounting Research · 2012

Identifies linguistic markers, such as extreme positive emotion, few hesitations and vague impersonal references, that distinguish deceptive from truthful executive statements on earnings calls better than chance.

Can charisma be taught? Tests of two interventions

Antonakis, Fenley & Liechti · Academy of Management Learning & Education · 2011

Two randomized experiments show that training people to deploy charismatic leadership tactics, such as metaphors, stories, three-part lists and contrasts, raises how charismatic and influential observers rate them.

Language style matching predicts relationship initiation and stability

Ireland, Slatcher, Eastwick, Scissors, Finkel & Pennebaker · Psychological Science · 2011

Using speed-dating and instant-messaging data, shows that linguistic style matching between two people predicts mutual attraction, relationship initiation and later stability.

The nature and experience of entrepreneurial passion

Cardon, Wincent, Singh & Drnovsek · Academy of Management Review · 2009

A theory paper defining entrepreneurial passion as intense positive feeling tied to identities such as inventor, founder and developer, and explaining how it energizes goal pursuit, creativity and persistence.

Imageability ratings for 3,000 monosyllabic words

Cortese & Fugett · Behavior Research Methods, Instruments & Computers · 2004

Provides imageability norms, ratings of how readily a word evokes a mental image, for 3,000 monosyllabic English words, a reference resource widely reused in psycholinguistic, reading and memory research.

Linguistic style matching in social interaction

Niederhoffer & Pennebaker · J. Language & Social Psychology · 2002

Documents linguistic style matching, the way conversation partners unconsciously converge in their function-word use, and links the degree of matching to engagement and social coordination.

The role of transportation in the persuasiveness of public narratives

Green & Brock · J. Personality & Social Psychology · 2000

Introduces narrative transportation: the more a reader is absorbed into a story, the more their beliefs and attitudes shift toward it, establishing absorption as a core engine of narrative persuasion.

Cross-cutting / methods

The Glasgow Norms: Ratings of 5,500 words on nine scales

Scott, Keitel, Becirspahic, Yao & Sereno · Behavior Research Methods · 2019

Offers the Glasgow Norms: ratings of 5,500 words on nine psycholinguistic dimensions including imageability, concreteness, valence and arousal, with analysis of how the dimensions interrelate.

Empath: Understanding topic signals in large-scale text

Fast, Chen & Bernstein · CHI · 2016

Introduces Empath, a tool that generates and validates hundreds of lexical categories from deep-learning word embeddings, closely approximating hand-built dictionaries like LIWC at far lower cost.

Variation across Speech and Writing

Biber · Cambridge UP (book) · 1988

A landmark multidimensional analysis of register: tracking co-occurring linguistic features across many texts, it identifies dimensions like involved-versus-informational and shows speech and writing differ systematically.

Empirical foundations

Improving Startup Success with Text Analysis

Gavrilenko et al. · arXiv:2312.06236 · 2023

Expands public-data startup prediction to 171 mostly textual features, including linguistic metrics drawn from tweets, across ten models, forecasting funding with F-scores above 0.73 and beating prior approaches.

Adjacent (calibration)

Superforecasting: The Art and Science of Prediction

Tetlock & Gardner · Crown (book) · 2015

A popular account of Tetlock's forecasting tournaments: the best forecasters are not the smartest but those who think probabilistically, update on evidence and stay actively open-minded, directly relevant to epistemic calibration.

On-topic (not in notes)

A Fused Large Language Model for Predicting Startup Success

Maarouf et al. · arXiv:2409.03668 · 2024

Builds a fused large language model that combines founders' venture-platform self-descriptions with structured features to predict startup success, finding the text adds meaningful signal beyond fundamentals.