xpmethod | Group for experimental methods in the humanities

Health Language Lab

project

Elena Fratto
Rishi Goyal
Arden Hegele
Emily Madison
Dennis Yi Tenen

Vaccine hesitancy is not simply a matter of ignorance. Communities around the country are reluctant to vaccinate for all sorts of reasons: personal, religious, political, medical. By studying the language of vaccine-related conversations online–using computational analysis–our team of data and language researchers are revealing the deep seated causes of vaccine hesitancy, with the hope of improving vaccine messaging and ultimately increasing uptake.

Language matters. When declaring a “war on drugs,” for example, one should not be surprised if the treatment of addiction becomes militarized, involving further the use of excessive force in the policing of non-violent offences. Similarly, the framing of vaccine hesitancy in terms of ignorance further implies an uneducated public, alienating those who have real concerns about vaccination: in its compliance with Halal... read more →

Possessed by Property

research paper

Beth Cortese
Julie Hastrup-Markussen
Ross Deans Kristensen-McLachlan
Jakob Ladegaard
Dennis Yi Tenen

And now, with regard to the worldly matters which I shall die possessed of, as well as to those which of right appertain to me, either by the will of my said grandfather, or otherwise; thus do I dispose of them. – Samuel Richardson: Clarissa, Vol. 9, letter XXXIII.

The reading of the grandfather’s will in Samuel Richardson’s Clarissa sets in motion a crisis of ownership in which Clarissa’s “father’s living will” seeks to control her “grandfather’s dead one” (Vol. 1, letter XLIV). Clarissa becomes estranged from her family when she inherits an estate from her grandfather, who thereby bypasses her father and uncles as well as her siblings. The novel ends with Clarissa’s will, in which her property, guilt, and moral justice are distributed after her death. Bracketed by these two last wills, the novel can be read as a sustained reflection on the relationship between possession of property, inheritance, and agency.

In a joint effort between Columbia English Department’s Literary Modeling and Visualization Lab and the Unearned Wealth research project at the Department of Comparative Literature, Aarhus University, Denmark,... read more →

Distributed Agency in the Novel

research paper

Dennis Yi Tenen

In this paper I discuss the question of institutional agency more narrowly, on the basis of a literary genre principally concerned with trans-human, organizational actors. The readings will occasion a model of agency more broadly, which besides its exploratory, theoretical potential will find its application in a method for extracting literary characters. State-of-the-art methods for detecting literary characters often rely on features such as named entities (i.e. Heathcliff), gender attributes, and evidence of direct speech or sentience.¹ The house in Bleak House (1952–1853) by Charles Dickens, the wheat and the Railroad Commission in The Octopus (1901) by Frank Norris, and the airport in Arthur Hailey’s Airport (1968) are not characters by these measures. Yet we intuit them to act vitally and to exert an almost hypnotic influence on the action of the novel: “a strange beast that pertains to no one in particular and who... read more →

Computational Archaeology of Fictional Space

research paper

Dennis Yi Tenen

Space is a hard thing to pin down. It identifies dimensional continuity and a topography, that is, a relationship between objects. It is also itself an object: a limit-defining quantity even in its most abstract sense. “O God, I could be bounded in a nutshell and count myself a king of infinite space, were it that I have bad dreams,” Hamlet says of his ambition and his dreams. A human palm can be a part of the body or a map. A mirror is a piece of furniture and a frame for reflection. Under extreme magnification, the head of a pin appears a vast and mountainous terrain, home to angels and bacterial detritus. The characterization of diegetic—let us call it also virtual and fictional—space presents further difficulties. A stretch of land in fiction measures also a stretch of the imagination. These units do not always have names or explicit boundaries. Vladimir and Estragon wait for Godot: “A country road. A tree.” Two vectors are enough to situate the world. A road gives us the X and a tree the Y axis: an infinity in a... read more →

An Inquiry into the Creative Limits of Artificial Intelligence

book

Dennis Yi Tenen

Literary theory can no more ignore the output of artificially intelligent agents than the study of labor can ignore the advent of robotics.

An Inquiry Concerning the Creative Limits of Artificial Intelligence is a book about the automation of labor in the literary sphere, as told through the story of writers’ aids past and present—narrative plotters, spell checkers, and language generators—the “shameful little secrets” of mass literary production.

In subsequent chapters I bring to view a number of algorithmic artifacts vital to the advance of artificial intelligence, on a spectrum from quasi-autonomous heuristics for combinatorial composition—style guides and “canned” literary formulae—to fully-autonomous bots—of the sort used to manufacture junk mail and disinformation campaigns on social media.

read more →

Text Divider: Quick Markup for Chapter and Dialogue Splitting

python script

Moacir P. de Sá Pereira

This python script breaks up a text into its internal sections. It uses a light markup scheme to signal where chapters and sections begin, and it also can keep track of dialogue by speaker. Given an electronic version of The Great Gatsby, for example, after the markup, it is possible to extract only Tom Buchanan’s lines.

The markup that breaks out the sections and dialogue was created by David Hoover, though the entirety of Prof. Hoover’s markup scheme has not been implemented here.

Read the README at... read more →

Epigraphing the 19th Century

experiment

Aaron Plasek

Frequently ignored and occasionally made up, the epigraph is a textual genre defined both by its physical placement on the page and by the absence of the textual object being signposted. An epigraph attribution situates the text it prefaces within a larger constellation of texts and authors, and in this manner has an indexical function rather similar to scanning the spines of books on a shelf, flipping through a card catalog, or examining a record in a digital relational database. The affordances of citation networks cannot replace other critical methods, but a comparative approach to the different kinds of citation practices made visible by different networks of attribution provides an opportunity to reconsider how shared concepts that constitute a (disciplinary) field are produced,... read more →

Semantic Analysis of One Million #GamerGate Tweets

semantic analysis

Phillip R. Polefrone

This paper develops a methodology for describing the contents of a controversy on a microblogging platform (Twitter) by measuring correlations in broad semantic categories. Over one million tweets were gathered daily from November 2015 to June 2016 using Tweepy and the Twitter API, over 280,000 of which were not retweets and thus contained unique data. Using a Python implementation of Roget’s hierarchy of semantic categories, these tweets were collected in bins of one thousand and analyzed using a “bag of categories” model, or a categorized bag of words. The linear correlation of each category with the “WOMAN” category was measured and compared with a control group. The categories concomitant with “WOMAN” in the test corpus include some noise, but as a whole they present a meaningful description of the conversation that adheres to its known qualities. This result suggests that a more developed version of this methodology could be used... read more →

Shape of Time

visualization

Sierra Eckert
Allison Chaney

In a novel, time does not often move evenly or linearly–––a single paragraph in García Márquez’s One Hundred Years of Solitude jumps several decades while in Proust’s In Search of Lost Time, 15 pages are devoted to a single moment of eating a madeleine. In this project, we are interested in the kind of language that used to talk about time and what is the shape and tempo of this language in a given text. Tracking what we call the “time signature” of a text, we use explicit references to time passing in order to divide up a text, and then use... read more →

Science Surveyor

network analysis

William Leif Hamilton (Stanford)
Raine Hoover (Stanford)
Marguerite Y. Holloway
Dan Jurafsky (Stanford)
David Jurgens (Stanford)
Laura Kurgan (Center for Spatial Research)
Minkyoung Kim (Stanford)
Eli Bennett Levin
Dan McFarland (Stanford)
Vinodkumar Prabhakaran (Stanford)
Phillip R. Polefrone
Juan Francisco Saldarriaga (Center for Spatial Research)
Dennis Yi Tenen

One of the biggest challenges facing science journalists is the ability to quickly contextualize journal articles they are reporting on deadline. A science reporter must rapidly get a sense of what has come before a new paper in the field, understand whether the paper represents a significant advance or not, and establish whether this finding is an outlier or part of the field’s consensus. Doing all that within a matter of hours or a few days is often impossible. The consequences of these limitations are serious and well documented. Science journalists are often overly dependent on expert sources, which encourages investigative complacency; they become vulnerable to presenting false balance and to covering articles that will be retracted; they sensationalize. As a consequence, the public often receives a mistaken view of science. Many people see science as a series of great new “discoveries” accompanied by a lot of hype; few understand... read more →

Character Networks for Narrative Generation

article

Graham Alexander Sack

This paper models narrative as a complex adaptive system in which the temporal sequence of events constituting a story emerges out of cascading local interactions between nodes in a social network. The approach is not intended as a general theory of narrative, but rather as a particular generative mechanism relevant to several academic communities: (1) literary critics and narrative theorists interested in new models for narrative analysis, (2) artificial intelligence researchers and video game designers interested in new mechanisms for narrative generation, and (3) complex systems theorists interested in novel applications of agent-based modeling and network theory. The paper is divided into two parts. The first part offers examples of research by literary critics on the relationship between social networks of fictional characters and the structure of long- form narratives, particularly novels. The second part provides an example of schematic story generation based on a simulation of the structural balance network model. I will argue that if literary critics can better understand sophisticated narratives by extracting networks from them, then narrative intelligence researchers can benefit by inverting the process, that is, by generating narratives from... read more →

Visualizing Joyce

graphic

Emily Fuhrman

In reference to schemas for Ulysses, Joyce describes the compositional technique behind the “Sirens” episode as a “fugue with all musical notations,”¹ and as including the “eight regular parts of a fuga per canonem.”² Joyce uses the first 63 lines of the chapter to introduce 99 words and syllables that reappear in different forms throughout the rest of the text. The sounds ultimately act as leitmotifs, evoking the sensory presence of different characters at different times.

This visualization is constructed as a line-by-line annotation of each sound that recurs at least four times following its... read more →

Roget Tools

toolkit

Phillip R. Polefrone

Following Klingenstein, Hitchcock, and DeDeo (2014)’s work on the “Old Bailey” records,¹ Roget Tools is a Python class for tracking broad semantic categories through bodies of text using the top-down hierarchical structure of Peter Mark Roget’s Thesaurus.² This hierarchy is a comprehensive and unbroken network encompassing all of Roget’s original thesaurus categories, and importing it into a Python-readable format achieves two goals. First, it enables the body of research on Roget’s thesaurus to incorporated into automated text analysis, thus providing a basis for stable interpretation of quantitative results. Second, it... read more →

Plain Text

book

Dennis Yi Tenen

▁▁▁▂▄▄▅▅▆▅▆▇▇█ 93378 words on 2016-03-10

While I write these introductory remarks, a ceiling-mounted smoke detector in my kitchen emits a loud noise every three minutes or so. A pleasant female voice announces also “low battery.” This is, I learn, a precaution stipulated by US National Fire Alarm Code 72-108 11.6.6 (2013). The clause requiring a “distinct audible signal before the battery is incapable of operating” is encoded into the device. The smoke detector literally embodies that piece of legislation in its circuitry. We thus obtain a condition where two meanings of code—as governance and machine instruction—coincide. Code equals code.

I am at home, but I also receive a notification of the alarm on my mobile phone. Along with monitoring apps that help make my home “smarter,” the phone contains most... read more →

LITclock

twitter-bot

Dennis Yi Tenen
Susana Zialcita

This project was inspired by Christian Marclay’s The Clock¹ Each minute, the LITclock Twitter handle will tweet one minute in time from a novel or narrative non-fiction book. (Occasionally, a travel guide chimes in.) Each tweet will be a quote from a book, describing what is happening in that very minute.

For example, the LIT CLOCK started with a quote from Christopher Marlowe, at precisely 12:00 am on 3/13/14:

The clock striketh twelve O it strikes, it strikes! Now body, turn to air

and then thirteen hours and nine minutes later, the LIT CLOCK told us that Miriam Wu from Stieg Larsson’s Millennium Trilogy is being interviewed:

“The time is 1:09 pm.” She turned off tape recorder.

Our goal was to create, as Zadie Smith said of Marclay’s clock, “thousands of fictional interpretations of time repurposed to... read more →