
There are seven types of pronouns that both English and English as a second language writers must recognize: the personal pronoun, the demonstrative pronoun, the interrogative pronoun, the relative pronoun, the indefinite pronoun, the reflexive pronoun, and the intensive pronoun.
CHAPTER 10
Word Sleuthing
In studying words, I have frequently been asked to analyze language to answer questions that I would have never considered. Lawyers, historians, music lovers, political consultants, educators, intelligence agents, and others have occasionally contacted me to see if our language approach could give them a different perspective on a problem they have been thinking about.
This chapter brings together some of the more interesting projects my students and I have been playing with over the years. The topics vary quite a bit. Nevertheless, they showcase different ways words can be analyzed to answer novel questions.
USING WORDS TO IDENTIFY AUTHORS
The phone call I received from the senior partner in a law firm caught me off guard. He was curious if I could analyze an e-mail that had been sent to a member of his firm; let’s call her Ms. Livingston. It was quite sensitive, he confided, and it was important that he talk directly with the person who had sent the e-mail. The only problem was that the e-mail had been sent anonymously from an untraceable e-mail address. After I agreed to look at it, he sent me the following e-mail:
Ms. Livingston:
I think you should know that David Simpson has perpetuated the idea that you have no credibility among your colleagues. He says you altered depositions and falsified expense reports at your last job in New York. He says this is the reason you left so abruptly.
He has spread these stories to people in various departments, including Billing, Personnel, Public Relations and to those at the executive level. It is uncertain how and when our senior partners will deal with this. But if you start getting the cold shoulder, you will know why.
When I first heard of this, I was surprised, but took what he said at face value. Of course, this was before I learned of his voracious appetite for propagating half-truths, gossip, and outright lies, all in the name of somehow making himself look knowledgeable and “better.”
Such a pity. He obviously has talent, but it is all negated by his vile, malicious tongue. All I can think of is a tremendous sense of insecurity. But I digress. I just thought you would like to know. A friend
After receiving the e-mail, Ms. Livingston turned it over to the law firm. She dismissed the rumor as provably false but was concerned that if David Simpson really was spreading false rumors, it could damage her reputation along with that of the firm. I had spent several years developing methods to analyze language and personality but had never been paid to be a word detective.
What kind of person may have written the note? Is “A friend” a male or female and what is his or her approximate age? What is the person’s link to Ms. Livingston, to David Simpson, and to the firm? Any hints as to the person’s personality traits?
In the years since I worked on the case, several new ways of looking at words have been developed. One involves comparing the words “A friend” used with those of tens of thousands of regular bloggers. For example, by looking at just the function and emotion words, we can guess that there is a 71 percent chance that the author is female and a 75 percent chance that she is between the ages of thirty-five and forty-five. It is much harder to get a good read on her personality. One analysis suggests that there is a fairly good chance that the author of the e-mail is high in the trait of narcissism—meaning she may be somewhat conceited and manipulative.
Look more closely at the e-mail and other hints emerge. The person is psychologically connected to the firm (“our senior partners”) and has knowledge of rumors from across several departments within the firm. The person also is working to impress Ms. Livingston by using a large vocabulary. Particularly interesting is the use of words and phrases such as “voracious appetite,” “vile,” and “malicious tongue.” These are Old Testament words that, in other analyses, were primarily used by people between forty-two and forty-four years of age at the time of the project.
One other important clue was the layout and punctuation. The e-mail was professionally typed with paragraphs of equivalent size. There was only one space between the period and the beginning of the next sentence, which suggests the person learned to type after about 1985—when desktop computers became popular—or the person had some background in journalism or publishing before 1985, where the single space after a period was the norm. (My wife, who was in publishing before 1985, explained this to me.)
What happened? When I submitted my report to the senior partner, he was relieved because it precisely matched the person he had suspected—a conscientious women in her early forties with a background in newspapers who had been with the firm for several years. I never learned the final disposition of the case, but I see that Ms. Livingston is now a senior partner with the firm.
WHO WROTE IT?
THE ART OF AUTHOR IDENTIFICATION
Deciphering linguistic clues to solve crimes has a rich tradition in criminology. The FBI, various national security agencies, and local police departments around the world occasionally seek the expertise of linguists to help decode ransom notes or written threats, or to assess who might have written legal or other documents.
One of the best-known early forensic linguists is Donald Foster, a professor of English at Vassar College. Using a mixture of computer and deductive skills, along with a broad knowledge of history and literature, Foster has worked with law enforcement agencies on high-profile cases such as the Unabomber, the 2001 anthrax attacks, and the 1997 JonBenét Ramsey murder case. He has also applied his methods to determine the authenticity of some works by Shakespeare and others. Perhaps his most successful venture was in identifying Joe Klein as the author of an anonymously published satirical novel on the Clinton presidency, Primary Colors.
Foster has been a controversial figure because several of his high-profile claims about authorship have not panned out. He has also been less than forthcoming about the details of his methods of author identification, something that reflects his training in English rather than statistics and science. Nevertheless, Foster’s approach has alerted the literary and forensic worlds to the promise of computer-based methods to identify authors and their work.
FINDING THE TELLS
World-class poker players closely watch and listen to their opponents in attempts to predict the cards they may be holding. Often players will pretend they have a poor set of cards when they have a good set; other times they will bluff by giving the impression they have a winning hand when they don’t. Experts look for telling signs of deception—or “tells.” Some players avoid looking around the table, others tap their feet, yet others talk more loudly. The ability to decipher tells can give card players a large advantage in high-stakes poker games.
There are various types of tells in people’s use of written language as well. Two are particularly good clues in identifying authors: function words and punctuation. This can be seen in looking back at the blogs we collected in 2001 as part of the September 11 project discussed in the last chapter. Recall that we saved about seventy blog entries from each of a thousand people in the two months before and after the 9/11 attacks. Every few years, my students and I revisit LiveJournal.com to see if the same people are still posting. Ten years later, 25 to 30 percent are still active. About 25 percent have erased their accounts. The remainder stopped posting, on average, five years after the attacks, in 2006. Many of the former posters migrated to other systems such as Facebook or Twitter.
Simply reading the last ten years of people’s posts provides an intimate picture of their lives. Not unlike Michael Apted’s Seven Up! documentary series, we have been able to track the unfolding experiences of the bloggers as they grow older. Many of the same issues still drive the authors. Even though some have now married, had children, and started careers, recurring insecurities, motives, and goals keep returning. Those who were happy and upbeat in 2001 tend to be the same optimistic people nine years later. For example, a young father writes in a random blog in 2001 about his favorite hockey team:
lucky lucky chicken
bone. i shall do the
happy-cup-dance. we
shall win. we shall
triumph. and there will
be much rejoicing! i just
need to get cable first. ok.
i wasn’t just gonna post
about hockey, but yv
onne’s ready to go. yeah.
shut up. you try resist-
ing that sweet, sweet
candeh.
And nine years later, you see the same person:
My first attempt at
making salsa was, in my
humble opinion, not
too shabby. protip: don’t
use Roma tomatoes. I’m
not sure why the hell I
thought they’d work out
fine, but I was terribly
wrong. Ok, not terribly
just mildly. ah, salsa
humor. I’m heading back
to the mexi-mart today
to pick up the goods to
try another batch. Maybe
i’ll have it done in time
for the bbq. Who knows?
Since my catharsis, I’ve
been in an amazing
headspace.
Obviously, these two writing samples are from the same person. I mean, anyone could spot it immediately.
Really?
Actually, we can see the similarities once we know that they were written by the same person. But what if we read blogs all day and came across the second one several hours after reading the first? In all likelihood, most people wouldn’t jump up yelling, “Aha! I have read that writing style earlier … yes, from the guy who wrote about hockey.” Could language experts or computers make a definitive match? Are language fingerprints as reliable as DNA or real fingerprints? The short answer is no. However, computerized language analyses do a reasonably good job at matching which writing goes with which person.
Imagine we had a large number of blog entries from twenty bloggers. Several years later, we retrieve a handful of new postings from each of the same twenty bloggers. Now imagine sitting on your living room floor with hundreds of pages of posts trying to match each current blog entry with the original posts of the twenty bloggers. All things being equal, anyone should be able to match 5 percent of the blog posts correctly just due to chance alone. Most people would do terribly on this task. It is unlikely that you would match at rates any better than 10–12 percent. The writing style differences are too subtle and there is just too much information.
Computers are more patient and systematic. If we just analyze function words, the computer correctly matches the recent blog posts with the original authors about 29 percent of the time. This is actually impressive given the time lag between the writing of the posts.
But there is more to author identification than function words. Look at the consistency of punctuation. The following woman, for example, continues to use asterisks in the same way nine years apart. This was part of an early 2001 entry:
Oh.. I have also dis-
covered a shy streak I
didn’t know I had. I guess
you would call it shyness.
Somebody made me
*blush*. Repeatedly. That
is *weird*. I don’t blush.
And in 2010:
We *are* in post-post-
punk now, aren’t we? The
guys in the band made a
joke about how they just
wrote that song yester-
day, and maybe a quarter
of the people in the room
didn’t get why the rest of
us were chuckling. weird.
*shrug*
Others use punctuation in equally unique but more subtle ways. From a twenty-seven-year-old male in 2001:
I mailed memorial gift
checks to Immanuel
[endowment donation in
honor of Joan’s mother];
and St Anne’s – for
my favorite accounting
professor the Smythe
scholarship. Frank &
Rebecca brought over
“Midnight in the Garden
of Good & Evil” and a
couple homebrews. My
eyelids want to close so I
better …
In 2010:
I didn’t quite know what
to say thinking, “hmm,
mud, what is it … when
I found a mirror I didn’t
see any other “brown
stuff” i brought a water-
melon and Costco multi-
grain chips, Had a couple
beers, I took Yuengling B
& T – dinner was boiled
/grilled chicken, okra,
slaw, “dipping” brownies.
This person is the Alvin Ailey of punctuation. He jumps, swirls, swoops, and rolls with the full gamut of punctuational possibilities: [ ; – … & “/. Oddly, when I first read his blog, I didn’t even notice his use of punctuation marks—they just blended into his writing. However, when his blogs were computer analyzed, his use of punctuation stood out.
Punctuation marks can identify some people better than anything they write. In fact, when looking only at punctuation, computer programs identified 31 percent of authors correctly—essentially the same rate as relying on function words. When both function words and punctuation were used together, the computer correctly paired the original bloggers with writing samples several years later 39 percent of the time. Punctuation, function words, and content words that are used in everyday writing are all parts of our personal signature. To appreciate this, go to your own e-mail account and spend a few minutes looking at the e-mails you send to and receive from others. Start with the page layout.
Some people tend to write very long e-mails, whereas others keep them to a sentence or two. People tend to differ in the length of their paragraphs and sentences. Their greetings and closings vary tremendously as well. Some use emoticons; some never do. Some of these differences may be psychologically important but most probably aren’t. The person who ends most e-mails with “Sincerely” may do this just because they were told to do so when they were younger. Even though these variations may not say anything about your conflicts with your mother when you were an infant, they still mark you. That is, they are part of your general writing style that makes you stand out from everyone else. And that is the interesting story. All of the language features we can measure can help to identify you. (loc.3971)
“The Secret Life of Pronouns: What Our Words Say About Us” by James W. Pennebaker

You must be logged in to post a comment.