Hebrew University researchers unveil a sarcasm detector

Yeah, right? Yeah, right!

“Trees died for this book?”
If you saw that succinct review on Amazon.com, you’d surmise that the critic was being sarcastic. And you’d probably be right.
“Are these iPods designed to die after two years?”
You’d likely spot that as sarcastic, too.
But what about this one: This invention is one of the most brilliant in the world.
Is that statement sarcastic or completely honest?
If you’re not sure, help is at hand. Hebrew University of Jerusalem researchers have devised patent-pending algorithms that can be used on an ordinary computer to detect sarcasm in text with an accuracy of around 77 percent.
Yeah, right, you say? Yeah, right!
Oren Tsur, a doctoral student at HU’s Institute of Computer Science who has been working under the supervision of the institute’s Prof. Ari Rappoport, and fellow doctoral student Dmitry Davidov, will present their work at a Washington, DC, conference on Tuesday.
Over a dozen teams around the world have been trying to detect sarcasm – or verbal irony – in text for years, but only the HU researchers have succeeded, Rappoport told The Jerusalem Post on Monday. The use of this technology might seem esoteric to many, but detecting sarcasm has many commercial applications, including the assessment of reviews on the Internet of products and services such as restaurants, hotels and books, as well as of opinions on a variety of subjects, from political to personal.
As opinions on Internet sites are increasingly used for “opinion mining,” the HU computer researchers suggested that the use of sarcasm could mislead, by giving different or opposite views than the understood meaning.
A patent application for the algorithms was filed by Yissum, HU’s technology transfer arm, and it has aroused much interest worldwide. Yissum is now looking for a commercial partner to develop the unusual product, called RevRank, which is the basis for the sarcasm detector.
Rappoport said the system does not use assessments of people’s voices, but of text alone. Over a decade ago, for example, an Israeli computer program could, with reasonable accuracy, detect falsehoods by analyzing voice fluctuations.
“It would be even more accurate if we could combine both text and voice to determine sarcasm, but the reviews are in text form only,” said Rappoport.
Rappoport has been working on the field of computerized linguistics and assessments of text language for many years, but the sarcasm-identification project began just nine months ago. The bulk of the project was done by Israeli-born Tsur, with the computer work accomplished by Russian-born Davidov.
Computers handle verbal commands in a rather automatic manner, but humans have more convoluted thinking, using symbols or slang, said Rappoport. Thus it has been difficult to bridge the gap between computerized and human linguistics. In sarcasm, the individual states the inverse of what he means, such as meaning “dumbest” when writing: “This invention is one of the most brilliant in the world.” If the statement is not sarcastic, the sentence literally means what it says.
Nicknamed SASI for a Semi-supervised Algorithm for Sarcasm Identification, the sarcasm-detecting technique was initially based on tens of thousands of scanned reviews of books and other products on Amazon.com. For an assessment of sarcasm, the algorithms were also applied to “tweets” posted on Twitter. First, there was a pattern acquisition algorithm, followed by classification algorithm.
The three researchers prepared a long article for the proceedings of the Association for the Advancement of Artificial Intelligence conference in Washington. They noted that “sarcasm is a sophisticated form of speech act widely used by online communities,” and stressed that automatic recognition of sarcasm in online reviews and blog posts was a difficult task.
They noted that SASI had two stages: semi-supervised pattern acquisition and sarcasm classification. The team experimented on a data set of about 66,000 Amazon reviews for various books and products.
“Using a gold standard in which each sentence was tagged by three annotators, we obtained precision of 77% and recall of 83.1% for identifying sarcastic sentences,” the researchers said. “We found some strong features that characterize sarcastic utterances. However, a combination of more subtle pattern-based features proved more promising in identifying the various facets of sarcasm. We also speculate on the motivation for using sarcasm in online communities and social networks.”
The annotators labelled sentences in the reviews for their degree of sarcasm and then looked for patterns that appeared, judging similar statements as sarcastic or not. The computer thus “learned” on a basic smaller set of sentences and applied this discernment to determine whether new ones were sarcastic, Rappoport explained.
Among the review titles or summaries that the team took from their experimental data were: “[I] Love The Cover” (for a book); “Where am I?” (GPS device); “Trees died for this book?” (book); “Be sure to save your purchase receipt” (smart phone); “Are these iPods designed to die after two years?” (music player); “Great for insomniacs” (book); “All the features you want. Too bad they don’t work!” (smart phone); “Great idea, now try again with a real product development team” (e-reader); and “Defective by design” (music player).
Some could be genuine or sarcastic, while others are clearly sarcastic.
Initially, the algorithm identified the essential dominant terms that define the optimal criticism on the Internet. Then it used the program with these definitions to classify and rank other reviews in relation to the highest-quality reviews from a control group.
The technique can enable every Internet user to set down his preferences regarding the length of the review he is interested in and the depth of the content, said Rappoport, who predicted that the algorithm could change the way people assess the information they glean from the Web.
The team worked in English, as well as German and Chinese. Detecting sarcasm in Arabic is more complicated, said Rappoport, and Hebrew even more so.