Israeli team invents automated method to summarize texts

With the right business partner, the patented technology solution is expected to be on the market soon.

Prof. Mark Last (photo credit: DANI MACHLIS / BGU)
Prof. Mark Last
(photo credit: DANI MACHLIS / BGU)
The world is on information overload. Researchers at Ben-Gurion University will help people break through the noise with an automated, language-independent method for summarizing texts.
According to Prof. Mark Last, who helped invent the novel technology, “People are trying to swallow so much information... this saves time for the readers. They don’t have to spend too much time reading a long article; they can get right to the point.”
Last worked with Dr. Marina Litvak and Dr. Menachem Friedman.
Specifically, BGN Technologies - the technology transfer company of BGU - introduced a novel, automated and language-independent tool for summarizing texts that is applicable for extraction from articles, magazines and databases, both within media itself and by users of such media. Users could be professional organizations, such as libraries or academic search engines, or individuals. 
The text summaries are based on a genetic algorithm that, according to a release, ranks document sentences using statistical sentence features. 
In phase I, Last told The Jerusalem Post, “we calculate a set of statistical metrics for each sentence in the document.” Then, the algorithm assigns each metric a weight. Based on that weight, the algorithm calculates the importance of each sentence in the document. The sentences are reordered and users receives a summary from which they can decide whether they are interested in the piece.
And the whole process is automated.
“Some news websites, such as The New York Times, present a short summary or highlights for each of their news articles,” Last said. “To the best of my knowledge, nowadays, those summaries are done manually. Our method can do it automatically.”
Even more unique is that the system works on at least nine languages: English, Hebrew, Arabic, Persian, Russian, Chinese, German, French and Spanish. Its summarization quality has already been evaluated on four of those languages – English, Hebrew, Arabic and Persian – showing a high level of similarity to human-generated summaries. 
The first report on the method was published by the team in 2010. Since then, it has continued to perfect the system. Today, said Last, “people are quite happy with the results.”
Zafrir Levy, senior VP for business development at BGN Technologies, said that after filing a patent to protect the invention, it is now seeking an industry partner for the further development of the technology.
“We are currently looking for potential partners for further development and commercialization of this promising invention,” Levy said.