Digitally inclined

What should we be watching out for in a world where private companies control most available knowledge?

The main reading room of The New York Public Library (photo credit: REUTERS)
The main reading room of The New York Public Library
(photo credit: REUTERS)
While you are reading this article – in print or online – all around the world a mass digitization project is taking place. Millions of texts are being converted to digital form alongside images, audio and video, all for the benefit of the public and the convenience of academic researchers.
The British Library’s digital collection includes some 68,000 volumes from the 19th century alone. The Bavarian State Library in Munich has over one million digitized volumes. The Lazarus Project (USA) uses state-of-the-art imaging technologies to collect and preserve damaged and illegible texts. The World Digital Library lets users search almost 15,000 items dating from 8000 BCE to today.
The online availability of texts is growing at an astounding pace. But is it really a revolution or just another step in the long way we’ve come since the dawn of writing? Whenever major new technologies come into our lives, they tend to change us in ways we never expected.
The steam engine, originally designed to help draw water from flooded mines, eventually found its way to trains, turning the world ever smaller and more accessible (for the rich, at least).
Elsewhere, newly invented railways connected areas across the east-west axis, forcing cities to coordinate timekeeping and effectively creating the notion of time zones.
Private computers, smartphones and the invention of the Internet are similarly changing us. Sometimes it’s done in more obvious ways – for instance, in how we keep in touch with one another. But there are other, more surprising aspects, too: the Internetconnected smartphone has forever changed the way we show off. Earlier generations, when faced with the inevitable dinner fact-based argument, would have to retire to the study to find an answer – or else give up on the question altogether, with all sides remaining in effect victorious. But the 21st century changed that habit drastically. Now we’re always only a couple of clicks away from proving we’re right – I told you Marie Antoinette never really said that bit about the cakes.
At this point in time, most information is being created digitally to begin with, which makes it easier to store and reach as needed, but as for everything that was created earlier – a couple of thousand years’ worth of knowledge – that’s where book digitization comes in.
“Digitization is one of many forms we adopted over history to preserve information,” explains Dr. Elad Segev of Tel Aviv University’s department of communications. “As information can become useful knowledge and then power, the need to preserve, organize and search information has always been an important issue.”
But digitization in particular does create unprecedented abilities.
“Digitization is somewhat different, as it enables us to store a vast amount of information in smaller spaces, as well as create identical copies and disseminate them globally,” creating new research interests, encouraging technological advancement, and, yes, changing the way people think, everywhere.
MANY UNIVERSITIES, libraries and other public organizations, as previously mentioned, have taken it on themselves to contribute to the growing international body of digital texts. But valuable as these projects are, they’re but a drop in the bucket: Like in most other areas of modern life, Google rules the field unchallenged. Google Books was launched in 2004 and quickly gathered partners from universities and libraries the world over. In 2007 Marissa Mayer, then vice-president at Google, told The New Yorker the project intended to scan every book ever published. The estimation at the time was that there exist about 32 million books, and Mayer said, “We think we can do it all inside of 10 years.”
A dramatic decade later, Google Books does hold over 30 million volumes.
Unfortunately, its own estimation of how many books exist has grown much larger: in 2010 its algorithm estimated that the number of books ever published is not 30 million but 130 million. Meanwhile, the rate of scanning has slowed for several reasons and in particular because of copyright issues.
Still, an almost unimaginable wealth of books exists out there for private use, as do some entertaining side effects, such as the Ngram Viewer, which lets users search a phrase in books from the last 500 years and discover its popularity over time. Incidentally, it turns out that the word “digital,” which understandably became huge after 1950, had an additional small peak around 1515.
The Ngram Viewer, harmless as it may seem, does hit on a key insight regarding digital information and the way it breaks down the unit we know as a book. “Once a book is digitized and accessed through digital means (such as computers, tablets or smartphones), its format and value are completely changed,” says Segev.
“We might not treat a book anymore as one complete unit but, rather, as a continuum of information from which we extract only the segments that are relevant to us.”
As it turns out, what’s relevant to us is not always the book itself; sometimes it’s the book (or the map, or the song, or the committee protocol) as a way to understand our own past.
Land of milk and data
One of the biggest digitization projects in the country belongs, unsurprisingly, to the National Library of Israel. According to Yaron Deutscher, head of the NLI’s digital access division, the library’s digital collections contain several millions of items: about a million photographs, a hundred thousand manuscripts, tens of thousands of books, and more.
This is indeed one potent way to understand ourselves via those relevant segments from the continuum of information: maps, audio files, posters, old bus tickets and more keep coming in.
“A lot will be done over the next few years,” promises Deutscher. “These huge archives are just a drop in the ocean.” He gives books as an example: The NLI holds every book ever published in Hebrew – some 400,000 of them. The digital library, however, has only 1-2% of those.
The Israeli Internet, too, is being archived, as daunting a task as that may sound. Deutscher explains that every website that ends with “.co.il” is saved, as is, at least once a year; news portals can be saved as often as five or six times a day, allowing visitors to view the homepage and click through to articles as well.
“Most of our users are probably researchers, teachers and students,” he admits. “But we’re widening those circles all the time. We’re trying to draw in the public in two ways. The first is technological – making it easy and convenient to search and browse the archives. The second is making the content itself more accessible and interesting to the viewers. For instance, if we’ve got a letter sent by Alfred Dreyfus to his wife, we’re not going to post it just like that; we’ll add a text about when and why it was written, what’s interesting about it, why it matters. There’s plenty of demand for that. People want to hear interesting stories about our culture and history.”
Another major digital collection is the one belonging to the University of Haifa’s Younes and Soraya Nazarian Library. The library collects and showcases digital versions of students’ and researchers’ academic papers for easier access, as well as a rare book collection, historic photographs of Israel, and more. There’s even a unique theater archive, documenting performances and text from the Haifa Theater, the Tel Aviv Cameri Theater and even the Acre Festival, just in case you were wondering who played Henry Higgins in the Haifa Theater’s 1967 production of My Fair Lady. (It was Oded Teomi. Note that for your next dinner argument.)
He who holds the knowledge
Within this ongoing sea change, and considering the endless amount of material that needs uploading, another key question we should be asking ourselves is: Who gets to decide what’s digitized and what isn’t? Segev quotes Harold Innis and Marshall McLuhan, “who reminded us that the resources that we use to preserve information and communicate with one another are key to understanding our general perception of the world and the diffusion of power.”
While public institutions such as national libraries and universities have more transparency as far as their objectives are concerned, that is not necessarily the case when privately owned companies step in.
“Global corporations such as Google or Facebook have a very heavy social responsibility. As their goal by definition is maximizing their revenues, the information they provide is increasingly commercialized. So it is not only about digitization per se but about the ownership of digital information and the power distribution among people. When money becomes inseparable from content, our world becomes homogeneous and shallow,” Segev continues.
Luckily, behind major companies and public institutions, using big funds to try to digitize knowledge, there’s an additional sphere of private projects – volunteer-based undertakings that aim to create free databases for all.
MOST FAMOUS of all is Project Gutenberg, which was founded by Michael Hart in 1971, making it the oldest e-text project on the Internet. It played a major role in causing a radical change in the realm of reading, long before Google’s deep pockets came along. According to its own mission statement, the project is “powered by ideas, ideals and idealism... [and] totally by volunteers.” It is estimated that a few thousand volunteers do their part every month, producing ebooks and proofreading texts. The project offers over 53,000 free ebooks, from the King James Bible to Edgar Allen Poe’s “The Raven.”
Almost 30 years after the Project Gutenberg was born, along came its Israeli counterpart, very much inspired by it. Project Ben-Yehuda was founded in 1999 by Asaf Bartov, a former hi-tech industry employee who’s currently working for the Wikimedia Foundation.
Like Gutenberg, Project Ben-Yehuda aims to make available all Hebrew texts, bringing them back to the public’s possession, where they belong.
“Not enough people in Israel understand the concept of public property – meaning texts that are no longer protected by copyrights and therefore belong to us all,” claims Bartov. “Project Ben-Yehuda is a group of volunteers of the general public.
We have set ourselves the goal of executing this right of public property by producing electronic editions which are available for everyone, in Israel and around the world, with no charge and free of commercials.”
Project Ben-Yehuda offers over 10,000 texts by more than 250 writers and involves the work of some 250 active volunteers.
Bartov emphasizes that the project does not filter out anything, not by quality, political orientation, relevance or any other criterion. “We are interested in the entirety of Hebrew literature, in all its styles and genres, and even its less pleasant aspects, such as sexist writing.
All of it will eventually be uploaded to our site, for the public to use.”
The public certainly does seem to use it. According to Bartov, over 1 million pages are viewed every month by readers in Israel and abroad, “even Arab countries,” he stresses. Many of these users do, indeed, come from the academy – students, teachers, researchers – but he is particularly pleased to say that many are of the general public as well.
“We see the digitization of texts as the top priority,” says Bartov. “We think it’s the best thing that can happen to any written, photographed or recorded material. Digitization enables remote access, simultaneous access by many different people, easy searching, processing, comparison between texts, and much more.”
And after all of these rapid changes that we are experiencing within our lifetimes, regardless of who funds them, will knowledge be changed forever? Segev predicts that in the long run, books will become history. “We should remember that books in their current form have existed for a little more than 500 years, since the development of the printing press. Before that, people used parchment and papyrus scrolls for at least 4,000 years.”
But don’t worry too much about the challenge of adjustment; youth, at least, has got it covered. One of Segev’s doctoral candidates, Nathan Stolero, is researching the differences between the perceptions of information of youth and adults. The gap, unsurprisingly, is significant.
“Stolero’s preliminary findings show, on the one hand, that young users have a much broader set of tools when they look for information,” explains Segev.
“They consult with their peers in social networks, use more mobile apps, and do not limit themselves to textual searches but also do image searches or even record their questions and answers. On the other hand, young users often take technological development for granted.
Compared to adults, they hardly criticize or become suspicious of the algorithms that generate information and lead their lives.”
This prediction – the death of the book – and the new generation’s attitude toward text and information are even more interesting when compared to recent reports about the ebook market. After several successful years, when Kindle and its peers were all the rage, a sudden turn occurred.
According to the Association of American Publishers, ebook sales dropped by 21.8% for 2016’s first quarter. Another report in the UK similarly stated that ebook sales for the five biggest publishers fell during 2015 – although only by 2.4%, collectively.
The numbers fail to explain the whole story. This could signal a longing for the good old printed page; it could be due to the emergence of the audiobook trend; perhaps people are finding new ways to illegally obtain their reading materials, as they did with music and movies.
Or maybe we’re all just preparing for literature’s next phrase: bite-sized chunks of searchable, indexed information, like Marie Antoinette’s misquotes and Teomi’s roles in 1960s theater.
“I am not convinced that shorter bits of information would necessarily lead to shallowness or laziness,” reassures Segev.
“Many people invest time and efforts in articulating short but influential messages such as press releases, political campaigns or even status updates. They all compete for a growing number of potential audiences that are able to pay less and less attention. With this growing competition in every possible field, people are required to be more professional and skilled in order to lead.”
Hopefully, even if books do disappear, future generations will still be able to quote the one lesson we’ve learned from science fiction, at least: Technology is neither good nor bad; it’s what humans make of it.