Nat'l Library of Israel.
(photo credit: Nat'l Library of Israel)
Preserving scholarship, digitizing Hebrew text and dramatically increasing
access to archived scholarly materials written in Hebrew. Those are all likely
outcomes following the National Library of Israel (NLI) and University of Haifa
Library’s (UHL) recent entry into a contract with the popular online library for
academic journals, JSTOR (short for Journal Storage).
For JSTOR, however,
the project signals an additional opportunity. Establishing a process for
displaying content written in languages that use non- Latin character sets will
facilitate JSTOR’s mission to disseminate high-quality scholarship produced
worldwide. The organization has enlisted a digitization service provider, Apex
CoVantage, which has assembled an international team tasked with developing
digitization software that will meet JSTOR’s standards and support its
collaboration with the Israeli libraries (dubbed the Hebrew Journals
Digitization is the process by which print documents are
converted to digital page images that can then support optical character
recognition (OCR) resulting in full text files, enabling search engines to sift
through and register the document’s core contents.
Since text formats are
ideal for researchers seeking keyword-specific articles, libraries around the
world recognize the need for smart technologies that will facilitate fast and
precise digitization of content in spite of language barriers. Israel is leading
the way in this technological realm.
The Hebrew Journals Project is
funded primarily by the planning and budgeting committee of the Council for
Higher Education in Israel. The cost is estimated at $2.2 million.
and UHL first contacted JSTOR in 2008.
“We sought an experienced
international partner that would provide a sustainable future for the Hebrew
journals project and a wide distribution network to make the journals available
for millions of users,” says Oren Weinberg, director of the NLI.
JSTOR was founded in 1995, the organization has added more than 1,600 journals
and over one million images, letters and primary sources from nearly 900
publishers and other institutions.
The shared digital library has helped
academic institutions lower storage costs and improve access to scholarly
resources. In 2009, JSTOR merged with and became a service of ITHAKA, a
non-profit that shares JSTOR’s original mission. Today, more than 7,600
institutions – including academic and public libraries, secondary schools and
other groups based in 166 countries – participate in JSTOR.
alone,” says Sarah Glasser, associate director of marketing communications at
ITHAKA, “JSTOR counted over 560 million significant accesses to content listed
on the platform.”
Beginning in 2008, librarians from both Israeli
institutions indicated four Hebrew journals to be digitized for the pilot
project stage. That fall, representatives from NLI and UHL visited JSTOR’s
offices in New York and Ann Arbor, Michigan, to prepare for the
“One of the core things that we have accomplished is working
together to define a set of digitization guidelines that are in line with
JSTOR’s existing specifications, only specified further for Hebrew and for this
project,” says John Kiplinger, director of production at JSTOR.
2009, the pilot was under way. To satisfy JSTOR’s specifications, and to develop
a functional end product, it was necessary for JSTOR affiliates to be in regular
contact with technical staff at UHL and NLI. Via conference call, technicians
began considering and identifying the technical challenges posed by working with
the Hebrew calendar as well as nuances in the Hebrew language that would
complicate digitization. A librarian from UHL was sent to Ann Arbor in advance
of the pilot to learn about JSTOR’s processes and to provide expertise on
working with Hebrew.
Following this stage of development, JSTOR sought a
vendor capable of processing Hebrew-language content and matching their
professional digitization standards. Having worked with JSTOR in the past, Apex
CoVantage is well versed in JSTOR’s procedures and therefore was the logical
choice. The digitization service is now utilizing its staff and offices in
Hyderabad, India, to adapt software – originally developed for the digitization
of texts using Latin characters – to read Hebrew. Additionally, Apex has sent
representatives to Israel to assess the pilot documents and to recruit a team of
Israeli consultants who will oversee nearly half of the production.
fall, Apex will produce several thousand pages of digitized content. In the
interim, the principal objective is to choreograph communications and isolate
recurring errors in the digitization process.
JSTOR will conduct a final
analysis of all digitized materials to ensure that each data batch meets its
“The Hebrew Journals Project is not going to change
the publishing industry in Israel,” Weinberg acknowledges. However, he is
encouraged by the project’s potential for improving educational systems around
the world. When asked how adaptable the new technology and digitization process
will be when applied to another non-Latin based language, Kiplinger is
“We hope that the experience afforded to us through this
project will make working with other character systems easier and more scalable
for JSTOR,” he says. “Each character system will present new challenges that
will have to be identified, analyzed and addressed. Consequently, we wouldn’t be
ready to jump from Hebrew directly into Chinese or Arabic, but we also wouldn’t
have to start from scratch.”