Israeli company offers customers glimpse into the lives of their ancestors

MyHeritage, which currently specialises in DNA testing and genealogy, is set to offer information on the day to day lives of people's ancestors in the US.

Page of 1936 City Directory from Washington, DC featuring Justice Louis Brandeis. (photo credit: COURTESY OF MYHERITAGE.COM)
Page of 1936 City Directory from Washington, DC featuring Justice Louis Brandeis.
A new project by Israeli company MyHeritage is going to offer millions of people the opportunity to find out more about the lives of their ancestors all over the United States.
For instance, between 1918 and 1920. Reuben Rev Kaplan was a tenant at 1509 Hamilton in Houston, Texas, and worked as a “kosher rabbi” at the Adath Yeshurun Congregation.
In 1936, 20 years after making history for becoming the first Jew on the US Supreme Court, Louis Brandeis lived in a place he owned in Washington, DC. “Louis Dembitz Brandeis (Alice), Associate Justice at Supreme Court of the US, h2205 California nw, Apt. 506,” reads the entry about him in the city directory.
These are just two examples of the treasure trove of new information that MyHeritage has recently made available through its portal. It collected more than 25,000 public US city directories published between 1860 and 1960. They cover fundamental periods in American and world history, such as the Civil War, the Great Depression and both world wars.
“In the past, people had to personally go to libraries and archives to look for information about their family history,” Tal Erlichman, director of product management at MyHeritage, told The Jerusalem Post. “Today, with MyHeritage, users can search thousands of archives from their home. It’s a revolution.”
The company specializes in genealogy and DNA testing. It offers its subscribers a historical database of 11.9 billion records.  
“This means that we have almost 12 billion names mentioned in historical documents,” Erlichman said. “We are very global. We are translated in 42 languages, and we offer records from many countries around the world and especially in Europe, even though the majority of information we feature comes from the US, like in the case of city directories.”
City directories are public lists of names of people living in the city, published since the 18th century. They essentially had the purpose of allowing others to find or contact individuals and businesses, similar to modern White and Yellow Pages. With the exception of the most recent ones, they did not have telephone numbers.
The information they featured included the names of those considered the head of the households and their address, often with additional details such as their professions, names of their wives and whether they rented or owned their house.
MyHeritage gathered directories from some 8,000 US cities, including the 100 largest. For some of them, including Boston, Cleveland, San Francisco, Los Angeles and Washington, they found records for multiple years, allowing the comparison of information about the same people from different years.
“Usually, similar projects involve manual transcription of the records, which means human workforce going through every document and transcribing every name and relevant piece of information,” Erlichman said. “However, such a vast amount of records would have taken years and years and cost millions of dollars.
“We wanted to find a way to achieve the goal much sooner and with good quality. Therefore, we worked on developing machine-learning technology that could look at the documents and infer a structured index of information. This is what makes our initiative very unique.”
A main challenge was that the software needed to understand where each record began and ended and deal with differences in the layout of pages, fonts, structures of the record and information included in each directory.
MyHeritage corrected errors in the Optical Character Recognition of the scanned directory pages and then employed several advanced technologies, including Record Extraction, Name Entity Recognition and Conditional Random Fields, to analyze the data.
The second step was to aggregate all the information about the same person that was found in different directories. For example, someone living at the same address for multiple years would be included in several directories, while their profession or marital status could change, offering additional details about his life.
“This is the first time that any genealogy company tried to achieve this goal,” Erlichman said. “We managed to look at all those books and tried to extract as much information as possible about every individual living at a specific address.”
Teams working in the US, Israel and Ukraine worked on the project for about two years.
MyHeritage said it is in the process of indexing thousands of additional US city directories that will be added to the collection in the coming months. They will include directories dating back to the late 18th century and large and unique ones from the late 20th century.