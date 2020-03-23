A new project by Israeli company MyHeritage is going to offer millions of people the opportunity to find out more about the life of their ancestors all over the United States.For instance, between 1918 and 1920 one Reuben Rev Kaplan was a tenant at 1509 Hamilton in Houston, Texas and he worked as a “kosher rabbi” at the Adath Yeshurum Congregation. In 1936, twenty years after making history for becoming the first Jew serving in the US Supreme Court, Louis Brandeis lived in a place he owned in Washington DC. “Louis Dembitz Brandeis (Alice), Associate Justice at Supreme Court of the US, h2205 California nw, Apt. 506,” reads the entry about him in the city directory.MyHeritage has recently made available through its portal, collecting over 25,000 public US city directories published between 1860 and 1960, therefore covering some fundamental periods in American as well as world history, such as the Civil War, the Great Depression, and the two World Wars.“In the past people had to personally go to libraries and archives to look for information about their family history, today with MyHeritage users can search thousands of archives from their home. It’s a revolution,” Tal Erlichman, Director of Product Management at MyHeritage, told The Jerusalem Post.The company, specialized in genealogy and DNA testing, currently offers its subscribers a historical database of 11.9 billion records. “This means that we have almost 12 billion names mentioned in historical documents,” Erlichman explained. “We are very global, we are translated in 42 languages and we offer records from many countries around the world and especially in Europe, even though the majority of information we feature comes from the US, like in the case of city directories.”City directories are public lists of names of people living in the city, published since the 18th century. They essentially had the purpose of allowing others to find or contact individuals and businesses, similar to modern White and Yellow Pages although, with the exception of the most recent ones, they did not carry telephone numbers. The information they featured included the names of those considered the head of the households and their address, and often additional details such as their professions, their wives’ names and whether they rented or owned their house.MyHeritage gathered directories from about 8,000 US cities, including the 100 biggest cities in the country, and for some of them, like Boston, Cleveland, San Francisco, Los Angeles and Washington DC, they were able to find records for multiple years, allowing the company to compare the information about the same people from different years.“Usually similar projects involve manual transcription of the records, which means human workforce going through every document and transcribing every name and relevant piece of information. However, such a vast amount of records would have taken years and years and cost million of dollars,” Erlichman said. “We wanted to find a way to achieve the goal much sooner and with good quality, therefore we worked on developing machine learning technology that could look at the documents and infer a structured index of information. This is what makes our initiative very unique.”The manager explained that a main challenge was represented by the fact that the software needed be able to extract not a unique block of text from each page, but to understand where each record began and ended as well as what information offered, something complicated also in consideration of the differences in the layout of the pages, fonts, structures of the record and information included in each directory.In order to achieve so, MyHeritage corrected errors in the Optical Character Recognition of the scanned directory pages, and then employed several advanced technologies, including Record Extraction, Name Entity Recognition, and Conditional Random Fields to analyze the data.The second step was to aggregate all the information about the same person that was found in different directories: for example, someone living at the same address for multiple years would be included in several directories, while their profession or marital status could change, offering additional details about his life.“This is the first time that any genealogy company tried to achieve this goal,” Erlichman told the Post. “We managed to look at all those books and tried to extract as much information as possible about every individual living at a specific address.”Teams working in the US, Israel, and Ukraine worked on the project for about two years.MyHeritage stated that it is currently in the process of indexing thousands of additional US city directories that will be added to the collection in the coming months. This addition will include directories dating back to the late 18th century, as well as a large and unique set of directories from the late 20th century.These are just two examples of the treasure trove of new information that