Archivum Secretum

by wjw on May 7, 2018

The Vatican Secret Archives houses 53 linear miles of documents going back 12 centuries, much of it handwritten in Latin using archaic script.  It’s not Secret, exactly, it’s just inaccessible, because in order to look at one of the documents you actually have to be there, in the Vatican, along with your knowledge of Latin and archaic scripts.  Plus you have to know what you’re looking for and where to find it.

A lot of it looks like this:


But a new project could change all that. Known as In Codice Ratio, it uses a combination of artificial intelligence and optical-character-recognition (OCR) software to scour these neglected texts and make their transcripts available for the very first time. If successful, the technology could also open up untold numbers of other documents at historical archives around the world.

OCR is intended for typeset manuscripts, of course, because it requires a space between letters for the program to recognize them.  So— artificial intelligence to the rescue!  Along with a whole lot of high school students.

High scho0l students, see, have much better optics than computers, at least for reading cursive, so they are being used to “train” the AI to recognize letters and words.

The students didn’t even need to be able to read Latin. All they had to do is match visual patterns. At first, “the idea of involving high-school students was considered foolish,” says Merialdo, who dreamed up In Codice Ratio. “But now the machine is learning thanks to their efforts. I like that a small and simple contribution by many people can indeed contribute to the solution of a complex problem.”

The AI needed some further work after that, but the OCR was finally ready to read some texts on its own. The team decided to feed it some documents from the Vatican Registers, a more than 18,000-page subset of the Secret Archives consisting of letters to European kings, rulings on legal matters, and other correspondence.

The initial results were mixed. In texts transcribed so far, a full one-third of the words contained one or more typos, places where the OCR guessed the wrong letter. If yov were tryinj to read those lnies in a bock, that would gct very aiiiioying. (The most common typos involved m/n/i confusion and another commonly confused pair: the letter f and an archaic, elongated form of s.) Still, the software got 96 percent of all handwritten letters correct. And even “imperfect transcriptions can provide enough information and context about the manuscript at hand” to be useful, says Merialdo.

And of course the AI will get smarter as it goes along, and so the transcripts will improve.

But at what cost?  What made the Secret Archives fascinating was that they were Secret.  They were a great mystery.  Once they’re revealed to be 57 linear miles of dull bureaucratic memos, travel vouchers, invoices, complaints that Lord Whatsit dissed Abbot Wossis when they were on retreat in Fontevraud, and clarifications of theological points even the Vatican doesn’t care about anymore, a great source of wonder will be removed from the world.

And what will Dan Brown do then, I wonder?

Ralf T. Dog May 8, 2018 at 12:27 am

Its all Greek to me.

kpacheneg May 8, 2018 at 8:26 am

Not to mention all the documents proving the Catholic Church’s support of the Shoah have been destroyed ages ago, so there won’t be that much actually useful material left.

Comments on this entry are closed.

Previous post:

Next post:

Contact Us | Terms of User | Trademarks | Privacy Statement

Copyright © 2010 WJW. All Rights Reserved.