skip to content

Cambridge University Library

Q&A Wednesday: Algorithms and elbows, with Noam Sienna

Rabbi Refael Aharon Ben-Shim’on
Rabbi Refael Aharon Ben-Shim’on, the chief rabbi of Cairo at the time of Solomon Schechter's visit in 1896
Melonie Schmierer-Lee and Noam Sienna
Wed 28 Jul 2021

Noam, what are you working on at the moment?

I’m working with Marina Rustow at the Princeton Geniza Project, helping to train a computer algorithm to transcribe documentary material from the Genizah. The PGP has lots of transcriptions and editions of documents that various Genizah scholars have done, so right now we’re getting the computer to match up what it sees on an image with the transcription that a person has done, and once it can do that, hopefully it will be able to transcribe documents that scholars haven’t looked at yet. Even if it’s only 80 or 90% accurate, that would still be an immense advantage for scholarship. I’m also working on revising my dissertation into a book.

How do you set about training a computer? Can you outline the process, or your part in it?

There are a lot of people involved with this program. The computer science part is beyond my ken – we’re working with a digital palaeography platform called eScriptorium, and collaborating with scholars around the world. But my working process at the moment involves looking at an image of a Genizah document, checking to make sure the computer has “segmented” it appropriately – identified all the lines of text and individual letters on the page. Sometimes it misses words, or accidentally picks up something else. Lots of times it actually highlights the handwritten label of the fragment number! So that’s something that for a human is obviously not part of the original fragment text, but a computer has no way of knowing that without multiple corrections. Eventually it learns that writing of that size/colour/shape should be ignored. Once it’s been segmented, we start to correct the transcriptions – matching up what the computer sees with the transcription we have through the PGP (that have accumulated over many years of scholarship on the Genizah). The other issue is that humans are very good at inferring things from very little context, and especially trained scholars who know Judeo-Arabic writing conventions, palaeography, etc., – sometimes we can guess the whole word just from half a letter! But a computer only knows what it sees. So, a lot of the time we’re actually removing material from the transcription that a human scholar was able to infer but that is not actually visible on the page and not accessible to a computer algorithm.

How is successful is the computer so far? Is it better or worse with particular scribes? Or genres?

Pretty good! I don’t have exact numbers (Marina would know) but so far the transcriptions it’s producing are already fairly close to what a human scholar has produced. In the best cases: clear writing, straight lines, etc. As you know, Genizah documents can be quite messy. And genre matters too: a simple document, let’s say a legal record, is fairly easy, because the text is in one block on the page, it’s in straight lines, and the script is usually fairly clear. Letters are more complicated because writers squeeze things in the margins, go sideways, add postscripts, etc. And lists are all over the place. So, we’re working on documents with “simple” layouts first, and then hoping to work on complex ones.

PGP Screenshot

Screenshot of the PGP's 'Handwritten Text Recognition' project in progress, with line segmentation applied to T-S 10J4.12.

Can the algorithm deal with lines, dots, or diacritics?

No, at the moment we’re leaving those out.

And are you focussing on Hebrew-script manuscripts or looking at Arabic-script ones too?

At the moment, only Hebrew ones. I suspect that Arabic manuscripts will be in the future!

I’ve been reading back issues of the Genizah Fragments newsletter recently, and the developments in technology are striking – from getting an answering machine to an email address, to the dream of a full electronic catalogue, to the full digitisation of the manuscripts. It always sounds like computers are on the verge of taking over ­– or at least taking us to the next level.

Yes indeed ­– the history of the book is a constant dialogue with changing technology!

Can you tell me about your forthcoming book, and how your dissertation intersected with the Genizah?

My dissertation (and forthcoming book) focuses on Jewish book culture in early modern North Africa. The Genizah is an important early testimony to the vibrancy of Jewish life in medieval North Africa, and in particular how medieval North Africa was a centre for the copying and wide distribution of Hebrew books. In fact, many of the merchants who left their books and papers in the Genizah themselves were originally from North Africa, and migrated to Egypt both for economic reasons and due to political upheaval (for example, the Banu Hilal’s conquest of Tunisia in 1057). So that sets the stage for my book, in which I examine the development of early modern book culture in North Africa – both the continuities in manuscript production from the medieval period (and even the continued physical presence of medieval manuscripts), and also the growing involvement of North African Jews in transnational networks of printing. I’m generally interested in the materiality of Jewish books and book-making, so the Genizah has been a happy place for me for a long time.

Was any of the later material in the Genizah useful for your research?

Great question – I wish! The truth is that the later Genizah has been so understudied that the material is just not as readily accessible. I was not aware of any relevant documents but that’s only because they haven’t been published or catalogued, and I didn’t have the opportunity to do my own original research into it. But I’m sure there’s more material there.

One dynamic that becomes more pronounced in the early modern period is that North Africa (meaning Morocco, Tunisia, and Algeria) and Egypt become more and more distinct – Egypt becoming integrated into the Ottoman Levant, and North Africa into the Western Mediterranean. So, in the 18th century, let’s say, North African Jews have deeper connections with Gibraltar, Italy, France, even England, than they would with Egypt. This is a generalization, obviously: there are still some connections and some movement back and forth. 

One outstanding example is actually related to the Genizah: Refael Aharon Ben-Shim’on (1847–1928), who was the chief rabbi of Cairo when Schechter came by in 1896. Ben-Shim’on was born in Morocco, and had served as a rabbi in Fez before coming to Egypt. Sometimes when people talk about the Genizah he’s portrayed as an Orientalist stereotype, a typical “Middle Eastern” rabbi; a passive and disinterested traditionalist who has to be “wooed” by Schechter. But he was actually a scholar and antiquarian himself, and deeply interested and invested in studying historical Jewish manuscripts. Before he came to Cairo, he founded a society in Fez to find old manuscripts and bring them to print (it was called Ḥevrat Dovevei Siftei Yeshenim, and it published five liturgical and rabbinic works in Jerusalem and Alexandria between 1889 and 1903). Ben-Shim’on plays an important role in my dissertation and book, and I think deserves more respect in the narrative we tell about the Genizah.

In one of Schechter’s letters to his wife he notes the unusual sensation of being kissed on the cheek by the rabbi.

Yes, I love that tidbit! Ben-Shim’on did have quite the luxurious beard. 

But he was more than the beard – Schechter and Ben-Shim’on were both scholars and met on that common ground. 

Yes. And Ben-Shim’on was not alone: Moroccan Jewish scholars both in Morocco and abroad, such as Avner Yisrael Serfaty, Refael Moshe Elbaz, Refael Abensur, and Ya'aqov Moshe Toledano, were invested in preserving and recording the Maghrebi Jewish history held in rare books and manuscripts. For example, a colleague described that Refael Abensur “would collect and gather many books of Torah, from his love for Jewish literature, in addition to those books he inherited from his family... He did not ignore even a single manuscript leaf, but would gather dispersed leaves and bind them into volumes.” The work of these historians and antiquarians, which I examine in my forthcoming book, reveals how Maghrebi Jews themselves were deeply conscious of their role as guardians of their own history, and strove to apply modern standards of scholarship to the preservation of their heritage.

This was part of a transnational conversation in the late 19th century – Jewish scholars around the world were thinking about history, about archives, about the importance of manuscript heritage. Too often, scholars like Schechter are portrayed as the main players of that story, and scholars like Ben-Shim’on are just the background. I think Schechter himself might not have given Ben-Shim’on his proper due. But Ben-Shim’on definitely thought of himself as a modern intellectual, albeit one rooted in his North African context.

As an example, here’s what Ben-Shim’on wrote in 1889, when he brought to print a siddur of the unique liturgy of the Jews of Fez:

“I asked the leaders of the community [in Fez]... ‘why do you not send your manuscripts to be printed in one of the cities of print, in another country? What will you do if this hidden treasure in your hands decays from old age? Will not the heritage of your ancestors thus be lost? This would be an immeasurable loss.’ They answered me fairly that the art of printing was completely unknown, from one end of the Maghreb to the other, and for other countries, there is no price that could persuade them to accept this work! I therefore vowed to devote myself to this work, with the help of Heaven, and I immediately brought the manuscript to copyists to copy it for me, for it was not in the form needed for printing.”

He must have understood what an important resource the Genizah was, and he must have believed that allowing Schechter to help bring it into the world of scholarship was the best way to have it preserved.

There’s a lot of elbow jostling around the story of the ‘discovery’ isn’t there – so many people get sidelined. Do you know of any quotes from Ben Shim’on around Schechter and the Genizah? 

You know, I haven’t come across any, but he must have had something to say about it. I will keep an eye out! I certainly wonder if he wrote down his own account of it. That would be fascinating, wouldn’t it! I will look into it. I know that Rebecca Jefferson, for example, has been working on rethinking and rewriting our accounts of how “The Genizah” came to be. So many players – lots of elbow jostling indeed.

Thanks for your time, Noam.

Add new comment



Share this post