An entire Herculaneum scroll has been read for the first time (scrollprize.org)

1,693 points|by verditelabs|3d ago|365 comments|Read full story on scrollprize.org

Preprint: https://scrollprize.org/pdf/main.pdf

https://github.com/ScrollPrize/villa

Comments (365)

120 shown|More comments

1. verditelabs|3d ago|context

I am on the vesuvius challenge team that did the segmentation, unwrapping, and ink detection, so feel free to ask any questions.
2. helterskelter|3d ago|context

Given the current rate of progress, how long do you think it will take to decipher the entire collection?
3. verditelabs|3d ago|context

That's a tough one to give a strong estimate of. Some scrolls are easier or harder to unwrap and read for a multitude of different reasons, mostly due to how damaged the scroll was in the eruption, and how easy or not the ink is to read. IIRC from what we've scanned of the herculaneum collection, none of the ink is easily visible via spectrum alone, so we have to use a lot of ML and physically based rendering techniques to be able to find ink. That also requires unwrapping and segmentation _before_ any ink detection.
For iron gall ink with high enough iron concentration, the ink stands out in the xray volume through simply masking off low values, such as was shown in our campfire scroll experiment a few years ago. No herculaneum scrolls show similar ink.
4. helterskelter|3d ago|context

Thanks!
5. pimlottc|3d ago|context

Do you think this particular scroll is easier or harder to read that the others will be? Or about average?
6. verditelabs|3d ago|context

Pherc1667 was quite small and just so happened to have readable ink, so it was easier than I expect most others to be.
7. superjan|3d ago|context

Do we known what ink is used?
8. verditelabs|3d ago|context

Most of the evidence so far points towards carbon based ink. I am not sure if any of the scrolls we have scanned show strong evidence of iron gall based ink. I know that there are different types and preparation methods for different carbon based inks, but I do not know if it is possible to determine which kind(s) were used solely from inspecting the xrays.
I am, though, not a papyrologist, so historical ink making, preparation, and usage are not my field.
9. junon|2d ago|context

Thanks for answering all the questions in here. Fascinating work.
10. jimbob45|3d ago|context

Are the fragments destroyed in ‘69 and ‘80 available to be read similarly? Or were they disposed of?
11. verditelabs|3d ago|context

I am unaware of those fragments in particular. Though we have scanned a dozen or so fragments, mostly to help guide ink detection, since the ink in them is often more visible in visible and/or near IR light, but can be hard to impossible to detect in the xray spectrum.
12. adriand|3d ago|context

What are the wildest, most exciting but plausible things that might be discovered in these documents?
13. verditelabs|3d ago|context

I am not a papyrologist or a classicist, rather I'm a computer scientist, so my expertise is unfortunately not in _what_ the scrolls say, rather how we get there. That being said I think and hope that there will be a trove of things that has no known provenance at all, completely lost works that elude the public memory.
14. arikrahman|3d ago|context

Well what were your first thoughts when you decoded the script, besides the obvious Eureka, after making some sense of the texts?
15. tremon|2d ago|context

Probably something along the lines of "finally, now it looks like a coherent piece of text. I wonder what it says".
16. verditelabs|2d ago|context

Other members that were on the team before me had already proved it out before I came along so I knew it was possible. The cool thing for me though was specifically doing some physicically based rendering techniques. How well these work varies greatly, but on a few segments in one scroll they work extremely well. I whipped up some simple code to composite layers, did up a render, and without any ML at all was looking at multiple rows of text that no one had read for 2000 years. That was neat.
17. readthenotes1|2d ago|context

Your response reminds me of Nigel Richards :)
https://en.wikipedia.org/wiki/Nigel_Richards
Congratulations, and thank-you!
18. suddenlybananas|3d ago|context

Probably a lot more texts of Epicurean philosophy and not a whole lot else unfortunately according to my papyrologist friend.
19. cwmoore|3d ago|context

Why would Epicurean philosophy be unfortunate?
I was under the impression that there was almost nothing left of that school of thought, and that it’s writings had been destroyed.
What would you like to have instead?
20. cwnyth|3d ago|context

The unfortunate part is the lack of anything else therein, not that it's Epicurean philosophy.
21. ogogmad|2d ago|context

The Jewish Talmud uses Epicurus's name as a term meaning "heretic".
22. Telemakhos|2d ago|context

The Epicureans were particularly hostile to the Jews and Christians, because Epicureans deny Providence or the active intervention of the divine in human affairs. See Horace Sermones 1.5.
23. adrian_b|2d ago|context

It's more like the Christians and the Jews were particularly hostile to Epicureans and Stoics, because those mocked the claims about the existence of an all-powerful God that requires prayers.
The Epicureans and Stoics did not care much about Christians and Jews, but after the Christians obtained the power in the Roman Empire they made great efforts to persecute and discredit the Epicureans and the Stoics, as the most dangerous kinds of non-believers. (Unlike the rational Epicureans and Stoics, the traditional polytheists could be much easier converted to Christianity, by inventing a set of Christian saints to which the former polytheists could redirect the prayers and the holidays to which they were habituated.)
The Christian propaganda has created a false image of the Epicureans, which has persisted until today.
The Epicureans were not atheists, but they had a very different conception about what Gods are. They thought that in nature there are a lot of entities that have a god-like power, i.e. humans are too small and weak to influence them in any way, but the life of the humans is strongly dependent on the actions of those entities, so they can rightly be considered as gods. Examples of such entities are the Sun, the Moon, storms, volcanos etc.
Unlike in the traditional Greek and Roman religions, where it was believed that for each such natural phenomenon there exists some sentient god, who can be convinced to change the events to a more favorable outcome by prayers and sacrifices, the Epicureans believed that the gods, even supposing that they were sentient, in any case they do not care about humans more than humans care about ants, so there is absolutely no point in praying to them or bringing sacrifices to them.
Therefore humans should conduct their life according to ethic principles, but without worrying about what gods may think about their actions.
Many modern humans would probably agree with the Epicurean philosophy, which was completely different from what the Christian propaganda claimed, e.g. that Epicureans were some kind of sinners addicted to pleasures.
24. FergusArgyll|2d ago|context

> completely different from what the Christian propaganda claimed, e.g. that Epicureans were some kind of sinners addicted to pleasures.
Interestingly, in Jewish literature (Talmud and further refined by Maimonedes) Epicurus refers to a certain kind of non-believer, not to a sinner for pleasure. See here for example https://www.sefaria.org/Mishneh_Torah%2C_Repentance.3.8?lang...
I always wondered about that because I guess I fell for the "Christian propaganda" as you call it.
25. adrian_b|2d ago|context

Indeed, the 3 beliefs attributed to Epicureans there, i.e.:
a) one who denies the existence of prophecy and maintains that there is no knowledge communicated from God to the hearts of men;
b) one who disputes the prophecy of Moses, our teacher;
c) one who maintains that the Creator is not aware of the deeds of men.
are actually accurate enough renderings of what an Epicurean might have said in a discussion with a Jew, because as I have mentioned, Epicureans believed that there are gods, but those do not pay attention to humans and do not attempt to communicate with humans, because humans are insignificant for them.
This is quite different from how Epicureans were portrayed in Christian literature, where calumnies against them were preferred for avoiding any direct controversy.
26. adriand|2d ago|context

> What would you like to have instead?
History! That's what intrigues me the most: texts with accounts of events that have otherwise vanished from the historical record.
27. cwmoore|1d ago|context

Past events, and the ideas behind them, are both first-class history topics.
28. Matticus_Rex|2d ago|context

That's what was thought, but maybe not -- only one of the three so far looks Epicurean, which is not what was expected. Maybe it's a fluke, but historians are buzzing a bit about whether it might be broader than expected.
29. kome|2d ago|context

in the paper it says "The recovered text is a philosophical treatise on ethics, and the evidence points to a Stoic work: it turns on human nature, impulse, and the moral progress of human beings, and its final preserved column names Aristocreon — nephew and disciple of the great Stoic Chrysippus — which, together with the language and themes of the text, places it in a Stoic context and dates it to the 2nd century BC."
30. colechristensen|3d ago|context

Here's a list. The scrolls are from a library that burned in 79 AD.
https://en.wikipedia.org/wiki/List_of_lost_literary_works
31. kouru225|3d ago|context

Woah there was a lost Homer epic comedy about a bumbling fool named Margites?
32. sapphicsnail|2d ago|context

There's also the Telegony. Odysseus has a son through Circe who winds up killing him and marrying Penelope. Odysseus son through Penelope, Telemachus, marries Circe. There's some wild stuff that doesn't survive.
33. kouru225|2d ago|context

Looking through these it’s crazy to find out that The Iliad is only 1 of like 5 original texts on the Trojan war. We’re reading book 2 of a 5 book series
34. colechristensen|2d ago|context

It was an oral epic passed through generations for quite a while before anything was written down so there isn't necessarily much of an "original"
35. GeoAtreides|2d ago|context

Aristotle's second book of Poetics, of course.
36. wolfi1|2d ago|context

we already know that a blind Italian monk burnt it to ashes, at least, that's what Eco wrote and he was a learned scholar
37. pestatije|2d ago|context

but that was a copy
38. wolfi1|2d ago|context

well the other existing copy (or original) was destroyed with the library of Alexandria
39. echelon|3d ago|context

Did anyone on the team come from a non-science, non-math, non-academia background? Did anyone working on this just teach themselves and start contributing?
40. verditelabs|3d ago|context

Yes. Sean, who was a co-winner of the 2024 prize, IIRC has no formal background in ML, computer science, AI, etc. He is one of our core researchers and the most productive team member.
41. fintechjock|3d ago|context

I've been on the Discord for a couple of years now, and poking around with submissions as well. Sean and the entire team deserve so much praise for all of this work.
It's easy to just read about the breakthrough and see it as one neat, linear line to get there, and hard to comprehend the hours, months and years that so many spent to get there. Big congrats to you, Sean, Nat and the entire team!
42. echelon|2d ago|context

That's incredibly impressive.
Major kudos to all of you on your achievements! This is amazing work for anthropology and for society, and it's greatly appreciated.
43. tsol|3d ago|context

How do get to do that? As in what did you study to get the prerequisite knowledge, and how did you find this particular job? When I see interesting jobs I'm anyways curious what path lead there
44. verditelabs|3d ago|context

I am a computer scientist. I studied CS in university, worked in the semiconductor industry for a while, got started as a participant in the challenge aspect of the Vesuivus Challenge. They were hiring, I sent in an application, interviewed, and was offered the job.
45. matneyx|2d ago|context

That last sentence is so perfect, like my dad answering the question of how he lost weight. "I ate less and exercised more."
46. tsol|1d ago|context

Very cool that you got in just through your interests. It's anyways cool to see stories where that works out! Good to know it's possible to get that kind of job
47. inglor_cz|3d ago|context

I don't have any questions, just a comment.
You have a potential to rewrite the history of European Antiquity quite substantially. The Herculaneum set of scrolls is enormous and must contain a lot of hitherto unknown.
That comes with a set of peculiar risks. Once your work starts producing something that contradicts previous work of Very Important People, they will lobby to stop you. Be prepared for that.
Science should be neutral and always value new evidence. Scientists as humans are unfortunately subject to all sorts of passions.
48. Rebelgecko|2d ago|context

What contradictions do you think the scrolls contain?
49. inglor_cz|2d ago|context

I don't have any concrete tips.
We have very little written material surviving from Rome, at least from the period before a codex (book) was invented, which was more durable that a scroll. Often, we only know of one source describing important events, and when it comes to political struggles and civil wars, the perspective of the defeated party often did not survive. The punishment of damnatio memoriae was practised and even among the early emperors, Caligula and Nero were subject to a form thereof. (This library in Herculaneum was buried 11 years after Nero's death.) I would be surprised if everything in the scrolls perfectly aligned with the record that survived for 2000 years and that was filtered by both random chance and political/religious censorship. Even Christians later destroyed some pagan texts.
BTW personally, I would love for some textbook of Etruscan to emerge from there. This was once again a language whose teaching was banned in Rome.
50. TheOtherHobbes|3d ago|context

No questions, but I just want to say this is really exciting work!
51. Dzugaru|3d ago|context

Outstanding work! I've participated in the challenge, but didn't get far. One of the questions I had at the time was - if I'm going to use ML to detect ink, could it invent hallucinated letters, or even parts of text, and how to prevent that?
52. verditelabs|3d ago|context

Yes, it's quite possible for ML to hallucinate ink, though it is on a much more local scale, like predicting a slightly longer stroke, filling in more of a character than is actually in the data, etc. Perhaps enough to change a reading of a character or show where ink isnt. It is difficult for ink detection to hallucinate grammatical and idiomatic greek and latin.
53. im3w1l|3d ago|context

What is the input to the ML algorithm? Does it know the surrounding context so that it has a chance to deduce "if this stroke is slightly longer then the end result will be idiomatic greek and latin"?
54. verditelabs|3d ago|context

The input is 3d chunks of reconstructed CT data from our scans. I can't remember the specifics but maybe enough voxels for .5mm^3 at a time or so? They're all available for free from https://registry.opendata.aws/vesuvius-challenge-herculaneum... . Our trained models are all available at https://huggingface.co/scrollprize
55. cwnyth|3d ago|context

Not all machine learning is generative AI.
56. mc32|3d ago|context

True but like regular document scanning software there can be errors in detection.
57. dleeftink|3d ago|context

Just as with redacted documents (consistently blocked terms) or bad OCR jobs (wrong or missing characters), even if only a certain percentage comes out unmangled it is more readable than having no data at all.
A stable base corpus and some dynamic programming will allow you to clean up the remainder[0].
[0]: http://stackoverflow.com/a/11642687/2449774
58. mkl|2d ago|context

The problem is when you can't tell which bits are unmangled. OCR systems will happily give you plausible but wrong readings, and even some scanners/copiers will change things: https://dkriesel.com/en/blog/2013/0802_xerox-workcentres_are...
59. selcuka|2d ago|context

Yeah. There was a weird Xerox printer bug that swapped digits (turning 6s into 8s) on scanned documents caused by the JBIG2 image format [1].
[1] https://www.dkriesel.com/en/blog/2013/0802_xerox-workcentres...
60. BiraIgnacio|3d ago|context

Amazing work, fantastic!
61. tomcam|3d ago|context

Absolutely incredible work. This is one of the most amazing news articles I’ve encountered in decades. Congratulations team!
62. temp987|3d ago|context

this is überragend. by many means!
63. 2ap|2d ago|context

I'm interested to know about the approaches that you tried with the ML, and then decided to not use. In practice, the options are so many. How did you come up with the final approach - and was there a systematic way to decide which options to go for?
64. verditelabs|2d ago|context

I am not on the research team, rather on the production side of things, so my knowledge on that is pretty limited. I think one of the main takeaways from a lot of the research, though, on both the segmentation side and the ink detection side, is that it's a lot less about what models and techniques and such you use, but how good your training data is. Gathering ground truth is hard, and if you don't have a lot of good ground truth, it doesn't matter if your code is perfect, you'll never get results.
65. gekoxyz|2d ago|context

> it's a lot less about what models and techniques and such you use, but how good your training data is.
Ah, the good old bitter lesson strikes again
66. rossdavidh|2d ago|context

That is a general truth of most ML; many models _can_ find the information in the data, if the data is good enough. If it is not, then likely no model can.
67. EvanAnderson|2d ago|context

You brought up what I'm most curious about: Where does the ground truth come from for this work since you can't just to unwrap a scroll to tell if the model got it right or, presumably, make a facsimile scroll and wrap it up.
68. verditelabs|2d ago|context

The ground truth comes from manual work. The scrolls can be unwrapped virtually, manually, through extensive pointing and clicking by a human on the boundaries of the scroll. This, in and of itself, is not particularly hard in sections of the scroll that are preserved well, but is extremely tedious and slow and error prone. We have a team of annotators who do manual annotation and refinement through custom software we've written, mostly improving on automatically generated segmentations and unwrappings.
Once you have some unwrapped papyrus, you can render it to an image and look for ink. Ink leaves a certain texture that can be identified by the naked eye and labeled. Between these two processes you get the segmentation and ink detection ground truth. Segments can be flattened virtually through existing software and algorithms.
69. EvanAnderson|2d ago|context

I'm sure that process is described somewhere on the project's site and, being a lazy human (and unwilling to ask LLMs to summarize it for me), I leaned on you for a human answer. I really appreciate you taking the time to answer. Thank you.
I can see why you'd be attracted to this project from a "let's solve problems computationally" perspective (never mind the historical side). It sounds like there are some cool problems in there.
The eye toward automating the process that the project seems to be targeting is particularly cool, too. This kind of stuff that makes me have real enthusiasm for ML.
70. NooneAtAll3|2d ago|context

how many scrolls have been scanned so far? what's the main limitation on scan amount?
have any attempts (or just ideas) been made to recreate such charring on known texts?
71. verditelabs|2d ago|context

30 scrolls, maybe? Something like that. I scanned Pherc Paris 4 and Pherc Paris 3 at Beam line 18 at ESRF back in March.
The team did "the campfire scroll" experiment a few years ago to replicate carbonization, unrolling, and ink detection. That is the only case I am aware of. It proved the method could work but it's not a source of say training data; it varies too much from the real scrolls.
The main limitation is time and cost. We have to scan on what is AFAIK the most powerful x-ray beam line in the world. It is not cheap
72. CGMthrowaway|2d ago|context

You had to pay? I understand the machine cost many hundreds of millions of dollars, but I would have thought for academic researchers doing open science, the beamtime is free (funded by the govt / science trusts).
73. verditelabs|2d ago|context

The beam time is unfortunately not free. I scanned Pherc Paris 4 and Pherc Paris 3 in March and had the final shift on the beam. As I was removing the scroll from the scanning pedestal the next team of scientists were already in the lab getting their samples ready. It's a well oiled machine and they've got customers.
74. prox|2d ago|context

What other type of stuff gets scanned? I can’t imagine a whole industry waiting to x-ray something?
75. CGMthrowaway|1d ago|context

Materials science, battery research, pharma (protein structure)
76. larkost|2d ago|context

The way these things normally work is that the project starts with some sort of a grant. Then that grant pays for all of the costs of the project: peoples' salary, materials used, time on equipment, plus money for the buildings and administration (overhead).
In this case the time on the equipment would need to be included, both a portion of the cost of building/maintaining it, and probably the energy needed to run it. Even where the government is providing the grant (likely here), it still needs to be accounted for.
77. verditelabs|2d ago|context

We - the core challenge team anyway - get no money from any government. We paid for the beam time from our donations and internal funding.
78. negergreger|2d ago|context

How fast is the process?
Could it be automated to the point where it's faster to scan a book closed than opened?
79. verditelabs|2d ago|context

We've been trying to automate since the beginning. A lot of it is automated but it's mostly the easier and less damaged parts of the scrolls. Scanning takes a few days for the biggest scrolls but the amount of human refinement is still a multi month process.
80. itsthecourier|2d ago|context

may you please tell us how much effort goes into each type of task in those months?
where else do you think these techniques be applied?
81. verditelabs|2d ago|context

We are a core team of about 10 researchers and developers working full time on work that applies to all of the scrolls. We also ahve 4 full time annotators that tend to work on one scroll at a time. The amount of time spent on any given scroll varies with how difficult and large it is.
There is an extremely large overlap between a lot of the work we do with medical imaging, CT scanning, XRay technology, and such. A lot of the ML models and frameworks we have used and adapted for our purposes originated in the medical field for things like cancer detection or segmenting different body parts.
82. fph|2d ago|context

Random shower thought: I wonder if it would be better in the long term to stop digging out archeological findings. The more we excavate, the more damage we do for future archaeologists who will have the superpower of reading these texts without even needing to dig the scrolls free and open them.
83. flir|2d ago|context

Modern archaeologists are painfully aware that theirs is a destructive science, and do their best to mitigate that. The most extreme example is probably the tomb of the First Emperor, Qin Shi Huang, where official policy on excavation can be boiled down to "not yet".
84. verditelabs|2d ago|context

We stand on the shoulders of those that came before us. People have been trying to unroll and read the scrolls for 250 some odd years now. Had they not laid the groundwork for all that time we wouldn't be making the progress we are now.
85. Centigonal|2d ago|context

There is an active debate on exactly this topic when it comes to whether or not to excavate the tomb of Qin Shi Huang.
https://en.wikipedia.org/wiki/Mausoleum_of_Qin_Shi_Huang
86. kylemaxwell|2d ago|context

Archaeologists think about this a lot. Many digs leave portions intact specifically so that future scientists, with access to techniques and technologies beyond what's available now, can research them.
87. NoMoreNicksLeft|2d ago|context

How many scrolls are intact (worldwide, rather than just France) that might still be recoverable?
88. verditelabs|2d ago|context

IIRC 99% of all of the existing scrolls are still in Italy's possession. I think the breakdown is something like ~350 are mostly in tact, another ~1000 are damaged but still "scroll like", and the remaining hundreds are shattered fragments.
89. pestatije|2d ago|context

...plus the ones that have not been dug out yet... the site is still partially buried
90. NoMoreNicksLeft|2d ago|context

My god, but that sounds wonderful.
91. countrymile|2d ago|context

did anything progress on trying to dig more out of the ground? i know that there was thinking that a lot of scrolls might still be down there
92. verditelabs|2d ago|context

Not yet, as far as I am aware. Digging progress is decided by the Italian government at multiple levels and would be a many year long thing. We have our hands full for the forseeable future with the 30 or so scrolls we've already scanned. We're getting more and more efficient on the scanning and automation fronts, though, and are hoping that we can get our hands on the other 300 or so intact scrolls, but that in and of itself is a multi year long project that will require more money and time. As I've mentioned in a different comment, scanning is _not cheap_ and we pay for it ourselves from our own funding and donations in order to release the data for free with permissive licensing. We hope that we can improve our processes to be able to work with cheaper, lower resolution CT methods, but right now we are focused on extracting as much as possible from the best scan source in the world. Productization of cheaper scanning methods is a secondary to tertiary priority at the moment.
93. countrymile|2d ago|context

Funding wise I asked a year or so back about crowd funding, but hear nothing back. My means are limited but I'm sure there are a lot of people like me who could band together. The project seems content on the big doners right now?
94. verditelabs|1d ago|context

We will gladly take your money. There's a link on our homepage to give us money. We list a lot of sponsors and appreciate all that we have been given. Most of that was given in 2023 and 2024 and most/all of it had been spent. As far as I am aware we now rely solely on our internal funding from our founding sponsors. We are a separate entity from both U of K and the University of Naples and receive zero dollars from either of them, though their faculty and staff work closely with us in our goal of reading the scrolls.
I couldn't tell you about our future funding efforts or possible crowd funding. That's not in my wheelhouse.
95. nkoren|2d ago|context

Massive kudos to the whole team. I've been waiting 30 years for this announcement, ever since I first heard about the scrolls. Fantastic work!
96. dogscatstrees|2d ago|context

What is your origin story? How did you end up doing this and how can I do the same?
97. verditelabs|2d ago|context

BS in CS from a big state school in the USA. I have a hobby interest in history. I learned about the challenge on YouTube. Got involved contributing because I needed money. Then they put out a job posting. I applied, interviewed, and was hired.
98. Refreeze5224|2d ago|context

What a cool job, and congrats on great work!
99. ghghgfdfgh|2d ago|context

I understand that the complexity of the project has increased over the years. How difficult is it for a newcomer to get into it?
100. verditelabs|2d ago|context

It has gotten harder, unfortunately. One of the barriers to entry is simply the massive amounts of data; not everyone can set aside $100s worth of HDD or SSD space to play around. That said I have done a lot of work to dramatically reduce the amount of storage and bandwidth needed.
We unfortunately get a lot of slop submissions, which is unfortunate. I think a _really_ good place to start is simply joining the discord and looking at the data we've published and trying to replicate something or anything really. We understand that not everyone is a researcher that can jump in making awesome immediately applicate submissions.
Granted, that's pretty specifically for people that want to submit for prizes and prize money. Everyone on the team absolutely loves to talk shop and interact with real people with real interest, so if you show it in the discord we are all more than happy to help, engage, fix bugs, gvmive advice, etc.
I would personally love to see more open source and contributed papyrology and translation, musing on difficult readings etc.
For the more technically inclined, testing software, pointing out bugs, and actually running and trying to fix things is a huge positive that we like. We get a lot of slop submissions that are just someone pasting an issue on our GitHub into codex or Claude. We don't want to encourage that. We can do that ourselves.
101. amluto|2d ago|context

Do you know what kinds of features the model is picking up on to distinguish ink from papyrus? And did you have any labeled data (images where a human expert has identified ink or perhaps a scan of a burnt scroll with known content) to help train it?
Certainly my Mark 1 eyeballs would not obviously perform better than random guessing at this task. Although my eyeballs are, if nothing else, nerfed by only being able to see a 2D slice of the data.
102. verditelabs|2d ago|context

Yes. Most of the ink we have come across is carbon based. This leaves a certain texture on the scrolls that is recoverable and viewable with fairly basic physically based rendering, though how much ink is recoverable varies greatly from one character to the next. I don't have links handy but we just published updates to our data viewer page on our website. Pherc.Paris.4 I believe has the best overlay of ink.
A lot of labeled data is available on our ftp server which has public access
103. londons_explore|2d ago|context

I assume that's because the writer probably sometimes shortly after re-inking the writing instrument was putting down a 10x thicker layer...
104. amluto|2d ago|context

When you say "physically based rendering" do you mean that one could build a PBR model based on the (unrolled?) xray data, render that model, and be able to see the ink?
edit: I found this:
https://scrollprize.org/data_browser#/samples/PHercParis4/se...
The JSON seems to suggest that I'm mostly looking at ink detection output, but I could easily be using the tool wrong.
But I also found this awesome explanation:
https://scrollprize.org/data_fragments
I guess I bunch of the training was done by using fragments of scrolls where ground truth data is available using IR photography.
Also... that xray resolution is absolutely amazing!
105. verditelabs|2d ago|context

Some images on that page, specifically the "alpha composite" and "combined alpha" images, are a pretty simple PBR (if it's even that complex; it's just a composite rendering over a 3d array to a 2d image) rendering with no ML based ink detection in the input.
106. Izmaki|2d ago|context

How awesome do you feel right now? This is HUUUGE! To think that a scroll was unreadable for so, so long, until we invented machines that let us read it slice by slice. It's such an unfathomable achievement - we made machines that let us read 2000+ year olds fragile scrolls without ever opening them - and you helped do just that.
Hats off!
107. verditelabs|2d ago|context

In March I went to Beam Line 18 at the European Synchrotron Radiation Facility. I had to swap out the scrolls on the xray pedestal. Scrolls that were presented as a diplomatic gift to Napoleon and Josephine by King Ferdinand. France has 2 of the 6 that they were given still in tact. I had to handle both of them. I have never felt more stressed in my life and have never and will probably never again handle such a priceless artifact.
I feel the opposite of that feeling and am immensely proud of everything that the core challenge team has accomplished
108. _boffin_|2d ago|context

I am floored at these achievements. Such amazing work.
If I may ask, when you started thinking about achieving this, what were the first attempts, ideas on how to go about it? What were some of the obstacles that had to be overcome to achieve this ?
109. verditelabs|2d ago|context

The process of trying to read the scrolls has been going on for about 275 years or so, now. Doing it nondestructively via CT scanning and virtual unrolling and reading has been in the works for 25 years or so, so it's a lot of building on previous work.
Virtual unrolling and reading are not terribly hard to do manually, they are just not feasable on a large scale. Like years and years of human time spent tediously clicking on papyrus and labelling ink in renders, so a large amount of automation is required.
A lot of difficulty has come from the first step: xraying the scrolls. It's hard and expensive and difficult to get right. The efforts since this all began with CT scanning 25 years ago has been kneecapped by the data simply not being good enough. We xray on what is AFAIK literally the most powerful xray beamline in the world and we would still like for it to be more powerful and faster. Not to mention the massive amounts of data. For Pherc Paris 3, our largest scroll, the raw reconstructed data is 260 terabytes. That's a lot of data to have to deal with.
110. tokioyoyo|2d ago|context

This is one of the most fascinating comments I’ve ever read. Thank you so much!
I was wondering, how does this all get funded?
111. rjtavares|2d ago|context

There's a sponsors and partners list on their webpage: https://scrollprize.org/#sponsors
112. skew-aberration|2d ago|context

Where can we read about the xray setup? e.g the type of sensor, if/how the target and/or beam is scanned, any fancy gratings/etc, what kind of CT algorithms are used
113. SideburnsOfDoom|2d ago|context

Parent comment says "Beam Line 18 at the European Synchrotron Radiation Facility"
so https://www.esrf.fr/home/UsersAndScience/Experiments/BM18.ht...
https://www.iis.fraunhofer.de/en/pr/2022/20221031_bm18.html
114. anentropic|2d ago|context

Just wonderful
Wonderful that all of this amazing technology exists
Wonderful that we used it to read these ancient scrolls
Thank you
115. mmooss|2d ago|context

> We xray on what is AFAIK literally the most powerful xray beamline in the world and we would still like for it to be more powerful and faster.
What makes power relevent here? Obviously medical applications aren't particulary powerful, are quick, and are very useful. Is it harder to penetrate the material than the human body? Is the increased power due to increased resolution - i.e., increased pixels/cm^2 rather than increased watts/pixel? The latter would seem to risk damaging the artifact?
116. verditelabs|2d ago|context

We scan the full scrolls a 2.4 micron and scan portions of them at up to .5 micron. This is 1000x to 4000x higher resolution than your standard medical CT scanner, so that requires a lot more power to get readings at such high resolution. There are other properties that make large synchrotrons more amenable to our task but I am not an xray technician so am not qualified to speak to most of them.
Damage to the artifacts is less than you might expect. I think that the radiation is particulary dangerous to living tissue and fiber. The scrolls are inert, pure carbon charcoal bricks for the most part and not particularly vulnerable to high power xrays.
117. Upvoter33|2d ago|context

Lots of great work that pioneered here (I wish the website did a better job showing that?)
e.g., Dr. Brett Seales and his decades of work: https://www.science.org/doi/10.1126/sciadv.1601247
118. verditelabs|2d ago|context

Brent is an advisor on the Vesuvius Challenge. He's listed on our website as such but the work we are doing and specifically that which falls under the Vesuvius Challenge is separate from him (apart from his being an advisor), EduceLab lab at U of K, and U of K as a whole. The purpose of the scrollprize website is not to showcase the 25 years of research leading up to the Vesuivus Challenge. It's to showcase what the Vesuivus Challenge is doing.
Granted none of the core team are web developers so updates to the website are best effort.
119. Upvoter33|2d ago|context

ah cool - thanks for the clarification. some of the comments here read like nothing like this has ever been done before ...
120. thom|2d ago|context

Do we have a sense for what proportion of text is actually retrievable from these scrolls?