As an amateur who's been fascinated by this puzzle himself, I will add some context that might be relevant in assessing the plausibility of this claim:
- The "Libation Formula", which the author used as the base for his translations, is the most studied piece of writing in Linear A, because it's the only recurring phrase (with grammatical variation) that we have. The corpus is extremely fragmentary, with just a handful of instances of longer text (and even then, the texts are the length of an average sentence in English). The majority of documents available to us are lists (of inventory, personnel, offerings or something of this sort). The longer texts make use of punctuation marks, likely put in between words. This gives us a non-trivial vocabulary, which still does not match that of any known language.
- With such fragmentary remaining material, we cannot be sure that a) all the texts we call "Linear A" are written in the same language, and b) the recognizable words are not abbreviations, for example.
- The author made an assumption that Linear A symbols which have counterparts in Linear B should have the same phonetic values. This gives us an already known glyph that represented "NA". "Duplicate" glyphs are only found in the P-series, and are assumed to represent syllables which were distinguished by the Linear A language, but not by Greek - such as aspirated/unaspirated P. There is a glyph that stands for "NWA" in Linear B, but instances of it have been found in Linear A as well.
- There are countless words with no known etymology in Ancient Greek, assumed to originate from a substrate language or languages spoken in the area at the time Greeks migrated to their present-day homeland. The language of Linear A would be a likely candidate for such substrate. If Linear A were a Semitic language, then we should already be able to establish Semitic etymologies for those words as they were in Greek. Of course it could also be the case that these words came from an another language which did not adopt writing or its writing did not survive to our times.
tamarru 23 hours ago [-]
Ciao. I'm Tom di Mino, and I'm on vacation in Bellingham, Washington right now. I'll get back to you later with a formal response.
I've also reached out to Dr. Ester Salgarella, so I'm familiar with attempts to apply computational analysis to the corpus, and where previous efforts erred.
stratocumulus0 23 hours ago [-]
Always glad to exchange! I'm a software engineer and a hobby linguist only myself, so don't expect wonders from me. But this is a fun topic to research for sure.
I'm not an expert on linguistics, but I will say that Crete at that time was polylingual. No one is saying that everyone on the island spoke this Minoan semitic language; only the semitic people on the island, and it was a diverse population.
YeGoblynQueenne 18 hours ago [-]
The name of Cyprus being of semitic origin is probably easy to hand-wave away as the result of trade.
I'd like to offer some evidence that the people of Crete were of Greek origin and therefore Indo-European rather than semitic, unfortunately all the scholarship I can find on the subject is from Greek scholars and since it confirms that the Minoans are genetically related to modern Greeks, the more I hear of that evidence the less I am convinced by it. Because it's exactly consistent with confirmation bias. So I would not be surprised if the Minoans turned out to be one of the lost tribes of Israel.
Except of course we know those turned up in the Americas so they can't be the Minoans.
The serious bit is that as soon as you make claims about who is from where and connected to what ancient people, you lose. It's impossible to disentangle peoples' nationalism and identity politics from whatever facts. I'm speaking in this as a Greek myself. Did you know that the Greek language is not, actually, an Indo-European language, but predates it by severeral hundreds of thousands of years, and has influenced every language you can find on every continent, including but not limited to the languages of the pre-Columbian civilisations? True story. Evidence: plenty! Consider https://en.wikipedia.org/wiki/Xochicalco Obviously that is the temple of the Goddess Kali in the country side ("Ο ναός της Θεάς Κάλι στην Εξοχή". Εξοχή-Κάλι-κο, Ξοχικάλκο!). I have actually read that in a book someone handed to me when I was a teenager. I had to put the book down after that.
tl;dr people get really crazy when it comes to their ancient history and lose the ability to think straight and derive sound conclusions from facts.
mcswell 20 hours ago [-]
"...syllables which were distinguished by the Linear A language, but not by Greek - such as aspirated/unaspirated P." Given that aspirated and unaspirated voiceless stops were almost certainly distinguished in spoken Greek of the time (as they were in proto-IndoEuropean and in later classical Greek), why would the Greeks not have carried over such a distinction if it existed in the Linear A language? It seems much more likely that the distinction did not exist in the Linear A language or script, and that's why it didn't show up in Linear B.
stratocumulus0 13 hours ago [-]
True, I did not think of the pi/phi distinction when writing this. There are other quirks exclusive to Linear B that don't occur in Linear A though, such as three different ways to write the phoneme "A" with no clear pattern on when is which one used.
With regard to the origin of the script, Linear A documents have been dated to earlier times than Linear B. And then, there is also an even earlier hieroglyphic script, but its relation to Linear A has not been established.
_alternator_ 24 hours ago [-]
Thanks for the context; how do you think this impacts plausibility? Presumably the fact that he made progress in a well studied passage is cause for skepticism? What's your take?
stratocumulus0 9 hours ago [-]
I'm not saying that this is implausible, but this is one guess in a sea of many. We were shown a single word substitution with a claim that 300 others match, but no documentation of this person's research. We don't know this person's methodology beyond this single sentence. A naive approach would be to collect a thesaurus of Semitic roots and use automation such as an LLM to match those against instances of words. Sounds plausible in the beginning, but words do not exist independently.
To illustrate how things can go wrong, let's try to prove that English is a Semitic language. Suppose that the source material we have is this sentence:
- "Baker" matches "bVkVr-", "first-born son". We have our anchor now. (V means any vowel).
- "Brought" looks plausibly close to "burāṯ-", "juniper" on our list. So far so good.
- "Bushel" is a good match with "b-š-l", "to be cooked"!
- "Wheat" does not have an exact match. It could be a loanword, for example.
- "Mill" looks like "m-r-r", "bitterness" if we assume lack of written L/R distinction. Again, juniper + cook + bitter is plausible, because juniper can be bitter.
- The meaning of particles will be inferred from the sentence structure.
Okay, let's take a look at our translation! We have "first-born son", "juniper", "cook", (wheat), "bitter". Pretty clear that (wheat) must be the name of a dish here. Therefore, the sentence can be translated as "A bitter juniper dish is being cooked for a first-born son". This even matches the context: the sentence was found in a granary, and it refers to food.
My point here is that with such a small sample size, we can extrapolate the data to mean absolutely anything. With no reference material, we cannot assess the correctness of any translation.
yorwba 23 hours ago [-]
Well, the reasoning in the article is that if you take A-TA-I-*301-WA-JA, keep only W-J and assume *301 starts with N, then you get a claimed Semitic root N-W-Y related to dwelling, except I wonder whether that shouldn't be N-W-H instead https://en.wiktionary.org/wiki/%D7%A0%D7%95%D7%95%D7%94 (Semitic isn't my area) so at best one fifth of one word matches two thirds of another therefore iT mUsT bE sEmItIc. A serious attempt at decipherment should at least try to explain the A-TA-I, or any of the other words in the sentence, for that matter.
tempaccountabcd 22 hours ago [-]
But Greek did distinguish aspirated and unaspirated P, just not in Linear B.
Kosturdistan 1 days ago [-]
A lot of loonies make this claim, but Tom's work is credible enough that it's being reviewed by linguistics experts at Rutgers and Cambridge. Additional validation: his approach produces results. He's translated over 300 words, and that's never been done before, and his solution actually solves some problems in Linear B. Tom is an AI engineer, and Claude Code was key to his work. Disclosures: I know Tom socially, and I wrote the post at the link.
canjobear 21 hours ago [-]
> Tom's work is credible enough that it's being reviewed by linguistics experts at Rutgers and Cambridge
What does this mean? Like he e-mailed it to some people at Rutgers and Cambridge? Or it's under some kind of non-anonymous peer review?
kubb 1 days ago [-]
Let's wait until it's been verified.
mikestorrent 1 days ago [-]
You're absolutely right! We've opened a ticket with the Linear A folks, hopefully they'll get back to us soon with an update as to whether we've got it correct or not. Hang tight!
kridsdale1 1 days ago [-]
This comment sure is load bearing.
mikestorrent 1 days ago [-]
It's the veritable smoking gun
TeMPOraL 1 days ago [-]
Regardless, we should stand ready, loaded for bear.
saagarjha 1 days ago [-]
A Linear ticket, hopefully
gus_massa 23 hours ago [-]
I agree. The post has too few information. Also
>> reviewed by linguistics experts at Rutgers and Cambridge.
Here in Argentina, near 2005, we had like 5 guys that claimed to have 5 independent solutions of the Goldbach Conjeture. Each one got a PhD student that volunteer to read it, discussed the obvious problems with the author, tried to help to solve them and after a few months of back and forth they concluded that none of the solutions were correct or has an interesting insight. Nobody was surprised about the that, but some wanted to give them a try.
Until there is a official report by Rutgers or Cambridge, it doesn't mean too much.
>> He's translated over 300 words
Where is the table of translations?
Kosturdistan 23 hours ago [-]
Skepticism is appropriate until the experts bless the work. I will point out however that all of the words Tom has translated provide strong support for his proposed phonetic values. And that's why I published the information prior to confirmation, along with the appropriate caveats.
evilfred 16 hours ago [-]
he has apparently translated one word according to the only documentation we have. "my untrained amateur friend with no experience in the field has solved a hundreds of years old puzzle. no, trust me!" gets old real quick.
baq 21 hours ago [-]
You’re right to push back.
bawolff 1 days ago [-]
How does an expert even verify something like this?
red_admiral 23 hours ago [-]
They verified Linear B against a new tablet that turned up in a dig after the Kober/Ventris* solution had been published. It had pictures of jars with no or one or two handles, and the claimed Linear B for "two handled jar" and such next to the correct picture.
* Ventris' publication, but given Kober's contribution to the work they should really share equal credit. I like to think Kober would have got there on her own if she had access to the larger corpus that Ventris had (the Pylos tablets) and a comparable amount of free time and money available.
canjobear 21 hours ago [-]
You can evaluate the logic for the decipherment step by step and make sure all the claims are justified. But the best test is to try the proposed decipherment against some new text and see if it makes sense. In the case of Linear A and the other remaining undeciphered scripts, there's not a lot of held-out text to test against, so it's tricky.
Kosturdistan 23 hours ago [-]
You look at the proposed sound values and compare it to other known languages. Languages from the same family share grammar and vocabulary.
I mean it's not like anyone could objectively go back in time and query ancient civilizations for what they meant, but presumably it means the verification heuristics, they have currently, pragmatic success, and expert solidarity means that it is "verified"
yorwba 1 days ago [-]
Then why is there no link to the actual write-up?
Kosturdistan 1 days ago [-]
The only write-up at the moment is my blog post, hopefully that changes in the coming weeks.
Sniffnoy 24 hours ago [-]
The blog post mentions a draft of a manuscript though. I was expecting something like a preprint. He's not willing to post that draft yet?
Kosturdistan 23 hours ago [-]
I have seen and read his draft article, he's not comfortable sharing it publicly yet since it's being reviewed by experts.
ahknight 23 hours ago [-]
But the Internet nerds wish to blindly judge something they know nothing about so they can feel better with the assumption that they could have done better somehow. How will they be appeased if the document they will say they have read and understood (without having done either) is not available to point at? How, I ask?!
gus_massa 17 hours ago [-]
After reading too many post in HN I got two conclusions:
1) Many preprints are bad, incredible bad. I read a lot of posts about ivermectine during 2020 and the errors were obvious. Like no control groups, the control group is a bunch of unrelated guys in another city, and a weird articles that split the 20+20 cases in 10 bins with 2+2 cases in each. They had a lot of error that were easy to spot without being a medical doctor. (Ctrl+F exclusions, you may get a surprise.) (And don't get me started with Chlorine Dioxide.)
2) Perpetual mobile and mass less drive reappear every few years. I definetively can read most of them. The most interesting part is the totally broken explanation of why this new version does not break the laws of physics.
3) HN has a lot of users specialized in niche topic. A few weeks ago I wrote a comment with a joke: "the list of text transformation to allow a Spanish speaker to read German enters in a napkin" (for example v->f and w->v and a few more). Someone was surprised because s/he knows that German has more phonemes than English that has more phonemes than Spanish. There is someone wandering here that really knows about phonetics.
So, I want to see a preprint. Perhaps I can read it, perhaps someone else can read it, perhaps we have to wait a few days until someone writes a nice blog post and debunks it, perhaps it's correct.
qustio 20 hours ago [-]
It's entirely reasonable to ask for the underlying research in response to a blog post hyping up an unproven claim in an area notoriously full of amateurs making the same claim that historically fail to stand up to scrutiny.
Particularly when the only source is a friend of the author, posting on a blog named "AI Clambake" about "A weekly, human-powered newsletter for advertising folks who want to stay on top of the AI mayhem" and not a publication with any credibility in linguistics.
None of that means it can't be true, but some basic skepticism is warranted here. Otherwise we end up in a situation like the LK99 room temperature superconductor where a lot of HN commenters were also upset at the cynical "downers" who just couldn't root for a good thing/progress.
evilfred 16 hours ago [-]
exactly, thank you
GavinMcG 1 days ago [-]
Presumably because it hasn’t yet been published?
1 days ago [-]
m0llusk 1 days ago [-]
It seems this is still extremely early in the process. There is an apparent finding that was shared. Evidence which would be the basis for a paper is "being reviewed by linguistics experts at Rutgers and Cambridge". So they are trying to do the right thing by talking about what they believe they have done but holding off publication and serious claims until later. The general idea that written forms can be categorized by systems built with Claude could be applied to other as yet undecipherable languages could be used by other interested investigators just with what is discussed here.
sillysaurusx 1 days ago [-]
> The general idea that written forms can be categorized by systems built with Claude could be applied to other as yet undecipherable languages could be used by other interested investigators just with what is discussed here.
Could you rephrase this or explain it more thoroughly? I don’t follow. What does it mean to categorize a written form by systems built with Claude?
tyingq 1 days ago [-]
The same pattern/tech is generic enough that it might be able to solve other unrelated, and so-far undecipherable, written languages.
kelseyfrog 1 days ago [-]
You can use Claude, like the author, to reproduce the result.
_verandaguy 1 days ago [-]
This isn't really a reasonable approach, is it?
The original prompts aren't provided, nor is the original context; even then, you can't really treat a stochastic system like an LLM as a major component in reproducibility.
cake-rusk 14 minutes ago [-]
Isn't this like the P vs NP problems? Once you have a solution it is easy to verify?
Kosturdistan 1 days ago [-]
Claude code was used to organize the material and to run simulations. The simulations were to determine the likelihood that the text was Semitic vs Tom got lucky. Tom has assigned probabilities to each of the syllables he has proposed sound values for.
peterfirefly 21 hours ago [-]
I think I caught this guy's reddit posts on the subject. Someone was playing around with statistical analyses of a big Linear A corpus + some other corpora. There was an extremely clear signal that Linear A seemed to be much more similar to one other corpus than to the others. This was the first time I've ever heard of something that might* have been a good hint for decipherment. There's a Dutch professor emeritus (in linguistics) who claims it is Hurrian-Urartian and he's been posting youtube videos about his "decipherment" but he didn't seem too convincing to me.
Claude helped write code to read and parse the corpora and to do some fairly basic statistical analysis along the lines of "which Linear A symbols most often occur together" and "if we use known Linear B sound values, which of the other corpora most often have vowel similarities with the Linear A corpus".
You can write that code yourself or you can ask an LLVM to write it for you. The provenience of the code isn't important.
*) He later deleted some of them, I think. What was still there on reddit a few weeks ago had dead links to a web site of his with statistical tables and I believe also code.
petjuh 6 hours ago [-]
Ok, but Hurro-Urartian is an agglutinative language family that is an isolate and has nothing to do with Semitic.
ben_w 1 days ago [-]
> even then, you can't really treat a stochastic system like an LLM as a major component in reproducibility.
If you had the other things, being "stochastic" is not even remotely a show-stopper. Stochastic processes abound and are the reason the mathematics of statistics was developed in the first place, ultimately allowing us to create such things as LLMs.
When all the relevant steps gets published, I absolutely expect a lot of people to (attempt to) reproduce this work even though LLMs are stochastic.
_verandaguy 1 days ago [-]
My issue with this is that it's a form of "soft" reproducibility, where it'll work for many (maybe even most!) people, but that depends on the way the original prompt was formulated (read on) and the state of the random noise in the system.
On the prompt formulation; prompts with very similar formulations (in terms of both semantics, hamming distance, or both) can lead to _wildly divergent_ outputs in my experience. It's not rigourous, and when that divergence happens, it's extremely difficult (arguably impossible, by nature of the architecture of transformers) to identify why the divergence happened and where.
fragmede 1 days ago [-]
Sure it is. We're humans, not robots (well, I think I am, and I presume you are as well, but for all we know, we could be living in a simulation), so if the non-deterministic system decides to generate code that calls the variable foo one day and bar the next, as long as the code still does what's being asked of it, why do I care that the non deterministic system chose to call the variable something different when run on Tuesday? There's the computer science definition of determinism and the engineering result of "does it work", which are at odds. It's like the halting problem. We haven't solved the computer science definition of the halting problem, but give some C code with a loop that won't terminate to Claude, and it'll call that out as not halting.
_verandaguy 1 days ago [-]
All things aside, I think this misses the forest for the trees on the halting problem.
It's not about being able to throw claude or codex at a loop and having it evaluate it for halting, it's about being able to do this for arbitrary code. Computer science rigourously defines the halting problem as not computable and undecidable. within the framework of using something akin to static analysis using any deterministic Turing machine.
There's not really a question of "solving" the halting problem like there's some as-yet unknown way of generally figuring out if arbitraty code halts. Turing proposed a proof in 1937 in favour of undecidability of what we now know as the halting problem, building on ideas first articulated by Church a few years prior.
Frankly, if anything, it's reasonable to say that the halting problem's been solved, just in the direction of undecidability rather than decidability.
Anyway, back to LLMs; as code gets more complex, the robot will need a bigger context window, more hardware resources, and more time, all of which will be variable due to the noise inherent in the system. It'll be difficult to put a useful upper and lower bound on how much computing power and time it'll take to figure out if a program ever halts. Which is all a bit moot, frankly, in the context of halting, but useful to keep in mind in the more general context of using these things as analysis tools.
23 hours ago [-]
iwontberude 1 days ago [-]
Actually it is because Claude did the work and being a lay person isn’t really that high of a bar.
Kosturdistan 1 days ago [-]
Claude helped, but did not do the work. This was a human dude who had a very helpful assist from Claude
TeMPOraL 1 days ago [-]
> stochastic system
Every day when you lower your butt onto your chair, you trust a stochastic system enough to assume you'll rest on the chair safely and not spontaneously phase through, which would lead to rather gory and painful terminal experience.
Physics at macro scale is stochastic, which is a good reminder that stochastic != uniformly random. Expected distributions matter.
ben_w 23 hours ago [-]
While strictly true, QM has such small standard deviations as to be irrelevant on the macro for things like bums and chairs.
IMO a better example would be the stochastic nature of quality control in manufacturing.
TeMPOraL 23 hours ago [-]
> QM has such small standard deviations as to be irrelevant on the macro for things like bums and chairs
I was going to segue into thermodynamics as a backup example, but you made me think of something better.
> IMO a better example would be the stochastic nature of quality control in manufacturing.
How about, more specifically, food manufacturing? Or maybe, let's talk about cooking?
Cooking is as stochastic as it gets, and we handle it fine. It could be better - the better version is called "chemical process engineering", it's what cooking looks like when you care about quality and consistency of output, and can afford the equipment and process actually necessary for it. Regular people don't (i.e. neither care, nor can afford) - we call this cooking. It's an art, not a science, and people not only do it, but love it, and tie their identities to it, and build businesses around it, and a culture that embraces all the compromises (and calls the more serious approach "unhealthy").
ben_w 22 hours ago [-]
> Cooking is as stochastic as it gets, and we handle it fine. It could be better
My attempts at making bread have been too stochastic, in that it hardly ever produces nice results.
But yes. Eyeballing how much dried herbs to put in my dishes because I like what 2-isopropyl-5-methylphenol does for them. Usually it works, sometimes it's just a bit too Italian.
TeMPOraL 22 hours ago [-]
Might not be the amount, you may have not controlled for humidity or temperature (wink wink), or just that the timer on your oven is off by one minute per every ten minutes, and its bang-bang thermostat never actually reaches the temperature you set on the panel, and...
... in some sense, it's a miracle most people deal with this kind of bullshit without complaining much.
(Probably because they don't realize it's something to complain about. It's just how things are.)
tadfisher 21 hours ago [-]
There are too many value judgments in this post. You can "cook" like "regular people" do, and be completely serious, and apply chemical and physical knowledge in doing so, and test the output for quality; generally that's what restaurant chefs do. It doesn't make sense to cook like you're tooling an assembly line, because you aren't cost-optimizing and packaging a product that needs to sit on a store shelf for weeks, months, or years while maintaining its desired qualities.
Speaking generally, food produced though "chemical process engineering" (a.k.a. factories) must compromise on many axes, one of them being nutritional content. We intuitively do not care about several of these dimensions when cooking food with fresh ingredients, at least not at the scale of, say, Kellogg's or General Mills.
Maybe that's evidence of accepting a stochastic process in our daily lives, but you're kind of selling the tradition and science of cooking short when you argue that factory-produced food is a "more serious approach".
atrus 1 days ago [-]
somehow I suspect it was a bit more involved than: Claude, please solve Linear A.
fragmede 1 days ago [-]
A little bit more. If you ask ChatGPT to "solve linear a" it thinks you mean linear algebra. If you specify that it's the Minoan translation problem, you get a table similar to the one that we get a glimpse of in the without access to the paper, we can't say how much more work the paper has than my gist.
The 'major insight' described in the article predates Fable's release by two week four days. It would be a complicated timeline.
grey-area 1 days ago [-]
Amazing work and refreshing to see a well written and cogent post to summarise it. Would love to hear more about how he used Claude to help solve the puzzle.
1 days ago [-]
1 days ago [-]
dwroberts 1 days ago [-]
You know him socially but is there a reason you’re writing this rather than him? It looks like he has his own web presence.
Cynical read would be you’re stealing his thunder a bit by prematurely announcing this before it’s fully confirmed
Kosturdistan 1 days ago [-]
Tom knows I'm a freelance writer and decided to give me the scoop. He's more interested in linguistics than he is in journalism.
jstanley 1 days ago [-]
Promoting your friends' work is hardly stealing their thunder. It's increasing their thunder!
Conscat 1 days ago [-]
Isn't it customary for the author of a post shared on HN to leave a comment on the thread?
dwroberts 1 days ago [-]
I’m not referring to the parent comment: The post is not written by the author of the claimed breakthrough.
iwontberude 1 days ago [-]
What thunder? Claude did the work and used a human to interface with experience and causality better.
Kosturdistan 1 days ago [-]
Claude helped, it did not do the work. It would have taken Tom more time to crack on his own, and it would have been harder, but the key insights were Tom's not Claude's.
ben_w 1 days ago [-]
The thunder is as per the headline. Assuming it passes review.
One of the things I find weird with AI is how the dismissals of work that involve AI splits into two camps: like yours, saying the AI did the work while the human played no role and deserves no credit; and those saying the AI rips off its training data while the human using it played no role and deserves no credit.
iwontberude 1 days ago [-]
I exist in both camps. Claude can’t launder human achievement into a different person. Claude stole it, but it’s still in Claude’s possession and is not transferable in any durable sense.
ben_w 1 days ago [-]
> Claude stole it, but it’s still in Claude’s possession and is not transferable in any durable sense.
No human, individually or as a team, has been able to solve this to date.
To the extent this was Claude solving it itself and thus denying Di Mino any thunder, there was nobody to have stolen anything from. To the extent he has thunder to be stolen, it wasn't ever in Claude's possession.
iwontberude 8 hours ago [-]
Actually it’s the flattening of achievement to be some guy, Claude or a combination of both when really it was neither.
ben_w 2 hours ago [-]
"Flattening of achievement" is true for the overwhelming majority of people named and revered for their genius. Shakespeare didn't invent many of his plot lines, Einstein used Lorentz' transform (amongst other things) etc.
As reported (I have no skin in this), Di Mino appears to have used Claude to write tool to perform statistical analysis to test an idea he had, and in such cases as this it seems to me fairer to praise the human using the machine than to praise a director for the films acted, filmed, written, and edited by others.
Either all information is stolen, or none is. Can't have it both ways.
blackqueeriroh 21 hours ago [-]
I’m confused; do you actually believe a large language model named “Claude” went out and hoovered up all the information about this Minoan problem?
If not, how did “Claude” steal anything?
simonw 23 hours ago [-]
> Di Mino used Claude Code to build a suite of Python scripts that query, cross-reference, and organize the digitized Linear A corpus (drawn from the GORILA and SigLA databases), enabling systematic hypothesis testing at a scale that would have been impractical to do manually.
That's exactly the kind of thing I'd hope Claude would be used for in these kinds of projects - building tools, not black-box "solving" the problem.
xeonmc 22 hours ago [-]
If it had been a proper developer he would've been nerd-sniped into yak-shaving those tools and never get the original work done.
topaz0 21 hours ago [-]
If you look at his GitHub it does seem like he's obsessed with the tools
neonstatic 22 hours ago [-]
you don't know that
Tuna-Fish 1 days ago [-]
The reason linear A is so difficult is that the total remaining corpus of Linear A text is ~7500 characters, spread out over ~1500 inscriptions.
If you have a 4k screen, you can fit all remaining Linear A text on your screen at once, in 14pt high font.
stratocumulus0 1 days ago [-]
An in addition to that, a vast majority of documents are lists which consist of a "header" (1 to 3 words) and word-number pairs afterwards. An another common class are small clay seals with 1, 2 characters carved into them. It's likely that in both cases, we may be dealing with abbreviations.
Some of the lists end with "ku-ro" and a number that's the sum of all the previous numbers, oddly frequently off by one.
humodz 1 days ago [-]
It would be amusing if archaeologists in the future also end up spending countless hours trying to decipher my shopping lists and poor math skills
neonstatic 21 hours ago [-]
Imagine if the first archeological discovery they made was tax forms from different countries. What would they think of us, haha.
_kst_ 1 days ago [-]
They hadn't yet decided whether to count from 0 or from 1.
cwmma 1 days ago [-]
Surprisingly this comes up more then you'd think, for instance in Ancient Rome, tomorrow is two days away so all the dates are off by one from what you'd think it was. They mainly count down and it goes, 5, 4, 3, day before, day.
codebaobab 23 hours ago [-]
I noticed that when I read Tom Holland's new translation of The Lives of the Caesars. All the dates were in the form "N days before Kalends/Ides".
mcswell 20 hours ago [-]
I think it comes up in the Gospels, too, e.g. "on the third day" after the resurrection.
kps 24 hours ago [-]
“Should array indices start at 0 or 1? My compromise of 0.5 was rejected without, I thought, proper consideration.” — Stan Kelly-Bootle (first person to obtain a postgraduate degree in computer science)
red_admiral 23 hours ago [-]
ku-ro obviously means "carry in" :)
vidarh 23 hours ago [-]
My French teacher told me a story of a Norwegian man who married a French woman. A few months after she'd moved to Norway, my French teacher had come to visit thrm.
When she was leaving, the woman said "pose, pose". My French teacher was puzzled, and asked why she'd said that, and the woman asked if it didn't mean "au revoir" in Norwegian?
Because it was what the cashier at the grocery store said to her every time.
It means (carrier) bag.
YeGoblynQueenne 20 hours ago [-]
That's one of the reasons. Another, and more important one, is that we don't know the language that the script transcribes. The claim above is that it's Hebrew.
I have no idea why Minoans would speak Hebrew, there's no indication as far as I'm aware of extensive cultural exchange between the Minoan civ and Hebrew-speaking people, but there's a very clear hierarchy of difficulty to translate dead scripts. From easier to harder:
a) We know what language the script transcribes and how the script transcribes it (e.g. what symbol means what word or sound).
b1) We don't what language the script transcribes but we know how the script transcribes it (e.g. it's a syllabary or an abjad etc).
b2) We know what language the script transcribes but we don't know how the script transcribes it (e.g. Egyptian hieroglyphics).
c) We don't know what language the script transcribes nor do we know how it transcribes it.
b1) and b2) are more or less of similar difficulty.
Linear A goes to category c) above. We know next to nothing about the script or the language, other than the fact the former was reused in linear B to transcribe Mycenean Greek.
Tuna-Fish 18 hours ago [-]
Semitic, not Hebrew. Hebrew is one language in the semitic group, alongside Arabic, Amharic and many more. They were much more spread out in the west before the iron age, with most people in Asia Minor belonging to the group. Some of the earliest states used the languages, and they spread alongside the idea of states.
YeGoblynQueenne 10 hours ago [-]
You're right of course but the article above is unclear about whether the claim is that Linear A transcribes Hebrew or a different Semitic language.
First it says that the language of Linear A is a semitic language that is a precursor to biblical Hebrew:
>> Di Mino believes that Linear A belongs to an extinct Semitic language that was a precursor to biblical Hebrew, the way that Latin is a precursor to Italian.
But then it compares the Linear A language directly to Hebrew. For example:
>> Once deciphered, Di Mino saw that the prayer was similar to subsequent Hebrew prayers but was addressed to a Goddess.
So maybe I'm confused. The claim that the Minoans spoke a semitic language sounds less odd than claiming they spoke straight-up biblical Hebrew.
dehrmann 1 days ago [-]
Very vaguely, it makes it like a one-time pad where it can be anything you want it to be. Not quite, but so little text leaves a lot of options open.
AaronAPU 1 days ago [-]
I wonder, is there a form of analysis which lets you quantify how ambiguous a set of symbols is? Maybe related to entropy?
Obviously one symbol can mean literally anything, but you could also have very long strings of symbols with many different meanings.
red_admiral 23 hours ago [-]
Yes. Somewhere in Claude Shannon's work, called the "unicity distance".
WithinReason 1 days ago [-]
As observed by archaeologist John Younger, the entire Linear A corpus takes up only 1.84 pages of letter paper when typeset in 12 point font and 1-inch margins.
elbasti 24 hours ago [-]
I would love to have this image available!
tclancy 24 hours ago [-]
I’d send it to you but you probably wouldn’t understand it.
stringfood 1 days ago [-]
when I first read the title thought he was talking about linear algebra and I was like damn it's not that hard
petjuh 21 hours ago [-]
From what I know, the main issue is that the Linear A script corpus is rather small. Another commenter here said it's only 7500 symbols in total, spread around 1500 inscriptions (so on average 5 symbols per inscription).
The other thing I find odd, however, is that it's found to be a Semitic language. If it's a Semitic language, I would have expected it to already have been deciphered. And certainly linguists would have already looked at Semitic languages, and looked hard.
Also if it were a Semitic language, why wasn't it consonantal but had vowels? Usually Semitic languages (and Egyptian maybe) write only the consonants because their stems are made three consonants and vowels are interweaved to make words.
Example semitic root K-T-B and how vowels are added in-between to form words:
kataba – He wrote
yaktubu – He writes / is writing
kitāb – A book
kutub – Books
kātib – A writer / scribe / clerk
maktūb – Written / fate
maktab – An office / desk
maktabah – A library / bookstore
And another such root - D-R-S which means "studying" or "learning."
darasa – He studied
yadrusu – He studies / is studying
dirāsah – A study / school course
dāris – A student / learner
madrūs – Studied / carefully planned
madrasah – A school
This system of triliteral roots is the reason why usually Semitic languages don't use vowels. Why would Linear A have consonant+vowel syllabary if it were semitic?
mcswell 20 hours ago [-]
Your post shows exactly why it would be useful to write vowels in Semitic languages: to distinguish among the different tenses, passive/active, nominalizations etc.---in other words, to distinguish the various words that happen to be based on a single root.
16 hours ago [-]
loudmax 1 days ago [-]
This is very exciting. Congrats to Tom on the accomplishment.
To be clear, this is an attempt at a decipherment. This is not proven, and we shouldn't consider Linear A to be "solved" until experts in the field have reviewed the work. In fact, it probably shouldn't be considered "proof" unless some more Linear A writings are uncovered and these are congruent with the method proposed. All that can be said for certain at this point is that this is an interesting conjecture.
But this is a story worth following. This could be the real deal. More research and validation should follow and we should have a better idea in the next few weeks or months whether Linear A has really been solved. At the very least, this is an interesting attempt, and optimistically, it could yield real insight into Minoan culture. Kudos.
Kosturdistan 1 days ago [-]
Thanks, I hope Tom is right, but now it's in the hands of the pros.
cwmma 1 days ago [-]
Isn't a big problem with Linear A that there are so few symbols you can "solve" it relatively straightforwardly with no way to tell if you it's correct or not?
Kosturdistan 23 hours ago [-]
The lack of discovered inscriptions does make deciphering it harder, but it is possible!
singularity2001 22 hours ago [-]
If it turns out to be true, it would open the door a bit for connecting Indo-European languages with Semitic languages. In the beginning of the last century it was believed that these were related. Later this came out of vogue. How could they have been so wrong initially? Because both languages families were entangled, as now there is genetic evidence that both languages spread from very close to the Caucasus. It's probably old news for most but in the last 15 years it became clear that Europe was completely resettled, once by Anatolians and then partly by Indo-Europeans. The language of the Anatolians is still unknown.
mcswell 20 hours ago [-]
"How could they have been so wrong initially?" For the same reason that many other proposed reconstructions were (and some still are) so wrong. Chance matches, sometimes actual loanwords, and bias on the part of some people.
singularity2001 22 hours ago [-]
Wait, I've seen the same libation formula appearing in the Phaistos disc. For those 10 of you who have the fonts installed:
𐇑 𐇘 𐇪 𐇐 | 𐇬 𐇳 𐇖 𐇗𐇽 | 𐇬 𐇗 𐇜 | 𐇬 𐇼 𐇖𐇽 | 𐇥 𐇬 𐇳 𐇖 𐇗𐇽 | 𐇪 𐇱 𐇦 𐇨 | 𐇖 𐇡 𐇲 | 𐇖 𐇼 𐇖𐇽 | 𐇖 𐇡 𐇲 | 𐇥 𐇬 𐇳 𐇖 𐇗𐇽
i-𐇘-wi-jeʳ | ʰau-ni-ti-noʳ au-no-pa au-ndi-tiʳ 𐇥-au-ni-ti-noʳ wa-pi-naᵐwa ti-ru-te ti-nd-tri ti-na-ru-he ʰau-ni-ti-noʳ
i-301-wa-ja/e | ʰau-... jaᵘ-di-ki-to i-pi-na-ma si-ru-te ta-na-ra te-ti-u ta-na-te i-da
𐘚 ᴴI 𐘮 WA 𐘱 JA 𐘱 JA 𐘆 DI 𐘸 KI 𐘹 TU 𐘚 ᴴI 𐘢 PI 𐘅 NA 𐙁 MA ()
I believe the phonetic values for Phaistos here were based on similarity.
em-bee 16 hours ago [-]
i had the fonts for the phaistos disk but not linear a. i have no idea why i had that one even installed, but now i have both :-) only that last character () is still missing. which font is that?
rich_sasha 23 hours ago [-]
Gotta love the nominative determinism: Tom Di Mino ("of Mino"?) cracks a Minoan language.
mNovak 1 days ago [-]
Interesting writeup. Would be nice to have a couple images of Linear A/B scripts to visualize. Looking on google, they're very daunting!
Blahah 1 days ago [-]
lineara.xyz is your friend
fittingopposite 11 hours ago [-]
How do they know how this language was read?
teleforce 20 hours ago [-]
>Di Mino believes that Linear A belongs to an extinct Semitic language that was a precursor to biblical Hebrew, the way that Latin is a precursor to Italian.
Indus valley script is about 1500 years earlier than Linear A and I hope we can also decipher Indus script using AI or not [1]. It's well overdue although from statistical profiling it's has been proven to be a valid linguistic script believed to be being used for writing system the ancient Harappan language, the likely precursor of modern Dravidian languages for examples Telegus and Tamil.
The main reason it's very difficult to decipher is that there's no equivalent Rosetta Stone for Indus script. My hypothesis is that the AI LLM model can be trained or tuned as the logical or virtual version of the venerable Rosetta stone hence can be used to decipher ancient writing system.
I wonder how you would even know if you have “cracked” it, given the corpus is so small?
Kosturdistan 23 hours ago [-]
You know you have cracked it because using the proposed system you are able to translate the uncracked language. Also helpful if your proposed system for Linear A makes sense relative to related languages that are not Linear A. Tom's proposed phonetic values work for more than one language.
bazoom42 23 hours ago [-]
Not questioning the particular finding, just wondering in general. E.g when linear B or hieroglyphs were cracked, you could check angainst other untranslated texts and see if the translation still made sense.
18 hours ago [-]
evilfred 23 hours ago [-]
i'm gonna write a blog post now about how my buddy discovered cold fusion and will have a paper out real soon now
Can I get his decipher-forgotten-ancient-text skill? I want to try my hand at the Voynich Manuscript
tlogan 22 hours ago [-]
Ok. But where is the table of translations?
WhitneyLand 1 days ago [-]
If confirmed this is really cool and impressive work.
Honestly curious how many years before it can be one shotted in a coding harness with Fable.next by someone who’s not a linguistics expert.
Develop, test, and rank hypotheses about the phonetic values, morphology, grammar, and possible language family of Linear A using the full available corpus. Do not assume any decipherment is correct. Treat all candidate readings as hypotheses to be scored…”
danishanish 23 hours ago [-]
I don’t imagine a model capable of the first part would require being told not to assume a decipherment is correct
yorwba 22 hours ago [-]
Humans capable of the first part regularly make the mistake of assuming that their decipherment is correct even if it's not internally consistent and fails to adequately explain most of the data.
vb-8448 1 days ago [-]
I wonder if LLMs trained specifically for this purpose can perform well with "forgotten languages".
I know I'm simplifying a lot, but all this deciphering isn't it just some kind of pattern matching?
however, nawaya or what ever examples around it are not part of the Hebrew language.
atdt 22 hours ago [-]
You're incorrect. The letter ו (vav) is pronounced /v/ in modern Hebrew but /w/ in Biblical Hebrew and other ancient Semitic languages, which is why the letter is referred to as Waw and transliterated as 'W'.
Don't know about the situation for this particular example, but keep in mind this type of analysis will necessarily involve extremely archaic dialects of all the involved languages
rw_panic0_0 1 days ago [-]
would like to hear more about Tom's learning/education path in ML/AI.
Kosturdistan 1 days ago [-]
I haven't talked to him extensively about how he learned his engineering skills, but he is I believe 100% self taught. His background is in copywriting.
OutOfHere 1 days ago [-]
Is this extendible to a generalizable approach to translate any language pair (without a translation map or translation dataset)?
retrac 1 days ago [-]
I think it is an open question: can an unknown language be cracked -- without any dictionary or grammar or understanding of the language? Just lots and lots of texts, maybe some of it bilingual.
It's a common misconception that is what happened with Ancient Egyptian with the Rosetta Stone. The Rosetta Stone was just one of the big pieces of the puzzle. The decoding came when people realized that Coptic (a language written alphabetically and still in use in the Coptic Church today) is actually descended from Ancient Egyptian; as Spanish is to Latin, Coptic is to Ancient Egyptian.
Similarly the attempts to decode classical Maya were all dead ends. Until Yuri Knorozov realized that it encoded the ancestor of the Maya languages which are still spoken to this day. (Knorozov's Wikipedia article is worth checking out just for his photo with his cat. [0] IMHO.)
I have written before about the La Mojarra 1 stele in Mexico [1]. It looks a lot like Maya. [2] But it isn't Maya. Maybe the difference like between Russian and Latin writing?
No one can read it. It's undecipherable. There are some attempts to identify it with a proposed ancient language that would have been related to the modern Mixe-Zoque languages: some of the glyphs that are shared with Maya, when read phonetically, start sounding like a Mixe-Zoque language. But no one has proposed a confident decipherment. There probably isn't enough text. La Mojarra 1 is the only long example of the Isthmian script.
Deciphering Akkadian was very difficult, at first. The process started with Persian; old Persian was written in a simplified adapted form of the Mesopotamian cuneiform (wedges on clay). It was a kind of alphabet. And Old Persian was already understood. And there was a bilingual text on a monument carved by Darius I. But even then -- decoding relies so heavily on the fact that Akkadian is a Semitic language distantly related to Hebrew, more distantly, also Ancient Egyptian. So again, we sort of knew what we were looking for.
That is all to say: even if the Voynich manuscript (for example) contains real text in an otherwise completely lost language, I'm not sure it is possible even theoretically to translate it.
Off topic, but that photo is amazing, and got a good laugh out of me. It definitely falls in the "pets and owners who look alike" category.
Kosturdistan 1 days ago [-]
Tom thinks he may be able to use his approach to crack more languages, but that's not confirmed.
sejje 22 hours ago [-]
Are there many uncracked languages out there?
SoftTalker 1 days ago [-]
Towards the Star Trek universal translator.....
1 days ago [-]
WalterBright 22 hours ago [-]
Amateurs! I've already translated it:
"Thag is a smarty-pants"
pfdietz 23 hours ago [-]
Now get to work on Harappan.
YeGoblynQueenne 19 hours ago [-]
The author and their friend are in the thread so I'll try to not be mean.
Caveat: I'm Greek so a kind of natural amateur historian. That is to say I grew up reading about the prehistory and ancient history of Greece, as one does when one is born Greek and a geek. I've seen the Phaistos disk and linear A inscriptions with mine own eyes in Greek museums and I have dreamed of the day they would be translated. I am not at all unsympathetic to the hopes of a Linear A decipherement.
However. The claimed decipherment has all the hallmarks of imaginative and fanciful attempts to draw parallels between historical events and entities, that were not really connected, many of them notably inspired by the Hebrew bible. For example, remember when the lost tribes of Israel turned out in the New World [1]? Or how Biblical Sodom was actually destroyed by a comet [2]? Or the time that Venus was ejected from Jupiter and caused the Biblical Cataclysm [3]? Or, for less biblical but no less foundational texts of the Western literary canon, remember when Heinrich Schliemann discovered the Jewels of Helen of Troy [4] and the Death Mask of King Agamemnon [5]?
Or of course we could recall any of the claims to decipher the Phaistos Disk [6] or the Dropa Disks from Bayan Kara-Ula [7], and so on I'm sure.
All of the above is not to say that a decipherement is impossible. What it is to say is that it currently isn't possible; because we have no idea what the language that Linear A transcribes even is. It's not like the Minoan language is still spoken today in some far-evolved form, as was the case for e.g. Egyptian or Mayan or indeed ancient Greek [8]. So we have an unknown script, writing an unknown language, and to make matters worse there are no parallel texts with another ancient language that might help us bridge the gap. What there is, is some rudimentary understanding of the more obvious contents of Linear A texts (mostly, lists of goods) and the fact that some Linear A symbols have been reused in Linear B.
But, how were they reused? And what good is that knowledge without knowing anything about the language transcribed by Linear A? I can read German, a language that I don't speak, because I can read Latin script, but the meaning of the script might as well be Greek to me [9].
I'm a computer scientists, I guess, these days [10]. The problem of deciphering Linear A, or the Phaistos Disk, or any other script (that may not even be a script) that transliterates a language that we don't know is a problem of reconstructing information that we don't have, from other information that we don't have. I'm not saying it's completely impossible. I mean, who knows? Maybe we're just missing the right maths. But, what we're really trying to do here is de-noise a message garbled by the passage of time without even a guess as to the language the message is written in. Claude Shannon would tell you that it's a fantasy that is not worth pursuing. You don't have to ask him, you can just read his magnum opus [11] and check out Section 3 titled "The Series of Approximations to English" for an idea of what the mechanics of deciphering a script when the language is known look like with the only technology we have that can do the job reliably.
When Turing and the other Brits at Bletchley Park cracked the Enigma code, they at least knew it was, ultimately, a coded form of German. We may have a lot more compute now, and much more advanced tech overall, but there are some barriers that you cannot physically cross, no matter what resources you have. For example, you can't go faster than light and you can't escape the event horizon of a black hole. In the same way you can't translate text written in an unknown script, encoding an unknown language, without any parallel texts with a known language. There is just not enough information to do the job. Worse, if you try, you can endlessly come up with plausible "translations" and convince yourself that you have the right one, but you have no way to know you do.
I'm sorry but this claim is just a wild guess trying to link Hebrew to Linear A, without any serious evidence that the two are linked and without any evidence that the link is real, other than "look, I can guess what all the texts say!".
[8] I can read ancient Greek. The further back in time it goes, the harder it gets to understand what it means but I can still read the script. It has changed in 5000 years but not enough that I can't read it. Nothing like that ability survived for Linear A. I blame Thera.
[9] Except of course then I would understand it. But it's just German to me.
[10] I can assure you that took me by surprised, first of all.
Sorry but I don’t recognize this as being an achievement by an amateur. This dude had no chance in hell until we trained a model to use his time to suss it out.
jonahx 1 days ago [-]
Assuming this pans out, every other professional linguist in the world has had the option to use Claude or other LLMs, but has not solved this problem, despite the incentives for doing so. It stands to reason the human is adding crucial value.
Kosturdistan 1 days ago [-]
I drilled down on this with Tom. He thinks that it might not have happened without Claude Code, but Claude was used to organize all of the symbols, and to run I think it was 100,000 simulations to assess whether or not he had an actual insight, or if he just randomly got lucky. Claude did NOT crack the code. Significant supporting role though.
BretonForearm 24 hours ago [-]
So Claude Code was used to generate software that ran simulations? I don't think LLMs in and of themselves can execute simulations, esp. a specific, non-single digit count like 100k.
22 hours ago [-]
Kosturdistan 23 hours ago [-]
I don't know if the agent ran the simulations or if the agent built software that ran the simulations. But Claude was used to run the simulations.
slopinthebag 19 hours ago [-]
I feel like that is extremely relevant information. An LLM running "simulations" is nonsense, using an agent (apparently Claude Code) to write simulation software is more realistic. I'm sure you could clarify this since you have a relationship with the person.
Curious to see where this goes. Hopefully an update is posted here when it's all said and done :)
1 hours ago [-]
22 hours ago [-]
fooster 1 days ago [-]
Alot of the comments in this thread are disappointing. Rather that celebrating an achievement (whether or it is validated yet), many of you seem to want to put him down, or make it seem like claude did all the work.
Claiming that claude did all the work is patently ridiculous. Claude is a tool, like any other. The corpus of linear A is ~7500 characters across ~1500 inscriptions and claude, no matter how smart, doesn't just solve that on its own.
What a shame.
evilfred 23 hours ago [-]
this isn't an achievement, it's yet another amateur crank claiming he solved a famous puzzle, without a paper and without any critical review. many people have claimed to decode Linear A before. just because this guy used an LLM doesn't make it more credible
Kosturdistan 23 hours ago [-]
He has a working draft of a manuscript that may form the basis of a scholarly article, it has been shared with experts, and there is an excerpt of the paper in my blog post. I have also seen and read the paper with my own 2 eyes, I can't publish it though, Tom wants to keep that under wraps while it's reviewed by linguistics experts.
BigTTYGothGF 22 hours ago [-]
> He has a working draft of a manuscript that may form the basis of a scholarly article, it has been shared with experts
So did many of the previous attempted solvers.
Kosturdistan 22 hours ago [-]
One way to assess the validity of prior claims is to see how many words you can translate using their proposed system.
qustio 20 hours ago [-]
But the entire corpus for Linear A is tiny, you could backfill a "translation" that has no actual similarity with the real language when properly tested against novel examples. How was this tested?
evilfred 21 hours ago [-]
of course we have no way to assess this claim as there is no public software or paper to review
evilfred 23 hours ago [-]
the info provided is completely unverifiable.
fooster 21 hours ago [-]
"amateur crank claiming"
I don't know why you want to stoop to name calling which violates the guidelines and the spirit of this site.
"without any critical review" is also seemingly untrue: the post says Rutgers and Cambridge are reviewing it
YeGoblynQueenne 19 hours ago [-]
The post says so. What do Rutgers and Cambridge say though?
My following searches turn out no announcements by either Rutgers or Cambridge:
"rutgers linguists evaluate deciphering of linear a by tom di mino"
who at Rutgers and Cambridge? did he just email some random linguistics profs? why not classics? and did they respond out of genuine interest or just trying to let an enthusiastic amateur down gently?
i am sorry but "crank" is the correct term for the many amateurs who routinely email mathematicians, physicists, and apparently linguists with their special theories without having any academic experience in the field. for every anecdote where it panned out there are thousands of cases where it did not.
tennfown 1 days ago [-]
[flagged]
Rendered at 18:35:22 GMT+0000 (Coordinated Universal Time) with Vercel.
- The "Libation Formula", which the author used as the base for his translations, is the most studied piece of writing in Linear A, because it's the only recurring phrase (with grammatical variation) that we have. The corpus is extremely fragmentary, with just a handful of instances of longer text (and even then, the texts are the length of an average sentence in English). The majority of documents available to us are lists (of inventory, personnel, offerings or something of this sort). The longer texts make use of punctuation marks, likely put in between words. This gives us a non-trivial vocabulary, which still does not match that of any known language.
- With such fragmentary remaining material, we cannot be sure that a) all the texts we call "Linear A" are written in the same language, and b) the recognizable words are not abbreviations, for example.
- The author made an assumption that Linear A symbols which have counterparts in Linear B should have the same phonetic values. This gives us an already known glyph that represented "NA". "Duplicate" glyphs are only found in the P-series, and are assumed to represent syllables which were distinguished by the Linear A language, but not by Greek - such as aspirated/unaspirated P. There is a glyph that stands for "NWA" in Linear B, but instances of it have been found in Linear A as well.
- There are countless words with no known etymology in Ancient Greek, assumed to originate from a substrate language or languages spoken in the area at the time Greeks migrated to their present-day homeland. The language of Linear A would be a likely candidate for such substrate. If Linear A were a Semitic language, then we should already be able to establish Semitic etymologies for those words as they were in Greek. Of course it could also be the case that these words came from an another language which did not adopt writing or its writing did not survive to our times.
I've also reached out to Dr. Ester Salgarella, so I'm familiar with attempts to apply computational analysis to the corpus, and where previous efforts erred.
Speaking of Greek, Linear B and Semitic, the related Cypriot syllabary was deciphered thanks to a bilingual inscription in Phoenician and Greek: https://en.wikipedia.org/wiki/Idalion_bilingual And just as in Crete, there is an undeciphered pre-Greek language written in the same script: https://en.wikipedia.org/wiki/Eteocypriot_language
I'd like to offer some evidence that the people of Crete were of Greek origin and therefore Indo-European rather than semitic, unfortunately all the scholarship I can find on the subject is from Greek scholars and since it confirms that the Minoans are genetically related to modern Greeks, the more I hear of that evidence the less I am convinced by it. Because it's exactly consistent with confirmation bias. So I would not be surprised if the Minoans turned out to be one of the lost tribes of Israel.
Except of course we know those turned up in the Americas so they can't be the Minoans.
Yeah it's a joke. See https://en.wikipedia.org/wiki/Jewish_Indian_theory
The serious bit is that as soon as you make claims about who is from where and connected to what ancient people, you lose. It's impossible to disentangle peoples' nationalism and identity politics from whatever facts. I'm speaking in this as a Greek myself. Did you know that the Greek language is not, actually, an Indo-European language, but predates it by severeral hundreds of thousands of years, and has influenced every language you can find on every continent, including but not limited to the languages of the pre-Columbian civilisations? True story. Evidence: plenty! Consider https://en.wikipedia.org/wiki/Xochicalco Obviously that is the temple of the Goddess Kali in the country side ("Ο ναός της Θεάς Κάλι στην Εξοχή". Εξοχή-Κάλι-κο, Ξοχικάλκο!). I have actually read that in a book someone handed to me when I was a teenager. I had to put the book down after that.
tl;dr people get really crazy when it comes to their ancient history and lose the ability to think straight and derive sound conclusions from facts.
With regard to the origin of the script, Linear A documents have been dated to earlier times than Linear B. And then, there is also an even earlier hieroglyphic script, but its relation to Linear A has not been established.
To illustrate how things can go wrong, let's try to prove that English is a Semitic language. Suppose that the source material we have is this sentence:
"A baker brought a bushel of wheat to the mill."
Now let's match it to Semitic roots as found on this list: https://en.wiktionary.org/wiki/Appendix:Proto-Semitic_stems. Let's look for plausible substitutions now:
- "Baker" matches "bVkVr-", "first-born son". We have our anchor now. (V means any vowel).
- "Brought" looks plausibly close to "burāṯ-", "juniper" on our list. So far so good.
- "Bushel" is a good match with "b-š-l", "to be cooked"!
- "Wheat" does not have an exact match. It could be a loanword, for example.
- "Mill" looks like "m-r-r", "bitterness" if we assume lack of written L/R distinction. Again, juniper + cook + bitter is plausible, because juniper can be bitter.
- The meaning of particles will be inferred from the sentence structure.
Okay, let's take a look at our translation! We have "first-born son", "juniper", "cook", (wheat), "bitter". Pretty clear that (wheat) must be the name of a dish here. Therefore, the sentence can be translated as "A bitter juniper dish is being cooked for a first-born son". This even matches the context: the sentence was found in a granary, and it refers to food.
My point here is that with such a small sample size, we can extrapolate the data to mean absolutely anything. With no reference material, we cannot assess the correctness of any translation.
What does this mean? Like he e-mailed it to some people at Rutgers and Cambridge? Or it's under some kind of non-anonymous peer review?
>> reviewed by linguistics experts at Rutgers and Cambridge.
Here in Argentina, near 2005, we had like 5 guys that claimed to have 5 independent solutions of the Goldbach Conjeture. Each one got a PhD student that volunteer to read it, discussed the obvious problems with the author, tried to help to solve them and after a few months of back and forth they concluded that none of the solutions were correct or has an interesting insight. Nobody was surprised about the that, but some wanted to give them a try.
Until there is a official report by Rutgers or Cambridge, it doesn't mean too much.
>> He's translated over 300 words
Where is the table of translations?
* Ventris' publication, but given Kober's contribution to the work they should really share equal credit. I like to think Kober would have got there on her own if she had access to the larger corpus that Ventris had (the Pylos tablets) and a comparable amount of free time and money available.
(Also: the passive voice was used.)
1) Many preprints are bad, incredible bad. I read a lot of posts about ivermectine during 2020 and the errors were obvious. Like no control groups, the control group is a bunch of unrelated guys in another city, and a weird articles that split the 20+20 cases in 10 bins with 2+2 cases in each. They had a lot of error that were easy to spot without being a medical doctor. (Ctrl+F exclusions, you may get a surprise.) (And don't get me started with Chlorine Dioxide.)
2) Perpetual mobile and mass less drive reappear every few years. I definetively can read most of them. The most interesting part is the totally broken explanation of why this new version does not break the laws of physics.
3) HN has a lot of users specialized in niche topic. A few weeks ago I wrote a comment with a joke: "the list of text transformation to allow a Spanish speaker to read German enters in a napkin" (for example v->f and w->v and a few more). Someone was surprised because s/he knows that German has more phonemes than English that has more phonemes than Spanish. There is someone wandering here that really knows about phonetics.
So, I want to see a preprint. Perhaps I can read it, perhaps someone else can read it, perhaps we have to wait a few days until someone writes a nice blog post and debunks it, perhaps it's correct.
Particularly when the only source is a friend of the author, posting on a blog named "AI Clambake" about "A weekly, human-powered newsletter for advertising folks who want to stay on top of the AI mayhem" and not a publication with any credibility in linguistics.
None of that means it can't be true, but some basic skepticism is warranted here. Otherwise we end up in a situation like the LK99 room temperature superconductor where a lot of HN commenters were also upset at the cynical "downers" who just couldn't root for a good thing/progress.
Could you rephrase this or explain it more thoroughly? I don’t follow. What does it mean to categorize a written form by systems built with Claude?
The original prompts aren't provided, nor is the original context; even then, you can't really treat a stochastic system like an LLM as a major component in reproducibility.
Claude helped write code to read and parse the corpora and to do some fairly basic statistical analysis along the lines of "which Linear A symbols most often occur together" and "if we use known Linear B sound values, which of the other corpora most often have vowel similarities with the Linear A corpus".
You can write that code yourself or you can ask an LLVM to write it for you. The provenience of the code isn't important.
*) He later deleted some of them, I think. What was still there on reddit a few weeks ago had dead links to a web site of his with statistical tables and I believe also code.
If you had the other things, being "stochastic" is not even remotely a show-stopper. Stochastic processes abound and are the reason the mathematics of statistics was developed in the first place, ultimately allowing us to create such things as LLMs.
When all the relevant steps gets published, I absolutely expect a lot of people to (attempt to) reproduce this work even though LLMs are stochastic.
On the prompt formulation; prompts with very similar formulations (in terms of both semantics, hamming distance, or both) can lead to _wildly divergent_ outputs in my experience. It's not rigourous, and when that divergence happens, it's extremely difficult (arguably impossible, by nature of the architecture of transformers) to identify why the divergence happened and where.
It's not about being able to throw claude or codex at a loop and having it evaluate it for halting, it's about being able to do this for arbitrary code. Computer science rigourously defines the halting problem as not computable and undecidable. within the framework of using something akin to static analysis using any deterministic Turing machine.
There's not really a question of "solving" the halting problem like there's some as-yet unknown way of generally figuring out if arbitraty code halts. Turing proposed a proof in 1937 in favour of undecidability of what we now know as the halting problem, building on ideas first articulated by Church a few years prior.
Frankly, if anything, it's reasonable to say that the halting problem's been solved, just in the direction of undecidability rather than decidability.
Anyway, back to LLMs; as code gets more complex, the robot will need a bigger context window, more hardware resources, and more time, all of which will be variable due to the noise inherent in the system. It'll be difficult to put a useful upper and lower bound on how much computing power and time it'll take to figure out if a program ever halts. Which is all a bit moot, frankly, in the context of halting, but useful to keep in mind in the more general context of using these things as analysis tools.
Every day when you lower your butt onto your chair, you trust a stochastic system enough to assume you'll rest on the chair safely and not spontaneously phase through, which would lead to rather gory and painful terminal experience.
Physics at macro scale is stochastic, which is a good reminder that stochastic != uniformly random. Expected distributions matter.
IMO a better example would be the stochastic nature of quality control in manufacturing.
I was going to segue into thermodynamics as a backup example, but you made me think of something better.
> IMO a better example would be the stochastic nature of quality control in manufacturing.
How about, more specifically, food manufacturing? Or maybe, let's talk about cooking?
Cooking is as stochastic as it gets, and we handle it fine. It could be better - the better version is called "chemical process engineering", it's what cooking looks like when you care about quality and consistency of output, and can afford the equipment and process actually necessary for it. Regular people don't (i.e. neither care, nor can afford) - we call this cooking. It's an art, not a science, and people not only do it, but love it, and tie their identities to it, and build businesses around it, and a culture that embraces all the compromises (and calls the more serious approach "unhealthy").
My attempts at making bread have been too stochastic, in that it hardly ever produces nice results.
But yes. Eyeballing how much dried herbs to put in my dishes because I like what 2-isopropyl-5-methylphenol does for them. Usually it works, sometimes it's just a bit too Italian.
... in some sense, it's a miracle most people deal with this kind of bullshit without complaining much.
(Probably because they don't realize it's something to complain about. It's just how things are.)
Speaking generally, food produced though "chemical process engineering" (a.k.a. factories) must compromise on many axes, one of them being nutritional content. We intuitively do not care about several of these dimensions when cooking food with fresh ingredients, at least not at the scale of, say, Kellogg's or General Mills.
Maybe that's evidence of accepting a stochastic process in our daily lives, but you're kind of selling the tradition and science of cooking short when you argue that factory-produced food is a "more serious approach".
https://gist.github.com/fragmede/bbf277d36a2398065f109484f34...
Cynical read would be you’re stealing his thunder a bit by prematurely announcing this before it’s fully confirmed
One of the things I find weird with AI is how the dismissals of work that involve AI splits into two camps: like yours, saying the AI did the work while the human played no role and deserves no credit; and those saying the AI rips off its training data while the human using it played no role and deserves no credit.
No human, individually or as a team, has been able to solve this to date.
To the extent this was Claude solving it itself and thus denying Di Mino any thunder, there was nobody to have stolen anything from. To the extent he has thunder to be stolen, it wasn't ever in Claude's possession.
As reported (I have no skin in this), Di Mino appears to have used Claude to write tool to perform statistical analysis to test an idea he had, and in such cases as this it seems to me fairer to praise the human using the machine than to praise a director for the films acted, filmed, written, and edited by others.
Either all information is stolen, or none is. Can't have it both ways.
If not, how did “Claude” steal anything?
That's exactly the kind of thing I'd hope Claude would be used for in these kinds of projects - building tools, not black-box "solving" the problem.
If you have a 4k screen, you can fit all remaining Linear A text on your screen at once, in 14pt high font.
Some of the lists end with "ku-ro" and a number that's the sum of all the previous numbers, oddly frequently off by one.
When she was leaving, the woman said "pose, pose". My French teacher was puzzled, and asked why she'd said that, and the woman asked if it didn't mean "au revoir" in Norwegian?
Because it was what the cashier at the grocery store said to her every time.
It means (carrier) bag.
I have no idea why Minoans would speak Hebrew, there's no indication as far as I'm aware of extensive cultural exchange between the Minoan civ and Hebrew-speaking people, but there's a very clear hierarchy of difficulty to translate dead scripts. From easier to harder:
a) We know what language the script transcribes and how the script transcribes it (e.g. what symbol means what word or sound).
b1) We don't what language the script transcribes but we know how the script transcribes it (e.g. it's a syllabary or an abjad etc).
b2) We know what language the script transcribes but we don't know how the script transcribes it (e.g. Egyptian hieroglyphics).
c) We don't know what language the script transcribes nor do we know how it transcribes it.
b1) and b2) are more or less of similar difficulty.
Linear A goes to category c) above. We know next to nothing about the script or the language, other than the fact the former was reused in linear B to transcribe Mycenean Greek.
First it says that the language of Linear A is a semitic language that is a precursor to biblical Hebrew:
>> Di Mino believes that Linear A belongs to an extinct Semitic language that was a precursor to biblical Hebrew, the way that Latin is a precursor to Italian.
But then it compares the Linear A language directly to Hebrew. For example:
>> Once deciphered, Di Mino saw that the prayer was similar to subsequent Hebrew prayers but was addressed to a Goddess.
So maybe I'm confused. The claim that the Minoans spoke a semitic language sounds less odd than claiming they spoke straight-up biblical Hebrew.
Obviously one symbol can mean literally anything, but you could also have very long strings of symbols with many different meanings.
The other thing I find odd, however, is that it's found to be a Semitic language. If it's a Semitic language, I would have expected it to already have been deciphered. And certainly linguists would have already looked at Semitic languages, and looked hard.
Also if it were a Semitic language, why wasn't it consonantal but had vowels? Usually Semitic languages (and Egyptian maybe) write only the consonants because their stems are made three consonants and vowels are interweaved to make words.
Example semitic root K-T-B and how vowels are added in-between to form words:
kataba – He wrote yaktubu – He writes / is writing kitāb – A book kutub – Books kātib – A writer / scribe / clerk maktūb – Written / fate maktab – An office / desk maktabah – A library / bookstore
And another such root - D-R-S which means "studying" or "learning."
darasa – He studied yadrusu – He studies / is studying dirāsah – A study / school course dāris – A student / learner madrūs – Studied / carefully planned madrasah – A school
This system of triliteral roots is the reason why usually Semitic languages don't use vowels. Why would Linear A have consonant+vowel syllabary if it were semitic?
To be clear, this is an attempt at a decipherment. This is not proven, and we shouldn't consider Linear A to be "solved" until experts in the field have reviewed the work. In fact, it probably shouldn't be considered "proof" unless some more Linear A writings are uncovered and these are congruent with the method proposed. All that can be said for certain at this point is that this is an interesting conjecture.
But this is a story worth following. This could be the real deal. More research and validation should follow and we should have a better idea in the next few weeks or months whether Linear A has really been solved. At the very least, this is an interesting attempt, and optimistically, it could yield real insight into Minoan culture. Kudos.
𐇑 𐇘 𐇪 𐇐 | 𐇬 𐇳 𐇖 𐇗𐇽 | 𐇬 𐇗 𐇜 | 𐇬 𐇼 𐇖𐇽 | 𐇥 𐇬 𐇳 𐇖 𐇗𐇽 | 𐇪 𐇱 𐇦 𐇨 | 𐇖 𐇡 𐇲 | 𐇖 𐇼 𐇖𐇽 | 𐇖 𐇡 𐇲 | 𐇥 𐇬 𐇳 𐇖 𐇗𐇽 i-𐇘-wi-jeʳ | ʰau-ni-ti-noʳ au-no-pa au-ndi-tiʳ 𐇥-au-ni-ti-noʳ wa-pi-naᵐwa ti-ru-te ti-nd-tri ti-na-ru-he ʰau-ni-ti-noʳ i-301-wa-ja/e | ʰau-... jaᵘ-di-ki-to i-pi-na-ma si-ru-te ta-na-ra te-ti-u ta-na-te i-da 𐘚 ᴴI 𐘮 WA 𐘱 JA 𐘱 JA 𐘆 DI 𐘸 KI 𐘹 TU 𐘚 ᴴI 𐘢 PI 𐘅 NA 𐙁 MA ()
I believe the phonetic values for Phaistos here were based on similarity.
Indus valley script is about 1500 years earlier than Linear A and I hope we can also decipher Indus script using AI or not [1]. It's well overdue although from statistical profiling it's has been proven to be a valid linguistic script believed to be being used for writing system the ancient Harappan language, the likely precursor of modern Dravidian languages for examples Telegus and Tamil.
The main reason it's very difficult to decipher is that there's no equivalent Rosetta Stone for Indus script. My hypothesis is that the AI LLM model can be trained or tuned as the logical or virtual version of the venerable Rosetta stone hence can be used to decipher ancient writing system.
[1] Indus script:
https://en.wikipedia.org/wiki/Indus_script
Honestly curious how many years before it can be one shotted in a coding harness with Fable.next by someone who’s not a linguistics expert.
Develop, test, and rank hypotheses about the phonetic values, morphology, grammar, and possible language family of Linear A using the full available corpus. Do not assume any decipherment is correct. Treat all candidate readings as hypotheses to be scored…”
I know I'm simplifying a lot, but all this deciphering isn't it just some kind of pattern matching?
however, nawaya or what ever examples around it are not part of the Hebrew language.
https://www.biblexika.com/bible-lexicon/navah-h5115
https://hebrew-academy.org.il/%D7%93%D7%A3-%D7%9E%D7%99%D7%9...
https://biblehub.com/hebrew/5116.htm
It's a common misconception that is what happened with Ancient Egyptian with the Rosetta Stone. The Rosetta Stone was just one of the big pieces of the puzzle. The decoding came when people realized that Coptic (a language written alphabetically and still in use in the Coptic Church today) is actually descended from Ancient Egyptian; as Spanish is to Latin, Coptic is to Ancient Egyptian.
Similarly the attempts to decode classical Maya were all dead ends. Until Yuri Knorozov realized that it encoded the ancestor of the Maya languages which are still spoken to this day. (Knorozov's Wikipedia article is worth checking out just for his photo with his cat. [0] IMHO.)
I have written before about the La Mojarra 1 stele in Mexico [1]. It looks a lot like Maya. [2] But it isn't Maya. Maybe the difference like between Russian and Latin writing?
No one can read it. It's undecipherable. There are some attempts to identify it with a proposed ancient language that would have been related to the modern Mixe-Zoque languages: some of the glyphs that are shared with Maya, when read phonetically, start sounding like a Mixe-Zoque language. But no one has proposed a confident decipherment. There probably isn't enough text. La Mojarra 1 is the only long example of the Isthmian script.
Deciphering Akkadian was very difficult, at first. The process started with Persian; old Persian was written in a simplified adapted form of the Mesopotamian cuneiform (wedges on clay). It was a kind of alphabet. And Old Persian was already understood. And there was a bilingual text on a monument carved by Darius I. But even then -- decoding relies so heavily on the fact that Akkadian is a Semitic language distantly related to Hebrew, more distantly, also Ancient Egyptian. So again, we sort of knew what we were looking for.
That is all to say: even if the Voynich manuscript (for example) contains real text in an otherwise completely lost language, I'm not sure it is possible even theoretically to translate it.
[0] https://en.wikipedia.org/wiki/Yuri_Knorozov
[1] https://en.wikipedia.org/wiki/La_Mojarra_Stela_1
[2] https://commons.wikimedia.org/wiki/File:La_Mojarra_Stela_1_S...
"Thag is a smarty-pants"
Caveat: I'm Greek so a kind of natural amateur historian. That is to say I grew up reading about the prehistory and ancient history of Greece, as one does when one is born Greek and a geek. I've seen the Phaistos disk and linear A inscriptions with mine own eyes in Greek museums and I have dreamed of the day they would be translated. I am not at all unsympathetic to the hopes of a Linear A decipherement.
However. The claimed decipherment has all the hallmarks of imaginative and fanciful attempts to draw parallels between historical events and entities, that were not really connected, many of them notably inspired by the Hebrew bible. For example, remember when the lost tribes of Israel turned out in the New World [1]? Or how Biblical Sodom was actually destroyed by a comet [2]? Or the time that Venus was ejected from Jupiter and caused the Biblical Cataclysm [3]? Or, for less biblical but no less foundational texts of the Western literary canon, remember when Heinrich Schliemann discovered the Jewels of Helen of Troy [4] and the Death Mask of King Agamemnon [5]?
Or of course we could recall any of the claims to decipher the Phaistos Disk [6] or the Dropa Disks from Bayan Kara-Ula [7], and so on I'm sure.
All of the above is not to say that a decipherement is impossible. What it is to say is that it currently isn't possible; because we have no idea what the language that Linear A transcribes even is. It's not like the Minoan language is still spoken today in some far-evolved form, as was the case for e.g. Egyptian or Mayan or indeed ancient Greek [8]. So we have an unknown script, writing an unknown language, and to make matters worse there are no parallel texts with another ancient language that might help us bridge the gap. What there is, is some rudimentary understanding of the more obvious contents of Linear A texts (mostly, lists of goods) and the fact that some Linear A symbols have been reused in Linear B.
But, how were they reused? And what good is that knowledge without knowing anything about the language transcribed by Linear A? I can read German, a language that I don't speak, because I can read Latin script, but the meaning of the script might as well be Greek to me [9].
I'm a computer scientists, I guess, these days [10]. The problem of deciphering Linear A, or the Phaistos Disk, or any other script (that may not even be a script) that transliterates a language that we don't know is a problem of reconstructing information that we don't have, from other information that we don't have. I'm not saying it's completely impossible. I mean, who knows? Maybe we're just missing the right maths. But, what we're really trying to do here is de-noise a message garbled by the passage of time without even a guess as to the language the message is written in. Claude Shannon would tell you that it's a fantasy that is not worth pursuing. You don't have to ask him, you can just read his magnum opus [11] and check out Section 3 titled "The Series of Approximations to English" for an idea of what the mechanics of deciphering a script when the language is known look like with the only technology we have that can do the job reliably.
When Turing and the other Brits at Bletchley Park cracked the Enigma code, they at least knew it was, ultimately, a coded form of German. We may have a lot more compute now, and much more advanced tech overall, but there are some barriers that you cannot physically cross, no matter what resources you have. For example, you can't go faster than light and you can't escape the event horizon of a black hole. In the same way you can't translate text written in an unknown script, encoding an unknown language, without any parallel texts with a known language. There is just not enough information to do the job. Worse, if you try, you can endlessly come up with plausible "translations" and convince yourself that you have the right one, but you have no way to know you do.
I'm sorry but this claim is just a wild guess trying to link Hebrew to Linear A, without any serious evidence that the two are linked and without any evidence that the link is real, other than "look, I can guess what all the texts say!".
_________________
[1] https://en.wikipedia.org/wiki/Jewish_Indian_theory
[2] https://www.smithsonianmag.com/smart-news/destruction-of-cit...
[3] https://en.wikipedia.org/wiki/Worlds_in_Collision
[4] https://en.wikipedia.org/wiki/Priam%27s_Treasure#/media/File...
[5] https://en.wikipedia.org/wiki/Mask_of_Agamemnon
[6] https://en.wikipedia.org/wiki/Phaistos_Disc_decipherment_cla...
[7] https://en.wikipedia.org/wiki/Dropa_stones
[8] I can read ancient Greek. The further back in time it goes, the harder it gets to understand what it means but I can still read the script. It has changed in 5000 years but not enough that I can't read it. Nothing like that ability survived for Linear A. I blame Thera.
[9] Except of course then I would understand it. But it's just German to me.
[10] I can assure you that took me by surprised, first of all.
[11] https://people.math.harvard.edu/~ctm/home/text/others/shanno...
Curious to see where this goes. Hopefully an update is posted here when it's all said and done :)
Claiming that claude did all the work is patently ridiculous. Claude is a tool, like any other. The corpus of linear A is ~7500 characters across ~1500 inscriptions and claude, no matter how smart, doesn't just solve that on its own.
What a shame.
So did many of the previous attempted solvers.
I don't know why you want to stoop to name calling which violates the guidelines and the spirit of this site.
"without any critical review" is also seemingly untrue: the post says Rutgers and Cambridge are reviewing it
My following searches turn out no announcements by either Rutgers or Cambridge:
"rutgers linguists evaluate deciphering of linear a by tom di mino"
https://www.google.com/search?q=rutgers+linguists+evaluate+d...
https://www.google.com/search?q=cambridge+linguists+evaluate...
i am sorry but "crank" is the correct term for the many amateurs who routinely email mathematicians, physicists, and apparently linguists with their special theories without having any academic experience in the field. for every anecdote where it panned out there are thousands of cases where it did not.