Basic Speech synths were available even on the Commadore 64. There are a set number of phenoms in human speech. the process of text to speech is basically about converting a word such as 'sausages' into the phenoms that make it up. Natural sounding speech is complex, lots of tone and inflection issues that are hard to replicate.

But a 'computer voice', something that's understandable (like Professor Stephen Hawkins voice) is relatively easy. You can get many files from lots of these open speech places that are essentially digital waveforms of existing phenoms or you can synth the phenoms in realtime, or you can do a mix of the two (which as I understand is how modern systems do it).

Text to Speech is a complex topic, but a basic speech synth isn't actually hard at all.

You see them all the time in the Demoscene, to create one isn't by any means a silly idea.

At a very rough guess, a simple, understandable synth is probably as hard as learning and coding a simple scripting engine or maybe say... a software implementation of a skeletal/bone animation lib.

it's probably not too far off that to be fair.