You can pre-train babies for language before they’re born.
Researchers had pregnant women in France repeatedly play a children’s story in the last month of pregnancy, in their native French, but also in either German or Hebrew (both of which the parents did not speak).
Then, within 3 days of birth, they played the newborns the same story in French, German, and Hebrew while measuring brain responses.
The babies’ brains responded to the foreign language heard in the womb very similarly to the native language, while not displaying the same patterns for the 3rd new language.
Brief but repeated prenatal exposure to a new language appeared sufficient to make that language neurally recognizable at birth.
This doesn’t mean that the babies learned the languages fluently, but more likely that they learned low-frequency acoustic, prosodic, and rhythmic regularities like cadence, stress pattern, intonation envelope, and maybe even familiar story-level temporal structure. External sounds below roughly 250-400 Hz transmit relatively well through the womb, so fetuses are getting a muffled but structured auditory stream, but not clean phonemes.
I used the word “pre-train” in the opening hook, because that’s literally what’s going on here. In ML, pre-training is the stage where a model absorbs broad statistical structure before it is asked to do any specific tasks. In pre-training, a model is learning the shape of the data distribution.
Here, the fetus isn’t learning words per se, because it’s not mapping sounds to meanings, so it’s not like you can say the baby is “learning Hebrew.”
It is, however, learning that speech has structure. Some streams of sound are familiar, and some rhythms recur. There are contours to the outside world the fetus is about to enter.
This way, the baby is born with priors already slightly tuned by the acoustics statistics of its environment.
This is probably a bit uncanny to hear because it collapses the intuitive boundary most people place between before and after birth,
but when it comes to language at least, things don’t begin with the first word.
Language begins as a faint, low-pass filtered signal through the body of the mother, long before the baby has any idea what a word is.
Rhythm precedes semantics, and distribution precedes vocabulary.
Yet another baby neuro paper showing convergence between baby neuro and ML. Before intelligence can do tasks, it needs to learn the shape of the world.
paper:
nature.com/articles/s42003-0…