Linguistics is not only about understanding human language.
How can generative AI & linguistics help us understand animals?
Here's a great summary of the
@ProjectCETI workshop on *Decoding Communication in Nonhuman Species II*
@SimonsInstitute. 🐳🐘🐬
"Prior to the work of Project CETI, the common assumption in the field was that whale codas form discrete basic “building blocks” of their communication system. Some speakers cautioned that this assumption would be analogous to turning a human vocal utterance into written text. This might correctly identify some linguistic elements, but you might also lose important information in the process.
Gašper Beguš from UC Berkeley, CETI’s linguistics lead, pointed this out in his talk. “Spoken language is the primary thing; written text is just a representation of spoken language. If we see a [sperm whale] click, then we immediately think: oh, that is a one/zero kind of thing.” But that’s our human (or computer scientist) bias.
When we model a language from text, we can learn a lot about it, but we also lose a lot of non-textual communication. “If you're approaching something as unknown as whale communication, you don't want to be throwing away anything.”
He is using a flavor of Generative AI called a General Adversarial Network (GAN). GANs have the advantage that they can work with the original audio recordings to create translations or representations. They consist of two AI players, the generator and the discriminator. The discriminator has been trained on a set of data, e.g. audio recordings. The generator, which never sees the original training set, looks at new data and tries to describe them, starting with random strings of symbols. The two parts of the algorithm play a kind of game: the generator tries to get better and better at describing the data, while the discriminator tries to get better at telling correct descriptions from false ones. As each party tries to outperform the other, the whole system gets better at interpreting the data.
“GANs are also very nice because they replicate stages in the acquisition of language,” Beguš said. So scientists could use them to study how whale calves start acquiring their language in small steps, just as human children don't start talking in complex sentences.
Beguš doesn’t see GANs as a competitive alternative to using LLMs, however. Rather, these different AI methods complement each other in understanding different aspects of the whales’ world. “GANs are appropriate for understanding what is meaningful at the most basic level, or how their vocalizations are learned,” he said. “Using LLMs helps us with longer conversations and more discretized representations of their vocalizations. Graph neural networks that operate with mathematical graphs can be used to model their complex social structure and behavior.”"
Full article by
@chrdr:
simons.berkeley.edu/news/wor…
Video:
youtube.com/watch?v=jFo59fDl…