OpenAI's language models are making the headlines these days but can open source models do better?
RedPajama is the latest project with that mission, recently completing their first step: creating and releasing training data
Announcing RedPajama — a project to create leading, fully open-source large language models, beginning with the release of a 1.2 trillion token dataset that follows the LLaMA recipe, available today!
together.xyz/blog/redpajama
More in 🧵 …