Principal on @LifeScanApp; Director of Informatics at CBG, University of Guelph, and Director of BOLD (boldsystems.org)

Joined August 2008
36 Photos and videos
SujeevanRatnasingham retweeted
10 research gap types and how to bridge them
23
1,021
5,563
910,457
SujeevanRatnasingham retweeted
27 Jul 2024
Thrilled to spotlight the 24 women of the NbS Guinean Forests Project in Côte d'Ivoire! Their efforts with Malaise traps in Divo Botanical Reserve will provide crucial insect data. 🦋🌿 @WorldUniService @CECI_Canada @iBOLConsortium @CIFOR_ICRAF_WCA #Biodiversity #WomenInScience
1
4
6
459
Didi was a bright young star whose light was extinguished early. She was a north star for many young women in South Africa. Se was a beautiful person and a dedicated scientist. I will miss her. mg.co.za/news/2024-07-15-mg-…
2
5
201
SujeevanRatnasingham retweeted
11 Jul 2024
Prof. Michelle Van der Bank has been awarded the SA Academy of Science and Culture’s Medal of Honour for contribution to science in South Africa. Read more: shorturl.at/2g0zq

2
6
12
1,378
SujeevanRatnasingham retweeted
Extremely happy to share this article!!!!! within in the project @DRYvER_H2020 we analysed the GHG emissions in European drying river networks. Drying had a legacy effect both on CO2 and CH4 & riverbeds represented >50% of total annual C emissions in 3 of the 6 case studies
3
18
59
4,737
SujeevanRatnasingham retweeted
My PhD thesis is due in a couple of weeks. With so many things to take care of, I have adopted more effective to-do lists to manage my tasks and I am never going back! Here are my tips for best practices to improve output (1/n)#phdlife #AcademicChatter #phdvoice #AcademicChatter
27
209
2,591
496,119
SujeevanRatnasingham retweeted
2 May 2024
Inexpensive token generation and agentic workflows for large language models (LLMs) open up intriguing new possibilities for training LLMs on synthetic data. Pretraining an LLM on its own directly generated responses to prompts doesn't help. But if an agentic workflow implemented with the LLM results in higher quality output than the LLM can generate directly, then training on that output becomes potentially useful. Just as humans can learn from their own thinking, perhaps LLMs can, too. For example, imagine a math student who is learning to write mathematical proofs. By solving a few problems — even without external input — they can reflect on what does and doesn’t work and, through practice, learn how to more quickly generate good proofs. Broadly, LLM training involves (i) pretraining (learning from unlabeled text data to predict the next word) followed by (ii) instruction fine-tuning (learning to follow instructions) and (iii) RLHF/DPO tuning to align the LLM’s output to human values. Step (i) requires many orders of magnitude more data than the other steps. For example, Llama 3 was pretrained on over 15 trillion tokens, and LLM developers are still hungry for more data. Where can we get more text to train on? Many developers train smaller models directly on the output of larger models, so a smaller model learns to mimic a larger model’s behavior on a particular task. However, an LLM can’t learn much by training on data it generated directly, just like a supervised learning algorithm can’t learn from trying to predict labels it generated by itself. Indeed, training a model repeatedly on the output of an earlier version of itself can result in model collapse. However, an LLM wrapped in an agentic workflow may produce higher-quality output than it can generate directly. In this case, the LLM’s higher-quality output might be useful as pretraining data for the LLM itself. Efforts like these have precedents: - When using reinforcement learning to play a game like chess, a model might learn a function that evaluates board positions. If we apply game tree search along with a low-accuracy evaluation function, the model can come up with more accurate evaluations. Then we can train that evaluation function to mimic these more accurate values. - In the alignment step, Anthropic’s constitutional AI method uses RLAIF (RL from AI Feedback) to judge the quality of LLM outputs, substituting feedback generated by an AI model for human feedback. A significant barrier to using LLMs prompted via agentic workflows to produce their own training data is the cost of generating tokens. Say we want to generate 1 trillion tokens to extend a pre-existing training dataset. Currently, at publicly announced prices, generating 1 trillion tokens using GPT-4-turbo ($30 per million output tokens), Claude 3 Opus ($75), Gemini 1.5 Pro ($21), and Llama-3-70B on Groq ($0.79) would cost, respectively, $30M, $75M, $21M and $790K. Of course, an agentic workflow that uses a design pattern like Reflection would require generating more than one token per token that we would use as training data. But budgets for training cutting-edge LLMs easily surpass $100M, so spending a few million dollars more for data to boost performance is quite feasible. That’s why I believe agentic workflows will open up intriguing new opportunities for high-quality synthetic data generation. [Original text: deeplearning.ai/the-batch/is… ]

34
231
1,248
204,125
Privileged to be surrounded by so many luminaries at the Franklin Institute’s Committee on Science and the Arts Dinner. @CBG_UofG @iBOLConsortium
1
2
9
642
The festivities continue
1
75
Incredible ceremony @ the Franklin Institute
67
SujeevanRatnasingham retweeted
Today, the #UofG community honoured the groundbreaking achievement of evolutionary biologist, Dr. Paul Hebert, for the pioneering work he and the team at the Centre of Biodiversity Genomics have accomplished in DNA barcoding to catalogue life on Earth.
9
30
3,484
Long-standing frustration: The actions of the Democrats often fall short of their proclaimed ideals and rhetoric. #USpolitics youtube.com/watch?v=hNDgcjVG…

73
I'm delighted to see my long-time mentor and friend, Paul Hebert, awarded the Benjamin Franklin Medal. The 2024 Laureates also include notable figures like Lisa Su (AMD) and Robert Metcalfe (Ethernet). #Innovation #ScienceLeaders #TechPioneers
4
17
274
SujeevanRatnasingham retweeted
Dr. Paul Hebert, CBG's CEO and founder, has received the prestigious Benjamin Franklin Medal in Earth and Environmental Science. His groundbreaking work is #IlluminatingBiodiversity across the planet. Congratulations, Dr. Hebert! news.uoguelph.ca/2024/01/hol… #DNAbarcoding #uofg
5
19
94
3,407
SujeevanRatnasingham retweeted
18 Nov 2023
Sam and I are shocked and saddened by what the board did today. Let us first say thank you to all the incredible people who we have worked with at OpenAI, our customers, our investors, and all of those who have been reaching out. We too are still trying to figure out exactly what happened. Here is what we know: - Last night, Sam got a text from Ilya asking to talk at noon Friday. Sam joined a Google Meet and the whole board, except Greg, was there. Ilya told Sam he was being fired and that the news was going out very soon. - At 12:19pm, Greg got a text from Ilya asking for a quick call. At 12:23pm, Ilya sent a Google Meet link. Greg was told that he was being removed from the board (but was vital to the company and would retain his role) and that Sam had been fired. Around the same time, OpenAI published a blog post. - As far as we know, the management team was made aware of this shortly after, other than Mira who found out the night prior. The outpouring of support has been really nice; thank you, but please don’t spend any time being concerned. We will be fine. Greater things coming soon.
3,221
6,866
54,393
21,020,302
SujeevanRatnasingham retweeted
28 Sep 2023
We are delighted to have Prof Paul Hebert from @iBOLConsortium as a KEYNOTE SPEAKER for the #GEOBONconf2023 🌍 🎙️His talk will focus on "A Mission for Planetary Biodiversity" 🗓️Oct 12th at 8:30am #MonitorBiodiv4Action #GeneticDiversity #DNAbarcoding
1
9
24
2,595
SujeevanRatnasingham retweeted
Paul Hebert Centre for DNA Barcoding and Biodiversity Studies opens a new state-of-the-art facility to expand #biodiversity research in #India. ibol.org/articles/new-campus… #DNAbarcode #IlluminateBiodiversity #ScienceNews
4
27
75
7,578
SujeevanRatnasingham retweeted
Insect Investigators project is the recipient of a Citizen Science Award! @InsectInvestig8 brought together scientists and schools to discover, document, and describe Australia's insect #biodiversity. CBG barcoded the specimens. environment.sa.gov.au/goodli…
2
5
339