PortexAI

PortexAI

3 Photos and videos

Tweets

PortexAI

@PortexAI

23 Oct 2025

Day 2 of #PyTorchCon 🔥 What a ride. Talked with folks using #PyTorch to fine-tune models for drug discovery, cancer research, autonomous vehicles and, of course, customer support! Thanks @PyTorch for having us!

2,276

Kyle Waters

PortexAI retweeted

Kyle Waters @kylewaters_

24 Sep 2025

1/ AI isn't just a compute race anymore. It's a data race too. Labs are paying top dollar for differentiated, high-signal data. It's clear now is the time to experiment with new approaches to valuing and incentivizing the creation of frontier AI data. x.com/LucasNuzzi/status/1970…

Lucas Nuzzi

@LucasNuzzi

24 Sep 2025

AI has kicked off a gold rush for data, with OpenAI alone projecting $8B in data-related expenses by 2030. The challenge now is finding a reliable way to value data in this era. Our latest on data valuation techniques: research.portexai.com/data-v…

2,092

Lucas Nuzzi

PortexAI retweeted

Lucas Nuzzi

@LucasNuzzi

24 Sep 2025

Data Valuation as a Foundation for AI Progress

Data has been called the world’s most valuable resource and “unreasonably effective” at modeling the world around us, yet, the economics governing AI data acquisition have remained largely unchanged....

research.portexai.com

4,774

PortexAI

PortexAI

@PortexAI

17 Sep 2025

Noticing a trend? Specialized models continue to beat foundation models on task performance, cost, and latency. The emerging design pattern for agents is a foundation-model-brain that can invoke the most optimal tool for a given task.

Perceptron AI

@perceptroninc

17 Sep 2025

1/ Introducing Isaac 0.1 — our first perceptive-language model. 2B params, open weights. Matches or beats models significantly larger on core perception. We are pushing the efficient frontier for physical AI. perceptron.inc/blog/introduc…

ALT https://www.perceptron.inc/blog/introducing-isaac-0-1

1,708

PortexAI

PortexAI

@PortexAI

15 Sep 2025

"That will only happen with the assurance of effective copyright protection mechanisms, a market for fairly negotiated licensing deals, and additional incentives for creativity, ingenious, and innovative human work." We couldn't agree more 😉

Luiza Jarovsky, PhD

@LuizaJarovsky

9 Sep 2025

🚨 "AI Models Collapse When Trained on Recursively Generated Data" is one of the most influential papers ever on LLMs. Surprisingly, it might signal that copyright protection is a BACKBONE of the AI industry. Quotes & comments: "The development of LLMs is very involved and requires large quantities of training data. Yet, although current LLMs, including GPT-3, were trained on predominantly human-generated text, this may change. If the training data of most future models are also scraped from the web, then they will inevitably train on data produced by their predecessors. In this paper, we investigate what happens when text produced by, for example, a version of GPT forms most of the training dataset of following models. What happens to GPT generations GPT-{n} as n increases? We discover that indiscriminately learning from data produced by other models causes ‘model collapse’—a degenerative process whereby, over time, models forget the true underlying data distribution, even in the absence of a shift in the distribution over time" - "Our evaluation suggests a ‘first mover advantage’ when it comes to training models such as LLMs. In our work, we demonstrate that training on samples from another generative model can induce a distribution shift, which—over time—causes model collapse. This in turn causes the model to misperceive the underlying learning task. To sustain learning over a long period of time, we need to make sure that access to the original data source is preserved and that further data not generated by LLMs remain available over time. The need to distinguish data generated by LLMs from other data raises questions about the provenance of content that is crawled from the Internet: it is unclear how content generated by LLMs can be tracked at scale. (...)" - My comments: As the internet becomes polluted with AI-generated content, a way to maintain the quality of training datasets and support the development of new LLMs might be to create generative AI-free training datasets, including text, image, video, and audio data. That will only happen with the assurance of effective copyright protection mechanisms, a market for fairly negotiated licensing deals, and additional incentives for creativity, ingenious, and innovative human work. - 👉 Link to the paper below. 👉 Never miss my analyses and recommendations of excellent papers, join 77,000 people who subscribe to my newsletter (link below).

156

PortexAI

PortexAI

@PortexAI

2 Sep 2025

GPT-4b micro is a model trained exclusively on specialized biological data. It was used to reverse cellular aging with a 50x improvement in efficiency relative to previous approaches. A testament to the power of narrow AI specialized data. Amazing overview by @rowancheung:

1:17

1,329

Lucas Nuzzi

PortexAI retweeted

Lucas Nuzzi

@LucasNuzzi

25 Aug 2025

AI is selling off because we've reached a plateau of what foundation models can do without specialized tools. It's like we have an amazing operating system but very few apps running on it. The way forward is fine-tuning tools with better data: the next major trend in AI ⬇️

3,066

Kyle Waters

PortexAI retweeted

Kyle Waters @kylewaters_

22 Aug 2025

1/ New @PortexAI research studying the massive growth and reach of @huggingface, which is unquestionably the heartbeat of open source AI today. Its repository of datasets and models point to a bright future for fine-tuning, and also a coming market for proprietary datasets 🧵

2,749

Lucas Nuzzi

PortexAI retweeted

Lucas Nuzzi

@LucasNuzzi

12 Aug 2025

Helix was trained on a specialized dataset with 500 hours of human tasks. It's an exciting architecture because it uses a single set of neural net weights to learn behaviors. If you feed it specialized data, it will learn new tasks. It's beyond impressive.

Brett Adcock

@adcock_brett

12 Aug 2025

For the first time, a humanoid robot can fold laundry using a neural net We made no changes to the Helix architecture, only new data

2:54

1,513

Kyle Waters

PortexAI retweeted

Kyle Waters @kylewaters_

7 Aug 2025

1/ We're excited about the launch of the @PortexAI Datalab: the full-stack data acquisition platform for AI builders. This week, we're showcasing one of the most impactful features on the platform, which has already saved an early user 180 developer hours. 🧵

1,561

PortexAI

PortexAI

@PortexAI

5 Aug 2025

In an era where software is free to create and replicate, the are only lasting moats are: Compute, product distribution, and data.

Sam Altman

@sama

3 Aug 2025

entering the fast fashion era of SaaS very soon

159

Lucas Nuzzi

PortexAI retweeted

Lucas Nuzzi

@LucasNuzzi

30 Jul 2025

1/ In 2024, @PortexAI tried to compete with oracle providers like Chainlink neck-and-neck. The economics looked great. Oracle operators on Chainlink alone had generated close to 200M USD in revenue since '21 with take rates at times reaching 46% Here's what happened next..🧵

PortexAI

@PortexAI

30 Jul 2025

What We Learned Trying to Build a Blockchain Oracle (and Where We’re Going Next) research.portexai.com/what-w…

2,595

PortexAI

PortexAI

@PortexAI

30 Jul 2025

What We Learned Trying to Build a Blockchain Oracle (and Where We’re Going Next) research.portexai.com/what-w…

2,651

PortexAI

PortexAI

@PortexAI

4 Apr 2024

We’re excited to share Portex’s first research paper at the intersection of crypto and AI Check out the paper, "Reputation Oracles: Determining Smart Contract Reputability via Transfer Learning", below: research.portexai.com/reputa…

Reputation Oracles: Applying Transfer Learning to Bytecode

Introducing reputation oracles: programs that can predict whether a given smart contract is reputable, or malicious.

research.portexai.com

2,442