Joined December 2024
292 Photos and videos
Pinned Tweet
Attention AI and ML enthusiasts!!๐Ÿšจ๐Ÿšจ Have you heard about Ready Tensor? Whether youโ€™re a seasoned expert or just starting your journey, Ready Tensor is your stage. You can Share your projects, Collaborate with peers, and Shape the future of AI. Sign up Today! , and have all your projects in one intuitive place, ready and shareable to showcase your expertise!! readytensor.ai
3
7
23
5,458
Moltbook has AI agents talking, forming groups, even belief systems. Fascinating โ€” and unsettling for some. I recorded a short video separating hype from reality and what AI agents actually mean for work and careers.
3
3
145
We'll be running Llama 3.1-70B in production in 2035. Enterprise IT ran Windows XP into the 2010s because upgrading working systems was too risky. We're building agentic systems on specific LLM versions right now. Same prompts, different model โ†’ different behavior. These systems route tasks and make decisions. That behavior gets locked in during testing. So what happens when the next version drops? Rebuild everything? Or pin the version and hope it keeps running? My guess: production deployments will freeze on the model versions they were built with. Legacy AI, just like legacy Windows.
66
Generative language models are just massive classifiers. Ten years ago we built spam filters. Two classes: spam or not spam. GPT does the same thing. Just with 50,000 classes -every token in the vocabulary. Type "the cat sat on the" and it assigns probabilities to every possible next token. Highest probability wins. Same core task. Different scale and architecture. Watch the short video breaking this down with examples.
1
46
๐—”๐—ฟ๐—ฒ ๐˜†๐—ผ๐˜‚ ๐—ฟ๐—ฒ๐—ฎ๐—ฑ๐˜† ๐—ณ๐—ผ๐—ฟ ๐—ฎ๐—ฑ๐˜ƒ๐—ฎ๐—ป๐—ฐ๐—ฒ๐—ฑ ๐—Ÿ๐—Ÿ๐—  ๐—ฒ๐—ป๐—ด๐—ถ๐—ป๐—ฒ๐—ฒ๐—ฟ๐—ถ๐—ป๐—ด ๐˜„๐—ผ๐—ฟ๐—ธ? We built a 22-question self-assessment to help you evaluate whether you have the foundation for our LLM Engineering & Deployment Certification - our most technically demanding program. ๐—œ๐˜ ๐—ฐ๐—ผ๐˜ƒ๐—ฒ๐—ฟ๐˜€: - Python proficiency (OOP, generators, API integration) - Command line and Git workflows - ML fundamentals (overfitting, learning rate, batch size, optimizers) - PyTorch and Hugging Face transformers experience - NLP concepts (embeddings, transformer architectures) - GPU training basics The quiz takes 6-8 minutes. You get instant results showing where you stand and what to strengthen before enrolling. It's designed by the program creators and reflects the actual technical baseline needed for fine-tuning, distributed training, and cloud deployment work.
2
31,686
LLM leaderboard datasets favor binary correctness over real-world generation. Open LLM Leaderboard runs on MCQ tasks and datasets with machine-checkable outputs. Even instruction-following evals like IFEval use regex patterns, not subjective quality. This makes automation possible. But it also means a 75% benchmark score tells you nothing about how the model handles ambiguous prompts or maintains context. The video shows exact match calculation on GSM8K. The model gets 39% accuracy by matching final numbers, regardless of reasoning path. Production tasks rarely have a single verifiable answer. Watch the full benchmark walkthrough: youtu.be/yy-MQ-i-fls Part of the LLM Engineering & Deployment Certification program by Ready Tensor.
2
96
Colab gives you T4s for free with 4-hour windows. The session limit isn't a bug. It's a forcing function for efficient fine-tuning decisions. You learn to pick smaller base models, use LoRA instead of full fine-tuning, and checkpoint frequently. Those same patterns matter when you're paying per GPU hour later. Paid plans get you A100 access when projects scale. But the discipline from working constrained carries forward. The video covers runtime configuration, secure key storage, and navigating Colab's interface. Part of the LLM Engineering & Deployment Certification program by Ready Tensor.
79
Agentic AI framework choice splits between personal and company projects. Personal work: choice of framework matters less than problem solving skills. LangChain, LangGraph, CrewAI, etc. - pick something workable and move on. Don't agonize over it. Don't spend six months mastering the tool. The field shifts too fast for deep tool expertise to pay off. Company projects are different. You need to consider a 5 year horizon - will the framework exist over the long term? That question alone eliminates shiny new experimental frameworks. Anything backed by a major company or with long history by developers becomes the safe choice. More on this and some personal experiences in the video. This video is part of the Agentic AI Essentials certification program.
4
1
2
20,471
Equivalent LLM benchmark scores make model size the cost driver. Comparing coding-focused models on leaderboards shows this clearly. A 20B parameter model and a 120B model both scored identically on the same tasks. The decision becomes purely about infrastructure. Smaller models cost less to serve when they deliver the same output quality. Performance parity shifts the constraint from accuracy to deployment economics. See how this plays out across different model comparisons: youtu.be/WdQU0-ZaWko This video is part of the LLM Engineering & Deployment Certification program by Ready Tensor. Learn more here: readytensor.ai/llm-certificaโ€ฆ
6
19,659
Users pick different winners than benchmarks when testing LLMs blind. Chatbot Arena puts two anonymous models side by side. Same prompt, different responses. Users vote without knowing which LLM they're judging. The votes build an ELO ranking that diverges from benchmark leaderboards. A model can score high on tests but lose user votes for adding unwanted formatting or extra content. One response included a header the user didn't request. The other stayed minimal. Both answered correctly. The minimal one won the vote. Benchmark accuracy doesn't capture whether a model feels right to interact with. Voting patterns surface preferences that automated metrics miss entirely. See how blind testing changes LLM rankings: youtu.be/tsPMFYSvaEs This video is part of the LLM Engineering & Deployment Certification program by Ready Tensor. Learn more here: readytensor.ai/llm-certificaโ€ฆ
1
50
Embeddings retrieve documents by meaning, not matching words. A search for "I'm very hungry" pulls food-related publications. No keyword overlap exists between query and results. The vector database matches on semantic similarity instead. This runs on PG Vector with 768-dimensional embeddings. Each publication gets chunked, embedded, then averaged into a single vector. Cosine similarity ranks retrieval. Exact keyword matching breaks when users don't know the right terms. Vector search handles intent-based queries that traditional search can't parse. See the full implementation walkthrough: youtu.be/RkKO9_lP4co
2
1
1
19,464
If you fine-tune large language model without understanding LoRAโ€™s hyperparameters, youโ€™re likely wasting precious compute, time, or both. LoRA fine-tuning is controlled by three hyperparameters that balance expressiveness, training stability, and computational cost. Rank R sets the dimensionality of adapter matrices. Values as low as 4-8-16 can match full fine-tuning performance, with 8 being the most common starting point. The choice is primarily constrained by available memory rather than model quality. Alpha acts as a scaling factorโ€”typically set to 2ร—Rโ€”that maintains training stability when experimenting with different rank values. Without it, changing R would require manually adjusting learning rates each time. Target modules specify which neural network components get adapted. The default recommendation is the query and value projection matrices in attention blocks. Watch a deeper dive in this YouTube video youtu.be/8KLy3vcWR6M This video is part of our LLM Engineering & Deployment Certification program.
1
9,393
How well do you understand the inner workings of conversation memory in chatbots? Take this quiz to find out: โ€ข Handling multi-turn conversations โ€ข Limitations of conversation history โ€ข Sliding window vs. summarization โ€ข Essential components for persistence โ€ข Compliance and security in storage Self-assessment quiz. ~6โ€“8 minutes. Instant results. Take the quiz: readytensor.ai/assessments/aโ€ฆ
2
10,584
Chain-of-thought, ReAct, and self-ask integrate through config layers. Strategy instructions sit in YAML. Prompt templates reference a Reasoning Strategy key. The builder injects text conditionally near the end. Every task inherits the change from one location. The prompt system stays modular while reasoning behavior shifts. Implementation time drops when prompts aren't hardcoded. Watch how it wires together: youtu.be/mpyuTz5dKJE
3
14,981
How well do you know prompt engineering fundamentals for agentic AI? This quiz checks your understanding of the mechanics behind real LLM applications: โ€ข system prompts vs user prompts โ€ข grounding with external information โ€ข modular prompt design โ€ข reasoning techniques: Chain of Thought, ReAct, Self-Ask โ€ข prompt templates and dynamic inputs โ€ข API response metadata Multiple choice. 5โ€“7 minutes. Free self-assessment. Instant results. Take quiz here: readytensor.ai/assessments/aโ€ฆ Part of Ready Tensor's Agentic AI Essentials certification.
2
3
113
Long prompts bottleneck at first token; long outputs extend total generation time. Prefill processes the entire input in parallel to generate token one. A 4K token prompt took 3000ms for that single token. This is where long-context applications spend their time. Decode generates the rest sequentially. Doubling output length from 80 to 160 tokens roughly doubled total decode time. Prompt length barely affects per-token decode speed. If your app uses 8K prompts with 50-token outputs, you're optimizing prefill. If it's 200-token prompts with 2K outputs, you're optimizing decode. Full timing breakdown here: youtu.be/HRKFa8LIAQg #buildinpublic #ArtificialIntelligence
1
138
Short prompts with long outputs hit different bottlenecks than long prompts with short outputs. Testing a 100-token input with 200-token generation shows decode stage dominating end-to-end latency. A 2000-token input with 50-token output shows prefill stage spiking time-to-first-token instead. Both scenarios need measurement across 50 requests. Median tells you typical performance. P95 and P99 show what happens when things slow down. Single test scenarios miss half the picture. Your inference pipeline performs differently under each combination of input and output length. See the full metrics breakdown: youtu.be/DW-mo65DJ-Q #BuildInPublic #ArtificialInteligence
2
135
Where your fine-tuning data goes depends on which LLM you pick GPT-4 supports fine-tuning, but you send data to OpenAI for processing. No local training. No weight access. No control over the training loop. Llama or Mistral models download to your infrastructure. You run the fine-tuning job. Data never leaves your environment. Customization exists in both cases. Control over the process doesn't. See the comparison with working examples: youtu.be/QimfOFeih2Y #BuildInPublic #ArtificialInteligence
2
124
INT8 quantization gives you 1GB per billion parameters. That's the rule of thumb. A 7B model needs roughly 7GB in INT8, compared to 26GB in FP32. The difference comes from bytes per parameter. FP32 uses 4 bytes, INT8 uses 1 byte. A 13B model drops from 48GB to 12GB. A 70B model goes from 260GB to 65GB. Even expensive GPUs hit limits fast. A 40GB A100 can't hold a 13B model in full precision. Precision determines what fits before you write a single line of training code. See the full calculation: youtu.be/anhLHBi1pP4 #BuildInPublic #LLMs
1
2
166
Most problems don't need agentic AI. The signal: you need multiple tools, ambiguous goals, and some reasoning between steps. If it's well-defined enough for RPA or structured enough for operations research, use those instead. Agentic AI is complex to get working. We treat it as a last resort. The blocker isn't usually technical. It's explainability. Business users making high-stakes decisions won't adopt systems they can't trace or understand. Compliance applications are out. Privacy-sensitive contexts are risky. Anything requiring deterministic outputs doesn't fit. The mental model: would you give this to a smart intern and trust them to figure it out? If yes, maybe agentic AI. If it's more mechanical than that, or warrants a PhD-level research effort, reach for other tools. See the full breakdown: youtu.be/gOO1rF8P174 #BuildInPublic #ArtificialIntelligence
3
188
Long prompts bottleneck at first token; long outputs extend total generation time. Prefill processes the entire input in parallel to generate token one. A 4K token prompt took 3000ms for that single token. This is where long-context applications spend their time. Decode generates the rest sequentially. Doubling output length from 80 to 160 tokens roughly doubled total decode time. Prompt length barely affects per-token decode speed. If your app uses 8K prompts with 50-token outputs, you're optimizing prefill. If it's 200-token prompts with 2K outputs, you're optimizing decode. Full timing breakdown here: youtu.be/HRKFa8LIAQg #BuildInPublic #ArtificialIntelligence
2
155