Joined December 2012
535 Photos and videos
LLMOps: Post-Training LLM,s. SFT, DPO, ORPO, PPO, GRPO Parte 1 youtu.be/18eyk6_ZLZg In this video I will do a tour of the five techniques that take a pretrained language model from text completer to aligned assistant, SFT → DPO → ORPO → PPO → GRPO Part 2 Demo youtu.be/JnGS4_h0FgY In this video Part 1: In previous sessions built RL algorithms from scratch: MDPs and Bellman (1), DQN family (2), PPO/A2C/TRPO (3), continuous-control actor-critic — DDPG, TD3, SAC (4). Session 7 pivoted to libraries and an applied trading demo. This session pivots again — from agents acting in environments to agents acting in language. We already covert PPO, but we have never seen what it looks like when "the environment" is a tokeniser, "the reward" is a learned classifier, and "the policy" is a 1.5-billion-parameter transformer. #mlops #machinelearning #datascience #artificialintelligence
12
Reinforcement Learning: Libraries and RL in financial Trading youtu.be/7Ja9TOv2nIg In this video we go through the ecosystem of available Reinforcement Learning libraries in Python, then we will use Stable-Baselines3 to rebuild a stock trading agent on live data from yfinance — and we close with a demo of environments applied to MuJoCo robotics to link the entire stack. #mlops #machinelearning #datascience #artificialintelligence
36
Reinforcement Learning: Librerias y RL aplicado al Trading youtu.be/_YTG3rBLmMM En este vídeo recorremos el ecosistema de librerias disponible de Reinforcement Learning en Python, luego usaremos Stable-Baselines3 para construir un agente de trading bursátil sobre datos en vivo de yfinance — y cerramos con una demo de entornos aplicados a la robotica de MuJoCo para engarzar todo el stack. #mlops #machinelearning #datascience #artificialintelligence
28
Reinforcement Learning: Continuous Control, Actor-Critic Off-Policy Methods youtu.be/BULfoeeYWT8 In this video, we'll explore Continuous Reinforcement Learning Control algorithms, specifically Actor-Critic Off-Policy methods. DDPG, TD3, and SAC—from Lillicrap's original baseline (2015) to Haarnoja's entropy regularization framework (2018)—all within a self-contained, reproducible benchmark suite. This repository implements and compares three fundamental actor-critic off-policy algorithms for reinforcement learning with continuous action spaces. Each algorithm is documented line by line, evaluated on Pendulum-v1, and integrated into a shared toolchain for multi-seed aggregation and automatic report generation. #mlops #machinelearning #datascience #artificialintelligence

15
Agentic AI: AI Agents Evaluation. DeepEval, Inspect AI, and Azure AI Evaluation SDK youtu.be/-5yW3_EbnAw In this video, we'll see how to evaluate AI agents using three complementary ecosystems: - DeepEval for test-first, CI-oriented evaluation - Inspect AI for task-based pipelines (Dataset - Solver - Scorer) - Azure AI Evaluation SDK for native Azure agent metrics and batch evaluation The project demonstrates a practical, multi-framework approach to agent evaluation: - Validating final responses and intermediate behavior - Evaluating tool usability and adherence to user tasks - Testing security behavior against explicit criteria - Running unit and batch evaluations - Preparing evaluation patterns suitable for CI/CD and technical presentations #mlops #machinelearning #datascience #artificialintelligence
18
Reinforcement Learning: Control Continuo, Métodos Actor-Critic Off-Policy youtu.be/nUCvGLtzZeQ En este vídeo exploraremos los algoritmos de Reinforcement Learning Control Continuo, Métodos Actor-Critic Off-Policy DDPG · TD3 · SAC — desde el baseline original de Lillicrap (2015) hasta el framework de regularización por entropía de Haarnoja (2018), todo en una suite de benchmarks reproducible y auto-contenida. Este repositorio implementa y compara tres algoritmos actor-critic off-policy fundamentales para reinforcement learning con espacios de acción continuos. Cada algoritmo está documentado línea a línea, evaluado sobre `Pendulum-v1`, e integrado en un toolchain compartido de agregación multi-seed y generación automática de reportes. #mlops #machinelearning #datascience #artificialintelligence
31
youtu.be/b2ixu-lt1Ac Agentic AI: Evaluación de Agentes DeepEval, Inspect AI y Azure AI Evaluation SDK En este vídeo vamos a ver como valuar agentes de IA con tres ecosistemas complementarios: - 'DeepEval' para evaluación test-first y orientada a CI - 'Inspect AI' para pipelines estructurados por tarea ('Dataset - Solver - Scorer') - 'Azure AI Evaluation SDK' para métricas agénticas nativas de Azure y evaluación por lotes El proyecto demuestra un enfoque multi-framework y práctico para evaluación de agentes: - validar respuestas finales y comportamiento intermedio - evaluar calidad de uso de herramientas y adherencia a tareas de usuario - probar comportamiento de seguridad con criterios explícitos - ejecutar evaluaciones unitarias y a escala de dataset (batch) - preparar patrones de evaluación aptos para CI/CD y presentaciones técnicas #mlops #machinelearning #datascience #artificialintelligence

51
Reinforcement Learning: Advanced Policy Optimization. A2C, A3C, PPO, GRPO and TRPO youtu.be/JYGg_U3UVgg In this video, we'll explore the most advanced Policy Optimization algorithms: A2C, A3C, PPO, and TRPO. We'll also give a brief introduction to GRPO. In Part 1 , we saw how REINFORCE learns directly from complete episodes but suffers from high variance. In this second session, we'll build the four algorithms that solve this problem, one by one, starting from the same mathematical foundation. #mlops #machinelearning #datascience #artificialintelligence

17
Reinforcement Learning: Policy Optimization Avanzada. A2C, A3C, PPO , TRPO y GRPO youtu.be/LxMhGWlckqA En este video exploraremos los algoritmos mas avanzados de Policy Optimization A2C, A3C, PPO y TRPO. Una pequeña introducción también a GRPO #mlops #machinelearning #datascience #artificialintelligence
1
53
Reinforcement Learning: Policy Optimization Introduction. Reinforce to PPO to RLHF In this video, we'll explore RL Policy Optimization — REINFORCE from scratch: math, code, and connection to RLHF. We'll build from the ground up how REINFORCE works — the policy gradient algorithm that forms the basis of PPO and LLM fine-tuning with RLHF. No prior RL knowledge is required. We'll start with MDP and work our way up to functional code in PyTorch. #mlops #machinelearning #datascience #artificialintelligence youtu.be/mfjWWOsCNIo

16
Reinforcement Learning: Introduccion a Policy Optimization. Reinforce #... youtu.be/CZgASJmRLvg?si=vXU_… via @YouTube En este video exploraremos RL Policy Optimization — REINFORCE desde cero: matemática, código y conexión con RLHF. Construimos desde los fundamentos cómo funciona REINFORCE — el algoritmo de policy gradient que es la base de PPO y del finetuning de LLMs con RLHF. #mlops #machinelearning #datascience #artificialintelligence
1
29
Reinforcement Learning: Advanced algorithms Q-Learning, Rainbow DQN #art... youtu.be/rnw8qE0fisw?si=_Ekz… via @YouTube In this video, we explore distributional reinforcement learning (C51, QR-DQN), Rainbow DQN, and Replay of Hindsight Experience (PER). What if predicting a single Q value isn't enough? In Part 3 of this series on deep reinforcement learning, we move beyond scalar returns and tackle four of the most powerful value-based methods in deep reinforcement learning. #mlops #machinelearning #datascience #artificialintelligence
53