olonok_ml_engineer

olonok_ml_engineer

535 Photos and videos

Tweets

olonok_ml_engineer

@AI_MLengineer

LLMOps: Post-Training LLM,s. SFT, DPO, ORPO, PPO, GRPO Parte 1 youtu.be/18eyk6_ZLZg In this video I will do a tour of the five techniques that take a pretrained language model from text completer to aligned assistant, SFT → DPO → ORPO → PPO → GRPO Part 2 Demo youtu.be/JnGS4_h0FgY In this video Part 1: In previous sessions built RL algorithms from scratch: MDPs and Bellman (1), DQN family (2), PPO/A2C/TRPO (3), continuous-control actor-critic — DDPG, TD3, SAC (4). Session 7 pivoted to libraries and an applied trading demo. This session pivots again — from agents acting in environments to agents acting in language. We already covert PPO, but we have never seen what it looks like when "the environment" is a tokeniser, "the reward" is a learned classifier, and "the policy" is a 1.5-billion-parameter transformer. #mlops #machinelearning #datascience #artificialintelligence

LLMOps: Post-Training LLM,s. SFT, DPO, ORPO, PPO, GRPO Part 1...

In this video I will do a tour of the five techniques that take a p...

youtube.com

olonok_ml_engineer

olonok_ml_engineer

@AI_MLengineer

Jun 10

Reinforcement Learning: Libraries and RL in financial Trading youtu.be/7Ja9TOv2nIg In this video we go through the ecosystem of available Reinforcement Learning libraries in Python, then we will use Stable-Baselines3 to rebuild a stock trading agent on live data from yfinance — and we close with a demo of environments applied to MuJoCo robotics to link the entire stack. #mlops #machinelearning #datascience #artificialintelligence

Reinforcement Learning: Libraries and RL in financial Trading...

In this video we go through the ecosystem of available Reinforcemen...

youtube.com

olonok_ml_engineer

olonok_ml_engineer

@AI_MLengineer

Jun 6

Reinforcement Learning: Librerias y RL aplicado al Trading youtu.be/_YTG3rBLmMM En este vídeo recorremos el ecosistema de librerias disponible de Reinforcement Learning en Python, luego usaremos Stable-Baselines3 para construir un agente de trading bursátil sobre datos en vivo de yfinance — y cerramos con una demo de entornos aplicados a la robotica de MuJoCo para engarzar todo el stack. #mlops #machinelearning #datascience #artificialintelligence

Reinforcement Learning: Librerias y RL aplicado al Trading #artific...

En este vídeo recorremos el ecosistema de librerias disponible de ...

youtube.com

olonok_ml_engineer

olonok_ml_engineer

@AI_MLengineer

Jun 4

Reinforcement Learning: Continuous Control, Actor-Critic Off-Policy Methods youtu.be/BULfoeeYWT8 In this video, we'll explore Continuous Reinforcement Learning Control algorithms, specifically Actor-Critic Off-Policy methods. DDPG, TD3, and SAC—from Lillicrap's original baseline (2015) to Haarnoja's entropy regularization framework (2018)—all within a self-contained, reproducible benchmark suite. This repository implements and compares three fundamental actor-critic off-policy algorithms for reinforcement learning with continuous action spaces. Each algorithm is documented line by line, evaluated on Pendulum-v1, and integrated into a shared toolchain for multi-seed aggregation and automatic report generation. #mlops #machinelearning #datascience #artificialintelligence

olonok_ml_engineer

olonok_ml_engineer

@AI_MLengineer

May 31

Agentic AI: AI Agents Evaluation. DeepEval, Inspect AI, and Azure AI Evaluation SDK youtu.be/-5yW3_EbnAw In this video, we'll see how to evaluate AI agents using three complementary ecosystems: - DeepEval for test-first, CI-oriented evaluation - Inspect AI for task-based pipelines (Dataset - Solver - Scorer) - Azure AI Evaluation SDK for native Azure agent metrics and batch evaluation The project demonstrates a practical, multi-framework approach to agent evaluation: - Validating final responses and intermediate behavior - Evaluating tool usability and adherence to user tasks - Testing security behavior against explicit criteria - Running unit and batch evaluations - Preparing evaluation patterns suitable for CI/CD and technical presentations #mlops #machinelearning #datascience #artificialintelligence

Agentic AI: AI Agents Evaluation. DeepEval, Inspect AI, and Azure AI...

In this video, we'll see how to evaluate AI agents using three comp...

youtube.com

olonok_ml_engineer

olonok_ml_engineer

@AI_MLengineer

May 29

Reinforcement Learning: Control Continuo, Métodos Actor-Critic Off-Policy youtu.be/nUCvGLtzZeQ En este vídeo exploraremos los algoritmos de Reinforcement Learning Control Continuo, Métodos Actor-Critic Off-Policy DDPG · TD3 · SAC — desde el baseline original de Lillicrap (2015) hasta el framework de regularización por entropía de Haarnoja (2018), todo en una suite de benchmarks reproducible y auto-contenida. Este repositorio implementa y compara tres algoritmos actor-critic off-policy fundamentales para reinforcement learning con espacios de acción continuos. Cada algoritmo está documentado línea a línea, evaluado sobre `Pendulum-v1`, e integrado en un toolchain compartido de agregación multi-seed y generación automática de reportes. #mlops #machinelearning #datascience #artificialintelligence

Reinforcement Learning: Control Continuo, Métodos Actor-Critic...

En este vídeo exploraremos los algoritmos de Reinforcement Learning...

youtube.com

olonok_ml_engineer

olonok_ml_engineer

@AI_MLengineer

May 20

youtu.be/b2ixu-lt1Ac Agentic AI: Evaluación de Agentes DeepEval, Inspect AI y Azure AI Evaluation SDK En este vídeo vamos a ver como valuar agentes de IA con tres ecosistemas complementarios: - 'DeepEval' para evaluación test-first y orientada a CI - 'Inspect AI' para pipelines estructurados por tarea ('Dataset - Solver - Scorer') - 'Azure AI Evaluation SDK' para métricas agénticas nativas de Azure y evaluación por lotes El proyecto demuestra un enfoque multi-framework y práctico para evaluación de agentes: - validar respuestas finales y comportamiento intermedio - evaluar calidad de uso de herramientas y adherencia a tareas de usuario - probar comportamiento de seguridad con criterios explícitos - ejecutar evaluaciones unitarias y a escala de dataset (batch) - preparar patrones de evaluación aptos para CI/CD y presentaciones técnicas #mlops #machinelearning #datascience #artificialintelligence

olonok_ml_engineer

olonok_ml_engineer

@AI_MLengineer

May 18

Reinforcement Learning: Advanced Policy Optimization. A2C, A3C, PPO, GRPO and TRPO youtu.be/JYGg_U3UVgg In this video, we'll explore the most advanced Policy Optimization algorithms: A2C, A3C, PPO, and TRPO. We'll also give a brief introduction to GRPO. In Part 1 , we saw how REINFORCE learns directly from complete episodes but suffers from high variance. In this second session, we'll build the four algorithms that solve this problem, one by one, starting from the same mathematical foundation. #mlops #machinelearning #datascience #artificialintelligence

olonok_ml_engineer

olonok_ml_engineer

@AI_MLengineer

May 16

Reinforcement Learning: Policy Optimization Avanzada. A2C, A3C, PPO , TRPO y GRPO youtu.be/LxMhGWlckqA En este video exploraremos los algoritmos mas avanzados de Policy Optimization A2C, A3C, PPO y TRPO. Una pequeña introducción también a GRPO #mlops #machinelearning #datascience #artificialintelligence

Reinforcement Learning: Policy Optimization Avanzada. A2C, A3C, PPO y...

En este video exploraremos los algoritmos mas avanzados de Policy O...

youtube.com

olonok_ml_engineer

olonok_ml_engineer

@AI_MLengineer

May 12

x.com/i/article/205408726818…

olonok_ml_engineer

olonok_ml_engineer

@AI_MLengineer

May 12

Reinforcement Learning: Policy Optimization Introduction. Reinforce to PPO to RLHF In this video, we'll explore RL Policy Optimization — REINFORCE from scratch: math, code, and connection to RLHF. We'll build from the ground up how REINFORCE works — the policy gradient algorithm that forms the basis of PPO and LLM fine-tuning with RLHF. No prior RL knowledge is required. We'll start with MDP and work our way up to functional code in PyTorch. #mlops #machinelearning #datascience #artificialintelligence youtu.be/mfjWWOsCNIo

olonok_ml_engineer

olonok_ml_engineer

@AI_MLengineer

May 7

Reinforcement Learning: Introduccion a Policy Optimization. Reinforce #... youtu.be/CZgASJmRLvg?si=vXU_… via @YouTube En este video exploraremos RL Policy Optimization — REINFORCE desde cero: matemática, código y conexión con RLHF. Construimos desde los fundamentos cómo funciona REINFORCE — el algoritmo de policy gradient que es la base de PPO y del finetuning de LLMs con RLHF. #mlops #machinelearning #datascience #artificialintelligence

Reinforcement Learning: Introduccion a Policy Optimization. Reinforce...

En este video exploraremos RL Policy Optimization — REINFORCE desd...

youtube.com

olonok_ml_engineer

olonok_ml_engineer

@AI_MLengineer

May 2

Reinforcement Learning: Advanced algorithms Q-Learning, Rainbow DQN #art... youtu.be/rnw8qE0fisw?si=_Ekz… via @YouTube In this video, we explore distributional reinforcement learning (C51, QR-DQN), Rainbow DQN, and Replay of Hindsight Experience (PER). What if predicting a single Q value isn't enough? In Part 3 of this series on deep reinforcement learning, we move beyond scalar returns and tackle four of the most powerful value-based methods in deep reinforcement learning. #mlops #machinelearning #datascience #artificialintelligence

Reinforcement Learning: Advanced algorithms Q-Learning, Rainbow DQN...

In this video, we explore distributional reinforcement learning (C5...

youtube.com