Instead of using the data provided in the BabyLM challenge, I opted for obtaining them from their sources, which added extra layers of filtering and complexity, revealing some discrepancies in the multimodal BabyLM data. I mention these in the paper.