DL captures human speech perception both *qualitatively* & *quantitatively* (R2>96%) for over 400 combinations of exposure and test items. Yet, previous DL models fail to capture important limitations. Specifically, we find that DL seems to proceed by remixing prev experience 2/2