The article highlights the "Scientific Discovery Evaluation" (SDE) benchmark, led by Deep Principle. Unlike traditional evaluations, SDE does not consist of isolated, difficult questions. Instead, it distills 43 research scenarios and a total of 1,125 interrelated tasks from 8 ongoing, real-world research projects with unpublished data.