We can draw these lines because we run the whole loop: collect the data, train and evaluate models on it, feed what we learn back in.
Better data takes a human willing to hold it to a standard. That's the difference between training a robot and filling a hard drive.