If you're just starting to track ML experiments, a well-maintained,
easily discoverable spreadsheet suffices for a small team that runs a few
experiments monthly. Like most mechanisms, in a few months, it will
look nothing like how it started (and you'll probably move to Weights & Biases or AimStack). At the very least, track the following details:
• ID: a moniker or a version number to identify the experiment
• Headline: a single-line description of the experiment
• Contacts: directly responsible individual(s) to contact if needed
• Start and End Dates: to track the period during which the experiment is active
• Hardware: specific hardware configuration used to run the experiment
• Report URL: where the report and associated writeups can be found
• Code URL: where code lives (e.g., a permalink to a notebook in Git)
• Data URL: where relevant datasets can be found (the exact version)
• Logs URL: where logs (e.g., training logs) can be found
• Model URL: where model deployment artifacts can be found (again, the exact version)
Assuming your experiments target a somewhat standard set of experimental factors (e.g., hyperparameters) and performance metrics, you may include them in the spreadsheet for comparison. Limit the spreadsheet to commonly used details, leaving sparsely used ones to the detailed experiment reports. Each report should also include — at the very least — the hypothesis to test, the null hypothesis, and performance notches (when possible) indicating baseline, oracle, and human-level performances. It also needs to describe how to reproduce the experiment. Reproduction steps need to be unambiguous to the extent that a new hire can run them and reproduce the results (within acceptable variances).
Reproducibility, especially on GPUs, is extremely hard, but that's a post for another time.
#ShippingMachineLearningSystemsBook