Mohamed El-Geish

Mohamed El-Geish

Users
Tweets

Mohamed El-Geish

@elgeish

29 Oct 2025

To mitigate complex debugging of ML systems: Isolate phases of a model’s behavior as early as possible, and assert preconditions, postconditions, and invariants. Before weights are repeatedly updated, it's easier to debug issues such as initialization bugs: Were the weights initialized as expected? Since the model has not learned anything useful yet, are output values (e.g., classes) almost uniform for the first few training steps (given the assumptions in model architecture)? Instead of waiting for an entire epoch to smoke out surprises, curate a smoke-test dataset of extreme examples (e.g., the longest sequences) to exercise the training code and systems; for example, making sure these extreme batches fit in memory. Unless proven before, make sure your model can learn from the data by overfitting first; if it can't overfit, it can't generalize. For more ML practices to ship production systems, pre-order #ShippingMachineLearningSystemsBook shippingmlsystems.com

635

Mohamed El-Geish

Mohamed El-Geish

@elgeish

30 Jul 2025

Dealing with messy #data is common practice in the real world. Data can go missing for many reasons, such as human errors or data that were simply unavailable at the time; the more reasons why feature stores are critical (to monitor the health of features). Data can get corrupted in motion or at rest; even cosmic rays flip bits and wreak havoc! You may need to impute missing values — depending on how they got missing — or consider removing records or features with too many missing values (given that you have sufficiently useful data remaining afterward). Consider predicting house prices given features such as the number of rooms and square footage: simple heuristics can help impute missing values, within reasonable bounds, in either feature given the other. Alternatively, you may substitute missing values with a representative value (e.g., the median) or use a probabilistic model — only privy to observations in the train split — to predict the missing values. More importantly, you may need to consider the bias of the missing data, such as if a certain subgroup of the population is being underrepresented due to the missing data. You also need to understand why the data is missing in the first place and if this is a systematic issue that needs to be addressed. There are three types of missing data: Missing Completely at Random (MCAR), Missing at Random (MAR), and Missing Not at Random (MNAR). MCAR is when the probability of any value being missing is completely unrelated to anything else (observed or unobserved measurement). MAR is when missingness is related to another observed measurement but unrelated to the specific value of the data point per se. MAR and MNAR situations require careful consideration to mitigate systematic bias when imputing missing data. #ShippingMachineLearningSystemsBook

267

Mohamed El-Geish

Mohamed El-Geish

@elgeish

26 Jul 2025

Exploratory Data Analysis (EDA) A solid understanding of the data with which you work will help identify the most relevant features/representation and issues to handle. For example, by plotting the data with a histogram, you may identify potential outliers and assess data distributions, which can then be used to decide whether to use normalization or standardization to scale the data. You can also use a correlation matrix to determine correlations among features and identify redundant or irrelevant features; the same technique can also help to find relevant ones that correlate well with target labels. Box plots are a great tool to visualize outliers and skewness in data distributions across various features, among other statistics (e.g., the median and the interquartile range). #ShippingMachineLearningSystemsBook

818

Mohamed El-Geish

Mohamed El-Geish

@elgeish

25 Jul 2025

When developing #ML models, the quality and design choices of representation (how to represent data to the model) make a world of difference! Representation choices are influenced by domain knowledge and prior assumptions about the problem at hand. They need to be apt for the ML task and effectively capture relevant information conducive to success (as defined by training objectives and evaluation metrics). For example, which properties of a cake matter most when predicting appropriate packaging? Its dimensions and weight may suffice. Now imagine representing a cake to an ML system that predicts its ingredients? A chocolate cake and a carrot cake of the exact dimensions and weight have congruent representations when it comes to the first problem; however, the same cakes are different in the eyes of the latter. #ShippingMachineLearningSystemsBook

267

Mohamed El-Geish

Mohamed El-Geish

@elgeish

24 Jul 2025

If you're just starting to track ML experiments, a well-maintained, easily discoverable spreadsheet suffices for a small team that runs a few experiments monthly. Like most mechanisms, in a few months, it will look nothing like how it started (and you'll probably move to Weights & Biases or AimStack). At the very least, track the following details: • ID: a moniker or a version number to identify the experiment • Headline: a single-line description of the experiment • Contacts: directly responsible individual(s) to contact if needed • Start and End Dates: to track the period during which the experiment is active • Hardware: specific hardware configuration used to run the experiment • Report URL: where the report and associated writeups can be found • Code URL: where code lives (e.g., a permalink to a notebook in Git) • Data URL: where relevant datasets can be found (the exact version) • Logs URL: where logs (e.g., training logs) can be found • Model URL: where model deployment artifacts can be found (again, the exact version) Assuming your experiments target a somewhat standard set of experimental factors (e.g., hyperparameters) and performance metrics, you may include them in the spreadsheet for comparison. Limit the spreadsheet to commonly used details, leaving sparsely used ones to the detailed experiment reports. Each report should also include — at the very least — the hypothesis to test, the null hypothesis, and performance notches (when possible) indicating baseline, oracle, and human-level performances. It also needs to describe how to reproduce the experiment. Reproduction steps need to be unambiguous to the extent that a new hire can run them and reproduce the results (within acceptable variances). Reproducibility, especially on GPUs, is extremely hard, but that's a post for another time. #ShippingMachineLearningSystemsBook

281

Mohamed El-Geish

Mohamed El-Geish

@elgeish

13 May 2025

في فرق تطوير #ML، النجاح يتطلب تعزيز وتطوير "النماذج الذهنية" (Mental Models) لدى أفراد الفريق. أفضل الفِرَق التي تخصص جزءاً من وقتها لتبادل هذه النماذج الذهنية، واختبارها، وتقويتها بشكل منهجي. قد تأخذ هذه الممارسة شكل جلسات نقاش أسبوعية، أو تتطلب مجموعات قراءة منتظمة للبحوث، لتعزيز التفكير النقدي وتشجيع الفضول العلمي. هذا ما نراه في شركات كبرى مثل أمازون التي تعقد مؤتمرات داخلية لفرقها، أو شركات مثل Hugging Face و Weights & Biases التي تتيح مجموعات القراءة الخاصة بها علناً للجميع. المعادلة بسيطة: 70% من التعلم بالممارسة. 20% بالتفاعل مع الزملاء والتواصل معهم. 10% من المصادر الرسمية كالكتب والأبحاث العلمية. كمطور في مجال ML، من الرائع العمل في بيئة كهذه، تجمع بين الممارسة، والتفاعل، والتعليم المستمر. مقتطف من الفصل الثالث من كتابي القادم، نناقش فيه تحديات تطوير النماذج. #سلسلة_من_كتاب_جديد #ShippingMachineLearningSystemsBook

946