Before you analyze anything, you have to get the data in, and it never arrives in the format you'd choose.
New article on importing data in Python: CSV and Excel, pickle, SAS, HDF5, MATLAB, SQL databases, plus pulling data from the web with requests, BeautifulSoup, and JSON APIs.
#Python#DataScience#pandasdatalad.co.uk/importing-data…
Cookie loss, ITP, and consent refusals quietly undercount your conversions. Google's bidding algorithms then make worse decisions on incomplete data.
New article on #enhancedconversions in GTM: how hashed first-party data recovers lost conversions, fed cleanly through the dataLayer and gated behind consent.
#GoogleAds#GTM#GA4datalad.co.uk/enhanced-conve…
Most teaching datasets are too clean, so the hardest part of the job never gets practised.
I wrote up how I built a simulated churn dataset: planted duplicates, three kinds of missing data, dirty country labels, and a leakage trap that fakes a 0.90 AUC.
You can download it for free here and practice on it.
#DataScience#MachineLearning#Pythondatalad.co.uk/inside-the-bri…
Iframe checkout? Your purchase events are landing in a sealed room your GTM can't see into.
New article on fixing it with window.postMessage: send events from the iframe, validate the origin on the parent, and fire clean GA4 ecommerce events.
#GA4#GoogleTagManager#Analyticsdatalad.co.uk/iframe-trackin…
Most analytics can tell you a sale happened. Enhanced ecommerce tells you the story behind it: what got viewed, clicked, added, abandoned, and finally bought.
New guide on implementing GA4 ecommerce tracking in GTM: the dataLayer contract, standard events, and testing the full funnel.
#GA4#GoogleTagManager#Ecommercedatalad.co.uk/gtm-enhanced-e…
If you're learning data science and want a project that goes beyond "fit a model on clean data," I built a full churn prediction code-along: deliberately messy dataset, a hidden leakage trap, three missingness mechanisms, and a logistic regression that beats a random forest.
Everything is explained line by line, and the notebook plus data are free to download. The fun part: the "obvious" best feature is a trap, and spotting why is half the lesson.
Happy to answer questions if anyone works through it.
datalad.co.uk/churn-a-comple…
"Can you add our tags to your site?" doesn't have to be a risk conversation.
New article on GTM Zones: link partner containers, scope them with URL conditions, whitelist exactly what can fire, and audit the rest. Tag governance done properly.
#GTM#GoogleTagManager#MarTechdatalad.co.uk/gtm-zones-mana…
A decade on, XGBoost is still the king of tabular data.
New practical guide: fit and predict, DMatrix, cross-validation with early stopping, hyperparameter tuning, and building sklearn pipelines that don't leak.
#XGBoost#MachineLearning#Pythondatalad.co.uk/xgboost-a-prac…
A decade on, XGBoost is still the king of tabular data.
New practical guide: fit and predict, DMatrix, cross-validation with early stopping, hyperparameter tuning, and building sklearn pipelines that don't leak.
#XGBoost#MachineLearning#Pythondatalad.co.uk/xgboost-a-prac…
A class is a cookie cutter. Instances are the cookies. Once that clicks, Python OOP stops being intimidating.
New article covering classes, self, init, inheritance with super(), dunder methods, and custom exceptions that fail fast.
#Python#OOP#100DaysOfCodedatalad.co.uk/object-oriente…
The difference between [] and () in Python can be the difference between a script that streams 100 GB on a laptop and one that crashes.
New article on iterators, comprehensions, and generators: enumerate, zip, yield, and reading files too big for memory in chunks.
#Python#DataSciencedatalad.co.uk/python-iterato…
Most tutorials hand you clean data. This one doesn't.
A complete churn analysis in one notebook: messy labels, three kinds of missing data, a leakage trap that fakes 0.90 AUC, and a twist: logistic regression beats the random forest.
#DataScience#Python#MachineLearning
Free notebook dataset:
datalad.co.uk/churn-a-comple…
Run a large language model on your own laptop. No API keys, no per-token costs, full data privacy.
New article on Llama 3 with llama-cpp-python: decoding parameters, prompt engineering, guaranteed-valid JSON output, and building a chatbot that remembers the conversation.
#Llama3#LLM#Pythondatalad.co.uk/working-with-l…
Text, images, audio, and video in one workflow.
New article on multi-modal models with Hugging Face: zero-shot classification with CLIP, voice conversion, ControlNet image editing, video generation, and scoring it all with CLIP score.
#HuggingFace#AI#MachineLearningdatalad.co.uk/multi-modal-mo…
State-of-the-art language models in 3 lines of Python.
New article covers the pipeline API, fine-tuning with the Trainer, and every evaluation metric you need: BLEU, ROUGE, perplexity, exact match, toxicity, and more.
#LLM#AI#Pythondatalad.co.uk/introduction-t…
#HuggingFace puts state-of-the-art #AI into 3 lines of #Python.
New article: run text classification, zero-shot labeling, summarization, and document QA using pipeline() and the transformers library.
datalad.co.uk/working-with-h…
A #pointestimate tells you where the parameter likely is. A #confidenceinterval tells you how much to trust that answer. Full coverage of single means, proportions, two-sample comparisons, paired samples, and sample size planning.
datalad.co.uk/point-and-inte…
The #normaldistribution is not just a bell curve — it is the foundation of how #statistics reasons about populations, samples, and uncertainty. From expected values to the central limit theorem, a full walkthrough of the concepts that underpin statistical inference.
datalad.co.uk/the-normal-dis…
#Surveys ask. Observation watches. Knowing when to use which, and which survey format fits your #research context, is what separates a well-designed study from an expensive guess.
Full breakdown of methods, criteria, and trade-offs:
datalad.co.uk/survey-and-qua…