K-Means vs K-Means++: Smarter Centroids, Better Clusters
K-Means++ is a clever upgrade to K-Means that fixes its biggest flaw: random initialization. Instead of picking all k centroids at random, K-Means++: Picks the first centroid randomly from the data points. For each remaining point ( x ), compute its shortest distance ( D(x) ) to the nearest chosen centroid. Choose the next centroid from the dataset with probability proportional to ( D(x)^2 ). Repeat until ( k ) centroids are selected. This spreads centroids out more effectively and leads to: ...
I Built an Open Source AI Powered SaaS Market Intelligence Tool for Marketing Teams. Here's How
So, the idea for this came from a chat I had with Dhruv, the CEO of Middleware. He pointed out that I should think of myself as a ‘builder-marketer,’ not just a marketer, and that I should build stuff that proves it. I thought about it, and he had a point. So I decided to build a project that would be useful for marketing and growth, but also something I could deploy myself. ...
ARIMA vs. SARIMA: A Practical Guide to Choosing the Right Time Series Model
When you’re working with time series data, naturally things are going to point towards forecasting. And two of the most reliable classical tools for this are the ARIMA and SARIMA models. To be honest I was quite confused when I first learnt about these a year ago but they do make a ton of intuitive sense once you understand them on a deeper level. The names are similar, and so is their underlying logic, but there is a key difference that more or less decides which one you will pick in what scenario. ...
Full Code Walkthrough - Reducing Churn in E-Commerce with Predictive Modelling
If you read part 1 of this series, ala Churn Prediction for E-Commerce with Predictive Modelling, you know I recently wrapped up a full end-to-end churn prediction project as part of my postgrad program. That article was the 30,000-foot view – the business problem, the segmentation insights, the high-level model results. With this one I simply walk you through the code. But instead of just dumping code snippets for you to copy pasta, I want to walk you through what I actually did and, more importantly, why it matters. ...
A Primer to Framing Business Problems for Machine Learning
A stakeholder comes to your desk. They’re excited. “We need to use AI,” they say, “to improve customer retention.” You nod, open your editor, and you start thinking. Should I use XGBoost? Or maybe a neural network? How will I set up the pipeline? Stop. Right there. This is the single biggest mistake many of us make when we’re starting out: we jump straight to thinking about solutions and algorithms. ...
Why ARIMA and SARIMA Still Matter: A Technical Guide to Time Series Forecasting
Deep Learning Gets the Spotlight, But Time Series Still Solves Real Problems In the machine learning landscape today, deep learning models - transformers, LSTMs, and other neural networks steal the show. They’re impressive, powerful, celebrated and make you feel smart too when you use them. However, when it comes to forecasting business metrics like sales, demand, or inventory, deep learning isn’t always the answer. Traditional time series models, especially ARIMA (AutoRegressive Integrated Moving Average) and its seasonal extension SARIMA, are some of the most effective and interpretable methods for forecasting structured temporal data. ...
22 Lessons from 1 year in Data Science and Machine Learning
It’s been a year in data science and machine learning. Okay, I lied. Technically a full year and a few months since I officially splooted (wanted to show off my extensive vocabulary) into the world of data science and machine learning with my master’s program. In late 2023 I started learning data science through a Udemy course and in January of 2024 I gave up. Well, not exactly per say. ...
Reducing Churn in E-Commerce: My End-to-End Capstone Project in Predictive Modeling
Customer churn isn’t just a marketing problem - it’s a business survival issue. In competitive industries like e-commerce, losing one customer often means losing several revenue streams, especially when one account can represent multiple users. This post is a breakdown of my churn prediction capstone project for the postgraduate data science program at UT Austin - also tied to my master’s in data science at Deakin U. The project was closed-source, so I can’t release the full notebook, but I’ll walk you through everything I did including code snippets, results, charts, what I learned, and where this project fits in my larger journey into machine learning and MLOps. ...
A Beginner Explains Pointers in Go: The Thing That Made Me Quit Coding (And Why I'm Back Now)
I learned about pointers 10 years ago… and I hated them. I still hate them. But now at least I can explain them, which, by some strange software industry law, qualifies me for a senior dev role. Flashback: Engineering School and Existential Crisis Back in electronics engineering, we were handed C++ for one semester and told, “Here, go build your future.” By Day 2, we were doing pointers - not the helpful kind, but the kind that break your code, your compiler, and your soul. ...
Day 5–8: Fine-Tuning AI Models, Learning MLOps, and Structuring My Year of Projects
It’s been a few days since I posted an update here. Not because I wasn’t learning - in fact, the opposite. I was working through a mix of things, from structuring my year-long learning roadmap to actually fine-tuning large language models for the first time. Why I Wasn’t Posting Daily Simple: I needed to zoom out a bit. I took a couple of days to sketch out what I want to build in the next few months — not just random toy apps, but meaningful projects that actually challenge me to grow. ...