Predictive SQL: Running Machine Learning Models Directly in Your Database — The Complete In-Database ML Guide for BigQuery, Snowflake and Redshift
Your ML training data is already in your database. Your predictions belong there too. This guide shows you how to build, evaluate, and deploy machine learning models without ever leaving SQL.
The traditional machine learning workflow has a friction problem. Data lives in your warehouse. You export it to a notebook. You engineer features in Pandas. You train a model in scikit-learn. You write a deployment pipeline to get predictions back into the warehouse. For every model iteration you repeat this cycle — and each iteration takes hours of data engineering work that has nothing to do with the actual machine learning.
In-Database ML eliminates this friction entirely. BigQuery ML, Snowflake Cortex ML, and Amazon Redshift ML let you train machine learning models, evaluate their performance, and generate batch predictions using SQL — directly where your data already lives. No exports. No notebooks. No deployment pipelines. One CREATE MODEL statement, one ML.EVALUATE query, one ML.PREDICT query. Done.
Predictive SQL: Running Machine Learning Models Directly in Your Database is the complete in-database machine learning guide for data scientists and advanced analysts who want to build production-quality ML models using SQL — covering classification, regression, clustering, recommendations, time series forecasting, feature engineering, hyperparameter tuning, and model evaluation across BigQuery ML, Snowflake Cortex ML, and Amazon Redshift ML.
Every model in this guide is built with working SQL you can run today. Every evaluation metric is explained in plain language. Every chapter includes real business use cases with actual predicted outputs.
What's Inside:
✅ Introduction — Why in-database ML changes the workflow for data scientists — a nine-row comparison table showing traditional Python ML versus in-database SQL ML across every workflow step, demonstrating 80 to 90 percent time savings per model iteration, plus a clear guide to which use cases belong in SQL ML and which still require Python
✅ Chapter 1 — BigQuery ML tutorial — the complete four-statement BigQuery ML workflow explained with purpose and example for each statement, a complete 13-model reference table showing every available model_type with use case and label type, and a full end-to-end customer churn prediction model from data preparation through CREATE MODEL, ML.EVALUATE, and ML.PREDICT with color-coded output showing churn probability and risk tier per customer
✅ Chapter 2 — Linear regression in SQL — 12-month customer lifetime value prediction model built entirely in SQL, feature weight analysis showing each dollar of first-90-day revenue predicts $2.84 in annual LTV, boosted tree regression comparison achieving 40 percent lower error than linear regression with R² of 0.89, and model evaluation result box interpreting every metric in business terms
✅ Chapter 3 — Classification models in SQL — fraud detection binary classifier with auto_class_weights for severe class imbalance achieving ROC AUC of 0.9812 and 92.3 percent recall, confusion matrix and ROC curve analysis for threshold optimization, and multi-class next product category prediction using random forest classifier with probability scores per category
✅ Chapter 4 — Machine learning using only SQL — k-means customer segmentation producing five named segments (Champions, Loyal, At-Risk, Lost, Big Spenders) with revenue and behavioral profiles, matrix factorization product recommendation engine generating top-5 recommendations per customer using implicit purchase feedback, and pure SQL IQR anomaly detection for daily revenue monitoring without any external library
✅ Chapter 5 — Forecasting sales in SQL database — ARIMA_PLUS time series model with automatic seasonality detection, US holiday effects capturing Black Friday lift of $187,234 and Christmas effects, 90-day forecast with 80 percent confidence intervals, multi-product forecasting training 100 separate SKU models in a single CREATE MODEL statement, and ML.ARIMA_EVALUATE for model comparison with MAPE benchmark guidance
✅ Chapter 6 — Feature engineering in SQL — complete temporal feature extraction including cyclical sin and cos encoding that prevents models from treating December and January as distant months, lag features at 1, 7, 14, 28, and 365-day intervals, rolling mean and standard deviation windows, exponentially weighted means, and entity embedding patterns for high-cardinality features like product_id and customer_id
✅ Chapter 7 — Model evaluation and hyperparameter tuning in SQL — a complete 10-metric reference table covering R², MAE, RMSE, MAPE, ROC AUC, precision, recall, F1, log loss, and MASE with good values and when to use each, automated hyperparameter search with BigQuery ML HPARAM_TUNING achieving 53 percent MAE improvement over untuned baseline, and SQL-based model A/B comparison across three model versions on the same holdout set
✅ Chapter 8 — Snowflake Cortex ML and Redshift ML — Snowflake Cortex FORECAST and DETECT_ANOMALIES functions requiring no training step, Cortex LLM functions for sentiment analysis, classification, and summarization of customer review text directly in SQL, Redshift ML training through SageMaker Autopilot from a CREATE MODEL SQL statement, and a complete 11-row platform comparison table across BigQuery ML, Snowflake Cortex, and Redshift ML
✅ Bonus — Complete SQL ML Reference — all 15 BigQuery ML functions in one copy-paste reference block from CREATE MODEL through DROP MODEL, a production automated daily batch scoring pipeline inserting churn predictions to a results table on a schedule, and a 10-row use case decision guide mapping business questions to ML approach and BigQuery ML model_type
This guide is perfect for:
- Data scientists who spend more time on data plumbing than on actual modeling and want to eliminate the export-notebook-pipeline cycle
- Advanced SQL analysts who want to build real machine learning models without learning Python or scikit-learn
- Analytics engineers building dbt models who want to add ML predictions to their transformation pipelines
- Data engineers who manage cloud data warehouses and want to offer ML capabilities to their analytics teams without building separate infrastructure
- Business analysts who understand SQL deeply and want to move into predictive analytics and ML
- Anyone whose data already lives in BigQuery, Snowflake, or Redshift and who wants predictions from that data with minimum friction
Your ML training data is in the warehouse. Your predictions belong there too.
The most valuable machine learning models your organization can build are sitting one CREATE MODEL statement away from your existing SQL skills. Customer churn models. Revenue forecasting models. Fraud detection models. Product recommendation engines. Customer segmentation. All of them — built, evaluated, and deployed in SQL — without a single Python import statement.
Predict Insights. Drive Decisions. All in SQL.
Instant digital download. Run your first ML model in SQL today.
Note: Primary examples use BigQuery ML with Google Cloud. Snowflake Cortex ML and Amazon Redshift ML examples require active accounts on those platforms. BigQuery ML pricing is based on data processed — model training and prediction queries are billed at standard BigQuery on-demand rates.
© 2026 Lexi Grace Products | LexiGraceProducts@gmail.com | payhip.com/LexiGrace