AI in Construction: Machine Learning

Why building your data foundation matters more than chasing algorithms, and what you can do today.

Collin Tsui

10/14/20253 min read

AI dominates today’s business headlines, with many articles talking about Tech, Finance, Software… computer centric industries. But what about Construction? We’ll explore each of these topics from a Construction perspective over a few blog posts.

Hordes of vendors line up to sell you AI-powered solutions. The first question I ask is which type of AI, because AI can mean many things:

  • Machine Learning (ML)

  • Large Language Model (LLM)

  • Agentic AI

This post explores Machine Learning, as part of a series on AI in Construction.

ML uses algorithms to find patterns in large datasets and make predictions or decisions based on those patterns. The system improves its accuracy as it processes more data, learning from examples rather than following explicit, pre-defined business rules.

The Volume Threshold

ML’s biggest drawback is the requirement for massive volumes of historical information to produce reliable results. Like, enormous datasets. Not "a few years of project history" enormous. Think "hundreds of completed projects" enormous.

For companies that straddle the industrial / commercial / residential sectors, or build vastly different projects like hospitals vs parking garages, new build vs renos, that means hundreds of projects for each grouping. Those hundreds of projects only train a small model for simple predictions. More complex projections can require tens of thousands, or even millions of data points.

If you're running a regional contracting business or specialty subcontractor, you likely don't have anything approaching that volume. If you do, data quality becomes the next consideration. Did each project capture data in the same way? That means breaking costs down by the same categories, accounts, and cost codes, or having the same rules of credit for progress. Most construction companies have inconsistently captured data. In ML, that’s called a data quality problem.

The organizations that can realistically deploy any ML in Construction are supermajor owners, large international EPCs, and academic institutions or consultants aggregating data from numerous sources.

I worked with a major EPC that waded into ML. They had the quantity of data, but lacked quality. In their quest to trailblaze for the industry, they embarked on a data cleaning project that expended thousands of overhead hours, taking more than a year to rectify decades old data.

What Works with Smaller Datasets

I had the opportunity to peer behind the curtain of a leading ML vendor in the late 2010s. It was immediately obvious to me that their Construction ML solution could be replicated with some basic statistical analysis (it had to be basic for me to recognize it). There’s no need for the slick user interface and very expensive consultants.

The good news is that the statistical approach requires far less data, and is already in use in Construction:

  • Performance Benchmarking can validate estimates by comparing against historic data broken into quartiles (four groups representing the 25% each, from the best to worst performers), and works with under a dozen projects.

  • Monte Carlo analysis simulates thousands of scenarios from a single data set, giving the probability of achieving a given result (e.g. P80, aka 80% chance of completing a project within contingency).

  • Time on Tools studies gathers its datasets in short and intense bursts, and can produce valuable recommendations with as few as 5-700 data points.

  • Variance Analysis identifies schedule risks with your past 10+ projects to highlight patterns.

These approaches only need small datasets, and tools like Excel (for one-time analysis) or Power BI (for repeat analytics, like dashboards), which you likely already have.

Your Strategic Focus

Before investing in ML capabilities, ensure you have:

  • Consistent data capture: Same metrics, same definitions, same timing across all projects. If your estimating team codes costs differently than your project managers, you don't have usable data—you have incompatible spreadsheets.

  • Centralized storage: Data scattered across project folders, superintendent notebooks, and individual spreadsheets can't be analyzed. Get it into a structured database or data warehouse. At the very least, get the data onto a computer and backed up, so it won’t be lost to time.

  • Basic analytics working: If you can't easily answer "What was our actual labor productivity on the last five concrete pours?" you're not ready for ML. Get those basic reports functioning first.

  • Proven decision-making process: Data only matters if someone acts on it. Build the habit of data-informed decisions with simpler analytics before adding ML complexity.

The Real Competitive Advantage

ML might give you a marginal edge eventually. But right now, your competitors likely aren't using their existing data effectively either.

The contractor who systematically captures project data and analyzes it (even if it’s “just” basic statistics on a project dashboard) beats the competitor with bad data on fancy ML algorithms.

Build that foundation now. It delivers immediate value through better BI. And if ML becomes genuinely useful for your scale later, you'll have the prerequisite data infrastructure already in place.

Ready to extract real value from your project data?

Is your team capturing data but not using it to drive decisions? Does the analysis use up far too many effort hours every month? Let's talk!

I build custom Power BI solutions that transform scattered construction data into reliable metrics to keep projects on time and on budget. Contact me today to see how to make your data clear.