• DataDrivenAEC
  • Posts
  • 🔍 Rethinking Compliance with AI: Meet Jack Shi, the Mind Behind BuildThemis

🔍 Rethinking Compliance with AI: Meet Jack Shi, the Mind Behind BuildThemis

Week 33 – Aug 18, 2025

When we think about AI in architecture, we often imagine flashy renderings or project scheduling tools. But what about the less glamorous — yet crucial — task of building code compliance?

This week, we’re spotlighting Jack Shi, a President’s Graduate Fellow and PhD researcher at the National University of Singapore. With a background spanning climate forecasting, AI model development, and hands-on work with Singapore’s CORENET X, Jack brings a rare cross-disciplinary lens to one of AEC’s toughest data challenges: translating dense, often ambiguous building regulations into machine-readable logic.

His research project, BuildThemis, explores how large language models (LLMs) can support—not replace—experts in making compliance faster, more transparent, and less manual. From fine-tuning open-source models to navigating the intricacies of legal-technical language, Jack’s work shows how technical curiosity and real-world complexity can drive breakthrough innovation in the built environment.

1. Quick Intro

DataDrivenAEC: How would you describe your research in one sentence?

Jack Shi: Using large language models for automated code compliance of building regulations.

DataDrivenAEC: What first inspired you to work at the intersection of AI and building regulations?

Jack Shi: My path to the intersection of AI and building regulations was an evolution of my interests. I started with a general passion for AI and its potential. My initial focus during my undergraduate was on climate science, where I used AI for complex tasks like weather forecasting. That experience was invaluable for honing my technical skills in machine learning research.

My perspective shifted during an internship with LeapThought. There, I was introduced to Singapore’s CORENET X and the intricate challenge of automating building code compliance. It was fascinating to see a real-world application where rigid, rule-based systems were essential. It happened just as large language models were gaining significant traction. I was wondering along the lines of whether we could leverage the nuanced, language-based understanding of LLMs to enhance the process of automated code compliance? This question became the foundation for my final year dissertation project, and has now blossomed into the core focus of my PhD research. I approach the problem from an AI/ML perspective, so I'm fortunate to be collaborating closely with experts in building regulations, whose domain knowledge is crucial to the success of this work.

2. BuildThemis & Impact

DataDrivenAEC: In simple terms, what is BuildThemis and why does it matter?

Jack Shi: In the paper, we view BuildThemis as a code compliance tool, similar to popular existing AI tools like Cursor/Claude Code etc, to assist programmers in converting textual regulations into computer-processable rules. It matters because this can reduce the time spent during rule interpretation, which is a manual and time-consuming process.

DataDrivenAEC: Who do you see benefiting the most from it in real-world use?

Jack Shi: I would think the programmers who are tasked with converting the textual regulations into computer-processable formats can benefit from it the most, as they can have a draft script to begin with instead of a blank state from scratch.

3. Data & AI

DataDrivenAEC: What makes building regulation data challenging for AI to work with?

Jack Shi: There are many challenges regarding building regulation data that makes it relatively harder for AI to address. Some examples are that 1) building regulation data is often highly unstructured, 2) building data can be non-static i.e., building regulations are regularly amended and updated, 3) data may be very localized i.e., data in Singapore are different from those in other countries, which introduces language and grammatical variations that can hinder standard NLP approaches from generalizing well, and 4) implicit information within the data i.e., ambiguous terms like ‘sufficient’ are hard to interpret for an AI without contextual understanding.

DataDrivenAEC: Can you share a moment when the AI surprised you — good or bad?

Jack Shi: My experience with surprise in AI is often two-sided. On one hand, I'm consistently impressed by the rapid pace of AI research. When a new model like DINOv3 is released (just a few days ago as of this writing), its capabilities are remarkable.

The 'bad' surprise comes when applying these tools to my work in building regulations. Seeing that even sophisticated models struggle with the domain's dense jargon, intricate cross-references, and implicit knowledge is a clear indicator of the problem's difficulty. So the real surprise for me is this recurring theme: the significant gap between a model's theoretical power and its practical performance in a specialized field. I hope to bridge this gap.

4. Fine-tuning vs Closed Models

DataDrivenAEC: Why did you choose to fine-tune an open-source model instead of using a closed one?

Jack Shi: I believe using open-source models is the way to go for research. I can have full control over the hyperparameter settings and how the data is inputted. Also, there are concerns regarding data privacy if I use closed-source models.

DataDrivenAEC: How did the fine-tuning process go, and what did you learn?

Jack Shi: The fine-tuning process went great. In fact, most of the work was regarding data preparation and data cleaning etc. What I learned is that for a specialized domain like this, fine-tuning is less about tweaking hyperparameters (in fact, we did not do hyperparameter tuning, since that isn’t the main focus. We wouldn’t want to meticulously tweak the hyperparameters just for our approach to work) and more about good data curation. A smaller, cleaner, and more accurate dataset (with good system prompt etc) can be far more effective than a larger, messier one. An interesting observation was the significant performance variation between different base models. I suspect this is due to the nature of their higher quality pre-training data.

5. Advice & Future

DataDrivenAEC: What’s one common misconception about AI in compliance checking?

Jack Shi: I think one common misconception about AI is that it will replace rule experts (e.g., engineers, programmers, etc) for compliance checking. I think it is more of an augmentation tool, very similar to what I mentioned above i.e., code assistance tool.

DataDrivenAEC: Where should a newcomer start if they want to work in AI for AEC?

Jack Shi: In my opinion, AI in AEC has not really taken off yet (practical standpoint, not from research standpoint). From what I see, AEC is one of the slowest sectors to adopt AI. LLM agents have been utilized in the industry, whereas it's barely just starting in the AEC industry. I would advise a newcomer to focus more on the technical aspects of AI first, before applying the work to AEC.

DataDrivenAEC: What’s next for BuildThemis?

Jack Shi: BuildThemis is just the start. We can further enhance the framework by introducing techniques that iteratively refine the code outputs (what we are working on at the moment). We would also like to look into other parts of automated code compliance, such as creating knowledge graphs for building representations and how to ensure good BIM data for code compliance checking

💡 Like this kind of insight? Subscribe to DataDrivenAEC to get future interviews and tech updates straight to your inbox.

Here are this week's top tech stories relevant to AEC professionals:

  1. 600 AEC Pros Spoke – Now Innovators Show the Way Forward
    A webinar discusses how industry innovators are using new data-driven approaches to change project delivery methods.
    Read more

  2. How AI is Shaping the Future of AEC Design
    AI technology is revolutionizing the architecture, engineering, and construction (AEC) industries by offering unprecedented design capabilities.
    Read more

  3. Around the Geospatial, 3D, and AEC Industries
    The Esri User Conference highlighted AI and 3D scanning as crucial trends that AEC leaders need to adopt.
    Read more

  4. Building the Critical Facilities for the UK’s Life Sciences and Technology Ambitions
    Developments focus on sustainable, local sourcing and effective waste management in the UK’s facility expansions.
    Read more

  5. Digital Transformation of Real Estate Hits Roadblock at AI Data
    Technical challenges with AI data management are slowing the digital transformation in real estate.
    Read more

  6. A Newly Patented Design Can Cool Data Centers with 50% Less Energy
    Shumate Engineering introduces an energy-efficient cooling system for data centers.
    Read more

  7. Buildots Expands Autodesk Integration for AI-Powered Construction Management
    The integration aims to streamline construction management by providing AI-driven insights.
    Read more

  8. Race to Build AI Infrastructure Might be Hampered by Outdated Construction Tech
    As AI demands grow, construction technology needs to evolve to avoid slowdowns.
    Read more

  9. A Book on Common-Sense Architecture
    This publication explores how people-centered design enhances architectural outcomes.
    Read more

  10. Smart Parenting: Designing Human-Centered IoT Solutions
    Research applies Human-Centered Design approaches to manage energy use in homes of newborn parents through IoT.
    Read more

  11. Integrating Behavioral Science Into Urban Planning
    A framework for integrating behavior science aims to improve urban spatial design.
    Read more

  12. The Must-Know PropTech Tools for Real Estate Developers in 2025
    New PropTech tools are transforming how real estate developers operate, leveraging AI and data analytics.
    Read more

Reply

or to participate.