House Stats: Real-Time UK Housing Market Analysis

For my Computer Science NEA, I set out to tackle a problem that frustrated me: despite the UK housing market being one of the most-analysed sectors in the economy, the public data is practically unusable for anyone who isn’t a data scientist or an investment firm. The Land Registry releases monthly CSVs with millions of transactions—but they’re difficult to visualise, painfully slow to process, and impossible to explore interactively.

So I built House Stats, a full-stack web platform for analysing UK house prices in real time.

Writeup    GitHub


System Overview

System outline

1. Data Ingest

  • A Golang + Python pipeline downloads the latest Price Paid Data from the Land Registry every month.
  • It validates, parses, and uploads millions of transactions to a PostgreSQL database using a Kafka message queue and asynchronous workers.
  • Updates are applied automatically (insert, amend, delete) to keep the dataset current with minimal downtime.
  • Parses postcodes into it’s constituent parts for grouping and searching

Postcode Breakdown

2. Data Storage and Caching

  • Data is normalised into third normal form to eliminate duplication and improve query efficiency.
  • PostgreSQL handles relational integrity while Redis provides an in-memory cache for frequent queries.
  • Full-text indexes and partial postcode searching enable sub-second lookups across millions of records.

Database Schema

3. Data Processing

  • Analytical workloads (standard deviation, outlier removal, and percentage change analysis) are executed using PolaRS a Rust drop-in for Pandas
  • The system automatically scales to use multiple workers, allowing for batch processing of many areas simultaneously
  • Results are cached in a MongoDB instance to allow for instant fetching of large areas that can take a few minutes to process Quick Stats

4. Web Interface

  • A Flask API serves data to a Svelte + Chart.js front-end for highly interactive visualisations.
  • Users can search by town, postcode, or county; view price trends over time; and compare regions side-by-side.
  • Uses write ahead search capability to help users find the area they are looking for
  • Change time period interval to look at 1mo, 3mo, 6mo ,1yr

Design Philosophy

I wanted House Stats to make large-scale data accessible to everyone—from first-time buyers to professional surveyors.

  • Speed: Sub-second queries are crucial for a seamless UX.
  • Clarity: Data is visualised, not buried in spreadsheets.
  • Openness: Unlike commercial tools from Rightmove or Zoopla, House Stats is built for the public.

Tech Stack Highlights

Layer Technology Purpose
Backend Flask, PostgreSQL, Redis, Dask Data ingestion, processing & caching
Frontend Svelte, Chart.js Dynamic UI and visualisations
Data Pipeline Python, Golang Parsing, validation, batch uploading
Infrastructure Docker, Proxmox, Linux (Ubuntu) Deployment & virtualisation

Performance and Testing

  • Inserted 26 million+ transactions at ~24 k inserts/sec.
  • Results fetched in milliseconds, even across millions of rows.
  • Continuous integration with GitHub Actions ensures reliability.
  • Sentry handles live error logging and telemetry.

Why It Matters

Large real-estate companies already exploit these datasets with proprietary software—but individuals don’t have that luxury. House Stats bridges that gap, offering open access to national-scale housing analytics. It’s a blend of distributed computing, web development, and data science, packaged into a single cohesive system.


Final Thoughts

What started as an A-level project ended up teaching me full-stack architecture, deployment, and large-scale data engineering. It was the project that led gave me an interest in large scale data analysis and building big systems.

Read the writeup I submitted and got 67/70 for Writeup
Also here is all the code from it GitHub