House Stats: Real-Time UK Housing Market Analysis

For my Computer Science NEA, I set out to tackle a problem that frustrated me: despite the UK housing market being one of the most-analysed sectors in the economy, the public data is practically unusable for anyone who isn’t a data scientist or an investment firm. The Land Registry releases monthly CSVs with millions of transactions—but they’re difficult to visualise, painfully slow to process, and impossible to explore interactively.

So I built House Stats, a full-stack web platform for analysing UK house prices in real time.

Writeup GitHub

System Overview

System outline

1. Data Ingest

A Golang + Python pipeline downloads the latest Price Paid Data from the Land Registry every month.
It validates, parses, and uploads millions of transactions to a PostgreSQL database using a Kafka message queue and asynchronous workers.
Updates are applied automatically (insert, amend, delete) to keep the dataset current with minimal downtime.
Parses postcodes into it’s constituent parts for grouping and searching

Postcode Breakdown

2. Data Storage and Caching

Data is normalised into third normal form to eliminate duplication and improve query efficiency.
PostgreSQL handles relational integrity while Redis provides an in-memory cache for frequent queries.
Full-text indexes and partial postcode searching enable sub-second lookups across millions of records.

Database Schema

3. Data Processing

Analytical workloads (standard deviation, outlier removal, and percentage change analysis) are executed using PolaRS a Rust drop-in for Pandas
The system automatically scales to use multiple workers, allowing for batch processing of many areas simultaneously
Results are cached in a MongoDB instance to allow for instant fetching of large areas that can take a few minutes to process

4. Web Interface

A Flask API serves data to a Svelte + Chart.js front-end for highly interactive visualisations.
Users can search by town, postcode, or county; view price trends over time; and compare regions side-by-side.
Uses write ahead search capability to help users find the area they are looking for
Change time period interval to look at 1mo, 3mo, 6mo ,1yr

Design Philosophy

I wanted House Stats to make large-scale data accessible to everyone—from first-time buyers to professional surveyors.

Speed: Sub-second queries are crucial for a seamless UX.
Clarity: Data is visualised, not buried in spreadsheets.
Openness: Unlike commercial tools from Rightmove or Zoopla, House Stats is built for the public.

Tech Stack Highlights

Layer	Technology	Purpose
Backend	Flask, PostgreSQL, Redis, Dask	Data ingestion, processing & caching
Frontend	Svelte, Chart.js	Dynamic UI and visualisations
Data Pipeline	Python, Golang	Parsing, validation, batch uploading
Infrastructure	Docker, Proxmox, Linux (Ubuntu)	Deployment & virtualisation

Performance and Testing

Inserted 26 million+ transactions at ~24 k inserts/sec.
Results fetched in milliseconds, even across millions of rows.
Continuous integration with GitHub Actions ensures reliability.
Sentry handles live error logging and telemetry.

Why It Matters

Large real-estate companies already exploit these datasets with proprietary software—but individuals don’t have that luxury. House Stats bridges that gap, offering open access to national-scale housing analytics. It’s a blend of distributed computing, web development, and data science, packaged into a single cohesive system.

Final Thoughts

What started as an A-level project ended up teaching me full-stack architecture, deployment, and large-scale data engineering. It was the project that led gave me an interest in large scale data analysis and building big systems.

Read the writeup I submitted and got 67/70 for Writeup
Also here is all the code from it GitHub

House Stats: Real-Time UK Housing Market Analysis#

System Overview#

1. Data Ingest#

2. Data Storage and Caching#

3. Data Processing#

4. Web Interface#

Design Philosophy#

Tech Stack Highlights#

Performance and Testing#

Why It Matters#

Final Thoughts#