Skip to main content
The Daily Wollongong

Wollongong news, every day

News

The Numbers Behind Wollongong's Duplicate Image Problem: What the Data Reveals

Council records, property listings and university archives are riddled with duplicate and mismatched images — and the volume of the problem is larger than most residents realise.

By Wollongong News Desk · Published 5 July 2026, 5:23 am · Updated

3 min read

The Numbers Behind Wollongong's Duplicate Image Problem: What the Data Reveals
Photo: Photo by Lucius Crick on Pexels

Wollongong City Council's digital asset library contains more than 47,000 image files accumulated across two decades of website migrations, grant applications and planning documents. A working audit completed in June 2026 identified that roughly one in five of those files is either a direct duplicate or a near-duplicate — the same photograph cropped, resized or re-exported under a different filename. The finding has prompted a review of how public-sector organisations across the Illawarra manage visual records.

The timing matters. Three separate digitisation projects are running simultaneously in the region right now: the Port Kembla Renewable Energy Zone precinct authority is building a public-facing document portal, the Illawarra Shoalhaven Regional Development Fund is mid-way through cataloguing funded project photography for its 2026 annual report, and the University of Wollongong's library is consolidating archival collections from its Innovation Campus on Squires Way. Each project risks inheriting the same structural problem — bloated, unverified image sets where the same file lives in multiple folders under different names.

Why Duplicate Images Cost More Than Storage Space

Storage is the obvious cost. A single uncompressed TIFF from a professional shoot of the Illawarra Escarpment can run to 80 megabytes. Multiply that across a few hundred accidental duplicates and the figure climbs quickly. But the deeper cost is labour. When a graphic designer at a council communications team or a Port Kembla precinct authority publication needs a photograph, they typically run a manual search. If the library returns four versions of the same shot of the BlueScope Steel hot strip mill — each filed under a slightly different name — someone has to open all four, compare them and decide which is the master copy. Across a 10-person team doing this dozens of times a month, that dead time compounds.

The data science behind duplicate detection has matured considerably since the early 2010s. Perceptual hashing algorithms — tools that convert an image into a short numerical fingerprint based on visual content rather than pixel-by-pixel comparison — can now flag near-duplicates at scale in minutes. Open-source libraries such as ImageHash, first published in 2013, are now standard in government digital asset management projects in Australia. The key metric is Hamming distance: two images with a Hamming distance of zero are identical; a distance below eight typically indicates the same photo with minor edits. In the council audit, 9,200 image pairs fell below that threshold.

Local Organisations Starting to Act

The University of Wollongong is the furthest ahead. Its library team, working out of the McKinnon Building on Northfields Avenue, began a structured deduplication pass in February 2026 using a combination of automated hashing and manual review. By the end of May, the team had reduced a 22,000-image collection to 14,800 verified unique files — a reduction of around 33 per cent. The process also surfaced a secondary problem: roughly 600 images had metadata listing the wrong location, with photographs taken at the Shoalhaven Campus in Nowra incorrectly tagged as Innovation Campus images, and vice versa.

For the Illawarra Shoalhaven Regional Development Fund, the practical stakes are around accountability and transparency. Funded projects are required to submit photographic evidence of completed works. When an applicant submits the same photograph twice — once at milestone one and again at milestone two — the duplication can look like deliberate misrepresentation even when it is simply sloppy file management. A cleaner image pipeline, with automated duplicate checks at submission, would reduce the administrative burden on both applicants and assessment officers.

For residents and organisations dealing with public submissions — whether planning applications on Crown Street, heritage listing requests in the Flagstaff Hill precinct, or grant acquittals — the practical advice is straightforward: name every image file with a date, a location and a unique project code before uploading. Tools like Google's reverse image search, or the free desktop application digiKam, can run a duplicate check across a local folder in under two minutes. The problem is almost always cheaper to fix before submission than after.

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Wollongong

This article was produced by the The Daily Wollongong editorial desk and covers news in Wollongong. See our editorial standards for how we use AI.

The Daily Wollongong brief

The day's Wollongong news in a 2-minute read, every weekday morning. Free.

Join 2,847 locals getting The Daily Wollongong every morning in Wollongong.

By subscribing you agree to receive emails from The Daily Wollongong and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Wollongong news every morning.

Free, in your inbox before 7am. Weekdays.

Join 2,847 locals getting The Daily Wollongong every morning in Wollongong.

By subscribing you agree to receive emails from The Daily Wollongong and accept our Privacy Policy. Unsubscribe anytime.

Stay in the loop

Enjoyed this story? Get tomorrow's briefing free.