Tens of thousands of duplicate image files are clogging the digital infrastructure of Wollongong's major institutions, inflating storage costs and complicating the public's ability to find reliable records online. The problem is measurable, well-documented in information management literature, and local organisations are only beginning to grapple with the scale of it.
The timing is not accidental. Across NSW, government agencies and universities have spent the past three years accelerating their shift to cloud-based document management systems. Every migration event — from legacy servers to platforms like SharePoint or purpose-built archives — creates duplication. Files get copied, renamed, re-uploaded. Images, which carry the heaviest file sizes of any document type, accumulate fastest.
What the Numbers Actually Look Like
Research published in peer-reviewed information science journals has consistently found that between 20 and 40 per cent of files stored in institutional digital repositories are exact or near-exact duplicates. Apply that conservative lower bound to a mid-sized local government body like Wollongong City Council — which manages planning records, heritage photography, engineering documentation and community event imagery across multiple departments — and the redundancy problem becomes significant in both storage cost and search accuracy terms.
Cloud storage, while cheaper per gigabyte than on-premise servers a decade ago, still carries real costs. Enterprise-tier cloud storage in Australia typically runs between $0.02 and $0.05 per gigabyte per month depending on the provider and access tier. A repository carrying 50,000 duplicate image files, each averaging 4 megabytes, represents roughly 200 gigabytes of wasted storage. That is a modest figure in isolation, but institutions rarely hold just one repository. Council, the University of Wollongong, TAFE NSW Illawarra, and the Illawarra Shoalhaven Local Health District each maintain separate digital asset systems, and the duplication problem compounds across all of them.
At the University of Wollongong's Innovation Campus on Squires Way, North Wollongong, researchers working with satellite and drone imagery for coastal monitoring and BlueScope Steel's industrial transition projects generate large image datasets routinely. Those datasets pass through multiple hands — researchers, HDR students, industry partners — and without enforced deduplication protocols, the same georeferenced image files frequently appear under different filenames across shared drives.
Local Programs Starting to Address the Problem
Wollongong City Council's Digital Transformation Strategy, referenced in Council meeting agendas from late 2024, identifies records integrity as a priority area. The Illawarra Shoalhaven Regional Development Fund has separately directed grant funding toward digital capacity-building for smaller regional organisations, though specific allocation figures for records management projects have not been made public.
The Port Kembla-based industrial precincts, where planning documentation related to the Renewable Energy Zone is expanding rapidly, are generating new image-heavy records at pace — aerial surveys, environmental impact photography, infrastructure mapping. Each planning application lodged with Council triggers a document package that can run to hundreds of files, and the current intake systems at the Council offices on Burelli Street do not automatically flag duplicates at lodgement.
Deduplication software — tools that hash image files and identify matches regardless of filename — has dropped substantially in price. Open-source options exist, and commercial platforms marketed to local government now offer per-seat licensing starting around $300 annually. The barrier is not primarily cost. It is workflow change and the staff training required to embed deduplication as a standard step before any migration or archive event.
For Wollongong organisations dealing with this right now, information management professionals recommend three practical steps: run a hash-based audit of existing repositories before any planned cloud migration, establish a single source-of-truth folder structure with naming conventions enforced at the point of file creation, and set a quarterly review schedule rather than waiting for a migration event to surface the problem. The audit stage alone typically reduces storage load by 15 to 30 per cent in institutional settings, based on published case studies from Australian university libraries. That figure, applied locally, suggests real budget savings are sitting unclaimed inside systems that organisations in the Illawarra already own.