Stop Dev Image Cache Pollution In Production R2

by Alex Johnson 48 views

h1. Critical Alert: Stop Dev Image Cache Pollution in Production R2

Introduction

Hey there, development wizards and operations gurus! Today, we're diving deep into a critical issue that's been quietly causing headaches for our production environment. We're talking about image caching in development, and how it’s inadvertently polluting our production R2 bucket. This isn't just a minor glitch; it’s a situation that requires immediate attention to prevent further contamination of our production CDN and maintain the integrity of our data. The core of the problem lies in a lack of environment awareness within our image caching pipeline. Every time a scraper runs or an image caching service is triggered in a development environment, it’s attempting to upload assets to the very same R2 bucket that serves our live production content. This leads to a cascade of negative effects, from incorrect cache entries and wasted storage to the alarming potential for overwriting live production images. We need to implement a robust, layered defense strategy to ensure that development activities remain strictly isolated from production infrastructure. This article will break down the root cause, illustrate the flow of data causing the problem, and detail the recommended fixes, ensuring we can all sleep soundly knowing our production environment is secure and efficient. Let's get this sorted, so our production CDN, cdn.wombie.com and cdn2.wombie.com, remains pristine and our R2 storage costs are kept in check.

Understanding the Impact: Why Dev Image Caching is a Production Problem

When development image caching starts uploading to the same R2 bucket as production, the consequences can be quite severe and far-reaching. First and foremost, dev data pollutes the production CDN. This means that users accessing cdn.wombie.com or cdn2.wombie.com might inadvertently be served development assets, leading to a broken or inconsistent user experience. Imagine a user seeing a placeholder image or an internal development graphic instead of the intended product image – not ideal! Beyond the immediate user-facing issues, there’s the problem of potential for orphaned files in R2 that never get cleaned up. Development processes often involve temporary assets or test data that aren't meant for long-term storage. Without proper isolation, these files can accumulate in the production bucket, consuming valuable storage space and making it harder to manage and audit our assets. This accumulation also leads to incorrect cache entries mixing dev and prod data. When the CDN tries to serve content, it might pull a cached version that’s actually from a development run, causing unexpected behavior and debugging nightmares. Furthermore, there's a significant risk of overwriting production images if entity IDs happen to collide. If a development process uses an entity ID that coincidentally matches an ID in production, there's a real danger that the dev data could overwrite the live image, causing data loss and service disruption. Lastly, all of these issues contribute to wasted R2 storage costs. We’re paying for storage that’s being consumed by non-production data, which is an unnecessary drain on resources. Addressing this isn’t just about tidiness; it’s about maintaining performance, security, and cost-effectiveness for our production systems.

Root Cause Analysis: Where the Pipeline Went Wrong

The root cause analysis reveals a fundamental lack of environment awareness in our image caching pipeline. Let’s break down how this happens across different components. Firstly, the Oban Queue Runs in All Environments. Looking at our config/runtime.exs file, specifically at line 112, we see image_cache: 3. This configuration indicates that the image_cache queue is active and processing jobs not just in production, but also in development and testing environments. This means that every job enqueued for image caching, regardless of its origin, is being picked up and processed by a worker. Secondly, the ImageCacheService Queues Jobs Unconditionally. In lib/eventasaurus_app/images/image_cache_service.ex, the cache_image function is defined as follows: def cache_image(entity_type, entity_id, position, original_url, opts \ []) do ... ImageCacheJob.new(...) |> Oban.insert() end. Noticeably absent is any check to see if the current environment is production. Because of this, every call to cache_image will create a CachedImage record in the database (even the development one) and then enqueue an ImageCacheJob into Oban. This happens consistently, irrespective of whether the application is running locally or in the cloud. The final piece of the puzzle is that the R2Client Uploads to the Same Bucket. The R2Client module, specifically in its lib/eventasaurus_app/services/r2_client.ex file, has a bucket function that determines where to upload files. It’s defined as `defp bucket do r2_config[:bucket] || System.get_env(