Replacing Backgrounds With Diffusion Models

Written by Dr. Efrat Taig | Oct 2, 2024 10:15:21 PM

TL;DR

Blog covers naive inpainting and advanced AI solutions for background replacement
Advanced solutions require custom data and specialized model training
Explains implementation, custom dataset creation, and specialized model training
Compares BRIA and Stability AI models, highlighting BRIA's strong performance
Addresses gap between academic research and industry needs in AI
Introduces "Source Code Available" model balancing innovation and fair compensation

The Evolution of Background Replacement in Images: From Manual Labor to AI Artistry

Abstract
This article presents an in-depth exploration of the fascinating journey of background replacement techniques in digital imagery, tracing their evolution from the painstaking manual methods of the past to the cutting-edge, AI-powered solutions of the present. We delve into the intricacies of implementing a naive approach using inpainting models, meticulously examining its limitations and proposing an advanced solution that harnesses the power of custom data creation and model training.

Furthermore, we address the complex challenges that arise in bridging the gap between academia and industry in the realm of AI development. We propose a novel "Source Code Available" market model, spearheaded by BRIA, which aims to strike a delicate balance between fostering innovation and ensuring fair compensation for content creators. This groundbreaking model seeks to democratize access to legally-sourced AI models while upholding the principles of proper attribution and royalty distribution.

The article concludes by introducing a revolutionary attribution model for AI-generated content, designed to guarantee the equitable distribution of royalties to the original artists and data providers who form the bedrock of this transformative technology.

How to Change Image Backgrounds with AI: From Challenges to Solutions

Changing the background of an image has long been a staple task in digital image editing. What was once a painstaking process involving careful selection, cutting, and pasting in software like Photoshop has evolved significantly with the advent of AI-powered tools. However, despite these advancements, achieving perfect results remains a challenge. In this blog series, we'll explore the journey of developing effective background-changing solutions and tackle the hurdles they present.

The Evolution of Background Changing Techniques

Traditionally, changing an image's background required:

Precise pixel-by-pixel selection of foreground objects
Careful cutting and pasting onto a new background
Skillful harmonization of lighting, textures, and colors

This process demanded both technical precision and artistic finesse—a combination not always easy to achieve, especially for those of us more comfortable with algorithms than aesthetics.

The Promise and Challenges of AI Solutions

Today's AI-powered tools have streamlined this process considerably. However, they're not without their limitations:

Results can be inconsistent
There's no one-size-fits-all solution
Each project often requires customization and a tailored approach

From Research to Production: Bridging the Gap

While research papers and demos showcase impressive results, translating these into production-ready tools presents its own set of challenges. The rapid pace of innovation in generative AI adds another layer of complexity, making it difficult to determine the best approach.

Starting Simple: The Value of Experimentation

As we begin this journey, we'll start with a basic approach using inpainting models. While this method may seem naive and often produces imperfect results, it's a crucial first step in understanding the intricacies of the process.

In the upcoming posts, we'll explore:

Experimenting with open-source and commercially available inpainting models
1. In both options (Bria & Stability) You can use the models weights — download them to your local environment, generate masks for the background, and then use code to run the models to generate new backgrounds for your images.
Generating masks for background replacement
Implementing these models in a local environment
Analyzing the results and identifying areas for improvement

By the end of this series, you'll have a comprehensive understanding of the challenges involved in AI-powered background changing and the strategies to overcome them.

Stay tuned as we dive deep into the world of AI-driven image manipulation!

Original image, Background mask(Bria RMBG 1.4 alpha mask). To replace the background in my profile picture using the naive approach, I’ll provide the inpainting model with a mask that isolates just the background—it’s like instructing the model, "Focus only on replacing the background pixels.

Tip: Bria offers a legal and high-quality Background removal (RMBG) model that can be trained on your own data for better performance on specific styles or tasks. There are other options available, like Remove.bg , Dis and BiRefNet, each with its strengths. Personally, I believe Bria is the best due to its strong baseline and ease of further training — but full disclosure, I might be a bit biased. 😊 If you have your own data and want to enhance RMBG for a specific task, feel free to contact us! We've had great success stories from customers who tailored the model to meet their unique needs.

Step 1: Creating Masks

Masks play a crucial role in background replacement by defining which parts of the image should remain untouched and which areas should be modified. Without a proper mask, the model won't know how to differentiate between the foreground object and the background you want to replace. This is where models Remove Background come in handy, as they generate precise masks that help guide the inpainting model. By using these masks, you can ensure more accurate and controlled background swaps, leading to higher quality results. Got it? It’s simple! Want to swap a background? Just provide a mask for it!

Use Bria’s Remove background model or Bria’s Remove background demo to separate the background from the foreground object — the resulting mask will allow you to guide the inpainting model and swap out the background. You can download the model and run it locally (here’s the link for download and setup instructions :Remove background model. Note that the results are in black and white, so you might need to invert the colors — white to black and vice versa.

Step 2 : Integrate Background Mask with Inpainting Models

We started by creating a mask to separate the background from the object using tools like Bria’s Remove Background model. Once you have the mask, input it into the inpainting model of your choice. The quality of both the input image and mask is crucial, as it directly affects the final results. You can even apply various morphological operations to refine the mask for better outcomes.

Now, with your mask and image ready, it’s time to run the inpainting model to swap backgrounds. Below is a code example for performing background replacement using two leading models: Bria and Stability AI. While both perform inpainting, they use different underlying foundation models to achieve the result

Original image, mask (Bria RMBG 1.4 alpha mask) , image after RMBG

For my own test, I swapped the background of my new profile picture using Bria’s inpainting model first. Here's the code! (from here)

<Example Bria Code>

<Example Stability Code>

Here are some additional results I’ve generated. Take a look and judge for yourself how well this approach works.

Some more results for using inpainting model as a replace background model

Testing with Demo Applications: An Easy Alternative

Alternatively to running the code locally, you can test this through demo applications (just draw a mask using the brush tool available in the demo, it’s easy). Here’s a link to Bria’s demo. Bria also have an implementation of the inpainting model using a FAST model (integrated with LCM LORA).

<You can check it out here>

<Here’s a link to Stability’s demo>

Results: A First Attempt at Background Replacement

Well…. I tried swapping the background of my new profile picture using this concept, here are the results from the code I shared above: let’s just say, you don’t need to be an artist to notice the results weren’t exactly stunning! It’s heading in the right direction, but in its current state, it’s not quite usable yet.

SDXL inpainting result

Bria 2.3 inpainting result

Challenges in Background Replacement: A Realistic Perspective

When attempting naive background replacement, you might encounter some challenges along the way. While the approach can yield decent results in about 1 out of 10 tries, it's important to keep in mind that the process may not always work flawlessly. Issues such as harmonization, lighting, alignment, gaps, data leakage, data duplication, and resolution differences between the foreground and background can cause the final output to appear slightly off.

However, don't let these challenges discourage you! The fact that good results are rare presents an exciting opportunity for growth and improvement. By understanding and addressing these issues, you can develop a more robust and reliable background replacement solution.

NOTE: It's worth noting that the examples presented in research papers are often carefully selected to showcase the best outcomes. In contrast, a real-world product needs to work at scale and consistently satisfy customers. This highlights the importance of rigorous testing, refinement, and continuous iteration to bridge the gap between academic research and practical applications.

When evaluating the results of your background replacement efforts, it's crucial to look beyond the initial impression. Upon closer inspection, you may notice subtle imperfections such as slightly mismatched lighting, awkward shadows, color inconsistencies, distortions, duplicated objects, bent straight lines, asymmetry, overly bright areas, or elements that seem out of place. As algorithm developers, we may not always have a keen eye for these nuances, but that's where collaboration with artists and designers can be invaluable.

Remember, the journey to creating a seamless and effective background replacement solution is an iterative process. By acknowledging the challenges, learning from them, and continuously refining your approach, you can gradually improve the quality and consistency of your results. Embrace the opportunity to grow, experiment, and push the boundaries of what's possible in the field of background replacement.

The Challenge of Real-World Application: Why 10% Success Isn’t Enough

There’s no doubt — the idea is great. The technology is almost “there,” but in practice, getting a successful outcome in just 10% of the cases doesn’t really help us solve our real-world problems.

Tuning Parameters

There are numerous variables to consider here: What should the prompt be? Should you include the main foreground object in the description, or just focus on the background? How detailed should that description be? What about the seed value, the CFG Scale, and the number of steps? Although you could set up a benchmark and try optimizing the results by adjusting these parameters, it's likely not a good use of your time.

While adjusting parameters might help, it's only worth the effort if you have a lot of time to spare. In practice, well-configured scripts are generally robust, and you probably won't achieve significant improvements just by tweaking parameters. There's a saying that goes: "Looking for your lost coin where there's light, not where you actually lost it." It implies focusing on what's easy or obvious rather than addressing the real issue. This is how I view spending more than minimal time on parameter optimization.

Playing with a parameter? Manually tweaking? — highly likely to be a story that never ends and is a probably a waste of time

Advanced Solution: Custom Masks for Better Results
The initial approach involved using the inpainting models in their default state. For a more sophisticated solution, we will train the models to better suit our specific needs. Which models should we use? How do we train them? Let's explore these questions in detail.

Establishing a Solid Foundation: Utilizing Models as Intended

To achieve consistent results, it's crucial to use these models as foundational elements. They are called "foundation" models for a reason! Start with them as a base and then develop processes that enhance your specific task. This means understanding the strengths and limitations of each model and leveraging them to build a more tailored solution.

My strategy for this type of work is always to break down tasks into smaller, manageable parts. This approach not only makes the process more approachable but also allows for more precise adjustments and improvements at each step. For instance, you might start by focusing on creating high-quality masks before moving on to fine-tuning the inpainting model. By tackling one aspect at a time, you can ensure that each component is optimized, leading to better overall performance.

It might sound paradoxical, but the more you break down tasks into smaller, manageable pieces, the more robust your solution will be. So, what should you do? Tailor the process to the problem at hand. If your issue is background generation (or replacement — whatever you call it), then train a model specifically designed to excel at generating backgrounds.

Focusing on the Specific Task: Overcoming the Limitations of Inpainting Models

The inpainting model is a foundation model, trained on millions of images and masks and prompt, designed to fill in missing parts of images, but it’s trained on random masks. This approach means that the model learns to fill in both backgrounds and objects in a somewhat scattered manner. While it can handle a variety of tasks, it often ends up being mediocre at both. Instead of relying on a model that tries to do it all, it’s better to focus on the specific problem you’re trying to solve. By creating targeted data and a precise workflow, you can achieve much better results.

How to create a targeted data and a precise workflow? We will start by choosing our foundation model, and continue in creating a custom data.

Choosing a Foundation Model:

When choosing a foundation model, you can start with Bria or Stability, as both are compatible with this training code and both are available for download through the provided links. Bria offers a legally sourced model, free for academic use (with a paid option for commercial use, where in both cases the weights can be downloaded). On the other hand, Stability’s model is fully open for all, but it was trained on data scraped from the internet. Each model has its own pros and cons, so it’s recommended to check this benchmarks report to find the best fit for your needs.

Creating Custom Data for Successful Training

First and foremost —
We all know this, and honestly, it could be the subject of an entire blog series (or dozens of them!). Training a model is all about the “holy trinity”: data, algorithms, and compute power.

“holy trinity”: data, algorithms, and compute power.

And if you’ve been paying attention to the literature, you’ll know that data is the most critical component of the three.

Speaking from experience, here’s my rule of thumb: 10% poor-quality data can lead to a 30% reduction in model performance. It’s that serious. So even though data might not be the most exciting part of the process (at least not for me), don’t underestimate it. It’s absolutely crucial.

"Rule of thumb: 10% poor-quality data can lead to a 30% reduction in model performance. It’s that serious."

Now, let’s talk about one of the biggest data challenges: images.

It might sound funny, but this is probably one of the toughest challenges for us developers because it’s a question that comes from a totally different world — legal and commercial concerns. How do we make sure the content we’re training on is both legal and fair? It’s a bit of a headache, right? (For more on that, check post).

Choosing the Right Type of Data for the Task:

If your goal is to replace backgrounds for retail products, then use retail images. If you’re working on replacing the background for stock photos from image banks, grab those — think perfect lighting and studio setups that follow the rule of thirds. But if you’re dealing with replacing the background in a personal user gallery, make sure to stick with the images that have natural lighting, phone-quality resolutions, and that charmingly amateur vibe. And if you’re diving into the world of replacing background in selfies, well, you’d better train with plenty of selfies.

Customizing Masks for Targeted Background Replacement

We don’t need a model that tries to do everything but ends up being average at best — “jack of all trades, master of none.” Our goal is clear: focus solely on background completion. To achieve this, we need to customize both our data and approach specifically for this task. The default masks used in training inpainting models are a legacy from groundbreaking papers, like this one and this. These papers were among the first to release code for generating masks and training models, and they laid the foundation for many of the inpainting tools that have since evolved into production-level products. However, these default masks were designed for general purposes, which means they may not always be the best fit for specific tasks like background replacement.

Instead of using random masks, like the ones shown in Fig (taken from Free-Form Image Inpainting with Gated Convolution), we’ll create custom masks tailored specifically to background replacement.

Bottom line: The right type of mask for training completely depends on the use case of the model. If our use case is background completion, it’s highly recommended to use masks specifically designed for that purpose.

Random masks, taken from Free-Form Image Inpainting with Gated Convolution)

Left: Original images, without any masks.

Middle: Images with illustrations of random masks—essentially, random scribbles.

Right: Images with background masks, specifically isolating the background. If we train the model using random masks, it will learn to complete random areas in the image, which might be suitable for cases where we need to fill arbitrary gaps. However, if our goal is background generation —it makes more sense to use masks that clearly define the background. This ensures the model will be more accurate and effective in completing background areas.

Creating Custom Masks for Background Replacement

As mentioned earlier, we need custom masks—not random ones—tailored to the specific task at hand. Let’s use Bria’s RMBG 1.4 model to effectively separate the background from the foreground.

We’ll focus on creating masks that describe only the background, ensuring the process is as targeted and effective as possible.

Background customized masks create using Bria RMBG 1.4

For Business: Make Sure to License the Remove Background Model
If you’re a business planning to use the model extensively or put it into production, remember that you’ll need to purchase a license—this is how we do business, after all. The model is available, but don’t steal it. You’re welcome to develop and integrate it into your pipelines, but if it goes into production, make sure to get the appropriate license.

Make Sure to License the Remove Background Model

Disclosure

Full disclosure: The author is the VP of Generative AI Technology at Bria, holds a Ph.D. in Computer Vision, and has many years of experience in generative AI. She has extensive expertise in training models (both fine-tuning and from scratch), writing pipelines, and conducting practical, theoretical, and applied research in various areas of computer vision.

View full post