Bridging the Gap: From Academic AI to Ethical Business Models

Written by Dr. Efrat Taig | Oct 3, 2024 9:08:59 PM

In this insightful blog post, we explore the world of AI model development, focusing on quality data and effective training strategies.
The author, Dr. Efrat Taig, drawing from experience at Bria, shares valuable tips on curating datasets, implementing a two-phase training approach, and optimizing learning rates.
The post also touches on the importance of open source in AI development. It introduces Bria's innovative "Source Code Available" market concept, which aims to bridge the gap between academic research and commercial applications.
Additionally, the blog discusses copyright challenges in AI-generated content and presents Bria's unique solution: an Attribution Model that ensures fair compensation for original creators. This comprehensive piece offers both technical insights and thought-provoking ideas about the future of AI development and ethics.

In the fast-paced world of Artificial Intelligence, breakthrough technologies often struggle to bridge the gap between academic research and practical, commercial applications. As we've seen in our exploration of background replacement using diffusion models, the journey from concept to product is rarely straightforward. And what happens after we solve the technical challenges? How do we ensure cutting-edge AI technologies work and respect ethical considerations and copyright laws? And how can businesses leverage these advancements while fairly compensating the creators and data providers who make it all possible?

This blog will explore these essential questions, examining the intersection of AI technology, industry needs, and ethical considerations.

As the VP of Generative AI Technology at Bria, I've had a front-row seat to these challenges, and I'm excited to share my insights on how we're working to create a more balanced and fair AI ecosystem. We'll explore the multifaceted world of AI commercialization, from the importance of open source in driving innovation to developing new business models that respect creator rights.

Whether you're a developer, a business leader, or simply curious about the future of AI, this discussion will illuminate the often-overlooked aspects of bringing AI from the lab to the real world.

Join me as we explore how companies like Bria are reimagining the AI industry, striving to create a future where technological advancement and ethical considerations coexist.

Let's start with the foundation of any good model: data. At Bria, we've learned that quality always trumps quantity when it comes to data.

Here's what we've discovered

Beware the Leaky Data: Your goal is to complete the background, not extend the foreground object. Invest time in curating your data carefully. If your background removal isn't precise, you might end up with a model that learns to fill gaps by extending the foreground – that's data leakage, and it's a real headache.
Embrace the Multi-Aspect Ratio: Don't limit yourself to one aspect ratio. Incorporate various sizes in your dataset, but aim for consistency with about one million pixels per image. Fair warning: this might create chaos in your data loader, but the payoff in flexibility is worth it.
Filter, Filter, Filter: Don't ignore those pesky residual errors from your background removal process. It's easier to curate than to generate, so don't skip this step. Remember, if you train on errors, your model will generate errors.

The Two-Phase Training Strategy

We've found that splitting our training into two phases yields impressive results:

Phase One: Train on all the data you have. This gives your model a broad foundation.
Phase Two: Refine the model by training only on the highest-quality data. This phase is magical, elevating your model's performance to new heights.

The Foreground Preservation Trick

Here's a pro tip: if maintaining the integrity of the original image is crucial (and let's face it, when isn't it?), try this. After training, restore the original foreground. Models sometimes overlook masks, leading to unwanted changes. Our solution? Run a post-processing step to blend the original foreground into the generated image. It's like having your cake and eating it too!

Learning Rate: The Goldilocks Zone

When optimizing your learning rate, breaking the process into phases is useful. Start with a linear warm-up, gradually increasing the learning rate to help the model adjust to the data without making drastic updates. After the warm-up, you can switch to a constant or cyclical learning rate for more stable training. This approach helps prevent overfitting and allows the model to converge more efficiently.

Fast and Effective: Training Bria’s 2.3 Inpainting Network with Impressive Results

I trained Bria’s 2.3 inpainting network using this concept, and here’s the result for my profile picture. It’s getting close to what I want. The coolest part? I got excellent results fast—not just one out of 10 did well, but the first four were decent.

Does it look believable? Was the photo taken in a library? Not entirely, but it’s heading in that direction.

We’ll explore more complex topics in the upcoming Blogs, including ControlNet Deep Dive—A Different POV and How to Modify ControlNet Input: A Step-by-Step Guide for Input Modification. These posts will explore how to make better adjustments for specific use cases, keep up with the state-of-the-art (SOTA), and tailor the proper methodology to your needs.

In today’s fast-paced world, keeping up with SOTA and figuring out which approach to use is challenging. When aiming for high-performance and natural, tangible products, every project requires custom adjustments and the right tools.

Earlier this year, Bria's 2.3 Inpainting Network used these concepts. What were the results? Well, let's say they're turning heads. We're getting impressive outputs fast—not just 1 out of 10, but the first four attempts were decent. That's the kind of consistency we live for!

Looking Ahead: ControlNets and Advanced Adjustments

But wait, there's more! In upcoming blogs, we'll explore more advanced topics like ControlNet Deep Dive and How to Modify ControlNet Input. We'll discuss making better adjustments for specific use cases and keeping up with the state-of-the-art.

The Bria.ai Difference: Bridging Academia and Industry
From where I stand, the bridge between academia’s groundbreaking theories and the practical needs of industry. It is ♥️ of the challenge for us — algorithm developers and engineers, the ones who are responsible for turning cutting-edge technology into genuine, functional products. We’re often the bridge between academia’s groundbreaking theories and the practical needs of industry. But let’s be honest — how usually do things work “off the shelf”? Rarely.

The Importance of Open Source

And that’s exactly why open source is so valuable. The ability to download model weights, tweak the technology, and mold it to our specific use cases is critical. We can retrain on customized data, add a loss function that makes more sense for our goals and fine-tune to perfection. The numbers speak for themselves — look at the millions of downloads for open models like those from Stability AI or the runaway success of Hugging Face. There’s a natural, pressing need for this flexibility, and we all feel it.

The Challenge with Commercial Models

But here’s the catch: the moment commercial companies develop those models, we hit a wall. These companies have poured millions — sometimes billions — into their development; understandably, they must protect their investments. Their agenda isn’t open source; it’s about driving revenue from their products. And while that’s perfectly fair, it leaves us — the developers and researchers — out in the cold, unable to access, replicate, or build on the fantastic work.

In recent years, I have observed a notable shift at every conference I attend. I encounter remarkable research, yet without open access to the code, it feels akin to being given the keys to a treasure chest only to find it securely locked. The balance of power is transitioning from academia's open, collaborative spirit to the restricted domains of large corporations and their proprietary models. This shift poses a significant challenge for those who thrive on innovation and collaboration.

Our Attribution Model ensures that every time an image is generated, we can trace it back and compensate the original creators based on their contribution. It's like Spotify for the visual AI world—revolutionary, right?

To be clear, I’m not an advocate for open source at all costs. I recognize the importance of protecting intellectual property and the business models of these companies. However, it’s essential to acknowledge the challenges this creates for the broader development community. There’s still hope for bridging this gap, and in the next section, I’ll introduce a solution we’ve been developing at Bria to address these challenges.

Introducing a New Market: 'Source Code Available'

This is precisely the challenge that Bria is working to solve. We’re trying to create a new market that bridges the gap between these two worlds, something we call: “Source Code Available.” On one hand, we’re a business — a startup company that needs to grow, innovate, and sell its products. (Fun fact: a company that doesn’t sell doesn’t have much of a future! It’s one of those hard lessons you learn after leaving the university halls). And that’s precisely what we do — we sell models! Our business sells our models directly to other companies (B2B – business-to-business), allowing them to integrate Bria’s models into their products or services. We provide foundational and auxiliary models that can be customized to fit their specific needs and incorporated into the solutions they develop.

We’ve developed models compatible with open-source frameworks (like SDXL and others), ensuring that all the advancements built upon them will also be compatible with Bria’s model. We offer a range of models like Text2Image, Control Nets, IP adaptors, and more. You can check out all our offerings on Hugging Face. Companies can purchase these models, adapt them, and train them to meet their specific needs. From Bria, you can get the weights, training codes, and evaluation scripts, which are all customized and ready for developers. And everything is 100% legal!

Why Pay for a Model?

Now, you might be asking yourself — why should we pay Bria XXX dollars for a model when we can download it for free with just a click from Stability?

Great question. And my answer? Even better. 😊

The free models available for download today are often trained on data scraped from the internet. The LAION dataset, for example, is well-known for being a broad sweep of internet content. Artists who have spent years making a living from their photography, illustrations, or advertising work — suddenly, their content was scraped from the internet, and now anyone can use it to train models. Remember that little © symbol? It was supposed to mean something. But overnight, its significance vanished, as did the respect for creators’ rights.

Or did it?

Respecting Copyrights in AI

Only time will tell whether this disregard for copyright will continue. I remember watching the news one day and seeing an artist named Asaf Hanuka, who never shared his art with AI model training companies, didn’t know what AI is.. sitting there, stunned as the host showed him examples of AI-generated images that mimicked his style by using a specific prompt (“create an illustration in the style of Asaf Hanuka”). His reaction was inspiring (you can see it here)— alongside the shock and complex emotions, Asaf realized that the “genie” was already out of the bottle, and there was no turning back the clock on technology. Now, the challenge is to move forward and figure out how to harness this technology somewhat, as well as those who created the data in the first place.

Whether this copyright concern resonates with you on a personal level or not, it’s a severe issue for businesses. Companies that generate millions of dollars — legal entities that create visual content to push their business forward — can’t afford to deal with uncertainty around copyright (can’t afford to gamble with copyright uncertainty). Whether it’s images for marketing, catalogs, branding, graphics for social media, or videos for promotion and training, these visual assets are crucial for campaigns, investor presentations, customer pitches, and sales support. A business manages and creates thousands of images at any given time. They can’t afford to take the risk of unclear copyright usage.

So, here’s the real question — why take shortcuts (or worse, steal) when you can do things correctly and make it a win for everyone? And yes, there’s a better path forward.

What did we do at Bria? It’s effortless (in hindsight, of course — creating it was anything but simple, so don’t try this at home!).

A Fair Business Model: Inspired by Spotify

So, what did we do? Well, we kind of “Spotify-ed” the Gen AI visual ecosystem. Like Spotify did for music, we took inspiration from their business model. Remember the days of downloading music for free from sketchy sites, where the only ones making money were the folks running the servers, and the ads plastered all over the screen? Spotify flipped the script and brought the profits back to the music creators.

We’ve done something similar. We created an alliance of data contributors, and our business model works like this:

First, we train our models on legal data — provided by our partner image suppliers. Then, we sell these models (and yes, the code is accessible to developers for further development — did we mention that already? 😉). And here’s where the real magic of our solution begins — when these models move into production and companies start generating revenue from them.

The Attribution Model: A Fair Share for Creators

Bria’s agent saves the embedding vector whenever an image is generated. Then, using a model we call the “Attribution Model,” we can trace back and identify which images in the training set contributed the most to creating the new image. Based on how much they contributed to the final result, we pay royalties to the image providers, artists, and creators who made it all possible.

The more the generated image resembles your original work, the more money you earn. If you want me to define “resemble,” I can — but that’s a topic for another blog post about our Attribution Model.

Balancing Innovation with Fairness

We believe in releasing our model to benefit research and want to give back to the community. However, we must do this while protecting the interests of our data providers and respecting the copyright of artists. That’s why we distinguish between the academic community — where we release and share the code for research purposes — and businesses to whom we sell the models for fair commercial use. And by fair, we mean genuinely fair (remember, we pay royalties — unlike many other companies).

If you’ve made it this far — congratulations, you deserve a medal. Or maybe the real prize is that you now understand how Bria works and that it’s possible to bring the concept of respecting copyrights and paying royalties into the world of generative AI. Shocking, right?

Wrapping Up: A Different Way Forward with Bria
I hope you got a good peek into how we’re doing things differently at Bria. Balancing innovation with fairness isn’t easy, but someone must do it. Stick around for the following blogs, where I’ll show you how to use Control Net to swap out backgrounds in your images — because who doesn’t love better results?

Even with all the training and improvements, the results of the advanced approach are much better than those of the naive method, but they’re still imperfect. One of our biggest challenges was data leakage, where parts of the foreground object “bleed” into the background, messing up the separation. We also ran into issues with texture consistency and lighting.

Most of these problems were resolved by applying the ideas we discussed on the blog within the ControlNet framework, which gave us much finer control over the image generation process. I’ll dive deeper into that in the upcoming blogs ControlNet Deep Dive—A Different POV and How to Modify ControlNet Input: A Step-by-Step Guide for Input Modification. Stay tuned if you want to take your results to the next level!

View full post