zack

Docker

This post was contributed by Thierry Moreau, co-founder and head of DevRel at OctoAI.

Generative AI models have shown immense potential over the past year with breakthrough models like GPT3.5, DALL-E, and more. In particular, open source foundational models have gained traction among developers and enterprise users who appreciate how customizable, cost-effective, and transparent these models are compared to closed-source alternatives.

In this article, we’ll explore how you can compose an open source foundational model into a streamlined image transformation pipeline that lets you manipulate images with nothing but text to achieve surprisingly good results.

Banner docker and octoml a powerful duo for building creative generative ai apps

With this approach, you can create fun versions of corporate logos, bring your kids’ drawings to life, enrich your product photography, or even remodel your living room (Figure 1).

Figure 1: Examples of image transformation including, from left to right: Generating creative corporate logo, bringing children’s drawings to life, enriching commercial photography, remodeling your living room

Pretty cool, right? Behind the scenes, a lot needs to happen, and we’ll walk step by step through how to reproduce these results yourself. We call the multimodal GenAI pipeline OctoShop as a nod to the popular image editing software.

Feeling inspired to string together some foundational GenAI models? Let’s dive into the technology that makes this possible.

Architecture overview

Let’s look more closely at the open source foundational GenAI models that compose the multimodal pipeline we’re about to build.

Going forward, we’ll use the term “model cocktail” instead of “multimodal GenAI model pipeline,” as it flows a bit better (and sounds tastier, too). A model cocktail is a mix of GenAI models that can process and generate data across multiple modalities: text and images are examples of data modalities across which GenAI models consume and produce data, but the concept can also extend to audio and video (Figure 2).

To build on the analogy of crafting a cocktail (or mocktail, if you prefer), you’ll need to mix ingredients, which, when assembled, are greater than the sum of their individual parts.

2. OctoAI SDK setup

Let’s get started by installing the OctoAI SDK. You’ll use the SDK to invoke the different open source foundational models we’re using, like SDXL and Mistral-7B. You can install through pip:

1 2	# Install the OctoAI SDK !pip install octoai-sdk

In some cases, you may get a message about pip packages being previously imported in the runtime, causing an error. If that’s the case, selecting the Restart Session button at the bottom should take care of the package versioning issues. After this, you should be able to re-run the cell that pip-installs the OctoAI SDK without any

2024-02-24 15:08:51

zack