Pyramid Flow: Create AI Videos with a User-Friendly GUI Tool – A Hands-on Guide

AI video generation has been creating quite a buzz lately. If you’ve already tried your hand at image generation with tools like MidJourney or Stable Diffusion, you might be wondering what’s next. Well, today we’re taking it up a notch by diving into the world of AI video generation!

Imagine typing in a few words and watching them transform into a moving video, or seeing a single image come to life with motion… sounds like magic, right? Well, this magic is actually free to try, thanks to a tool called “Pyramid Flow.”

What makes this tool particularly exciting is its browser-based GUI interface. Even if you’re not too comfortable with programming, you can get started as long as you have the right environment set up.

Let’s jump in and explore the world of AI video generation with Pyramid Flow!

TOC

What is Pyramid Flow?

Pyramid Flow is a groundbreaking AI tool that can generate videos from either text descriptions or static images. Built on a technology called Flow Matching, it’s an efficient video generation model that’s been trained exclusively on open-source data – which means it’s completely free for anyone to use.

Here’s what makes it special:

  • High-quality video generation (up to 10 seconds at 768p resolution, 24FPS)
  • Text-to-video generation capabilities
  • Image-to-video transformation
  • User-friendly browser-based GUI
  • Optimized memory usage features

Required Environment

To get started with Pyramid Flow, you’ll need:

  • Python 3.8.10 (recommended version)
  • A PC with an NVIDIA GPU
  • Visual Studio Code (recommended editor)
  • CUDA-compatible GPU drivers

Setting Up Your Environment

In this guide, I’ll walk you through the setup process on Windows using VSCode, which offers a great visual interface for managing your project.

Managing Python Versions

First things first – we need to make sure we’re using the right version of Python (3.8.10). I’m using pyenv for version management, and here’s why that matters:

Let’s check our current Python version:

python --version

In my case, I was running Python 3.10.11, so I needed to switch to 3.8.10.

Project Setup

First, let’s grab the project from GitHub:

git clone https://github.com/jy0205/Pyramid-Flow
cd Pyramid-Flow

Now, here’s where I ran into my first challenge. Using pyenv, I tried to set the local Python version:

pyenv local 3.8.10

Pro Tip: If you’re using VSCode like I am, you’ll need to completely close and reopen it for the version change to take effect. I learned this the hard way! After reopening VSCode, double-check your Python version:

python --version

Creating a Virtual Environment

Next up, let’s set up a Python virtual environment. This keeps our project dependencies nice and tidy:

python -m venv venv
venv\Scripts\activate

You’ll know it’s working when you see (venv) appear at the start of your command prompt.

Package Installation

First, let’s update pip to its latest version:

python -m pip install --upgrade pip

Now we have two options for installing the required packages. The most straightforward way is to use the requirements file:

pip install -r requirements.txt

Alternatively, you can install the packages individually:

pip install gradio torch Pillow diffusers huggingface_hub

I tried both methods, and they both work fine. The requirements file is generally recommended as it ensures you get the exact versions that have been tested with the project.

Starting the GUI

After installing the packages, let’s try launching the GUI:

python app.py
diffusion_transformer_768p/config.json: 100%|█████████████████| 465/465 [00:00<00:00, 226kB/s] 
README.md: 100%|█████████████████████████████████████████| 9.38k/9.38k [00:00<00:00, 4.45MB/s] 
diffusion_transformer_image/config.json: 100%|████████████████| 465/465 [00:00<00:00, 233kB/s] 
text_encoder_2/config.json: 100%|█████████████████████████████| 782/782 [00:00<00:00, 391kB/s] 
text_encoder/config.json: 100%|███████████████████████████████| 613/613 [00:00<00:00, 204kB/s] 
(…)t_encoder_2/model.safetensors.index.json: 100%|███████| 19.9k/19.9k [00:00<00:00, 6.63MB/s] 
tokenizer/merges.txt: 100%|████████████████████████████████| 525k/525k [00:00<00:00, 1.21MB/s] 
tokenizer/special_tokens_map.json: 100%|██████████████████████| 588/588 [00:00<00:00, 235kB/s] 
tokenizer/tokenizer_config.json: 100%|████████████████████████| 705/705 [00:00<00:00, 276kB/s] 
tokenizer/vocab.json: 100%|██████████████████████████████| 1.06M/1.06M [00:00<00:00, 1.61MB/s] 
tokenizer_2/special_tokens_map.json: 100%|███████████████| 2.54k/2.54k [00:00<00:00, 1.26MB/s] 
spiece.model: 100%|████████████████████████████████████████| 792k/792k [00:00<00:00, 2.17MB/s] 
tokenizer_2/tokenizer.json: 100%|████████████████████████| 2.42M/2.42M [00:01<00:00, 1.68MB/s] 
tokenizer_2/tokenizer_config.json: 100%|█████████████████| 20.8k/20.8k [00:00<00:00, 5.93MB/s] 
model.safetensors: 100%|███████████████████████████████████| 246M/246M [00:49<00:00, 5.00MB/s] 
diffusion_pytorch_model.bin: 100%|███████████████████████| 1.34G/1.34G [02:03<00:00, 10.9MB/s]
Fetching 24 files:  17%|██████▌                                | 4/24 [02:03<12:53, 38.69s/it] 
diffusion_pytorch_model.safetensors:  18%|██▋            | 1.38G/7.89G [02:02<12:30, 8.66MB/s] 
diffusion_pytorch_model.safetensors:  40%|██████         | 3.16G/7.89G [04:16<04:17, 18.4MB/s]
diffusion_pytorch_model.safetensors:  32%|████▊          | 2.53G/7.89G [04:16<05:16, 16.9MB/s] 
diffusion_pytorch_model.safetensors:  32%|████▊          | 2.55G/7.89G [04:15<15:01, 5.92MB/s] 
model-00001-of-00002.safetensors:  29%|█████▏            | 1.43G/4.99G [02:01<03:59, 14.9MB/s] 
model-00001-of-00002.safetensors:  64%|███████████▌      | 3.22G/4.99G [04:15<02:12, 13.4MB/s] 
model-00002-of-00002.safetensors:  27%|████▉             | 1.24G/4.53G [01:59<06:22, 8.62MB/s] 
model-00002-of-00002.safetensors:  59%|██████████▋       | 2.69G/4.53G [04:14<03:21, 9.12MB/s]

Note: You might encounter this warning message:

[WARNING] CUDA is not available. Proceeding without GPU.

Don’t worry – we’ll address this in the GPU setup section.

GPU Configuration

To utilize your GPU, first check your CUDA version:

nvcc -V

In my case, I had CUDA 12.4 installed. Based on your CUDA version, install the corresponding PyTorch version. For CUDA 12.4:

pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124

Now that we’ve covered the basic setup steps, let me share some of the challenges I encountered along the way. Trust me, knowing these beforehand will save you some headaches…

The Real Challenges I Encountered

While I’ve made the setup process sound straightforward above, I actually hit quite a few roadblocks along the way. Let me share my experience – it might save you some time and frustration!

The VSCode Python Version Puzzle

Here’s a tricky situation I ran into. After setting Python 3.8.10 with pyenv:

python --version
Python 3.10.11

Wait, what? Even after running pyenv local 3.8.10, my Python version wasn’t changing. After some head-scratching and research, I discovered this was actually a VSCode quirk – you need to completely close and reopen VSCode for the version change to take effect. Nobody mentions this in the tutorials!

Detective Work: Finding the Version File

After restarting VSCode, I decided to investigate my project structure:

dir
2024/11/22  16:52    <DIR>          .
2024/11/22  16:51    <DIR>          ..
2024/11/22  16:51             1,446 .gitignore
2024/11/22  16:52                 8 .python-version
2024/11/22  16:51    <DIR>          annotation
2024/11/22  16:51            15,269 app.py
2024/11/22  16:51             5,619 app_multigpu.py
2024/11/22  16:51    <DIR>          assets
2024/11/22  16:51             8,105 causal_video_vae_demo.ipynb
2024/11/22  16:51    <DIR>          dataset
2024/11/22  16:51    <DIR>          diffusion_schedulers
2024/11/22  16:51    <DIR>          docs
2024/11/22  16:51             3,391 image_generation_demo.ipynb
2024/11/22  16:51             4,909 inference_multigpu.py
2024/11/22  16:51             1,086 LICENSE
2024/11/22  16:51    <DIR>          pyramid_dit
2024/11/22  16:51            16,508 README.md
2024/11/22  16:51               406 requirements.txt
2024/11/22  16:51    <DIR>          scripts
2024/11/22  16:51    <DIR>          tools
2024/11/22  16:51    <DIR>          train
2024/11/22  16:51    <DIR>          trainer_misc
2024/11/22  16:51            14,387 utils.py
2024/11/22  16:51             7,052 video_generation_demo.ipynb
2024/11/22  16:51    <DIR>          video_vae

Here’s where I made an interesting discovery – a .python-version file that simply contained “3.8.10”. You can spot this either through VSCode’s explorer or Windows File Explorer.

Running the version check again:

python --version
Python 3.8.10

Finally! The version had switched correctly.

Another Gotcha: The Missing Package Saga

Just when I thought I was ready to roll:

python app.py

Boom – another error:

Traceback (most recent call last):
File "app.py", line 3, in
import gradio as gr
ModuleNotFoundError: No module named 'gradio'

One problem led to another. When I tried to install the missing packages:

pip install gradio torch Pillow diffusers huggingface_hub

I got this warning:

WARNING: You are using pip version 21.1.1; however, version 24.3.1 is available.

Well, might as well do things properly:

python -m pip install --upgrade pip

And for good measure:

pip install -r requirements.txt

Who knew setting up a virtual environment could be such an adventure? But step by step, we got there!

Continuing Our Setup Journey

After overcoming those initial hurdles, we were making progress. But there was still one crucial piece of the puzzle left: getting our GPU ready for action.

The Final Boss: GPU Configuration

When you run the application, you might see this warning:

[WARNING] CUDA is not available. Proceeding without GPU.

Don’t panic! This warning is telling us that your GPU isn’t being recognized yet. Since video generation requires some serious computational power, getting this right is super important.

Let’s check what CUDA version we’re working with:

nvcc -V

In my setup, I had CUDA 12.4:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Thu_Mar_28_02:30:10_Pacific_Daylight_Time_2024
Cuda compilation tools, release 12.4, V12.4.131
Build cuda_12.4.r12.4/compiler.34097967_0

Time to install the matching PyTorch version:

pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124

And there we have it! Finally, we’ve got everything we need to start generating some videos. Trust me, all that setup work is about to pay off…

Time to Generate Some Videos!

Now that we’ve got everything set up, let’s dive into the fun part. First, launch the application:

python app.py

When you first run this, you’ll notice it starts downloading some pretty hefty model files. Don’t worry if this takes a while – grab a coffee and let it do its thing. Once the download is complete, your browser will automatically open to display the Gradio interface.

Getting to Know the GUI

The interface is divided into two main tabs:

  1. Text-to-Video
    • Prompt: Where you describe your desired video
    • Duration: Video length (up to 16 frames for 384p, 31 frames for 768p)
    • Guidance Scale: Controls how closely it follows your description
    • Video Guidance Scale: Adjusts the intensity of motion
    • Resolution: Choose between 384p or 768p
  2. Image-to-Video
    • Input Image: Upload your starting image
    • Prompt: Describe how you want the image to animate
    • Other settings: Similar to text-to-video options

Pro Tips for Better Results

Through my experimentation, I’ve learned a few things that might help you:

  • Start with 384p resolution (it’s much easier on your GPU)
  • Be as specific as possible with your prompts – vague descriptions lead to vague results
  • If you’ve got 8GB GPU memory like me, you might hit the “CUDA out of memory” error after a few generations – just refresh your browser if this happens

Real-World Generation Examples

Text-to-Video: My First Attempt

Let’s start with this creative prompt:

CopyA movie trailer featuring the adventures of the 30 year old space man wearing a red wool knitted motorcycle helmet, blue sky, salt desert, cinematic style, shot on 35mm film, vivid colors

Here’s how I configured it:

  • Resolution: 384p (playing it safe for the first try)
  • Duration: 16
  • Guidance Scale: 7.0
  • Video Guidance Scale: 5.0

The results were fascinating! The AI generated a scene featuring an astronaut with what looked like a red knitted helmet, walking across a desert landscape. It really captured that cinematic quality I was hoping for in the prompt.

Image-to-Video: Breathing Life into Still Images

For my next experiment, I tried the sample Great Wall image with this prompt:

FPV flying over the Great Wall

Settings used:

  • Resolution: 384p
  • Duration: 16
  • Video Guidance Scale: 4.0

The transformation was incredible – the static image smoothly transitioned into a dynamic sequence that really did look like drone footage flying over the Great Wall.

A Deep Dive into Memory Usage

Curious about resource consumption, I ran this diagnostic script:

import torch
print(f"Total Memory: {torch.cuda.get_device_properties(0).total_memory / 1024**2:.0f}MB")
print(f"Allocated: {torch.cuda.memory_allocated() / 1024**2:.0f}MB")
print(f"Cached: {torch.cuda.memory_reserved() / 1024**2:.0f}MB")

The results were enlightening:

Total Memory: 8188MB
Allocated: 0MB
Cached: 0MB

This explained why I was struggling with higher resolutions – my 8GB GPU was really being pushed to its limits!

Tips and Troubleshooting

The GPU Memory Challenge

The most significant limitation I encountered was GPU memory constraints. With my 8GB GPU setup, I faced several challenges:

  • 768p resolution was nearly impossible to work with
  • Even at 384p, I could only generate a few videos before running into memory issues
  • “CUDA out of memory” errors became a familiar sight

Effective Workarounds

After some trial and error, I found these strategies helped:

  1. Stick to 384p resolution for your initial work
  2. Reduce the duration (frame count) when memory gets tight
  3. Refresh your browser when errors occur
  4. Restart the application to clear GPU memory if things get sluggish

Practical Usage Tips

Here’s how I’ve learned to work efficiently with limited resources:

  • Start with low resolution to test your concepts
  • Once you get the results you like, try bumping up to higher resolution
  • When memory issues occur, take a quick break and restart the application

Final Thoughts

While Pyramid Flow is an incredibly powerful tool, it does require decent GPU specifications to really shine. A GPU with 16GB+ memory would definitely provide a smoother experience.

That said, don’t let hardware limitations discourage you. Even with my modest 8GB setup, I was able to create some truly impressive videos. The key is understanding your system’s limits and working within them.

The world of AI video generation is evolving rapidly, and tools like Pyramid Flow are making it more accessible than ever. Whether you’re a content creator, an AI enthusiast, or just someone curious about the latest tech, it’s an exciting time to dive in and experiment.

Give it a try – you might be surprised at what you can create, even with basic hardware. And remember, today’s limitations are tomorrow’s laughable specs. The future of AI video generation is looking brighter every day!

Please share if you like it!
TOC