Skip to content

Simple Visual Question Answering Example

This example demonstrates how to use the framework for visual question answering (VQA) tasks. The example code can be found in the examples/step1_simpleVQA directory.

   cd examples/step1_simpleVQA

Overview

This example implements a simple Visual Question Answering (VQA) workflow that consists of two main components:

  1. Input Interface
  2. Handles user input containing questions about images
  3. Processes and manages image data
  4. Extracts the user's questions/instructions

  5. Simple VQA Processing

  6. Takes the user input and image
  7. Analyzes the image based on the user's question
  8. Generates appropriate responses to visual queries

The workflow follows a straightforward sequence:

Prerequisites

  • Python 3.10+
  • Required packages installed (see requirements.txt)
  • Access to OpenAI API or compatible endpoint (see configs/llms/gpt.yml)
  • Redis server running locally or remotely
  • Conductor server running locally or remotely

Configuration

The container.yaml file is a configuration file that manages dependencies and settings for different components of the system, including Conductor connections, Redis connections, and other service configurations. To set up your configuration:

  1. Generate the container.yaml file: bash python compile_container.py This will create a container.yaml file with default settings under examples/step1_simpleVQA.

  2. Configure your LLM settings in configs/llms/gpt.yml:

  3. Set your OpenAI API key or compatible endpoint through environment variable or by directly modifying the yml file bash export custom_openai_key="your_openai_api_key" export custom_openai_endpoint="your_openai_endpoint"
  4. Configure other model settings like temperature as needed through environment variable or by directly modifying the yml file

  5. Update settings in the generated container.yaml:

  6. Modify Redis connection settings:
    • Set the host, port and credentials for your Redis instance
    • Configure both redis_stream_client and redis_stm_client sections
  7. Update the Conductor server URL under conductor_config section
  8. Adjust any other component settings as needed

Running the Example

  1. Run the simple VQA example:

For terminal/CLI usage: bash python run_cli.py

For app/GUI usage: bash python run_app.py

Troubleshooting

If you encounter issues: - Verify Redis is running and accessible - Check your OpenAI API key is valid - Ensure all dependencies are installed correctly - Review logs for any error messages

Building the Example

Coming soon! This section will provide detailed instructions for building and packaging the step1_simpleVQA example step by step.