<aside>

This document provides a technical overview of the Image-to-Code API, which converts uploaded UI design images into functional HTML and Tailwind CSS code.

</aside>

Purpose and Scope

<aside>

The Image-to-Code API transforms screenshots of website designs into executable HTML and Tailwind CSS code using comprehensive image analysis and AI-powered code generation. This overview explains the components and the processing pipeline.

</aside>

System Architecture

High-Level Architecture

high-level-overview (1).png

<aside>

The system implements a client-server architecture with a React frontend that communicates with a FastAPI backend. The backend integrates with external LLM services (Google Gemini and Anthropic Claude) to generate HTML and Tailwind CSS code from images.

</aside>

Core Processing Pipeline

Image-to-Code Conversion Flow

<aside>

Image-to-Code Conversion Pipeline

The pipeline processes image through multiple stages: image segmentation, component detection, description generation, and code generation. The system supports single and long images, with the latter requiring additional HTML combining.

</aside>

Pipeline Components and Processing Stages

<aside>

The backend follows a pipeline architecture with these processing stages:

  1. Image Segmentation: Determines whether segmentation is needed based on the image length, and performs segmentation if required.
  2. Component Detection: Identifies UI elements using Gemini.
  3. Description Generation: Creates detailed descriptions of detected components
  4. Code Generation: Uses LLMs to generate HTML and Tailwind CSS code based on descriptions
  5. HTML Combining (for multiple image segments): Combines individual HTML outputs using Gemini. </aside>

API Endpoints

<aside>

The backend exposes two primary endpoints for image-to-code conversion:

</aside>

Endpoint HTTP Method Function Description
/convert_image POST convert_image() Processes image and returns complete code
/stream POST convert_image_stream() Processes image and streams code chunks in real-time