Home Arrow Icon Knowledge base Arrow Icon Global Arrow Icon How does TeleportHQ's Vision API work in converting hand-drawn wireframes


How does TeleportHQ's Vision API work in converting hand-drawn wireframes


TeleportHQ's Vision API works by converting hand-drawn wireframes into digital, editable UI designs, enabling rapid prototyping and code generation from paper sketches or whiteboard drawings. This process involves a series of carefully developed steps leveraging AI, machine learning, and strongly enforced drawing conventions to ensure accuracy and clarity in interpretation.

The foundation of TeleportHQ's Vision API lies in the recognition of the natural and intuitive nature of hand-drawing as a means to structure visual and abstract thoughts. The Vision API focuses on automating the translation of these hand-drawn wireframes into production-ready code, making it possible to quickly move from concept to fully functioning digital prototypes or websites.

To build the Vision API, TeleportHQ first needed a consistent and annotated dataset of hand-drawn wireframes. They discovered that publicly available images from the open web were insufficient both in quantity and in the consistency of drawing conventions. To address this, the team created their own guidelines and collected wireframes drawn in a controlled manner by team members and friends, forming an initial coherent dataset. Each wireframe was manually annotated to label UI elements such as buttons, text, images, headings, links, inputs, and containers, providing the "ground truth" necessary for machine learning training.

The annotations had to be carefully managed because many UI elements can visually overlap or appear ambiguous. For instance, it is challenging to distinguish between a button and a text input, or between a header and a paragraph, without clear conventions. This led to the establishment of strict drawing guidelines and conventions that users must follow in their wireframes for the Vision API to perform optimally.

These guidelines specify precise ways to represent common UI elements:

- Headers are prefixed with a hashtag (#).
- Text inputs are drawn as empty, thin, wide rectangles with no text inside.
- Links are enclosed within square brackets.
- Labels are detected only when explicitly associated with inputs.
- Images are represented as rectangles or circles with a cross inside.
- Containers are simple rectangles that can group elements together.
- Checkboxes and radio buttons are drawn as their icons only; labels are separate text elements.
- Text areas have a small triangle in the bottom right corner of a container.
- Videos are rectangles with a play icon in the center.

These conventions reduce ambiguity, making it easier for the model to classify wireframe elements accurately.

The core functionality of the Vision API involves detecting bounding boxes around UI components in the wireframe image. The model outputs these bounding boxes with associated classifications, essentially identifying the location and type of each element. These raw detections are then converted into a structured format known as User Interface Definition Language (UIDL), a digital representation of the wireframe suitable for further processing.

After conversion to UIDL, TeleportHQ's code generation engine takes over. This engine uses a combination of rule-based decisions and AI-generated code (such as from ChatGPT) to transform the UIDL into production-ready front-end code, including HTML, CSS, and JavaScript frameworks. This allows users to quickly get a working prototype or even a deployable website with minimal manual coding.

The Vision API uses machine learning models trained on a curated dataset to detect wireframe elements. Initial versions had moderate accuracy (around 40-55%), primarily due to insufficient data and ambiguity in the drawings. However, through the use of synthetic data augmentation and more consistent annotation rules, the model's performance improved significantly, with some versions achieving up to 85% accuracy on internal test sets.

Despite improvements, the model still faces challenges, particularly with wireframes containing many overlapping or complex elements, as well as rare UI components such as sliders or rating controls. TeleportHQ is continuously working on expanding the range of recognized elements and improving detection accuracy.

Users of the Vision API must draw wireframes on a clean, white surface, using sufficient spacing between elements and consistent drawing styles as per the guidelines. The API works best when elements are logically grouped and clearly separated, which helps to reduce detection errors and ensures the generated UIDL accurately reflects the intended UI structure.

The Vision API is integrated into TeleportHQ's overall platform, which provides a low-code visual editor for modifying and customizing the generated wireframes and code. This allows designers and developers to iterate quickly on their prototypes and seamlessly move from sketch to web-ready design. The platform also enables real-time collaboration, code export, and integration with other tools like Figma.

In summary, TeleportHQ's Vision API converts hand-drawn wireframes into digital UI definitions through a sophisticated pipeline of machine learning detection based on strict drawing conventions, annotation, UIDL conversion, and AI-driven code generation. This process bridges the gap between analog ideation and digital implementation, streamlining the journey from sketches on paper to dynamic, editable web prototypes and code. This innovation reduces the effort and technical barrier traditionally involved in translating design concepts into working applications, enabling rapid prototyping and enhancing productivity for designers and developers alike.