HVACvisor (Beta)

Upload an image to extract text. Supports various document types and handwriting recognition.

Model Sizes:

  • Tiny: Fastest, lower accuracy (512×512)
  • Small: Fast, good accuracy (640×640)
  • Base: Balanced performance (1024×1024)
  • Large: Best accuracy, slower (1280×1280)
  • Gundam (Recommended): Optimized for documents (1024 base, 640 image, crop mode)

Task Types:

  • Convert to Markdown: Converts document to structured Markdown with layout
  • OCR this image: Standard OCR with grounding
  • Free OCR: Simple text extraction without layout
  • Parse the figure: Specialized for charts, diagrams, and figures
  • Describe this image in detail: Generates detailed image description
  • Locate item: Find specific referenced items (requires custom input)
Model Size
Task Type

Select the type of processing to perform

When enabled, returns plain text (may be faster). Disable to also get an annotated image and markdown.