Image tokens and embeddings

December 3, 2024

Encodings:

Calculating tokens from images:

Calculating token costs:

Dev:

  • Mime types
  • Base64 encoding
  • OpenAI API
    • utilities
      • use the stop and max_tokens parameters to avoid running out of tokens
    • inputs
      • either a link to the image or the base64
      • add the image as part of the user context
      • specify image fidelity
        • high
          • 512x512 tiles, represented as 170 tokens each
          • first scaled to 2048x2048 (if needed)
          • second scaled so that shortest side is 768px long
          • finally count number of 512x512 tiles the image can be divided into
        • low
          • 512x512 tile, represented as 85 tokens

Image Embeddings

  • option one is to generate a text caption of the image and embed that (although this is lossy)
  • option two is to directly embed the image using pre-trained models (ie CLIP-based)

Examples: