Image tokens and embeddings
December 3, 2024
Encodings:
- which encodings do tiktoken / openai support
Calculating tokens from images:
- tiktoken doesn't have support for tokenising images
- this library tries to solve that problem - https://github.com/pamelafox/openai-messages-token-helper
- takes the pricing logic presented here https://platform.openai.com/docs/guides/vision#calculating-costs and wraps it into a package
- requires base64-encoded images
- there is also this thread with calculation logic https://community.openai.com/t/how-do-i-calculate-image-tokens-in-gpt4-vision/492318/2
- takes the pricing logic presented here https://platform.openai.com/docs/guides/vision#calculating-costs and wraps it into a package
Calculating token costs:
- https://help.openai.com/en/articles/7127956-how-much-does-gpt-4-cost
- costs differ between prompt tokens and sampled tokens
- https://platform.openai.com/docs/guides/vision#calculating-costs
Dev:
- Mime types
- Base64 encoding
- OpenAI API
- utilities
- use the
stop
andmax_tokens
parameters to avoid running out of tokens
- use the
- inputs
- either a link to the image or the base64
- add the image as part of the
user
context - specify image fidelity
- high
- 512x512 tiles, represented as 170 tokens each
- first scaled to 2048x2048 (if needed)
- second scaled so that shortest side is 768px long
- finally count number of 512x512 tiles the image can be divided into
- low
- 512x512 tile, represented as 85 tokens
- high
- utilities
Image Embeddings
- option one is to generate a text caption of the image and embed that (although this is lossy)
- option two is to directly embed the image using pre-trained models (ie CLIP-based)
Examples: