imggen

manashmandal/imggen

Use this skill when users want to generate images, edit images, analyze/describe images, generate videos, or extract text from images using OpenAI's APIs. Invoke when users request AI-generated images, image editing, background removal, visual analysis, video generation, artwork, logos, illustrations, visual content from text prompts, or need to extract text/data from images.

0 stars

0 forks

96 views

View on GitHub Add to Favorites

SKILL.md

name: imggen description: Use this skill when users want to generate images, edit images, analyze/describe images, generate videos, or extract text from images using OpenAI's APIs. Invoke when users request AI-generated images, image editing, background removal, visual analysis, video generation, artwork, logos, illustrations, visual content from text prompts, or need to extract text/data from images. version: 1.2.0 allowed-tools: Bash(imggen:*), Read, Write model: inherit

imggen - OpenAI Image Generation, Editing, Analysis, Video, and OCR CLI

Generate images from text prompts and extract text from images using OpenAI's APIs.

Overview

imggen is a command-line tool that interfaces with OpenAI's image generation API. It supports multiple models (gpt-image-1.5, gpt-image-1, gpt-image-1-mini, dall-e-3, dall-e-2) and provides options for image size, quality, format, and style. It also supports image editing, vision-based image analysis, video generation, and OCR.

Prerequisites

imggen binary installed and available in PATH
OPENAI_API_KEY environment variable set with a valid OpenAI API key
Sufficient OpenAI API credits for image generation

Usage

imggen [flags] "prompt"

Available Flags

Flag	Short	Default	Description
`--model`	`-m`	`gpt-image-1.5`	Model: gpt-image-1.5, gpt-image-1, gpt-image-1-mini, dall-e-3, dall-e-2
`--size`	`-s`	`1024x1024`	Image dimensions
`--quality`	`-q`	`auto`	Quality level
`--count`	`-n`	`1`	Number of images (1-10 for gpt-image-1, 1 for dall-e-3)
`--output`	`-o`	auto-generated	Output filename or directory
`--format`	`-f`	`png`	Output format: png, jpeg, webp
`--style`		`vivid`	Style for dall-e-3: vivid, natural
`--transparent`	`-t`	`false`	Transparent background (gpt-image-1 + png/webp only)
`--compression`			Compression level 0-100 (GPT image models, jpeg/webp only)
`--moderation`		`auto`	Moderation level: auto, low (GPT image models only)
`--prompt`	`-P`		Prompt (can be specified multiple times)
`--parallel`	`-p`	`1`	Number of parallel workers for multiple prompts
`--api-key`		`$OPENAI_API_KEY`	Override API key

Model-Specific Parameters

gpt-image-1.5 (Default, Recommended)

Sizes: 1024x1024, 1536x1024 (landscape), 1024x1536 (portrait), auto
Quality: auto, low, medium, high
Max images: 10 per request
Supports: Transparent backgrounds, multiple output formats, editing

gpt-image-1

Sizes: 1024x1024, 1536x1024 (landscape), 1024x1536 (portrait), auto
Quality: auto, low, medium, high
Max images: 10 per request
Supports: Transparent backgrounds, multiple output formats

gpt-image-1-mini

Sizes: 1024x1024, 1536x1024 (landscape), 1024x1536 (portrait), auto
Quality: auto, low, medium, high
Max images: 10 per request
Supports: Transparent backgrounds, multiple output formats

dall-e-3

Sizes: 1024x1024, 1024x1792, 1792x1024
Quality: standard, hd
Max images: 1 per request
Supports: Style parameter (vivid/natural)

dall-e-2

Sizes: 256x256, 512x512, 1024x1024
Max images: 10 per request

Instructions

Verify OPENAI_API_KEY is set in the environment
Construct the imggen command with appropriate flags based on user requirements
Execute the command using Bash tool
Report the generated filename and any revised prompt returned by the API
If the user wants to view the image, use Read tool on the generated file
For image editing, use the edit subcommand with the image path and prompt
For background removal, use edit --bg-remove (no prompt needed, output defaults to PNG)
For image analysis, use the describe subcommand with one or more image paths
For video generation, use the video subcommand with a prompt

Output Format

The tool outputs:

Progress message: "Generating N image(s) with MODEL..."
Saved filename: "Saved: filename.png"
Cost information: "Cost: $X.XXXX (N image(s) @ $X.XXXX each)"
Revised prompt (if returned by API): "Revised prompt: ..."
Completion message: "Done!"

Generated files are saved to the current working directory with timestamp-based names (e.g., image-20251216-120000.png) unless --output is specified.

Cost Tracking

All image generation costs are automatically logged to ~/.imggen/sessions.db. View costs using the cost subcommand:

# View total costs
imggen cost

# View today's costs
imggen cost today

# View this week's costs (last 7 days)
imggen cost week

# View this month's costs (last 30 days)
imggen cost month

# View costs by provider
imggen cost provider

Interactive Mode Cost Commands

In interactive mode (imggen -i), use the cost or $ command:

cost today - Today's costs
cost week - This week's costs
cost month - This month's costs
cost total - All-time total
cost provider - Breakdown by provider
cost session - Current session's costs

Database Management

Manage the SQLite database storing sessions and cost data:

# Reset database (delete all data)
imggen db reset

# Reset with backup of old data
imggen db reset --backup

# Show database location and stats
imggen db info

Examples

Basic image generation

imggen "a sunset over mountains"

High-quality landscape with DALL-E 3

imggen -m dall-e-3 -s 1792x1024 -q hd "panoramic view of a futuristic city"

Multiple images with gpt-image-1

imggen -n 4 -q high "abstract geometric pattern"

Logo with transparent background

imggen -t -f png "minimalist tech company logo, flat design"

Custom output filename

imggen -o hero-image.png "website hero banner with gradient"

Natural style portrait

imggen -m dall-e-3 --style natural "professional headshot, studio lighting"

Multiple prompts via command line

# Generate multiple images with --prompt flag
imggen --prompt "a sunset" --prompt "a cat" --prompt "a dog" -o ./output

# Short form with parallel processing (3 workers)
imggen -P "sunset" -P "mountains" -P "ocean" -o ./images -p 3

Batch generation from file

# From a text file (one prompt per line)
imggen batch prompts.txt -o ./output

# From a JSON file with per-prompt options
imggen batch prompts.json -o ./output

# With parallel processing
imggen batch prompts.txt -o ./output -p 3

Multiple Prompts

Generate multiple images from command-line prompts using the --prompt/-P flag:

imggen --prompt "a sunset over mountains" --prompt "a cat playing piano" -o ./output

This processes all prompts and saves images to the output directory with indexed filenames:

001-a-sunset-over-mountains.png
002-a-cat-playing-piano.png

Use --parallel/-p to control concurrent processing (default: 1 = sequential).

Batch Generation

Generate multiple images from a file of prompts using the batch subcommand:

imggen batch <input-file> [flags]

Input File Formats

Text file (.txt) - One prompt per line (lines starting with # are ignored):

a sunset over mountains
a cat playing piano
abstract geometric art

JSON file (.json) - Array of objects with optional per-prompt settings:

[
  {"prompt": "a sunset over mountains"},
  {"prompt": "a cat playing piano", "model": "dall-e-3", "quality": "hd"},
  {"prompt": "abstract art", "size": "1792x1024"}
]

Batch Flags

Flag	Short	Default	Description
`--output`	`-o`	current dir	Output directory
`--model`	`-m`	`gpt-image-1.5`	Default model
`--size`	`-s`	model default	Default image size
`--quality`	`-q`	model default	Default quality level
`--format`	`-f`	`png`	Output format
`--parallel`	`-p`	`1`	Number of parallel workers
`--stop-on-error`		`false`	Stop on first error
`--delay`		`0`	Delay between requests (ms)

Video Generation

Generate videos using OpenAI's Sora API:

Video Usage

imggen video <prompt> [flags]

Video Models (Deprecated -- shuts down Sep 24, 2026)

Model	Duration	Sizes	Cost
sora-2 (default, deprecated)	4, 8, 12 sec	720x1280, 1280x720, 1024x1792, 1792x1024	$0.10/sec
sora-2-pro (deprecated)	4, 8, 12, 16, 20 sec	Above + 1080x1920, 1920x1080	$0.30/sec

Video Flags

Flag	Short	Default	Description
`--model`	`-m`	`sora-2-pro`	Model (sora-2, sora-2-pro)
`--duration`	`-d`	model default	Duration in seconds
`--size`	`-s`	`720x1280`	Video size (e.g., 1280x720)
`--output`	`-o`	auto-generated	Output filename
`--api-key`		`$OPENAI_API_KEY`	Override API key
`--verbose`	`-v`	`false`	Log HTTP requests

Video Examples

# Basic video generation
imggen video "a cat walking on a beach"

# With options
imggen video -m sora-2-pro -d 8 "sunset over mountains"
imggen video -s 1280x720 -o myvideo.mp4 "dancing robot"

Image Editing

Edit existing images with text instructions, inpainting with masks, and background removal.

Edit Usage

imggen edit <image> [prompt] [flags]

Edit Flags

Flag	Short	Default	Description
`--model`	`-m`	`gpt-image-1.5`	Model (gpt-image-1.5, gpt-image-1, gpt-image-1-mini, dall-e-2)
`--size`	`-s`	model default	Output size (e.g., 1024x1024)
`--quality`	`-q`		Quality level (auto, low, medium, high)
`--count`	`-n`	`1`	Number of edit variations
`--output`	`-o`	auto-generated	Output filename or directory
`--format`	`-f`	`png`	Output format (png, jpeg, webp)
`--mask`			Mask image for inpainting (PNG with alpha channel)
`--bg-remove`		`false`	Remove background (no prompt needed)
`--compression`			Compression level 0-100 (GPT image models, jpeg/webp only)
`--moderation`		`auto`	Moderation level: auto, low (GPT image models only)
`--show`	`-S`	`false`	Display result in terminal

Edit Examples

# Basic image editing
imggen edit photo.png "make the sky purple"

# Inpainting with mask
imggen edit photo.png --mask region.png "replace text with ACME"

# Background removal
imggen edit photo.png --bg-remove
imggen edit photo.png --bg-remove -o transparent.png

# Multiple variations
imggen edit photo.png -n 3 "make it look like a painting"

Image Analysis (Describe)

Analyze and describe images using AI vision. Unlike OCR (text extraction), describe provides general visual understanding: captioning, object identification, chart analysis, visual Q&A, and multi-image comparison.

Describe Usage

imggen describe <image...> [flags]

Describe Flags

Flag	Short	Default	Description
`--model`	`-m`	`gpt-5.2`	Model (gpt-5.2, gpt-5-mini, gpt-5-nano)
`--prompt`	`-p`	auto	Question or instruction about the image
`--output`	`-o`	stdout	Save output to file
`--url`			Image URL instead of file path
`--detail`		`false`	Request detailed analysis

Describe Examples

# Basic image description
imggen describe photo.png

# Ask a specific question
imggen describe photo.png -p "what color is the car?"

# Compare multiple images
imggen describe a.png b.png -p "compare these designs"

# Analyze from URL
imggen describe --url https://example.com/photo.png

# Detailed analysis saved to file
imggen describe photo.png --detail -o analysis.txt

Error Handling

Common errors and solutions:

"API key required": Set OPENAI_API_KEY environment variable
"invalid size": Use a size supported by the selected model
"supports maximum N images": Reduce --count value
"does not support --style": Only dall-e-3 supports style flag
"does not support --transparent": Only gpt-image-1 supports transparency
"does not support editing": Use a model that supports editing (gpt-image-1.5, gpt-image-1, gpt-image-1-mini, dall-e-2)
"provider does not support vision analysis": Ensure using OpenAI provider for describe command

Pricing Reference

Costs per image (USD):

gpt-image-1.5

Size	Low	Medium	High
1024x1024	$0.011	$0.042	$0.167
1536x1024	$0.016	$0.063	$0.250
1024x1536	$0.016	$0.063	$0.250

gpt-image-1

Size	Low	Medium	High
1024x1024	$0.011	$0.042	$0.167
1536x1024	$0.016	$0.063	$0.250
1024x1536	$0.016	$0.063	$0.250

gpt-image-1-mini

Size	Low	Medium	High
1024x1024	$0.005	$0.011	$0.036
1536x1024	$0.006	$0.015	$0.052
1024x1536	$0.006	$0.015	$0.052

dall-e-3

Size	Standard	HD
1024x1024	$0.040	$0.080
1024x1792	$0.080	$0.120
1792x1024	$0.080	$0.120

dall-e-2

Size	Cost
256x256	$0.016
512x512	$0.018
1024x1024	$0.020

OCR (Optical Character Recognition)

Extract text from images using OpenAI's vision API with optional structured output support.

OCR Usage

imggen ocr <image-path> [flags]

OCR Flags

Flag	Short	Default	Description
`--model`	`-m`	`gpt-5.2`	Model: gpt-5.2, gpt-5-mini, gpt-5-nano
`--schema`	`-s`		JSON schema file for structured output
`--schema-name`		`extracted_data`	Name for the JSON schema
`--suggest-schema`		`false`	Suggest a JSON schema based on image content
`--prompt`	`-p`	auto	Custom extraction prompt
`--output`	`-o`	stdout	Output file
`--url`			Image URL instead of file path
`--api-key`		`$OPENAI_API_KEY`	Override API key
`--verbose`	`-v`	`false`	Log HTTP requests and responses

OCR Models

Model	Cost (Input)	Cost (Output)	Best For
gpt-5-nano	$0.05/1M tokens	$0.40/1M tokens	Ultra budget, simple text
gpt-5-mini	$0.25/1M tokens	$2.00/1M tokens	Cost-effective, most OCR tasks
gpt-5.2	$1.75/1M tokens	$14.00/1M tokens	Complex documents, highest accuracy

OCR Examples

Basic text extraction

imggen ocr document.png

Extract from URL

imggen ocr --url https://example.com/image.png

Save output to file

imggen ocr receipt.jpg -o extracted.txt

Structured output with JSON schema

# Create a schema file (invoice_schema.json):
# {
#   "type": "object",
#   "properties": {
#     "vendor": {"type": "string"},
#     "date": {"type": "string"},
#     "total": {"type": "number"},
#     "items": {
#       "type": "array",
#       "items": {
#         "type": "object",
#         "properties": {
#           "name": {"type": "string"},
#           "price": {"type": "number"}
#         },
#         "required": ["name", "price"],
#         "additionalProperties": false
#       }
#     }
#   },
#   "required": ["vendor", "date", "total"],
#   "additionalProperties": false
# }

imggen ocr receipt.jpg --schema invoice_schema.json -o invoice.json

Auto-suggest a JSON schema

# Analyze image and suggest appropriate schema
imggen ocr document.png --suggest-schema

# Save suggested schema to file
imggen ocr document.png --suggest-schema -o suggested_schema.json

Use higher accuracy model

imggen ocr complex-document.pdf -m gpt-5.2

Custom extraction prompt

imggen ocr business-card.jpg -p "Extract the name, title, email, and phone number"

OCR Structured Output

When using the --schema flag, the output will be structured JSON matching your schema. This is useful for:

Extracting data from receipts, invoices, forms
Parsing business cards, ID documents
Converting tables and structured content to JSON
Data entry automation

The schema must follow JSON Schema draft-07 format with additionalProperties: false for strict validation.

OCR Tips

Use gpt-5-nano for simple text extraction (plain documents, basic receipts)
Use gpt-5-mini (default) for most OCR tasks (receipts, business cards, forms)
Use gpt-5.2 for complex documents (dense tables, handwriting, multi-language)
Suggest schema first if unsure about document structure
Custom prompts help when you need specific fields or formatting
Supported formats: PNG, JPEG, GIF, WEBP, PDF (first page)

Installation

Option 1: Use slash command in Claude Code

/install-skill https://github.com/manashmandal/imggen

Option 2: Clone to skills directory

# Global (all projects)

git clone https://github.com/manashmandal/imggen ~/.claude/skills/imggen

# Project-specific

git clone https://github.com/manashmandal/imggen .claude/skills/imggen

Add MCP server to .cursor/mcp.json:

{
  "mcpServers": {
    "skillz": {
      "command": "npx",
      "args": ["-y", "skillz-mcp", "https://github.com/manashmandal/imggen"]
    }
  }
}

Restart Cursor after adding the configuration.

Option 1: Use Gemini CLI command

gemini extensions install https://github.com/manashmandal/imggen

Option 2: Clone to extensions directory

git clone https://github.com/manashmandal/imggen ~/.gemini/extensions/imggen

Topics

ai cli golang image-generation openai