# desktopkit API — Agent Reference

Base: http://0.0.0.0:8083

Auth: POST /auth/login {"password":"..."} -> {"token":"..."}

All endpoints (except /health, /auth/login) require: Authorization: Bearer <token>

## Auth

POST /auth/login
Body: {"password":"..."}
-> {"token":"abc123...","expires_in_s":28800}
Authenticate with Unix password

POST /auth/logout
-> {"status":"ok"}
Invalidate current token

GET /auth/token-info
-> {"valid":true,"remaining_ttl_s":28750}
Check token validity and remaining TTL

## Screenshot

GET /screenshot ?screen_id=0&cursor_ring=false
-> {"image":"<base64>","width":1920,"height":1080}
Capture full screen

GET /screenshot/region ?x=<integer>&y=<integer>&width=<integer>&height=<integer>
-> {"image":"<base64>","width":400,"height":300}
Capture screen region

GET /screenshot/cursor-position
-> {"x":512,"y":384}
Get current cursor position

GET /screens
-> {"screens":[{"id":0,"width":1920,"height":1080}]}
List available screens

## Mouse

POST /mouse/move
Body: {"x":100,"y":200}
-> {"status":"ok"}
Move cursor to coordinates

POST /mouse/click
Body: {"button":"left"}
-> {"status":"ok"}
Click at current position

POST /mouse/double-click
Body: {"button":"left"}
-> {"status":"ok"}
Double-click at current position

POST /mouse/scroll
Body: {"x":500,"y":300,"delta_x":0,"delta_y":-3}
-> {"status":"ok"}
Scroll at position

POST /mouse/drag
Body: {"from_x":100,"from_y":200,"to_x":300,"to_y":400,"button":"left"}
-> {"status":"ok"}
Drag from one position to another

## Keyboard

POST /keyboard/type
Body: {"text":"Hello, World!"}
-> {"status":"ok"}
Type text

POST /keyboard/hotkey
Body: {"keys":["ctrl",{"char":"s"}]}
-> {"status":"ok"}
Press key combination

## Clipboard

GET /clipboard
-> {"text":"clipboard content"}
Read clipboard contents

POST /clipboard
Body: {"text":"new content"}
-> {"status":"ok"}
Write to clipboard

## Windows

GET /windows
-> {"windows":[{"id":12345,"title":"Terminal","x":0,"y":0,"width":800,"height":600}]}
List all windows

GET /windows/focused
-> {"id":12345,"title":"Terminal","x":0,"y":0,"width":800,"height":600}
Get focused window

POST /windows/{id}/focus
-> {"status":"ok"}
Focus a window by ID

GET /windows/dialog
-> {"dialog":null}
Detect active dialog/modal

## Application

POST /application/open
Body: {"path":"/usr/bin/gedit","args":["file.txt"]}
-> {"status":"ok","pid":12345}
Open an application

GET /application/running ?name=<string>
-> {"running":true,"pids":[12345]}
Check if application is running

POST /application/close
Body: {"window_id":12345}
-> {"status":"ok"}
Close an application window

## Actions

GET /actions ?limit=100
-> [{"timestamp":"...","action":"mouse.click","params":"...","result":"ok","duration_ms":42}]
Get recent action log

## System

GET /health
-> {"status":"ok"}
Health check

