Open-source browser automation

A Python framework for browser agents.

Kernelsphere helps developers build browser agents that read pages, choose actions, execute through Playwright, and verify progress on real websites.

Use the Gemini API as the LLM provider.

Architecture

Modular browser-agent architecture.

Kernelsphere separates page understanding, planning, element selection, execution, state comparison, and goal validation into focused modules that can be composed or reused in custom automation workflows.

Kernelsphere architecture diagram

How It Works

1. Understand

Capture page context, DOM state, accessibility data, visible elements, and task inputs before acting.

2. Plan

Convert the user goal into browser steps using task planning, page context, and model reasoning.

3. Select and Execute

Match intent to the right page element, then run browser actions through Playwright with retries.

4. Verify

Compare page states, validate goal progress, and loop until the task is complete or a new plan is needed.

Use Cases

  • Search product prices across multiple retail sites and return structured results.
  • Extract data from pages such as specs, ratings, reviews, paper abstracts, and job listings.
  • Build custom browser agents using modules for planning, selection, execution, and validation.
  • Research and benchmark different approaches to task-driven web automation.

Open Source

Kernelsphere is released as an open-source Python framework. Contributions are welcome across framework modules, docs, examples, tests, and integrations.