Open-source browser automation
A Python framework for browser agents.
Kernelsphere helps developers build browser agents that read pages, choose actions, execute through Playwright, and verify progress on real websites.
Use the Gemini API as the LLM provider.
Architecture
Modular browser-agent architecture.
Kernelsphere separates page understanding, planning, element selection, execution, state comparison, and goal validation into focused modules that can be composed or reused in custom automation workflows.

How It Works
1. Understand
Capture page context, DOM state, accessibility data, visible elements, and task inputs before acting.
2. Plan
Convert the user goal into browser steps using task planning, page context, and model reasoning.
3. Select and Execute
Match intent to the right page element, then run browser actions through Playwright with retries.
4. Verify
Compare page states, validate goal progress, and loop until the task is complete or a new plan is needed.
Use Cases
- Search product prices across multiple retail sites and return structured results.
- Extract data from pages such as specs, ratings, reviews, paper abstracts, and job listings.
- Build custom browser agents using modules for planning, selection, execution, and validation.
- Research and benchmark different approaches to task-driven web automation.
Open Source
Kernelsphere is released as an open-source Python framework. Contributions are welcome across framework modules, docs, examples, tests, and integrations.