What is Kernelsphere
Kernelsphere is a web automation agent. You describe a task in plain English, give it a starting URL, and it works through that task in a real browser, reading pages, filling forms, clicking buttons, handling login screens, and returning a structured result.
Most automation tools are built around selectors and scripts written against a specific version of a page. When the site changes, the script breaks. Kernelsphere reads the page fresh at every step and decides what to do based on what is actually there, so a layout update does not break the task.
It is built for general use. Kernelsphere can be instructed to do the task.
How It Works
Every task runs as a sequence of steps. At each step, the agent takes a screenshot of the current page and collects all visible interactive elements: buttons, input fields, links, and dropdowns. Those elements, together with the screenshot and the original task, are sent to Google Gemini. The model returns the next action, specifying which element to interact with and what to do.
The browser carries out that action. Kernelsphere then checks whether the page changed in any meaningful way, whether the URL updated, the content shifted, or a new dialog appeared. If nothing changed, the agent recognises the action had no effect and approaches the next step differently. This cycle continues until the task is complete or the step limit is reached. When the target information is on the page, Kernelsphere extracts it, checks it against the original task, and writes the final result.
Getting stuck in a loop is one of the harder problems in browser automation. If the same action runs on the same element more than once without the page responding, or if several consecutive steps produce no change, a stagnation detector activates. Rather than repeating itself, the agent tries a different element, a different approach, or navigates back and starts from another point.
Getting Started
Requirements
Kernelsphere requires Python 3.10 or higher, a Google Gemini API key, and Chromium. Chromium is installed through Playwright as part of the setup.
Installation
git clone https://github.com/kernelsphere-ai/kernelsphere.git
cd kernelsphere
pip install -r requirements.txt
playwright install chromium
Configuration
Create a .env file in the project folder. The only required setting to get started is your Gemini API key:
GOOGLE_API_KEY=your_gemini_api_key
Settings for proxy rotation, cloud browser sessions, and login credentials are covered in their respective sections below. None of them are needed for basic use.
Running Your First Task
python main.py run "Find the return policy on this product page" \
--start-url "https://example.com/product"
When the task finishes, the result is saved to final_output.json. The output format is described in the next section.
Writing Tasks
A task is a plain English instruction. Write it the way you would describe a goal to another person. The more specific you are about what you want back, the more reliable the result.
Consider the difference between these two:
The first gives the agent a clear goal with measurable criteria. The second does not define what a valid result looks like, so the agent has no way to know when it is done.
If the task involves conditions such as a minimum rating, a price ceiling, or a date range, include them in the task text. Kernelsphere parses those conditions and uses them when validating the final result.
Available Flags
| Flag | Default | Description |
|---|---|---|
| --task | required | Task description in plain English |
| --start-url | required | Where the agent begins |
| --max-steps | 30 | Steps before the agent stops |
| --headless | false | Run without a visible browser window |
| --model | gemini-2.0-flash-exp | Gemini model to use |
| --output | final_output.json | Path for the result file |
| --use-browserbase | false | Use a cloud browser with CAPTCHA handling |
| --browserbase-timeout | 600 | Session timeout in seconds, max 21600 |
| --use-proxy | false | Enable proxy rotation |
| --proxy-country | none | Preferred country code, e.g. US or DE |
| --viewport-width | 1280 | Browser width in pixels |
| --viewport-height | 720 | Browser height in pixels |
Output Format
Every completed task produces a JSON file with the answer and a full record of every action taken during the run.
{
"task": "Find the return policy on this product page",
"start_url": "https://example.com/product",
"success": true,
"total_steps": 5,
"final": {
"final_answer": "Returns accepted within 30 days with original receipt.",
"reasoning": "Found in the Returns section at the bottom of the page"
},
"steps": [
{
"step": 1,
"url": "https://example.com/product",
"actions": [
{
"action": "scroll",
"success": true,
"dom_changed": true
}
]
}
]
}
The success field is true when the agent found and validated an answer. When it is false, the agent either reached the step limit or could not access the information. The steps array is most useful when a task fails, as it shows exactly what happened at each step.
Authentication
Many tasks involve sites that require a login. Kernelsphere handles this without needing special configuration in most cases.
When the agent reaches a login page, it detects the form, fills in the credentials you provide, and continues with the original task. It confirms the login succeeded by watching for changes on the page after submission. Once authenticated, it will not try to log in again during the same session even if a login form appears later.
Credentials are passed at runtime:
python main.py run "Download my latest invoice" \
--start-url "https://example.com/login" \
--email "[email protected]" \
--password "yourpassword"
For sites that send a verification code to your email after login, Kernelsphere can read that code and enter it automatically when IMAP access is configured. Add the following to your .env file:
OTP_EMAIL[email protected]
OTP_EMAIL_PASSWORD=your_app_password
OTP_IMAP_SERVER=imap.example.com
The agent polls the inbox every few seconds and waits up to 60 seconds for the code to arrive. If your provider takes longer, extend this with --otp-timeout.
CAPTCHA Handling
Every browser session opens with a stealth configuration that reduces the likelihood of being flagged as automated. This covers browser flags, JavaScript properties that detection scripts commonly check, and user agent randomisation. For most sites this is sufficient and no CAPTCHA appears.
For sites that use active CAPTCHA services, Browserbase cloud sessions offer a more reliable path. These run on residential IP addresses with built-in CAPTCHA solving. Add your credentials to .env and pass --use-browserbase when running the task:
BROWSERBASE_API_KEY=your_key
BROWSERBASE_PROJECT_ID=your_project_id
python main.py run "your task" \
--start-url "https://example.com" \
--use-browserbase
Sessions default to a 600-second timeout. For longer tasks this can be extended up to 21600 seconds with --browserbase-timeout.
If you are running locally without Browserbase and a CAPTCHA does appear, the agent pauses and waits up to 120 seconds. During that window you can solve it manually in the visible browser. The wait time is adjustable with --captcha-max-wait.
Proxy Support
Kernelsphere supports proxy rotation per session. This is useful when a task requires a specific geographic location, or when running a large number of tasks against a site that rate-limits by IP address.
Supported providers include ProxyEmpire, Smartproxy, Oxylabs, Webshare, Proxy6, and custom configurations. Add your proxy details to .env:
PROXY_LIST=host1:port1:user1:pass1,host2:port2:user2:pass2
PROXY_PROVIDER=smartproxy
PROXY_TYPE=residential
Enable rotation with --use-proxy at runtime. A preferred country can be set with --proxy-country using a standard code such as US or DE. If no proxies are available for that country, the agent falls back to any healthy proxy in the pool.
The proxy manager tracks performance across sessions. A proxy that fails three times in a row is marked unhealthy and skipped. Pass --enable-proxy-health-check to run this monitoring in the background while tasks are running.
Running at Scale
Multiple tasks can run concurrently using the parallel runner. The default is 3 simultaneous browser sessions. Each session uses roughly 300 to 500 MB of memory, so the practical limit depends on available RAM. Tasks that fail are retried automatically before being marked as failed.
python parallel_runner.py \
--tasks-file data/tasks.jsonl \
--output-dir results \
--concurrency 5 \
--max-steps 30
For large task sets, splitting into batches before running is more manageable. Results from multiple runs can be merged into a single file afterward:
# split a large file into batches of 50
python batch_processor.py split data/tasks.jsonl --batch-size 50
# merge results after all batches finish
python batch_processor.py aggregate results/batch_1 results/batch_2 \
--output combined.json
A progress monitor is included for tracking active runs. Progress is written after each completed task, so monitoring can be started partway through and it will reflect the current state:
python progress_monitor.py --mode progress # current snapshot
python progress_monitor.py --mode monitor # live, updates every 10s
python progress_monitor.py --mode summary # final stats after completion
python progress_monitor.py --mode failures # failed tasks with reasons
python progress_monitor.py --mode eta # estimated time remaining
Logs and Task Records
Every task produces a log file with each step taken, the action performed, the element targeted, the reasoning the agent used, and the outcome. These files are organised by site name under the logs/ directory.
A cumulative summary is maintained in task_tracker.json, recording success and failure counts per site across all runs. When a particular site is causing consistent failures, this file surfaces it without needing to go through individual task logs.
Contributing
Kernelsphere is open source under the MIT License. Contributions are welcome. Bug fixes, improvements to extraction logic, better recovery behaviour, new site-specific handling, and documentation updates are all useful directions.
Getting Set Up
Fork the repository on GitHub, then clone your fork and set it up the same way as a regular installation:
git clone https://github.com/your-username/kernelsphere.git
cd kernelsphere
pip install -r requirements.txt
playwright install chromium
Making Changes
Create a branch before starting work:
git checkout -b your-branch-name
Keep each pull request focused on a single fix or feature. It makes review faster and keeps the commit history readable.
Where to Start
Open issues on GitHub are a reasonable starting point. Areas that commonly benefit from contribution:
Submitting a Pull Request
Push your branch to your fork and open a pull request against the main branch. Describe what the change does and reference any relevant issue. A short explanation of the problem and how the change addresses it makes pull request better.