Appearance
Browser Agent Integration Documentation
Overview
The Browser Agent connects to an external API hosted by Prosus that provides web browsing and automation capabilities. This agent can navigate websites, interact with web pages, extract information, and perform automated browser tasks. It's designed for long-running tasks that require web automation.
Unlike the Native Agent which runs client-side, the Browser Agent communicates with a remote REST API service that handles browser automation using headless browsers or browser automation frameworks.
Architecture
High-Level Architecture
Key Characteristics
- External API: Connects to Prosus-hosted API endpoint
- Browser Automation: Uses headless browsers or automation frameworks
- Polling-Based: Uses polling mode for long-running tasks
- Task Management: Implements task initialization and status polling
- Timeout Handling: Maximum 10-minute timeout for task completion
- Streaming Support: Also supports streaming mode for quick responses
API Endpoints
The Browser Agent uses predefined endpoints hosted by Prosus:
Base URLs
- Non-Streaming:
https://mmwnmruxd7.eu-west-1.awsapprunner.com/chat - Streaming:
https://mmwnmruxd7.eu-west-1.awsapprunner.com/chat/stream - Polling:
https://mmwnmruxd7.eu-west-1.awsapprunner.com/(base URL)
Note: These URLs are hardcoded and cannot be modified by users. They are defined in lib/database/repositories/agent-config.repository.ts.
Communication Modes
The Browser Agent supports three communication modes, with polling being the primary mode for long-running tasks:
1. Polling Mode (Primary)
Endpoint: POST {pollingUrl}chat → GET {pollingUrl}{statusUrl}
Behavior:
- Initiates a task and returns a task ID and status URL
- Polls the status endpoint every 30 seconds
- Continues until task completes, fails, or times out (10 minutes max)
- Best for long-running browser automation tasks
Implementation: hooks/use-chat-polling-agent.ts - handlePollingAgentResponse()
2. Streaming Mode
Endpoint: /chat/stream
Behavior:
- Returns Server-Sent Events (SSE) format
- Content is streamed incrementally
- Used for quick browser tasks that complete immediately
Implementation: hooks/use-chat-custom-agent.ts - handleCustomAgentStreamingResponse()
3. Non-Streaming Mode
Endpoint: /chat
Behavior:
- Returns complete response in single JSON object
- Used for quick browser tasks
Implementation: hooks/use-chat-custom-agent.ts - handleCustomAgentNonStreamingResponse()
Polling Mode API
Initialize Task
Endpoint: POST {pollingUrl}chat
Headers:
| Header | Description | Required |
|---|---|---|
Content-Type | application/json | Yes |
x-api-key | Internal API key for authentication | Yes |
x-prosusai-user-email | User's email address | Optional |
Request Body:
typescript
{
message: string; // User's message/task description
history: HistoryMessage[]; // Chat history
}Response:
typescript
{
taskId: string; // Unique task identifier
status: "pending" | "running";
message?: string; // Optional status message
statusUrl: string; // URL to poll for task status
}Note: If statusUrl is not present in the response, the task completed immediately and the response contains the final result.
Poll Task Status
Endpoint: GET {pollingUrl}{statusUrl}
Headers:
| Header | Description | Required |
|---|---|---|
x-api-key | Internal API key for authentication | Yes |
Response:
typescript
{
taskId: string;
status: "pending" | "running" | "complete" | "failed";
createdAt?: string; // ISO timestamp
content?: string; // Final response content (when complete)
products?: Product[]; // Optional product items
}Polling Configuration
- Polling Interval: 30 seconds
- Maximum Attempts: 20 attempts
- Total Timeout: 10 minutes (20 × 30s)
Request Format
Polling Mode Request
bash
curl -X POST https://mmwnmruxd7.eu-west-1.awsapprunner.com/chat \
-H "Content-Type: application/json" \
-H "x-api-key: YOUR_API_KEY" \
-H "x-prosusai-user-email: user@example.com" \
-d '{
"message": "Navigate to example.com and extract the main heading",
"history": [
{
"role": "user",
"content": "Can you help me browse the web?"
},
{
"role": "assistant",
"content": "Sure! What would you like me to do?"
}
]
}'Response:
json
{
"taskId": "task-abc123",
"status": "pending",
"statusUrl": "task/status/task-abc123"
}Polling Status Request
bash
curl -X GET https://mmwnmruxd7.eu-west-1.awsapprunner.com/task/status/task-abc123 \
-H "x-api-key: YOUR_API_KEY"Response (Pending):
json
{
"taskId": "task-abc123",
"status": "running",
"createdAt": "2024-01-15T10:30:00Z"
}Response (Complete):
json
{
"taskId": "task-abc123",
"status": "complete",
"content": "The main heading on example.com is: 'Welcome to Example'",
"products": []
}Streaming Mode Request
Same format as Multi-Agent streaming requests. See Multi-Agent for details.
Response Format
Polling Mode Response
Task Initialization Response:
typescript
{
taskId: string;
status: "pending" | "running";
message?: string;
statusUrl: string;
}Task Status Response:
typescript
{
taskId: string;
status: "pending" | "running" | "complete" | "failed";
createdAt?: string;
content?: string; // Available when status is "complete"
products?: Product[]; // Optional product items
}Streaming Mode Response
Same SSE format as Multi-Agent. See Multi-Agent for details.
Integration Flow
Polling Mode Flow
Code Examples
Example 1: Polling Mode Implementation
typescript
// Inside handlePollingAgentResponse
export async function handlePollingAgentResponse(
content: string,
pollingUrl: string,
signal: AbortSignal,
assistantMessageId: string,
convId: string,
updateMessage: UpdateMessageFn,
handleStreamComplete: HandleStreamCompleteFn,
messages: Message[]
) {
try {
// Show tool call indicator
updateMessage(assistantMessageId, {
toolCall: {
toolName: "browser_polling",
query: content,
},
}, true, convId);
// Initialize the task
const initResponse = await initializePollingTask(
content,
pollingUrl,
messages,
signal
);
const { statusUrl } = initResponse;
// Check if response is already complete
if (!statusUrl) {
const finalContent = (initResponse as any).content || "Task completed.";
const products = (initResponse as any).products || [];
updateMessage(assistantMessageId, {
content: finalContent,
toolCall: undefined,
}, false);
await handleStreamComplete(assistantMessageId, convId, {
content: finalContent,
products,
});
return { content: finalContent, products };
}
// Poll for completion
const POLLING_INTERVAL = 30000; // 30 seconds
const MAX_ATTEMPTS = 20; // 10 minutes total
let attempts = 0;
while (attempts < MAX_ATTEMPTS) {
if (signal.aborted) {
throw new Error("Request aborted");
}
await sleep(POLLING_INTERVAL);
attempts++;
const statusResponse = await pollTaskStatus(
statusUrl,
pollingUrl,
signal
);
if (statusResponse.status === "complete") {
const finalContent = statusResponse.content || "Task completed.";
const products = statusResponse.products || [];
updateMessage(assistantMessageId, {
content: finalContent,
toolCall: undefined,
}, false);
await handleStreamComplete(assistantMessageId, convId, {
content: finalContent,
products,
});
return { content: finalContent, products };
} else if (statusResponse.status === "failed") {
throw new Error("Task failed on server");
}
// Status is still "pending" or "running", continue polling
}
// Timeout reached
throw new Error("Polling timeout: Task did not complete within 10 minutes");
} catch (error) {
updateMessage(assistantMessageId, {
toolCall: undefined,
}, false);
throw error;
}
}Example 2: Task Initialization
typescript
async function initializePollingTask(
content: string,
pollingUrl: string,
messages: Message[],
signal: AbortSignal
): Promise<InitializeTaskResponse> {
const history = messagesToHistory(messages);
const userEmail = getUserEmail();
const response = await fetch(`${pollingUrl}chat`, {
method: "POST",
headers: {
"Content-Type": "application/json",
"x-api-key": process.env.EXPO_PUBLIC_INTERNAL_API_KEY || "",
...(userEmail && { "x-prosusai-user-email": userEmail }),
},
body: JSON.stringify({
message: content,
history,
}),
signal,
});
if (!response.ok) {
throw new Error(`Failed to initialize task: ${response.statusText}`);
}
return await response.json();
}Example 3: Status Polling
typescript
async function pollTaskStatus(
statusUrl: string,
pollingUrl: string,
signal: AbortSignal
): Promise<TaskStatusResponse> {
const response = await fetch(
`${pollingUrl}${statusUrl}`.replaceAll('//', '/'),
{
method: "GET",
headers: {
"x-api-key": process.env.EXPO_PUBLIC_INTERNAL_API_KEY || "",
},
signal,
}
);
if (!response.ok) {
throw new Error(`Failed to poll status: ${response.statusText}`);
}
return await response.json();
}Configuration
Agent Type Selection
The Browser Agent is selected in the Agent Settings screen:
File: app/agent-settings.tsx in the main application
Description: "Web browsing agent that can navigate and interact with websites. Capable of performing automated browser tasks."
Communication Mode
The Browser Agent only supports polling mode by default, but also supports streaming and non-streaming for quick tasks:
File: lib/database/repositories/agent-config.repository.ts
typescript
export function getAvailableCommunicationModes(agentType: AgentType): CommunicationMode[] {
if (agentType === 'browser-agent') {
return ['polling']; // Primary mode
}
// ...
}However, the implementation also supports streaming and non-streaming modes when those endpoints are available.
Environment Variables
| Variable | Description | Source |
|---|---|---|
EXPO_PUBLIC_INTERNAL_API_KEY | Internal API key for Prosus-hosted endpoints | From .env file |
Important: Never commit API keys to version control. All keys should be stored in .env.local (which is git-ignored).
URL Configuration
The Browser Agent URLs are hardcoded and cannot be modified by users:
File: lib/database/repositories/agent-config.repository.ts
typescript
export const BROWSER_AGENT_API_URL = 'https://mmwnmruxd7.eu-west-1.awsapprunner.com/chat';
export const BROWSER_AGENT_API_URL_STREAM = 'https://mmwnmruxd7.eu-west-1.awsapprunner.com/chat/stream';
export const BROWSER_AGENT_POLLING_URL = 'https://mmwnmruxd7.eu-west-1.awsapprunner.com/';Error Handling
Task Initialization Errors
- 4xx Errors: Bad request, invalid parameters
- 5xx Errors: Server errors during task initialization
Polling Errors
- Task Failed: Server reports task status as "failed"
- Polling Timeout: Task does not complete within 10 minutes
- Network Errors: Connection failures during polling
- Abort Signal: User cancels the request
Implementation
typescript
try {
// Initialize task
const initResponse = await initializePollingTask(...);
// Poll for status
while (attempts < MAX_ATTEMPTS) {
if (signal.aborted) {
throw new Error("Request aborted");
}
const statusResponse = await pollTaskStatus(...);
if (statusResponse.status === "failed") {
throw new Error("Task failed on server");
}
if (statusResponse.status === "complete") {
// Success
return { content: statusResponse.content, products: statusResponse.products };
}
// Continue polling
}
throw new Error("Polling timeout: Task did not complete within 10 minutes");
} catch (error) {
// Clear tool call indicator
updateMessage(assistantMessageId, { toolCall: undefined }, false);
throw error;
}Task Status Indicators
UI Feedback
During polling, the UI shows:
- Tool Call Indicator: "browser_polling" with the user's query
- Status Updates: Polling continues in background
- Final Response: Tool indicator is cleared and content is displayed
Status States
- Pending: Task is queued, waiting to start
- Running: Task is executing browser automation
- Complete: Task finished successfully, content available
- Failed: Task encountered an error
Use Cases
Typical Browser Agent Tasks
- Web Scraping: Extract information from websites
- Form Filling: Automate form submissions
- Navigation: Navigate through multi-page workflows
- Data Extraction: Collect data from dynamic web pages
- Screenshot Capture: Take screenshots of web pages
- Content Analysis: Analyze web page content
Example Queries
- "Navigate to example.com and tell me what's on the homepage"
- "Search for 'React Native' on Google and summarize the first 3 results"
- "Fill out the contact form on example.com with my information"
- "Extract all product prices from this e-commerce page"
Limitations
- External Dependency: Requires Prosus-hosted API to be available
- Long-Running Tasks: Maximum 10-minute timeout
- Polling Overhead: 30-second polling interval adds latency
- Network Required: Requires internet connection for all requests
- Hardcoded URLs: Endpoints cannot be customized by users
- Resource Intensive: Browser automation is resource-intensive on the server side
Performance Considerations
Polling Interval
- 30 seconds: Balance between responsiveness and server load
- 20 attempts: Maximum 10 minutes total wait time
- Early Completion: Task may complete before timeout
Optimization Tips
- Use streaming mode for quick tasks when available
- Set appropriate timeouts based on expected task duration
- Handle abort signals to allow user cancellation
- Cache task results if applicable
Future Enhancements
Potential improvements:
- WebSocket support for real-time status updates
- Configurable polling intervals
- Task progress indicators (percentage complete)
- Support for task cancellation
- Enhanced error recovery and retry logic
- Support for multiple concurrent tasks
Related Documentation
- Native Agent - Native Agent documentation
- Multi-Agent - Multi-Agent documentation
- Custom Agent - Custom Agent documentation
- Agent Settings UI:
app/agent-settings.tsxin the main application - Polling Agent Handler:
hooks/use-chat-polling-agent.tsin the main application - Custom Agent Handler:
hooks/use-chat-custom-agent.tsin the main application