BrowserAPI Query Methods
The BrowserAPI class provides six query methods through the
session.browser interface that let you extract content and locate elements
within an adaptive mode BBRE session. The getHtml() method returns the
complete HTML source of the current page as a string, giving you access to the full
DOM structure including dynamically rendered content that would not appear in a simple
HTTP response. The getText(selector) method extracts the visible text
content of any element matching a CSS selector, which is essential for reading labels,
headings, paragraphs, table cells, and any other text-bearing elements on the page.
The getTitle() method returns the current page title from the
<title> tag, useful for verifying navigation and confirming that the
correct page has loaded. The getUrl() method returns the current browser
URL, which is critical for tracking redirects, confirming successful navigation, and
extracting URL parameters after form submissions. The find(selector)
method locates an element by CSS selector and returns the raw action result, allowing
you to verify element existence before interacting with it. The
findByText(text, options) method searches for elements containing specific
text content with optional filtering, which is particularly useful when elements lack
unique CSS selectors but have identifiable text. Together, these six methods form the
data extraction layer of the BrowserAPI, enabling you to build sophisticated scraping
workflows, validate page state during automation sequences, extract structured data
from tables and lists, and verify that interactions produced the expected results. This
page provides complete documentation for every query method, including method
signatures, parameter tables, return types, practical code examples covering product
data extraction, table parsing, page verification, content monitoring, and structured
data collection, along with best practices, common issues, and links to related
documentation.
All BrowserAPI query methods require an active session running in
adaptive mode. Adaptive mode sessions launch a real browser instance
managed by the BBRE engine, which is necessary for querying the live DOM, extracting
rendered text content, and locating elements on the page. If you attempt to call query
methods on a passive mode session, the operation will fail with a
BROWSER_ACTION_FAILED error. Create your session with
mode: "adaptive" to use these methods. For simple HTTP requests where you
only need the raw HTML response without browser rendering, consider using passive mode
with session.request() instead, which is faster and consumes fewer
credits.
session.browser.getHtml()
The getHtml() method retrieves the complete HTML source of the current
page as rendered by the browser. Unlike a raw HTTP response that only contains the
initial server-sent HTML, getHtml() returns the fully rendered DOM
including all content generated by JavaScript execution, AJAX calls, and dynamic
rendering. This makes it the ideal method for extracting data from single-page
applications, JavaScript-heavy websites, and pages that load content asynchronously
after the initial page load. The method sends an html action to the BBRE
engine and returns the content as a string. If the page has no content or the action
fails to extract the HTML, an empty string is returned instead of throwing an error.
Method Signature
const html = await session.browser.getHtml();
Parameters
The getHtml() method does not accept any parameters. It captures the
current state of the entire page DOM at the moment of the call.
| Parameter | Type | Required | Description |
|---|---|---|---|
| This method does not accept any parameters. | |||
Return Type
The method returns a Promise that resolves to a string
containing the full HTML source of the current page. The returned HTML reflects the
current state of the DOM, including any modifications made by JavaScript after the
initial page load. If the content cannot be extracted, an empty string is returned.
| Return Type | Description |
|---|---|
| string | The complete HTML source of the current page. Returns an empty string if the content cannot be extracted. |
Basic Usage
The simplest usage of getHtml() navigates to a page and captures its
full HTML content. This is useful for archiving pages, feeding HTML into a parser,
or inspecting the rendered DOM structure.
const BBREClient = require("mydisctsolver-bbre");
const client = new BBREClient({ apiKey: "YOUR_API_KEY" });
async function capturePageHtml() {
const session = await client.createSession({ mode: "adaptive" });
try {
await session.start();
await session.browser.navigate("https://example.com/products");
const html = await session.browser.getHtml();
console.log("Page HTML length:", html.length, "characters");
console.log("First 200 characters:", html.substring(0, 200));
} finally {
await session.close();
}
}
capturePageHtml();
Extract Data from Rendered HTML
Combine getHtml() with a parsing library to extract structured data from
the rendered page. Since the HTML includes dynamically loaded content, you can extract
data that would be invisible in a simple HTTP response. This pattern is particularly
effective for single-page applications that render product listings, search results,
or data tables via JavaScript.
const BBREClient = require("mydisctsolver-bbre");
const cheerio = require("cheerio");
const client = new BBREClient({ apiKey: "YOUR_API_KEY" });
async function extractProductData() {
const session = await client.createSession({ mode: "adaptive" });
try {
await session.start();
await session.browser.navigate("https://shop.example.com/catalog");
await session.browser.waitForSelector(".product-card");
const html = await session.browser.getHtml();
const $ = cheerio.load(html);
const products = [];
$(".product-card").each(function () {
products.push({
name: $(this).find(".product-name").text().trim(),
price: $(this).find(".product-price").text().trim(),
rating: $(this).find(".product-rating").attr("data-score"),
inStock: $(this).find(".stock-badge").text().includes("In Stock")
});
});
console.log("Found", products.length, "products");
products.forEach(function (p) {
console.log(p.name, "-", p.price, "- Rating:", p.rating);
});
return products;
} finally {
await session.close();
}
}
extractProductData();
Compare HTML Before and After Interaction
Capture the page HTML before and after an interaction to verify that the expected changes occurred. This is useful for testing dynamic interfaces where clicking a button, submitting a form, or toggling a filter modifies the page content.
const BBREClient = require("mydisctsolver-bbre");
const client = new BBREClient({ apiKey: "YOUR_API_KEY" });
async function verifyFilterEffect() {
const session = await client.createSession({ mode: "adaptive" });
try {
await session.start();
await session.browser.navigate("https://shop.example.com/products");
await session.browser.waitForSelector(".product-list");
const htmlBefore = await session.browser.getHtml();
await session.browser.click(".filter-electronics");
await session.browser.waitForSelector(".product-list.filtered");
const htmlAfter = await session.browser.getHtml();
console.log("HTML before filter:", htmlBefore.length, "characters");
console.log("HTML after filter:", htmlAfter.length, "characters");
console.log("Content changed:", htmlBefore !== htmlAfter);
} finally {
await session.close();
}
}
verifyFilterEffect();
session.browser.getText(selector)
The getText() method extracts the visible text content of the first
element matching the given CSS selector. It returns only the text that a user would
see on the page, stripping away all HTML tags, inline styles, and hidden elements.
This method is the most direct way to read specific pieces of information from a page
without parsing the full HTML. It sends a getText action to the BBRE
engine with the provided selector and returns the extracted text as a string. If the
element is not found or contains no text, an empty string is returned. The
getText() method is ideal for reading prices, product names, status
messages, error notifications, table cell values, and any other text content that you
need to capture from a specific element on the page.
Method Signature
const text = await session.browser.getText(selector);
Parameters
The getText() method accepts a single required parameter: the CSS
selector string that identifies the target element whose text content you want to
extract.
| Parameter | Type | Required | Description |
|---|---|---|---|
selector |
string | Required | A CSS selector string that identifies the element to extract text from. The method targets the first matching element. Supports standard CSS selectors including class selectors (.price), ID selectors (#total), attribute selectors ([data-field="name"]), and compound selectors (.product-card .title). |
Return Type
The method returns a Promise that resolves to a string
containing the visible text content of the matched element. The text includes content
from all child elements, concatenated together. If the element is not found or has no
text content, an empty string is returned.
| Return Type | Description |
|---|---|
| string | The visible text content of the matched element. Returns an empty string if the element is not found or contains no text. |
Basic Usage
The simplest usage of getText() reads the text content of a single
element identified by a CSS selector. This is useful for extracting headings, labels,
prices, or any other visible text on the page.
const BBREClient = require("mydisctsolver-bbre");
const client = new BBREClient({ apiKey: "YOUR_API_KEY" });
async function readProductPrice() {
const session = await client.createSession({ mode: "adaptive" });
try {
await session.start();
await session.browser.navigate("https://shop.example.com/product/12345");
await session.browser.waitForSelector(".product-price");
const price = await session.browser.getText(".product-price");
console.log("Product price:", price);
const name = await session.browser.getText("h1.product-title");
console.log("Product name:", name);
return { name, price };
} finally {
await session.close();
}
}
readProductPrice();
Extract Multiple Fields from a Page
Use multiple getText() calls to extract several pieces of information
from a single page. This pattern is common when scraping product details, article
metadata, or user profile information where each field lives in a different element.
const BBREClient = require("mydisctsolver-bbre");
const client = new BBREClient({ apiKey: "YOUR_API_KEY" });
async function extractArticleMetadata() {
const session = await client.createSession({ mode: "adaptive" });
try {
await session.start();
await session.browser.navigate("https://blog.example.com/article/web-scraping-guide");
await session.browser.waitForSelector("article");
const title = await session.browser.getText("article h1");
const author = await session.browser.getText(".author-name");
const publishDate = await session.browser.getText(".publish-date");
const readTime = await session.browser.getText(".read-time");
const category = await session.browser.getText(".article-category");
const article = { title, author, publishDate, readTime, category };
console.log("Article metadata:", article);
return article;
} finally {
await session.close();
}
}
extractArticleMetadata();
Verify Action Results with getText()
After performing an interaction such as submitting a form or clicking a button, use
getText() to verify that the expected result appeared on the page. This
is a reliable way to confirm that your automation workflow is progressing correctly.
const BBREClient = require("mydisctsolver-bbre");
const client = new BBREClient({ apiKey: "YOUR_API_KEY" });
async function submitAndVerify() {
const session = await client.createSession({ mode: "adaptive" });
try {
await session.start();
await session.browser.navigate("https://app.example.com/login");
await session.browser.fill("#email", "[email protected]");
await session.browser.fill("#password", "securepassword");
await session.browser.click("#login-button");
await session.browser.waitForSelector(".welcome-message");
const welcomeText = await session.browser.getText(".welcome-message");
console.log("Welcome message:", welcomeText);
if (welcomeText.includes("Welcome")) {
console.log("Login successful");
} else {
console.log("Unexpected response after login");
}
} finally {
await session.close();
}
}
submitAndVerify();
session.browser.getTitle()
The getTitle() method returns the current page title as a string. The
page title is the text defined in the <title> HTML tag, which
browsers display in the tab header. This method is useful for verifying that a
navigation action landed on the correct page, tracking page transitions during
multi-step workflows, and extracting the title for logging or reporting purposes.
The method sends a getTitle action to the BBRE engine and returns the
title value directly. If the page has no title tag or the title is empty, an empty
string is returned. Since many websites include the page name, product name, or
section name in the title, this method provides a quick way to confirm page identity
without parsing the full HTML or searching for specific elements.
Method Signature
const title = await session.browser.getTitle();
Parameters
The getTitle() method does not accept any parameters. It reads the title
of the currently loaded page in the browser.
| Parameter | Type | Required | Description |
|---|---|---|---|
| This method does not accept any parameters. | |||
Return Type
| Return Type | Description |
|---|---|
| string | The current page title as defined in the <title> tag. Returns an empty string if the page has no title. |
Verify Navigation with Page Title
After navigating to a page, use getTitle() to confirm that the browser
loaded the expected page. This is a lightweight verification that does not require
searching for specific elements or parsing the page content.
const BBREClient = require("mydisctsolver-bbre");
const client = new BBREClient({ apiKey: "YOUR_API_KEY" });
async function verifyPageNavigation() {
const session = await client.createSession({ mode: "adaptive" });
try {
await session.start();
await session.browser.navigate("https://dashboard.example.com");
const title = await session.browser.getTitle();
console.log("Page title:", title);
if (title.includes("Dashboard")) {
console.log("Successfully navigated to the dashboard");
} else {
console.log("Navigation may have been redirected");
console.log("Expected dashboard, got:", title);
}
} finally {
await session.close();
}
}
verifyPageNavigation();
Track Page Transitions in Multi-Step Workflow
In multi-step workflows such as checkout processes or registration flows, use
getTitle() at each step to log the page transitions and verify that the
workflow is progressing through the expected sequence of pages.
const BBREClient = require("mydisctsolver-bbre");
const client = new BBREClient({ apiKey: "YOUR_API_KEY" });
async function trackCheckoutFlow() {
const session = await client.createSession({ mode: "adaptive" });
try {
await session.start();
const steps = [];
await session.browser.navigate("https://shop.example.com/cart");
steps.push(await session.browser.getTitle());
console.log("Step 1:", steps[steps.length - 1]);
await session.browser.click("#proceed-to-checkout");
await session.browser.waitForSelector("#shipping-form");
steps.push(await session.browser.getTitle());
console.log("Step 2:", steps[steps.length - 1]);
await session.browser.fillForm({
"#address": "123 Main Street",
"#city": "San Francisco",
"#zip": "94102"
});
await session.browser.click("#continue-to-payment");
await session.browser.waitForSelector("#payment-form");
steps.push(await session.browser.getTitle());
console.log("Step 3:", steps[steps.length - 1]);
console.log("Checkout flow pages:", steps);
return steps;
} finally {
await session.close();
}
}
trackCheckoutFlow();
session.browser.getUrl()
The getUrl() method returns the current URL of the browser as a string.
This includes the full URL with protocol, domain, path, query parameters, and
fragment identifier. The method is essential for tracking redirects after form
submissions, verifying that navigation actions reached the intended destination,
extracting dynamic URL parameters generated by the target website, and detecting
unexpected redirects to login pages or error pages. The method sends a
getUrl action to the BBRE engine and returns the URL value directly.
Unlike getTitle() which reads a static HTML tag, getUrl()
reflects the actual browser location which may differ from the originally navigated
URL due to server-side redirects, client-side routing, or URL rewrites.
Method Signature
const url = await session.browser.getUrl();
Parameters
The getUrl() method does not accept any parameters. It reads the current
URL from the browser address bar.
| Parameter | Type | Required | Description |
|---|---|---|---|
| This method does not accept any parameters. | |||
Return Type
| Return Type | Description |
|---|---|
| string | The full current URL of the browser including protocol, domain, path, query parameters, and fragment. Returns an empty string if the URL cannot be retrieved. |
Detect Redirects After Form Submission
After submitting a form, the server often redirects the browser to a new URL. Use
getUrl() to capture the redirect destination and extract any dynamic
parameters such as order IDs, confirmation codes, or session tokens embedded in the
URL.
const BBREClient = require("mydisctsolver-bbre");
const client = new BBREClient({ apiKey: "YOUR_API_KEY" });
async function captureRedirectUrl() {
const session = await client.createSession({ mode: "adaptive" });
try {
await session.start();
await session.browser.navigate("https://app.example.com/login");
await session.browser.fill("#username", "testuser");
await session.browser.fill("#password", "testpass");
await session.browser.click("#submit-login");
await session.browser.waitForSelector(".dashboard-header");
const currentUrl = await session.browser.getUrl();
console.log("Redirected to:", currentUrl);
const urlObj = new URL(currentUrl);
console.log("Path:", urlObj.pathname);
console.log("Search params:", urlObj.search);
return currentUrl;
} finally {
await session.close();
}
}
captureRedirectUrl();
Verify URL After Navigation
Some websites use client-side routing that changes the URL without a full page reload.
Use getUrl() to verify that the URL matches the expected pattern after
clicking links or triggering navigation events within a single-page application.
const BBREClient = require("mydisctsolver-bbre");
const client = new BBREClient({ apiKey: "YOUR_API_KEY" });
async function verifyUrlAfterNavigation() {
const session = await client.createSession({ mode: "adaptive" });
try {
await session.start();
await session.browser.navigate("https://app.example.com");
await session.browser.click("a[href='/settings']");
await session.browser.waitForSelector(".settings-page");
const url = await session.browser.getUrl();
console.log("Current URL:", url);
if (url.includes("/settings")) {
console.log("Navigation to settings page confirmed");
} else {
console.log("URL does not match expected pattern");
}
} finally {
await session.close();
}
}
verifyUrlAfterNavigation();
Extract URL Parameters
Many web applications embed important data in URL parameters after actions like
searches, filters, or form submissions. Use getUrl() combined with the
URL constructor to parse and extract these parameters programmatically.
const BBREClient = require("mydisctsolver-bbre");
const client = new BBREClient({ apiKey: "YOUR_API_KEY" });
async function extractSearchParameters() {
const session = await client.createSession({ mode: "adaptive" });
try {
await session.start();
await session.browser.navigate("https://shop.example.com");
await session.browser.fill("#search-input", "wireless headphones");
await session.browser.click("#search-button");
await session.browser.waitForSelector(".search-results");
const url = await session.browser.getUrl();
const urlObj = new URL(url);
console.log("Search URL:", url);
console.log("Query parameter:", urlObj.searchParams.get("q"));
console.log("Page:", urlObj.searchParams.get("page") || "1");
console.log("Sort:", urlObj.searchParams.get("sort") || "relevance");
return {
query: urlObj.searchParams.get("q"),
page: urlObj.searchParams.get("page") || "1",
sort: urlObj.searchParams.get("sort") || "relevance"
};
} finally {
await session.close();
}
}
extractSearchParameters();
session.browser.find(selector)
The find() method locates an element on the page by CSS selector and
returns the raw action result from the BBRE engine. This method is primarily used to
check whether an element exists on the page before attempting to interact with it.
Unlike getText() which extracts text content, or click()
which performs an action, find() simply queries the DOM for the presence
of a matching element. The method accepts either a plain CSS selector string or an
object with a selector property, giving you flexibility in how you pass
the selector. The method sends a find action to the BBRE engine and
returns the complete action result object. This is useful for conditional logic in
automation workflows where you need to take different paths depending on whether
certain elements are present on the page, such as checking for error messages,
optional form fields, pagination buttons, or dynamic content that may or may not
have loaded.
Method Signature
const result = await session.browser.find(selector);
Parameters
The find() method accepts a single parameter that can be either a CSS
selector string or an object containing a selector property. When you
pass a string, it is used directly as the CSS selector. When you pass an object, the
selector property is extracted and used as the CSS selector.
| Parameter | Type | Required | Description |
|---|---|---|---|
selector |
string | object | Required | A CSS selector string (e.g., ".product-card") or an object with a selector property (e.g., { selector: ".product-card" }). Both formats produce the same result. The object format is useful when building selectors dynamically or passing selector configurations from external sources. |
Return Type
The method returns a Promise that resolves to the raw action result
object from the BBRE engine. The structure of this object depends on the engine
response and typically contains information about whether the element was found and
its properties.
| Return Type | Description |
|---|---|
| object | The raw action result from the BBRE engine containing information about the found element. The result structure depends on the engine response. |
Check Element Existence Before Interaction
Use find() to verify that an element exists on the page before attempting
to click it, fill it, or extract its text. This prevents errors when dealing with
pages that may or may not contain certain elements depending on the user state,
page variant, or dynamic content loading.
const BBREClient = require("mydisctsolver-bbre");
const client = new BBREClient({ apiKey: "YOUR_API_KEY" });
async function conditionalInteraction() {
const session = await client.createSession({ mode: "adaptive" });
try {
await session.start();
await session.browser.navigate("https://shop.example.com/product/12345");
await session.browser.waitForSelector(".product-page");
const addToCartResult = await session.browser.find("#add-to-cart");
console.log("Add to cart button found:", addToCartResult);
const outOfStockResult = await session.browser.find(".out-of-stock-notice");
console.log("Out of stock notice found:", outOfStockResult);
} finally {
await session.close();
}
}
conditionalInteraction();
Using Object Selector Format
The find() method also accepts an object with a selector
property. This format is useful when you are building selectors dynamically, storing
selector configurations in data structures, or passing selectors from external
configuration files.
const BBREClient = require("mydisctsolver-bbre");
const client = new BBREClient({ apiKey: "YOUR_API_KEY" });
async function findWithObjectSelector() {
const session = await client.createSession({ mode: "adaptive" });
try {
await session.start();
await session.browser.navigate("https://shop.example.com/products");
const selectors = [
{ selector: ".product-grid" },
{ selector: ".pagination" },
{ selector: ".filter-sidebar" },
{ selector: ".sort-dropdown" }
];
for (const selectorObj of selectors) {
const result = await session.browser.find(selectorObj);
console.log("Selector:", selectorObj.selector, "Result:", result);
}
} finally {
await session.close();
}
}
findWithObjectSelector();
Detect Page State with find()
Use find() to detect the current state of a page by checking for the
presence of state-specific elements. This pattern is useful for handling pages that
can be in different states such as loading, loaded, error, or empty.
const BBREClient = require("mydisctsolver-bbre");
const client = new BBREClient({ apiKey: "YOUR_API_KEY" });
async function detectPageState() {
const session = await client.createSession({ mode: "adaptive" });
try {
await session.start();
await session.browser.navigate("https://app.example.com/dashboard");
const loadingResult = await session.browser.find(".loading-spinner");
const errorResult = await session.browser.find(".error-message");
const contentResult = await session.browser.find(".dashboard-content");
const emptyResult = await session.browser.find(".empty-state");
console.log("Loading spinner:", loadingResult);
console.log("Error message:", errorResult);
console.log("Dashboard content:", contentResult);
console.log("Empty state:", emptyResult);
} finally {
await session.close();
}
}
detectPageState();
session.browser.findByText(text, options)
The findByText() method searches for elements on the page that contain
the specified text content. This method is particularly valuable when elements do not
have unique CSS selectors, IDs, or class names, but can be identified by their visible
text. Many real-world web pages use generic class names or dynamically generated
selectors that are difficult to target with CSS selectors alone. In these cases,
findByText() provides a reliable alternative by matching elements based
on what the user actually sees on the page. The method sends a findByText
action to the BBRE engine with the search text and any additional options, and returns
the action result. The optional options parameter allows you to refine
the search by specifying additional criteria such as element type or matching behavior.
Method Signature
const result = await session.browser.findByText(text, options);
Parameters
The findByText() method accepts a required text string to search for and
an optional options object to refine the search behavior.
| Parameter | Type | Required | Description |
|---|---|---|---|
text |
string | Required | The text content to search for within page elements. The search matches elements that contain this text in their visible content. The matching behavior depends on the options provided. |
options |
object | Optional | An optional configuration object to refine the text search. The options are spread into the action parameters alongside the text value, allowing you to pass additional criteria supported by the BBRE engine such as element tag filtering or match type. |
Return Type
The method returns a Promise that resolves to the raw action result
object from the BBRE engine containing information about the found element or elements
matching the specified text.
| Return Type | Description |
|---|---|
| object | The raw action result from the BBRE engine containing information about elements matching the specified text content. |
Find Elements by Visible Text
The most common use of findByText() is locating elements by their visible
text when CSS selectors are not reliable. This is especially useful for buttons,
links, menu items, and labels that have descriptive text but lack unique identifiers.
const BBREClient = require("mydisctsolver-bbre");
const client = new BBREClient({ apiKey: "YOUR_API_KEY" });
async function findElementsByText() {
const session = await client.createSession({ mode: "adaptive" });
try {
await session.start();
await session.browser.navigate("https://shop.example.com/products");
const addToCartButton = await session.browser.findByText("Add to Cart");
console.log("Add to Cart button:", addToCartButton);
const viewDetailsLink = await session.browser.findByText("View Details");
console.log("View Details link:", viewDetailsLink);
const outOfStockLabel = await session.browser.findByText("Out of Stock");
console.log("Out of Stock label:", outOfStockLabel);
} finally {
await session.close();
}
}
findElementsByText();
Find with Options
Pass an options object as the second parameter to refine the text search. The options are merged with the text parameter and sent to the BBRE engine, allowing you to specify additional search criteria supported by the engine.
const BBREClient = require("mydisctsolver-bbre");
const client = new BBREClient({ apiKey: "YOUR_API_KEY" });
async function findWithOptions() {
const session = await client.createSession({ mode: "adaptive" });
try {
await session.start();
await session.browser.navigate("https://app.example.com/settings");
const saveButton = await session.browser.findByText("Save Changes", {
tag: "button"
});
console.log("Save button:", saveButton);
const deleteLink = await session.browser.findByText("Delete Account", {
tag: "a"
});
console.log("Delete link:", deleteLink);
} finally {
await session.close();
}
}
findWithOptions();
Verify Confirmation Messages
After performing an action, use findByText() to verify that the expected
confirmation or success message appeared on the page. This is more resilient than
using CSS selectors because confirmation messages often share generic class names
across different pages.
const BBREClient = require("mydisctsolver-bbre");
const client = new BBREClient({ apiKey: "YOUR_API_KEY" });
async function verifyConfirmationMessage() {
const session = await client.createSession({ mode: "adaptive" });
try {
await session.start();
await session.browser.navigate("https://app.example.com/profile");
await session.browser.fill("#display-name", "New Display Name");
await session.browser.click("#save-profile");
await session.browser.wait(2000);
const successResult = await session.browser.findByText("Profile updated successfully");
console.log("Success message found:", successResult);
const errorResult = await session.browser.findByText("Failed to update profile");
console.log("Error message found:", errorResult);
} finally {
await session.close();
}
}
verifyConfirmationMessage();
Data Extraction Patterns
The query methods work together to enable powerful data extraction workflows. By
combining getHtml() for full page content, getText() for
targeted element extraction, getUrl() for URL tracking, and
find() for element detection, you can build sophisticated scraping
pipelines that handle dynamic content, pagination, and complex page structures. The
following examples demonstrate common data extraction patterns that you can adapt to
your specific use case.
Scrape Product Listing with Pagination
This pattern navigates through multiple pages of a product listing, extracting product
data from each page and using getUrl() to track the current page and
find() to detect the presence of a next page button.
const BBREClient = require("mydisctsolver-bbre");
const cheerio = require("cheerio");
const client = new BBREClient({ apiKey: "YOUR_API_KEY" });
async function scrapeProductListing(startUrl, maxPages) {
const session = await client.createSession({ mode: "adaptive" });
const allProducts = [];
try {
await session.start();
await session.browser.navigate(startUrl);
await session.browser.waitForSelector(".product-grid");
for (let page = 1; page <= maxPages; page++) {
const currentUrl = await session.browser.getUrl();
console.log("Scraping page", page, "-", currentUrl);
const html = await session.browser.getHtml();
const $ = cheerio.load(html);
$(".product-card").each(function () {
allProducts.push({
name: $(this).find(".product-name").text().trim(),
price: $(this).find(".product-price").text().trim(),
url: $(this).find("a.product-link").attr("href"),
rating: $(this).find(".rating-value").text().trim()
});
});
console.log("Total products collected:", allProducts.length);
const nextButton = await session.browser.find(".pagination .next-page");
if (!nextButton) break;
await session.browser.click(".pagination .next-page");
await session.browser.waitForSelector(".product-grid");
}
return allProducts;
} finally {
await session.close();
}
}
scrapeProductListing("https://shop.example.com/catalog", 5);
Extract Table Data
Tables are one of the most common structured data formats on the web. This pattern
uses getHtml() to capture the full page content and then parses the
table rows and cells to extract structured data into an array of objects with named
fields.
const BBREClient = require("mydisctsolver-bbre");
const cheerio = require("cheerio");
const client = new BBREClient({ apiKey: "YOUR_API_KEY" });
async function extractTableData(url, tableSelector) {
const session = await client.createSession({ mode: "adaptive" });
try {
await session.start();
await session.browser.navigate(url);
await session.browser.waitForSelector(tableSelector);
const html = await session.browser.getHtml();
const $ = cheerio.load(html);
const headers = [];
$(tableSelector + " thead th").each(function () {
headers.push($(this).text().trim());
});
const rows = [];
$(tableSelector + " tbody tr").each(function () {
const row = {};
$(this).find("td").each(function (index) {
if (headers[index]) {
row[headers[index]] = $(this).text().trim();
}
});
rows.push(row);
});
console.log("Headers:", headers);
console.log("Rows extracted:", rows.length);
rows.forEach(function (row) {
console.log(row);
});
return { headers, rows };
} finally {
await session.close();
}
}
extractTableData("https://data.example.com/reports", "#quarterly-report");
Extract Structured Content from Detail Pages
This pattern combines navigation with targeted text extraction to collect detailed
information from individual pages. It navigates to each detail page, extracts
specific fields using getText(), and builds a structured data object
for each item.
const BBREClient = require("mydisctsolver-bbre");
const client = new BBREClient({ apiKey: "YOUR_API_KEY" });
async function extractProductDetails(productUrls) {
const session = await client.createSession({ mode: "adaptive" });
const products = [];
try {
await session.start();
for (const url of productUrls) {
await session.browser.navigate(url);
await session.browser.waitForSelector(".product-detail");
const name = await session.browser.getText("h1.product-name");
const price = await session.browser.getText(".current-price");
const description = await session.browser.getText(".product-description");
const sku = await session.browser.getText(".product-sku");
const availability = await session.browser.getText(".stock-status");
const currentUrl = await session.browser.getUrl();
products.push({
name,
price,
description,
sku,
availability,
url: currentUrl
});
console.log("Extracted:", name, "-", price);
}
return products;
} finally {
await session.close();
}
}
const urls = [
"https://shop.example.com/product/1001",
"https://shop.example.com/product/1002",
"https://shop.example.com/product/1003"
];
extractProductDetails(urls);
Monitor Page Content Changes
Use query methods to monitor a page for content changes over time. This pattern periodically checks specific elements and compares their values to detect updates such as price changes, stock status changes, or new content appearing on the page.
const BBREClient = require("mydisctsolver-bbre");
const client = new BBREClient({ apiKey: "YOUR_API_KEY" });
async function monitorPriceChanges(productUrl, checkIntervalMs, maxChecks) {
const session = await client.createSession({ mode: "adaptive" });
const priceHistory = [];
try {
await session.start();
await session.browser.navigate(productUrl);
await session.browser.waitForSelector(".product-price");
for (let i = 0; i < maxChecks; i++) {
const price = await session.browser.getText(".product-price");
const title = await session.browser.getTitle();
const timestamp = new Date().toISOString();
priceHistory.push({ timestamp, price, title });
console.log("[" + timestamp + "]", title, "-", price);
if (priceHistory.length >= 2) {
const previous = priceHistory[priceHistory.length - 2];
if (previous.price !== price) {
console.log("PRICE CHANGED from", previous.price, "to", price);
}
}
if (i < maxChecks - 1) {
await new Promise(resolve => setTimeout(resolve, checkIntervalMs));
await session.browser.reload();
await session.browser.waitForSelector(".product-price");
}
}
return priceHistory;
} finally {
await session.close();
}
}
monitorPriceChanges("https://shop.example.com/product/12345", 60000, 10);
Error Handling
The query methods handle errors differently depending on the method. The
getHtml(), getText(), getTitle(), and
getUrl() methods return empty strings when the extraction fails rather
than throwing errors, making them safe to call without try-catch blocks in most
scenarios. The find() and findByText() methods return the
raw action result which may indicate failure through the result structure. However,
all methods can throw errors when the underlying session is invalid, expired, or
closed, or when the BBRE engine encounters a critical failure. The following examples
demonstrate robust error handling patterns for query methods.
Error Types
| Error Scenario | Affected Methods | Solution |
|---|---|---|
BROWSER_ACTION_FAILED |
All query methods | Ensure the session is running in adaptive mode. Passive mode sessions do not support browser query methods. Create the session with mode: "adaptive". |
SESSION_NOT_FOUND |
All query methods | The session ID is invalid or the session was never created. Verify that session.start() completed successfully before calling query methods. |
SESSION_EXPIRED |
All query methods | The session has exceeded its maximum lifetime. Create a new session and re-navigate to the target page before retrying the query. |
SESSION_CLOSED |
All query methods | The session was explicitly closed. Do not call query methods after calling session.close(). Create a new session if you need to continue querying. |
| Empty string returned | getHtml, getText, getTitle, getUrl | The element was not found, the page has no content, or the extraction failed silently. Verify that the page has loaded completely and the selector is correct. Use waitForSelector() before calling getText(). |
| Element not found | find, findByText | The CSS selector or text does not match any element on the page. Verify the selector or text is correct, and ensure the page has finished loading dynamic content. |
Safe Query Pattern with Fallback Values
When extracting data from pages where some elements may not exist, use a safe query pattern that provides fallback values for missing data. This prevents your scraping pipeline from breaking when a page has a slightly different structure than expected.
const BBREClient = require("mydisctsolver-bbre");
const client = new BBREClient({ apiKey: "YOUR_API_KEY" });
async function safeGetText(browser, selector, fallback) {
try {
const text = await browser.getText(selector);
return text || fallback;
} catch (error) {
console.log("Failed to get text for", selector, ":", error.message);
return fallback;
}
}
async function extractWithFallbacks() {
const session = await client.createSession({ mode: "adaptive" });
try {
await session.start();
await session.browser.navigate("https://shop.example.com/product/12345");
await session.browser.waitForSelector(".product-page");
const product = {
name: await safeGetText(session.browser, "h1.product-name", "Unknown Product"),
price: await safeGetText(session.browser, ".current-price", "Price not available"),
brand: await safeGetText(session.browser, ".brand-name", "Unknown Brand"),
rating: await safeGetText(session.browser, ".rating-score", "No rating"),
reviews: await safeGetText(session.browser, ".review-count", "0 reviews")
};
console.log("Product data:", product);
return product;
} finally {
await session.close();
}
}
extractWithFallbacks();
Best Practices
When you need to extract a few specific values from a page, use
getText() with precise CSS selectors for each field. This is faster
and more readable than parsing the full HTML. When you need to extract many values
from a complex page structure such as a product listing with dozens of items, use
getHtml() once and parse the HTML locally with a library like
cheerio. This reduces the number of round trips to the BBRE engine
from one per field to a single call for the entire page, significantly improving
performance for bulk extraction scenarios.
Dynamic pages load content asynchronously after the initial page load. If you call
getText() or getHtml() before the target content has
rendered, you will get empty strings or incomplete data. Always use
waitForSelector() or waitForText() before calling query
methods to ensure the content you want to extract is present in the DOM. For
example, after navigating to a product page, call
await session.browser.waitForSelector(".product-price") before
calling await session.browser.getText(".product-price").
After navigating to a page, call getUrl() to verify that the browser
is on the expected URL. Many websites redirect unauthenticated users to login
pages, redirect based on geographic location, or redirect to maintenance pages
during downtime. By checking the URL before extracting data, you can detect these
redirects early and handle them appropriately instead of extracting data from the
wrong page. Compare the current URL against the expected URL pattern and abort or
re-authenticate if the URL does not match.
Some websites use dynamically generated class names (such as CSS-in-JS frameworks
that produce names like css-1a2b3c) or deeply nested structures that
make CSS selectors fragile and likely to break when the site updates. In these
cases, findByText() provides a more stable alternative because button
labels, link text, and heading content tend to remain consistent across site
updates. Use findByText() for elements with stable, descriptive text
and reserve CSS selectors for elements with reliable, semantic class names or IDs.
Before clicking a button, filling a form field, or performing any interaction, use
find() to check whether the target element exists on the page. This
is especially important for pages with conditional content such as cookie consent
banners, promotional popups, optional form sections, or A/B test variants. By
checking element existence first, you can write automation workflows that adapt to
different page states instead of failing when an expected element is missing.
For robust navigation verification, check both the page title and the URL after each navigation step. The URL confirms the correct path, while the title confirms the correct page content. Some websites use the same URL for different content states (such as single-page applications), and some use different URLs for the same logical page (such as localized versions). Checking both values gives you the highest confidence that the browser is on the expected page before proceeding with data extraction or interactions.
Common Issues
Symptom: You call getText() with a valid selector
but receive an empty string, even though the element is visible when you view the
page in a regular browser.
Cause: The content is loaded asynchronously via JavaScript after
the initial page render. When getText() executes, the element either
does not exist yet or exists but has not been populated with text content.
Solution: Add a waitForSelector() or
waitForText() call before getText() to ensure the
content has loaded. For example:
await session.browser.waitForSelector(".product-price"); followed by
const price = await session.browser.getText(".product-price");. If
the content loads with a delay after the element appears, add a short
wait() call between the wait and the text extraction.
Symptom: The HTML returned by getHtml() is missing
sections that you can see in the browser, such as product listings, comments, or
dynamically loaded widgets.
Cause: The page uses lazy loading, infinite scroll, or deferred
rendering that only loads content when the user scrolls to it or triggers a
specific interaction. The getHtml() method captures the DOM at the
moment of the call, which may not include content that has not been triggered
yet.
Solution: Before calling getHtml(), trigger the
content loading by scrolling to the relevant section using
session.browser.scrollDown() or
session.browser.scrollToBottom(), then wait for the content to
appear using waitForSelector(). For infinite scroll pages, you may
need to scroll and wait multiple times to load all content.
Symptom: Calling any query method throws an error with the code
BROWSER_ACTION_FAILED.
Cause: The session was created in passive mode, which does not
launch a browser instance. All BrowserAPI methods, including query methods, require
an adaptive mode session with an active browser.
Solution: Create the session with mode: "adaptive":
const session = await client.createSession({ mode: "adaptive" });.
If you only need the raw HTML response without browser rendering, use
session.request() in passive mode instead, which returns the HTTP
response body directly.
Symptom: The text returned by getText() contains
more text than expected, including content from child elements that you did not
intend to capture.
Cause: The getText() method returns the text content
of the matched element and all its descendants. If your selector matches a parent
container, the returned text will include text from all child elements
concatenated together.
Solution: Use a more specific CSS selector that targets the exact
element containing the text you want. For example, instead of
getText(".product-card") which returns all text in the card, use
getText(".product-card .product-name") to get only the product name.
Alternatively, use getHtml() and parse the HTML to extract specific
nested elements.
Symptom: You call findByText() with a text string
that you can see on the page, but the method does not find the element.
Cause: The text matching behavior depends on the BBRE engine
implementation. The visible text may contain extra whitespace, line breaks, or
hidden characters that differ from the search string you provided. Additionally,
the text may be split across multiple child elements rather than contained in a
single element.
Solution: Try using a shorter, more distinctive portion of the
text that is less likely to be affected by whitespace or formatting differences.
If the text is split across elements, use find() with a CSS selector
instead, or use getHtml() and search the HTML string directly.