Puppeteer is a Node.js library that allows you to control Chrome browser from JS code. Most things that you can do manually in the browser can be done using Puppeteer. Here are a few examples to get you started:
- Generate screenshots and PDFs of pages.
- Crawl a SPA and generate pre-rendered content (i.e. "SSR").
- Automate form submission, UI testing, keyboard input, etc.
- Create an up-to-date, automated testing environment. Run your tests directly in the latest version of Chrome using the latest JavaScript and browser features.
- Capture a timeline trace of your site to help diagnose performance issues.
If you need to scrape SPA or some website with heavy javascript usage - puppeteer in many cases is the way to go. It can also save a lot of time in many other situations because of simple high level API.
Here is an example of how Puppeteer can be used in combination with our rotating proxy server. Note that you don't need proxy authentication code if you have a static server IP. You can whitelist the IP in proxy user settings (which are accessible through your account dashboard). The code snippet below also shows how you can set additional headers to control BotProxy (please refer our docs for full information about all supported control headers and APIs).
const puppeteer = require('puppeteer');
puppeteer.launch({
args: ['--proxy-server=x.botproxy.net:8080']
}).then(async browser => {
const page = await browser.newPage();
page.authenticate({
username: 'proxy-user',
password: 'proxy-password'
});
await page.setExtraHTTPHeaders({
'SomeHeader': 'test'
});
await page.goto('http://httpbin.org/ip');
// other actions...
let content = await page.content();
console.log(content);
await browser.close();
});
To set outgoing country and session you need to use username API or configure desired settings in your account dashboard for the proxy user. The above approach works in most cases but there is one special case that will require additional things to arrange. Puppeteer's page.authenticate
works by setting basic authentication headers. In case you need to access a page that is protected by its own basic authentication this will not work. Here is what can be done to workaround. We will use additional NPM package: proxy-chain
. Make sure to install it before running the snippet below:
const puppeteer = require('puppeteer');
const proxyChain = require('proxy-chain');
(async() => {
const oldProxyUrl = 'http://proxy_user+DE:[email protected]:8080';
const newProxyUrl = await proxyChain.anonymizeProxy(oldProxyUrl);
// Prints something like "http://127.0.0.1:45678"
console.log(newProxyUrl);
const browser = await puppeteer.launch({
args: [`--proxy-server=${newProxyUrl}`],
});
const page = await browser.newPage();
//you can use page.authenitcate to access protected page now
await page.goto('http://httpbin.org/ip');
// other actions...
let content = await page.content();
console.log(content);
})();
To change outgoing country or use proxy session you will need to use our username API in this case. In the example above we specified to use DE as our outgoing location.
As you can see it is very easy to start using rotating proxies in your existing puppeteer projects and will require only a coupe of lines of additional code.