Capturing full page screenshots with puppeteer and Architect (arc.codes)
⚡ Lets build a server-less app that browses any site we provide in the url and takes a full page screenshot of the site and returns it to our browser! Sound like fun? ⚡
This turned out to be a little more challenging than I original thought, but I got it done! ☺️
In this tutorial, we will walk through some steps to create a server-less endpoint that takes a url as a query param and uses puppeteer to create a browser. The browser will navigate to the passed in url and take a picture of the full page web view.
Setting up Architect
Architect is a framework for building server-less functions on top of AWS Lambda. Architect provides a great boundary between just writing a function and AWS.
Checkout https://arc.codes/docs/en/guides/get-started/quickstart
npm install --global @architect/architect
Create a new folder called screenshoter
mkdir screenshoter
cd screenshoter
touch app.arc
modify your app.arc file to build an app with a single endpoint
@app
screenshoter
@http
/
method get
src src
Save the file, then run:
arc init
This will create a new folder in your project directory with an index.js file in it.
You can run a local sandbox and test out your new server-less function by running the command:
arc sandbox
Point a browser to http://localhost:3333 and you should see the Architect demo page.
Setup NodeJS Project
In your terminal, change into the src directory and run npm init -y
this will initialize your src folder as an npm project.
cd src
npm init -y
Each endpoint in Architect is a separate lambda application, so if you have dependencies they need to reside in the same folder as the index.js
file. By initializing the folder as an npm project, you create a package.json file which will contain your project manifest. There is more information at arc.codes.
Lets install some dependencies we will need in our project:
Installing puppeteer for lambda
We need to install some special dependencies for puppeteer to use in aws lambda
npm install puppeteer-core
npm install puppeteer-extra
npm install chrome-aws-lambda
npm install puppeteer-extra-plugin-stealth
npm install puppeteer-full-page-screenshot
npm install -D puppeteer
These modules will allow us to create a browser on aws lambda and capture a full page screenshot, the next thing we need is some image tools to convert the image into a base64 string.
Installing Jimp
npm install jimp
Jimp is a NodeJS package that allows you to manipulate images then either write them to disk or buffer.
Creating our handler function
The easiest way to do this is to remove the current index.js
and create a new index.js
file.
rm index.js
touch index.js
Then lets create our handler function
require('puppeteer-core')
const chromium = require('chrome-aws-lambda')
const { addExtra } = require('puppeteer-extra')
const puppeteer = addExtra(chromium.puppeteer)
const StealthPlugin = require('puppeteer-extra-plugin-stealth')
const Jimp = require('jimp')
puppeteer.use(StealthPlugin())
exports.handler = async function(req) {
}
Get the url query parameter
We need to get the url parameter from the queryStringParameters
...
exports.handler = async function(req) {
const { url } = req.queryStringParameters
...
}
Create the puppeteer browser
...
exports.handler = async function(req) {
...
const browser = await puppeteer.launch({
args: chromium.args,
defaultViewport: chromium.defaultViewport,
executablePath: await chromium.executablePath,
headless: chromium.headless
})
...
}
Create a new page (Like Browser Tab)
...
exports.handler = async function(req) {
...
const page = await browser.newPage()
page.setDefaultNavigationTimeout(0)
...
}
We set the timeout to zero which is like setting to infinity.
Go to the url
...
exports.handler = async function(req) {
...
await page.goto(url)
...
}
Get the screenshot
...
exports.handler = async function(req) {
...
const img = await fullPageScreenshot(page)
...
}
Convert to base64
...
exports.handler = async function(req) {
...
const base64 = (await Jimp.read(img.bitmap).then(
i => i.getBufferAsync(Jimp.AUTO))).toString('base64')
...
}
Close the browser
...
exports.handler = async function(req) {
...
await browser.close()
}
Return a Response Object
...
exports.handler = async function(req) {
...
return {
statusCode: 200,
headers: {
'Content-Type': 'image/png'
},
body: base64
}
}
Run in the sandbox
cd ..
arc sandbox
Deploy to AWS
arc deploy
Debug errors in logs
arc logs src
Summary
This post shows you the power of aws lambda and how easy it is to use tools like architect (arc.codes) to get up and going, even run a browser in the cloud! Also, how to use tools like Jimp to convert an image to base64 for send via a http response. Finally, the power of puppeteer, you can do just about anything you can do in a browser with puppeteer!