Capturing full page screenshots with puppeteer and Architect (arc.codes)

⚡ Lets build a server-less app that browses any site we provide in the url and takes a full page screenshot of the site and returns it to our browser! Sound like fun?  ⚡

This turned out to be a little more challenging than I original thought, but I got it done! ☺️

In this tutorial, we will walk through some steps to create a server-less endpoint that takes a url as a query param and uses puppeteer to create a browser. The browser will navigate to the passed in url and take a picture of the full page web view.

Setting up Architect

Architect is a framework for building server-less functions on top of AWS Lambda. Architect provides a great boundary between just writing a function and AWS.

Checkout https://arc.codes/docs/en/guides/get-started/quickstart

npm install --global @architect/architect

Create a new folder called screenshoter

mkdir screenshoter
cd screenshoter
touch app.arc

modify your app.arc file to build an app with a single endpoint

@app
screenshoter

@http
/
  method get
  src src

Save the file, then run:

arc init

This will create a new folder in your project directory with an index.js file in it.

You can run a local sandbox and test out your new server-less function by running the command:

arc sandbox

Point a browser to http://localhost:3333 and you should see the Architect demo page.

Setup NodeJS Project

In your terminal, change into the src directory and run npm init -y this will initialize your src folder as an npm project.

cd src
npm init -y
Each endpoint in Architect is a separate lambda application, so if you have dependencies they need to reside in the same folder as the index.js file. By initializing the folder as an npm project, you create a package.json file which will contain your project manifest. There is more information at arc.codes.

Lets install some dependencies we will need in our project:

Installing puppeteer for lambda

We need to install some special dependencies for puppeteer to use in aws lambda

npm install puppeteer-core
npm install puppeteer-extra
npm install chrome-aws-lambda
npm install puppeteer-extra-plugin-stealth
npm install puppeteer-full-page-screenshot
npm install -D puppeteer

These modules will allow us to create a browser on aws lambda and capture a full page screenshot, the next thing we need is some image tools to convert the image into a base64 string.

Installing Jimp

npm install jimp

Jimp is a NodeJS package that allows you to manipulate images then either write them to disk or buffer.

Creating our handler function

The easiest way to do this is to remove the current index.js and create a new index.js file.

rm index.js
touch index.js

Then lets create our handler function

require('puppeteer-core')
const chromium = require('chrome-aws-lambda')
const { addExtra } = require('puppeteer-extra')
const puppeteer = addExtra(chromium.puppeteer)
const StealthPlugin = require('puppeteer-extra-plugin-stealth')
const Jimp = require('jimp')

puppeteer.use(StealthPlugin())

exports.handler = async function(req) {
    
}

Get the url query parameter

We need to get the url parameter from the queryStringParameters

...
exports.handler = async function(req) {
  const { url } = req.queryStringParameters
  ...
}

Create the puppeteer browser

...
exports.handler = async function(req) {
  ...
  
  const browser = await puppeteer.launch({
    args: chromium.args,
    defaultViewport: chromium.defaultViewport,
    executablePath: await chromium.executablePath,
    headless: chromium.headless
  })
  
  ...
  
}

Create a new page (Like Browser Tab)

...
exports.handler = async function(req) {
  ...
  
  const page = await browser.newPage()
  page.setDefaultNavigationTimeout(0) 
  
  ...
  
}
We set the timeout to zero which is like setting to infinity.

Go to the url

...
exports.handler = async function(req) {
  ...
     
  await page.goto(url)
  
  ...
}

Get the screenshot

...
exports.handler = async function(req) {
  ...
  
  const img = await fullPageScreenshot(page)
  
  ...
}

Convert to base64

...
exports.handler = async function(req) {
  ...
  
  const base64 = (await Jimp.read(img.bitmap).then(
    i => i.getBufferAsync(Jimp.AUTO))).toString('base64')
    
  ...
}

Close the browser

...
exports.handler = async function(req) {
  ...
  
  await browser.close()
  
}

Return a Response Object

...

exports.handler = async function(req) {
  ...
  
  return {
    statusCode: 200,
    headers: {
      'Content-Type': 'image/png'
    },
    body: base64
  }
}

Run in the sandbox

cd ..
arc sandbox

Deploy to AWS

arc deploy

Debug errors in logs

arc logs src

Summary

This post shows you the power of aws lambda and how easy it is to use tools like architect (arc.codes) to get up and going, even run a browser in the cloud! Also, how to use tools like Jimp to convert an image to base64 for send via a http response. Finally, the power of puppeteer, you can do just about anything you can do in a browser with puppeteer!