Common Screens

Registry of Open Data on AWS

AWS data layout

The Common Screens Open Data Registry entry publishes the S3 bucket common-screens in us-west-2. The dataset is described as a corpus of web screenshot and metadata data composed of over 70 million websites, updated monthly, and licensed under CC BY 4.0.

S3 bucket

arn:aws:s3:::common-screens
region: us-west-2

Observed top-level data prefixes

PrefixPurpose
data/jpeg/JPEG screenshots, addressed by flat host-derived filenames.
data/png/PNG screenshot objects when available.
data/tiff/TIFF screenshot objects when available.
data/ocr/OCR and derived text data when available.

Public access examples

aws s3 ls --no-sign-request s3://common-screens/
https://common-screens.s3.us-west-2.amazonaws.com/data/jpeg/a427-com--.jpeg