Registry of Open Data on AWS
AWS data layout
The Common Screens Open Data Registry entry publishes the S3 bucket common-screens in us-west-2. The dataset is described as a corpus of web screenshot and metadata data composed of over 70 million websites, updated monthly, and licensed under CC BY 4.0.
S3 bucket
arn:aws:s3:::common-screens
region: us-west-2
Observed top-level data prefixes
| Prefix | Purpose |
|---|---|
data/jpeg/ | JPEG screenshots, addressed by flat host-derived filenames. |
data/png/ | PNG screenshot objects when available. |
data/tiff/ | TIFF screenshot objects when available. |
data/ocr/ | OCR and derived text data when available. |
Public access examples
aws s3 ls --no-sign-request s3://common-screens/
https://common-screens.s3.us-west-2.amazonaws.com/data/jpeg/a427-com--.jpeg