Skip to content

Amazon Redshift

What is Redshift?

Fully-managed, petabyte scale data warehouse service
10X better performance than other DW’s
- Via machine learning
- massively parallel query execution (MPP)
- columnar storage
Designed for OLAP, not OLTP
Cost effective
SQL, ODBC, JDBC interfaces
Scale up or down on demand
Built-in replication & backups
Monitoring via CloudWatch / CloudTrail

Use cases

Accelerate analytics workloads
Unified data warehouse & data lake
Data warehouse modernization
Analyze global sales data
Store historical stock trade data
Analyze ad impressions & clicks
Aggregate gaming data
Analyze social trends

Architecture

Architecture

Redshift Spectrum

Query exabytes of unstructured data in S3 without loading
Limitless concurrency
Horizontal scaling
Separate storage & compute resources
Wide variety of data formats
Support of Gzip and Snappy compression

Performance

Massively Parallel Processing (MPP)
Columnar Data Storage
Column Compression

Durability

Replication within cluster
Backup to S3
Asynchronously replicated to another region
Automated snapshots
Failed drives / nodes automatically replaced
~~However, limited to a single availability zone (AZ)~~
Multi-AZ for RA3 clusters now available

Distribution Styles

AUTO: Redshift figures it out based on the size of data
EVEN: Rows distributed across slices in round-robin
KEY: Rows distributed based on one column
ALL: Entire table is copied to every node

EVEN Distribution

even

KEY Distribution

key

ALL Distribution

all