Blog > Introduction to Amazon Redshift
Amazon Redshift is a fast, fully managed petabyte-scale data warehouse. Amazon Redshift makes it easy and cost-effective to analyze all the data using existing business intelligence tools. The data warehouse brings together datasets from all across an organization into one place with Redshift to easily run queries process; Redshift natively supports distributed workloads. It incorporates a pretty neat feature called parameter groups so, an organization has different users that are all using this redshift cluster. Redshift is a great choice if a database is overloaded due to OLAP transactions, as mentioned earlier. Amazon Redshift is designed for OLAP, which allows to easily combine multiple complex queries to provide answers relational or sequel database are row-based.
It also supports petabytes of data, which is primarily controlled by adding additional nodes with a newer RAS series. It is also optimizing extensive queries that can take place over multiple different tables. Redshift will optimize the database for user queries. Amazon Redshift provides massive, parallel, shared-nothing columnar architecture.
Why use Redshift for an Organization
Elastic Scaling: Redshift offers elastic scaling so the user can add or remove nodes to their cluster at any point.
Managed- Almost Zero Maintenance: Redshift is considered a managed service that sets up some alarms on sizing and CPU performance.
Optimized Query Performance: It also provides a very consistent and reliable performance for some frequently running queries.
Supports Thousands of Users with a Single Cluster: It can help thousands of users within a single cluster by scaling up the cluster and adding more nodes on top of it.
Flexible Pricing Model: If an organization uses on-demand, it is more expensive than the reserved instances, which need to purchase for a one-year commitment.
Compression
Goal: Amazon Redshift allows more data to be a store within an amazon Redshift cluster. It also reduces I/O for analytics queries and improves query performance by decreasing I/O.
Impact: Allows two to four times more data to be stored within the cluster.
Data Sorting
Goal: Make queries run faster by increasing the effectiveness of zone maps and reducing I/O.
Impact: Enables range-restricted scans to prune blocks by leveraging zone maps.
Columnar Storage
However, Redshift is a column-based database; the columnar data is stored sequentially on the storage so, it requires less read to get all the data, and it also allows to compress data. Columnar data can be pressed more comfortably as all the data types are the same because all the data is stored in one sequential row so, it’s much easier to compress than row storage. It only read the column data that is required without going through the whole data cluster.
Nodes
Amazon Redshift data warehouse is a collection of computing resources called nodes, and these nodes are organized into a group called a cluster. Each set runs an amazon redshift engine that contains one or more databases.
Limitations
The dynamic database limitation is that it is not highly available as it is only in one availability zone. The reason for this is that management business intelligence is not viewed as business-critical.
Workload Management (WLM)
Amazon Redshift allows for the separation of different query workload:
- Prioritize important queries
- Throttle/ abort less important queries
- Control concurrent number of executing of queries
- Divide cluster memory
- Set query timeouts to abort long-running queries
Amazon Redshift is significantly faster in a VPC compared to the EC2 classic.
Redshift Cost
Redshift cost charged for the number of compute node hours used, and that doesn’t include the lead-in through the leader node is not a chargeable node also charge for the backups stored by the users.
Author: SVCIT Editorial
Copyright Silicon Valley Cloud IT, LLC.