Table Size Optimization for Small Datasets in Amazon Redshift

1 mins read

Amazon Redshift is a fast, fully managed, petabyte-scale Data Warehouse (DWH) solution. It is a columnar Database and facilitates massive parallel processing (MPP). In any DWH solution, we always have facts, Dimensions, Xref, etc. types of datasets which vary significantly in size. Hence, It is not wise to use default redshift setting for all sizes of datasets. 

To design an efficient redshift based DWH solution one need to be aware of various Static and dynamic variables which control the size and performance of the environment.

Type and numbers of Nodes with slices:  An Amazon Redshift data warehouse is a collection of computing resources called Nodes and Each Nodes has dedicated memory and disk storage (slices). The Master node manages the distribution of data and query processing tasks to the Slave Nodes. The disk storage for a Slave (Compute) node is divided into a number of slices. The number of slices per node depends on the node type. For example, each DS1.XL compute node has two slices, and each DS1.8XL compute node has 16 slices. Based on the distribution style data get stored into these slices as block store. Hence, the utilization of these nodes majorly depends on upon distribution style.

Distribution style:  When a query is executed, the query optimizer redistributes the rows to the compute nodes as needed to perform any joins and aggregations. The goal in selecting a table distribution style is to minimize the impact of the redistribution step by physically locating the data where it needs to be before the query is executed.  There is three type of distribution style available in the Redshift.

Compression and Vacuum strategy:   Compression and Vacuum strategy is very important for petabyte scale DWH system. To ensure the maximum performance and up to the mark space utilization, a periodic vacuum needs to be strategized.

A data warehouse solution always consists of a variety of datasets and Redshift is designed for larger datasets hence smaller datasets need to be treated differently and this paper highlights that aspect.

Click here to read the complete whitepaper and learn about the table size optimization in Amazon Redshift

For Questions, Contact Us Now

Recommended Blogs

The commercial ecosystem of the medical device industry is an intricate one. Medical device sales teams constantly watch out for opportunities across the sales process, beyond the overarching life ...

6 mins read

Rid your binge-watching boredom through artificial intelligence and machine learning – a leading cause of the pharmaceutical industry's best innovations and advancements. As the projected weeks in ...

4 mins read

Omnichannel Marketing is Widely Used With Clear Benefits and Will Become More Essential in the Months to Come Today, omnichannel marketing is largely accepted as a necessary and effective way of ...

6 mins read

Let Us Show You What Axtria’s 

Solutions Can Do For You