Wednesday, May 21, 2014

DataWarehousing with Redshift: Benefits of a columnar data storage



This a short article stating the benefits of a columnar storage architecture.
In traditional database systems, records are stored into disk blocks by row. Each row is stored one after other. Each row sits in a block. Within those blocks each column sits one after another.
Following diagram illustrates this row-based storage architecture:




This kind of data storage architecture has been created keeping in mind that often a complete record is queried. This is a good fit for scenarios where in CRUD operations are in picture, and data is sought often for a complete row. This architecture is good for OLTP type of applications. 


However, for data analytics and data warehousing use-cases this type of storage architecture isn't a good fit. Amazon's Redshift takes up this issue and have come up with a columnar storage which helps in optimizing data warehousing related queries. In this architecture values for each column are stored sequentially into disk blocks.

Following diagram illustrates this column-based storage architecture:




Row-wise storage disadvantages:

  • If block size is smaller than the size of a record, storage for an entire record may take more than one block.
  • If block size is larger than the size of a record, storage for an entire record may take less than one block, resulting in an inefficient use of disk space.

So, if you can see, storing records in row-wise fashion isn't storage space optimized.


Column-wise storage advantages:


  • Each data block holds column field values for as many as three times as many records as row-based storage.
  • Requires a third of the I/O operations compared to row-wise storage. In practice, using tables with very large numbers of columns and very large row counts, storage efficiency is even greater.
  • Since each block holds the same type of data, it can use a compression scheme selected specifically for the column data type, further reducing disk space and I/O.
Hence, by using a columnar storage architecture Redshift has reduced the disk space being used to store same data resulting in reduced number of seeks it would be required to read the stored data in order to process an user query, ultimately increasing the query performance on large datasets.

Redshift achieves high performance over queries on very huge amount of data. Along with the columnar architecture there are other features Redshift has added in order to achieve this performance, they are:

  • Massively parallel processing
  • Data compression
  • Query optimization
  • Compiled code

I will take these up in detail in one of future posts. Try out Redshift if you need to analyze your data using some of the popular BI tools like Tableau and Jaspersoft. If your dataset size is huge and you are struggling to get good query performace, then I would certainly suggest you to try out Redshift once. Read more about it here.

Disclaimer: Many details including diagrams were taken from official AWS documentations.

36 comments:

  1. Thanks for sharing this concept in here. This was very useful to me and other too, :)

    Informatica training in chennai

    ReplyDelete
  2. Very useful information about the data warehousing thanks for sharing this information.
    packers and movers madhapur

    ReplyDelete
  3. Nice post. Big data is a term that portrays the substantial volume of information; both organized and unstructured that immerses a business on an everyday premise. To know more details please visit Big Data Training in Chennai | Primavera Training in Chennai |

    ReplyDelete
  4. Nice blog.Thanks for the information.Informatica is a data integration/ETL tool that provides functionality for data transformation and loading of data. Informatica gets data from various sources and it loads the data into different targets. Informatica does not store, it just moves data from one place to another.

    Informatica training in chennai

    ReplyDelete
  5. awful piece of information, I had come to know about your blog from my friend vimal, mumbai,i have read atleast 3 posts of yours by now, and let me tell you, your blog gives the best and the most interesting information. This is just the kind of information that i had been looking for, i'm already your rss reader now and i would regularly watch out for the new posts, once again hats off to you! Thanks a million once again, RegardsGuidewire training in hyderabad

    ReplyDelete
  6. Informatica is one of the most trusted and effective ETL software which is known globally. Informatica is one data warehousing ETL tool. Informatica ha very go opportuities for job across the world.
    informatica training in chennai | informatica training institute in Chennai | informatica classes in Chennai

    ReplyDelete
  7. How to write stored procedure in redshift. Is there any alternative.

    ReplyDelete
  8. How to write stored procedure in redshift. Is there any alternative.

    ReplyDelete
  9. excellent piece of information, and let me tell you, your site gives the best and the most interesting information. This is just the kind of information that i had been looking for, i'm already your rss reader now and i would regularly watch out for the new posts, once again hats off to you! Thanx a lot once again, Regards, devops training in hyderabad

    ReplyDelete
  10. Much obliged to you for requiring significant investment to give us a portion of the valuable and restrictive data with us.
    Thanks,
    Informatica Training center in Chennai | Best Informatica Training Institute in Chennai | Informatica Training in Chennai

    ReplyDelete
  11. Very good write-up. I definitely appreciate this website. Continue the good work!
    Devops Online Training
    Adobe cq5 Training
    Dell Boomi Training

    ReplyDelete
  12. The blog gave me idea about columnar data storage Thanks for sharing the information
    Hadoop Training in Chennai

    ReplyDelete
  13. You have done really great job. Your blog is very unique and informative. Thanks. Devops Training | Data Science Training

    ReplyDelete
  14. I love to read your articles because your writing style is too good
    Selenium Training in Anna Nagar | Selenium Training

    ReplyDelete
  15. This blog explains the details of most popular technological details. This helps to learn about what are all the different method is there. And the working methods all of that are explained here. Informative blog.For more details about oracle fusion financial please check our website.

    Oracle Fusion Financial Training Institute

    ReplyDelete
  16. The blog gives creative idea about columnar data storage. Thanks for sharing this information.

    Mobile Training in Chennai |Android Training in Chennai | ios Training in Chennai

    ReplyDelete

  17. This content is so informatics and it was motivating all the programmers and beginners to switch over the career into the Big Data Technology. This article is so impressed and keeps updating us regularly.
    Hadoop Training in Chennai | Hadoop Training Chennai | Big Data Training in Chennai

    ReplyDelete
  18. There are lots of information about hadoop have spread around the web, but this is a unique one according to me. The strategy you have updated here will make me to get to the next level in big data. Thanks for sharing this.Hadoop Training in Chennai | Big Data Training in Chennai

    ReplyDelete


  19. I think this is interesting articles and Business ethics for new information's, and i like that kind of information.So the i like that post,because all of given information was very excellent.devops openings in hyderabad

    ReplyDelete
  20. Thank you for explaining this in detail. Now I have a much better idea about the Benefits of a columnar data storage.Can refer you some of the which are related to etl testing.

    ReplyDelete
  21. Excellent Blog very imperative good content, this article is useful to beginners and real time Employees. DevOps Online Training

    ReplyDelete
  22. Good one, very informative.. thanks for sharing your views and ideas.. it is very useful to me.. thanks once again...

    Java Training in chennai |
    Software Testing Training in chennai

    ReplyDelete
  23. Thank you for your nice post. It is really useful..The best thing is that your blog really informative thanks for your great information!
    ITIL Certification Exam Center in Chennai | ITIL Certification Training in Chennai | Online Training in Chennai

    ReplyDelete
  24. Thanks For Your valuable posting, it was very informative
    Schulungsangebote

    ReplyDelete

Any feedback, good or bad is most welcome.

Name

Email *

Message *