Row-based and Column-based
Row-based and column-based
Column Store and Row Store are two ways to store data in a database management system (DBMS). Each has its own strengths and weaknesses. Understanding these two storage methods can help you learn about databases faster and understand their advantages and disadvantages.
Row-based storage
Firstly, we have some data as shown below:
id | username | tel |
---|---|---|
1 | nam.hoang | 0233444555 |
2 | minhit.96 | 0111222333 |
When we present this data in a row-based storage, it looks like this:
1: nam.hoang, 0233444555
2: minhit.96, 0111222333
As we can see, each record is stored as a single row. This is known as a row-based storage, which is the data storage model commonly used in most relational database management systems (RDBMS) worldwide.
Column-based storage
Otherwise, when we present above data in column-based storage
nam.hoang:1, minhit.96:2
0233444555:1, 0111222333:2
In this storage, data is stored more like columns. Each column in the table is associated with a key
Advantages and Disadvantages
Row-based and column-based storage are two distinct approaches to data storage and retrieval, each with its own advantages and disadvantages. row based storage
Pros
- ✔Suitable for OLTP applications (applications focused on transaction management).
- ✔Easy to approach and understand.
- ✔Suitable for retrieving all columns in a row.
Cons
- ❌Queries on rows that are not indexed will be slow.
- ❌All columns will be retrieved even if only a subset of columns is needed
- ❌When a new column is added in row-based (RDBMS), it is added to all rows in the entire table -> this process can be slow if there are many rows.
Column based storage
Pros
- ✔Suitable for OLAP applications (online analytic processing – applications with rapidly growing data that enable data analysis and storage, such as data warehouses)
- ✔There is a difference between retrieving data from a set of columns and retrieving data from all columns.
Cons
- ❌Retrieving all columns takes a significant amount of time.
- ❌To ensure query performance, the table structure needs to be predefined
Applicability
You should use row store in the following cases:
- 📝 When you need to focus on insert/update speed or require a balance between insert/update and select operations.
- 🚫 When fast querying on large datasets is not necessary.
On the other hand, you should use column store in these cases:
- 📊 Querying large datasets, image you have a table with hundreds of columns and you just only need some column
- ⚡ When you need high select and aggregation performance.
Conclusion
Row-based and column-based storage each serve distinct purposes, and choosing the right one depends on your workload and your data, Understanding the strengths of each approach helps optimize database performance and ensures efficient data processing for different use cases. 🚀
All rights reserved