DBMS: Evaluation of Relational Operations
INTRO 🙌
저번 시간에는 Sorting & Relational Operations 에 대하여 알아보았다.
이번 시간에는 Evaluation of Relational Operations: Other Techniques에 대해 알아보자.
Selection Operation (Simple Selection)
- Select (R.attr
<op>
value) R (i.e., \(age = 20\)).
No index, unsorted (file scan)
- must scan whole table
No Index, Sorted Data (sorted-file scan)
\[O(log_2M)+(scan\ cost)\]기말 출제 X
- utilize the sort order through a binary search (\(O(log_2M)\)).
- locate the first tuple that satisfies the selection condition
- retrieve all tuples that satisfy the selection condition by scanning R until the selection condition is no longer satisfied.
- 한계:
- 현실에서 정렬된 데이터 보장 X
- 해결:
Alternative (1
):- allow data records to be stored as index data entries.
- If the ordering of data records is important \(\rightarrow\) B+ tree index that uses
Alternative (1)
.
B+ Tree Index
STEP 1
: locate the first index entry that points to a qualifying tuple of R- Cost: 2 or 3 I/Os
STEP 2
: Scan the leaf pages to retrieve all entries in which the key value satisfies the selection condition- Cost: Depends on the number of qualifying tuples, the employed alternative and whether the index is clustered
기존 index보다 더 좋은 성능의 index 구현 가능하다.
- 기존 index: 파일이 커질수록 많은 overflow block 발생 \(\rightarrow\) 성능 하락 \(\rightarrow\) 주기적 재구성할 필요
- B+ Tree: 삽입/삭제 시 자동으로 구조 유지 \(\rightarrow\) 주기적 재구성할 필요 X.
B+ Tree Index 기말에서 이것만 알면 됨
Btree: good for \(>=, <=\)
Hash Index (Best)
- hashing을 통해 데이터 위치 index 저장
- hashing 함수 = key 값을 일정한 범위의 수로 변환
- Cost = a few (typically one or two) l/Os
Hash Index 기말에서 이것만 알면 됨
hash: good for \(=\)
Projection Operation 🍜
SELECT DISTINCT R.sid, R.bid
FROM Reserves R
1) Sort-based Projection
An approach based on sorting:
- Pass 0 (external sort): eliminate unwanted fields.
-
Eliminate duplicates.
- only need
sid
andbid
- external sorting
- Pass 0: remove unwanted fields (except for
sid
andbid
). - if duplicates in merging \(\rightarrow\) discard them
2) Hash-based Projection
- only need
sid
andbid
. - external sorting
- \(h1()\): generate \(B-1\) partitions (each partition = small enough to fit in memory)
- remove unwatned fields
- \(h2\ (<>\ h1)\) (if partition is still too big \(\rightarrow\) go to 2 (recursive)).
- remove duplicates in each partition (duplicates already in the hash table); 서로 다른 partition 간 duplicate 존재 X (hashing 거쳤기 때문)
- write out all tuples in the hash table \(\rightarrow\) no duplicates
Purpose of Hash table : keep track of which tuple is on focus
Set Operations 🍘
Intersection and cross-product
- Special cases of join.
Union (Distinct) and Except
- Union \(\Leftrightarrow\) Except.
- removes duplicates within the same partition
- put in memory (hash table)
- if already exist \(\rightarrow\) duplicates.
Sorting based approach to union:
SELECT DISTINCT R.sid, R.bid
FROM Reserves R
- Sort both relations (on combination of all attributes).
- Scan sorted relations and merge them.
- Alternative: Merge runs from Pass 0 for both relations.
Hash based approach to union:
- Partition \(R\) and \(S\) using hash function \(h()\).
- For each S-partition:
- build in-memory hash table (using h2)
- scan corresponding R-partition
- add tuples to table while discarding duplicates.
기말출제!! ★★★★★★
Sort-based intersection to union 어떻게 해?
Hash-based intersection to union 어떻게 해?
Aggregate Operations (AVG, MIN, etc.) 🥚
- Expensive!! (\(\hookleftarrow\) must touch all tuples)
index-only scan:
SELECT 칼럼 및 WHERE 조건을 모두 포함하는 인덱스 기준 scan; avoid looking at the table data entirely
Without grouping:
SELECT max(sal)
FROM Emp
- In general, requires scanning the whole relation.
- Given index whose search key includes all attributes in the \(SELECT\) or \(WHERE\) clauses, can do index-only scan.
With grouping:
SELECT max(sal), qty
FROM Emp
GROUP BY qty
- Sort on group-by attributes, then scan relation and compute aggregate for each group.
- Can improve upon this by combining sorting and aggregate computation
- Sorting 과정에서 mean 같은 수치 구할 수 있음.
- Can improve upon this by combining sorting and aggregate computation
- Similar approach based on hashing on group-by attributes.
- Given tree index whose search key includes all attributes in \(SELECT\), \(WHERE\) and \(GROUP\ BY\) clauses, can do index-only scan
- if group-by attributes form prefix of search key, can retrieve data entries/tuples in group-by order.
Summary 👌
- A virtue of relational DBMSs:
- queries are composed of a few basic operators; the implementation of these operators can be carefully tuned (and it is important to do this!).
- Many alternative implementation techniques for each operator
- no universally superior technique for most operators.
- Must consider available alternatives for each operation in a query and choose best one based on system statistics, etc.
- This is part of the broader task of optimizing a query composed of several ops.
Reference
Database Management Systems by Raghu Ramakrishnan and Johannes Gehrke
댓글남기기