3 분 소요

# INTRO 🙌

저번 시간에는 Sorting & Relational Operations 에 대하여 알아보았다.

이번 시간에는 Evaluation of Relational Operations: Other Techniques에 대해 알아보자.

# Selection Operation (Simple Selection)

• Select (R.attr <op> value) R (i.e., $$age = 20$$).

## No index, unsorted (file scan)

• must scan whole table

## No Index, Sorted Data (sorted-file scan)

기말 출제 X

$O(log_2M)+(scan\ cost)$
• utilize the sort order through a binary search ($$O(log_2M)$$).
• locate the first tuple that satisfies the selection condition
• retrieve all tuples that satisfy the selection condition by scanning R until the selection condition is no longer satisfied.
• 한계:
• 현실에서 정렬된 데이터 보장 X
• 해결:
• Alternative (1):
• allow data records to be stored as index data entries.
• If the ordering of data records is important $$\rightarrow$$ B+ tree index that uses Alternative (1).

## B+ Tree Index

• STEP 1: locate the first index entry that points to a qualifying tuple of R
• Cost: 2 or 3 I/Os
• STEP 2: Scan the leaf pages to retrieve all entries in which the key value satisfies the selection condition
• Cost: Depends on the number of qualifying tuples, the employed alternative and whether the index is clustered

기존 index보다 더 좋은 성능의 index 구현 가능하다.

• 기존 index: 파일이 커질수록 많은 overflow block 발생 $$\rightarrow$$ 성능 하락 $$\rightarrow$$ 주기적 재구성할 필요
• B+ Tree: 삽입/삭제 시 자동으로 구조 유지 $$\rightarrow$$ 주기적 재구성할 필요 X.

B+ Tree Index 기말에서 이것만 알면 됨

Btree: good for $$>=, <=$$

## Hash Index (Best)

• hashing을 통해 데이터 위치 index 저장
• hashing 함수 = key 값을 일정한 범위의 수로 변환
• Cost = a few (typically one or two) l/Os

Hash Index 기말에서 이것만 알면 됨

hash: good for $$=$$

# Projection Operation 🍜

        SELECT DISTINCT R.sid, R.bid
FROM Reserves R


## 1) Sort-based Projection

An approach based on sorting:

1. Pass 0 (external sort): eliminate unwanted fields.
2. Eliminate duplicates.

3. only need sid and bid
4. external sorting
5. Pass 0: remove unwanted fields (except for sid and bid).
6. if duplicates in merging $$\rightarrow$$ discard them

## 2) Hash-based Projection

1. only need sid and bid.
2. external sorting
3. $$h1()$$: generate $$B-1$$ partitions (each partition = small enough to fit in memory)
4. remove unwatned fields
5. $$h2\ (<>\ h1)$$ (if partition is still too big $$\rightarrow$$ go to 2 (recursive)).
6. remove duplicates in each partition (duplicates already in the hash table); 서로 다른 partition 간 duplicate 존재 X (hashing 거쳤기 때문)
7. write out all tuples in the hash table $$\rightarrow$$ no duplicates

Purpose of Hash table : keep track of which tuple is on focus

# Set Operations 🍘

## Intersection and cross-product

• Special cases of join.

## Union (Distinct) and Except

• Union $$\Leftrightarrow$$ Except.
• removes duplicates within the same partition
• put in memory (hash table)
• if already exist $$\rightarrow$$ duplicates.

### Sorting based approach to union:

        SELECT DISTINCT R.sid, R.bid
FROM Reserves R

• Sort both relations (on combination of all attributes).
• Scan sorted relations and merge them.
• Alternative: Merge runs from Pass 0 for both relations.

### Hash based approach to union:

• Partition $$R$$ and $$S$$ using hash function $$h()$$.
• For each S-partition:
• build in-memory hash table (using h2)
• scan corresponding R-partition

기말출제!! ★★★★★★

Sort-based intersection to union 어떻게 해?

Hash-based intersection to union 어떻게 해?

# Aggregate Operations (AVG, MIN, etc.) 🥚

• Expensive!! ($$\hookleftarrow$$ must touch all tuples)

index-only scan:

SELECT 칼럼 및 WHERE 조건을 모두 포함하는 인덱스 기준 scan; avoid looking at the table data entirely

## Without grouping:

    SELECT max(sal)
FROM Emp

• In general, requires scanning the whole relation.
• Given index whose search key includes all attributes in the $$SELECT$$ or $$WHERE$$ clauses, can do index-only scan.

## With grouping:

    SELECT max(sal), qty
FROM Emp
GROUP BY qty

• Sort on group-by attributes, then scan relation and compute aggregate for each group.
• Can improve upon this by combining sorting and aggregate computation
• Sorting 과정에서 mean 같은 수치 구할 수 있음.
• Similar approach based on hashing on group-by attributes.
• Given tree index whose search key includes all attributes in $$SELECT$$, $$WHERE$$ and $$GROUP\ BY$$ clauses, can do index-only scan
• if group-by attributes form prefix of search key, can retrieve data entries/tuples in group-by order.

# Summary 👌

• A virtue of relational DBMSs:
• queries are composed of a few basic operators; the implementation of these operators can be carefully tuned (and it is important to do this!).
• Many alternative implementation techniques for each operator
• no universally superior technique for most operators.
• Must consider available alternatives for each operation in a query and choose best one based on system statistics, etc.
• This is part of the broader task of optimizing a query composed of several ops.

# Reference

Database Management Systems by Raghu Ramakrishnan and Johannes Gehrke

Relational Operators

태그:

카테고리:

업데이트: