3 분 소요

INTRO 🙌

저번 시간에는 Sorting & Relational Operations 에 대하여 알아보았다.

이번 시간에는 Evaluation of Relational Operations: Other Techniques에 대해 알아보자.


Selection Operation (Simple Selection)

  • Select (R.attr <op> value) R (i.e., \(age = 20\)).

No index, unsorted (file scan)

  • must scan whole table

No Index, Sorted Data (sorted-file scan)

기말 출제 X

\[O(log_2M)+(scan\ cost)\]
  • utilize the sort order through a binary search (\(O(log_2M)\)).
    • locate the first tuple that satisfies the selection condition
    • retrieve all tuples that satisfy the selection condition by scanning R until the selection condition is no longer satisfied.
  • 한계:
    • 현실에서 정렬된 데이터 보장 X
  • 해결:
    • Alternative (1):
      • allow data records to be stored as index data entries.
    • If the ordering of data records is important \(\rightarrow\) B+ tree index that uses Alternative (1).

B+ Tree Index

  • STEP 1: locate the first index entry that points to a qualifying tuple of R
    • Cost: 2 or 3 I/Os
  • STEP 2: Scan the leaf pages to retrieve all entries in which the key value satisfies the selection condition
    • Cost: Depends on the number of qualifying tuples, the employed alternative and whether the index is clustered

기존 index보다 더 좋은 성능의 index 구현 가능하다.

  • 기존 index: 파일이 커질수록 많은 overflow block 발생 \(\rightarrow\) 성능 하락 \(\rightarrow\) 주기적 재구성할 필요
  • B+ Tree: 삽입/삭제 시 자동으로 구조 유지 \(\rightarrow\) 주기적 재구성할 필요 X.

B+ Tree Index 기말에서 이것만 알면 됨

Btree: good for \(>=, <=\)

Hash Index (Best)

  • hashing을 통해 데이터 위치 index 저장
    • hashing 함수 = key 값을 일정한 범위의 수로 변환
  • Cost = a few (typically one or two) l/Os

Hash Index 기말에서 이것만 알면 됨

hash: good for \(=\)


Projection Operation 🍜

        SELECT DISTINCT R.sid, R.bid
        FROM Reserves R

1) Sort-based Projection

An approach based on sorting:

  1. Pass 0 (external sort): eliminate unwanted fields.
  2. Eliminate duplicates.

  3. only need sid and bid
  4. external sorting
  5. Pass 0: remove unwanted fields (except for sid and bid).
  6. if duplicates in merging \(\rightarrow\) discard them

2) Hash-based Projection

  1. only need sid and bid.
  2. external sorting
  3. \(h1()\): generate \(B-1\) partitions (each partition = small enough to fit in memory)
  4. remove unwatned fields
  5. \(h2\ (<>\ h1)\) (if partition is still too big \(\rightarrow\) go to 2 (recursive)).
  6. remove duplicates in each partition (duplicates already in the hash table); 서로 다른 partition 간 duplicate 존재 X (hashing 거쳤기 때문)
  7. write out all tuples in the hash table \(\rightarrow\) no duplicates

Purpose of Hash table : keep track of which tuple is on focus


Set Operations 🍘

Intersection and cross-product

  • Special cases of join.

Union (Distinct) and Except

  • Union \(\Leftrightarrow\) Except.
  • removes duplicates within the same partition
    • put in memory (hash table)
    • if already exist \(\rightarrow\) duplicates.

Sorting based approach to union:

        SELECT DISTINCT R.sid, R.bid
        FROM Reserves R
  • Sort both relations (on combination of all attributes).
  • Scan sorted relations and merge them.
  • Alternative: Merge runs from Pass 0 for both relations.

Hash based approach to union:

  • Partition \(R\) and \(S\) using hash function \(h()\).
  • For each S-partition:
    • build in-memory hash table (using h2)
    • scan corresponding R-partition
    • add tuples to table while discarding duplicates.

기말출제!! ★★★★★★

Sort-based intersection to union 어떻게 해?

Hash-based intersection to union 어떻게 해?


Aggregate Operations (AVG, MIN, etc.) 🥚

  • Expensive!! (\(\hookleftarrow\) must touch all tuples)

index-only scan:

SELECT 칼럼 및 WHERE 조건을 모두 포함하는 인덱스 기준 scan; avoid looking at the table data entirely

Without grouping:

    SELECT max(sal)
    FROM Emp
  • In general, requires scanning the whole relation.
  • Given index whose search key includes all attributes in the \(SELECT\) or \(WHERE\) clauses, can do index-only scan.

With grouping:

    SELECT max(sal), qty
    FROM Emp
    GROUP BY qty
  • Sort on group-by attributes, then scan relation and compute aggregate for each group.
    • Can improve upon this by combining sorting and aggregate computation
      • Sorting 과정에서 mean 같은 수치 구할 수 있음.
  • Similar approach based on hashing on group-by attributes.
  • Given tree index whose search key includes all attributes in \(SELECT\), \(WHERE\) and \(GROUP\ BY\) clauses, can do index-only scan
    • if group-by attributes form prefix of search key, can retrieve data entries/tuples in group-by order.

Summary 👌

  • A virtue of relational DBMSs:
    • queries are composed of a few basic operators; the implementation of these operators can be carefully tuned (and it is important to do this!).
  • Many alternative implementation techniques for each operator
    • no universally superior technique for most operators.
  • Must consider available alternatives for each operation in a query and choose best one based on system statistics, etc.
    • This is part of the broader task of optimizing a query composed of several ops.

Reference

Database Management Systems by Raghu Ramakrishnan and Johannes Gehrke

Relational Operators

댓글남기기