DBMS: Evaluation of Relational Operations

3 분 소요

INTRO 🙌

저번 시간에는 Sorting & Relational Operations 에 대하여 알아보았다.

이번 시간에는 Evaluation of Relational Operations: Other Techniques에 대해 알아보자.

Selection Operation (Simple Selection)

Select (R.attr <op> value) R (i.e., \(age = 20\)).

No index, unsorted (file scan)

must scan whole table

No Index, Sorted Data (sorted-file scan)

기말 출제 X

\[O(log_2M)+(scan\ cost)\]

utilize the sort order through a binary search (\(O(log_2M)\)).
- locate the first tuple that satisfies the selection condition
- retrieve all tuples that satisfy the selection condition by scanning R until the selection condition is no longer satisfied.
한계:
- 현실에서 정렬된 데이터 보장 X
해결:
- Alternative (1):
  - allow data records to be stored as index data entries.
- If the ordering of data records is important \(\rightarrow\) B+ tree index that uses Alternative (1).

B+ Tree Index

STEP 1: locate the first index entry that points to a qualifying tuple of R
- Cost: 2 or 3 I/Os
STEP 2: Scan the leaf pages to retrieve all entries in which the key value satisfies the selection condition
- Cost: Depends on the number of qualifying tuples, the employed alternative and whether the index is clustered

기존 index보다 더 좋은 성능의 index 구현 가능하다.

기존 index: 파일이 커질수록 많은 overflow block 발생 \(\rightarrow\) 성능 하락 \(\rightarrow\) 주기적 재구성할 필요
B+ Tree: 삽입/삭제 시 자동으로 구조 유지 \(\rightarrow\) 주기적 재구성할 필요 X.

B+ Tree Index 기말에서 이것만 알면 됨

Btree: good for \(>=, <=\)

Hash Index (Best)

hashing을 통해 데이터 위치 index 저장
- hashing 함수 = key 값을 일정한 범위의 수로 변환
Cost = a few (typically one or two) l/Os

Hash Index 기말에서 이것만 알면 됨

hash: good for \(=\)

Projection Operation 🍜

        SELECT DISTINCT R.sid, R.bid
        FROM Reserves R

1) Sort-based Projection

An approach based on sorting:

Pass 0 (external sort): eliminate unwanted fields.
Eliminate duplicates.
only need sid and bid
external sorting
Pass 0: remove unwanted fields (except for sid and bid).
if duplicates in merging \(\rightarrow\) discard them

2) Hash-based Projection

only need sid and bid.
external sorting
\(h1()\): generate \(B-1\) partitions (each partition = small enough to fit in memory)
remove unwatned fields
\(h2\ (<>\ h1)\) (if partition is still too big \(\rightarrow\) go to 2 (recursive)).
remove duplicates in each partition (duplicates already in the hash table); 서로 다른 partition 간 duplicate 존재 X (hashing 거쳤기 때문)
write out all tuples in the hash table \(\rightarrow\) no duplicates

Purpose of Hash table : keep track of which tuple is on focus

Set Operations 🍘

Intersection and cross-product

Special cases of join.

Union (Distinct) and Except

Union \(\Leftrightarrow\) Except.
removes duplicates within the same partition
- put in memory (hash table)
- if already exist \(\rightarrow\) duplicates.

Sorting based approach to union:

        SELECT DISTINCT R.sid, R.bid
        FROM Reserves R

Sort both relations (on combination of all attributes).
Scan sorted relations and merge them.
Alternative: Merge runs from Pass 0 for both relations.

Hash based approach to union:

Partition \(R\) and \(S\) using hash function \(h()\).
For each S-partition:
- build in-memory hash table (using h2)
- scan corresponding R-partition
- add tuples to table while discarding duplicates.

기말출제!! ★★★★★★

Sort-based intersection to union 어떻게 해?

Hash-based intersection to union 어떻게 해?

Aggregate Operations (AVG, MIN, etc.) 🥚

Expensive!! (\(\hookleftarrow\) must touch all tuples)

index-only scan:

SELECT 칼럼 및 WHERE 조건을 모두 포함하는 인덱스 기준 scan; avoid looking at the table data entirely

Without grouping:

    SELECT max(sal)
    FROM Emp

In general, requires scanning the whole relation.
Given index whose search key includes all attributes in the \(SELECT\) or \(WHERE\) clauses, can do index-only scan.

With grouping:

    SELECT max(sal), qty
    FROM Emp
    GROUP BY qty

Sort on group-by attributes, then scan relation and compute aggregate for each group.
- Can improve upon this by combining sorting and aggregate computation
  - Sorting 과정에서 mean 같은 수치 구할 수 있음.
Similar approach based on hashing on group-by attributes.
Given tree index whose search key includes all attributes in \(SELECT\), \(WHERE\) and \(GROUP\ BY\) clauses, can do index-only scan
- if group-by attributes form prefix of search key, can retrieve data entries/tuples in group-by order.

Summary 👌

A virtue of relational DBMSs:
- queries are composed of a few basic operators; the implementation of these operators can be carefully tuned (and it is important to do this!).
Many alternative implementation techniques for each operator
- no universally superior technique for most operators.
Must consider available alternatives for each operation in a query and choose best one based on system statistics, etc.
- This is part of the broader task of optimizing a query composed of several ops.

Reference

Database Management Systems by Raghu Ramakrishnan and Johannes Gehrke

Relational Operators

Twitter Facebook LinkedIn

쭌스🎄

DBMS: Evaluation of Relational Operations

INTRO 🙌

Selection Operation (Simple Selection)

No index, unsorted (file scan)

No Index, Sorted Data (sorted-file scan)

B+ Tree Index

Hash Index (Best)

Projection Operation 🍜

1) Sort-based Projection

2) Hash-based Projection

Set Operations 🍘

Intersection and cross-product

Union (Distinct) and Except

Sorting based approach to union:

Hash based approach to union:

Aggregate Operations (AVG, MIN, etc.) 🥚

Without grouping:

With grouping:

Summary 👌

Reference

공유하기

댓글남기기

참고

2024.10.02
Evaluating on Image Hallucination for TTI Generative Models in I-HallA via PaliGemma

2023.12.04
[논문분석] Saliency as Pseudo-Pixel Supervision for Weakly and Semi-Supervised Semantic Segmentation (PAMI 2023)

2023.12.03
[논문분석] Segment Anything (ICCV 2023)

2023.12.03
[논문분석] Learning Transferable Visual Models From Natural Language Supervision (ICMR 2021)

2023.12.03
[논문분석] Learning Pixel-level Semantic Affinity with Image-level Supervision for Weakly Supervised Semantic Segmentation (CVPR 2018)

2023.12.01
[논문분석] Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization (ICCV 2017)

2023.12.01
[논문분석] Entropy regularization for weakly supervised object localization (PRL 2023)

2023.11.29
[논문분석] Railroad is not a Train: Saliency as Pseudo-pixel Supervision for Weakly Supervised Semantic Segmentation (CVPR 2021)

2023.11.25
[논문분석] Weaker Than You Think: A Critical Look at Weakly Supervised Learning (ACL 2023)

2023.08.03
[논문분석] PETR: Position Embedding Transformation for Multi-View 3D Object Detection (ECCV, 2022)

쭌스🎄

INTRO 🙌

Selection Operation (Simple Selection)

No index, unsorted (file scan)

No Index, Sorted Data (sorted-file scan)

B+ Tree Index

Hash Index (Best)

Projection Operation 🍜

1) Sort-based Projection

2) Hash-based Projection

Set Operations 🍘

Intersection and cross-product

Union (Distinct) and Except

Sorting based approach to union:

Hash based approach to union:

Aggregate Operations (AVG, MIN, etc.) 🥚

Without grouping:

With grouping:

Summary 👌

Reference

공유하기

댓글남기기

참고

2024.10.02 Evaluating on Image Hallucination for TTI Generative Models in I-HallA via PaliGemma

2023.12.04 [논문분석] Saliency as Pseudo-Pixel Supervision for Weakly and Semi-Supervised Semantic Segmentation (PAMI 2023)

2023.12.03 [논문분석] Segment Anything (ICCV 2023)

2023.12.03 [논문분석] Learning Transferable Visual Models From Natural Language Supervision (ICMR 2021)

2023.12.03 [논문분석] Learning Pixel-level Semantic Affinity with Image-level Supervision for Weakly Supervised Semantic Segmentation (CVPR 2018)

2023.12.01 [논문분석] Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization (ICCV 2017)

2023.12.01 [논문분석] Entropy regularization for weakly supervised object localization (PRL 2023)

2023.11.29 [논문분석] Railroad is not a Train: Saliency as Pseudo-pixel Supervision for Weakly Supervised Semantic Segmentation (CVPR 2021)

2023.11.25 [논문분석] Weaker Than You Think: A Critical Look at Weakly Supervised Learning (ACL 2023)

2023.08.03 [논문분석] PETR: Position Embedding Transformation for Multi-View 3D Object Detection (ECCV, 2022)

2024.10.02
Evaluating on Image Hallucination for TTI Generative Models in I-HallA via PaliGemma

2023.12.04
[논문분석] Saliency as Pseudo-Pixel Supervision for Weakly and Semi-Supervised Semantic Segmentation (PAMI 2023)

2023.12.03
[논문분석] Segment Anything (ICCV 2023)

2023.12.03
[논문분석] Learning Transferable Visual Models From Natural Language Supervision (ICMR 2021)

2023.12.03
[논문분석] Learning Pixel-level Semantic Affinity with Image-level Supervision for Weakly Supervised Semantic Segmentation (CVPR 2018)

2023.12.01
[논문분석] Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization (ICCV 2017)

2023.12.01
[논문분석] Entropy regularization for weakly supervised object localization (PRL 2023)

2023.11.29
[논문분석] Railroad is not a Train: Saliency as Pseudo-pixel Supervision for Weakly Supervised Semantic Segmentation (CVPR 2021)

2023.11.25
[논문분석] Weaker Than You Think: A Critical Look at Weakly Supervised Learning (ACL 2023)

2023.08.03
[논문분석] PETR: Position Embedding Transformation for Multi-View 3D Object Detection (ECCV, 2022)