DBMS: Relational Algebra

3 분 소요

INTRO 🙌

저번 시간에는 Relational Algebra Operations 에 대하여 알아보았다.

이번 시간에는 Relational Algebra가 무엇인지 알아보자.

RA란?

다섯 개의 기본 연산자; 기본 연산자로부터 파생물 多
쿼리 생성을 위해 연산자 조합
- RA expressions, 주로 트리 형태

Complex Expressions (복잡한 표현식)

RA는 대수학과 비슷하게 동작
- 가령 $(x + 4)*(y - 3)$
3가지 notations (just as in arithmetic):
1. Sequences of assignment statements
2. Expressions with several operators
3. Expression trees

1) Sequences of Assignment

Theta-Join: $R3\ :=\ R1\ JOIN_C\ R2$

\[R4:= R1 * R2 \\ R3 :=\ SELECT_C\ (R4)\]

임시 relation 이름 생성 $R4$
Renaming can be implied by giving relations a list of attributes

2) Expressions with Several Operators

theta-join: $R3\ :=\ R1\ JOIN_C\ R2$.

\[R3\ :=\ SELECT_C\ (R1 * R2)\]

관계 연산자(relational operators) 우선 순위:
1. Unary operators (i.e., select, project, rename)
2. Products and joins
3. Intersection
4. Union and set difference

괄호() 사용하여 항상 우선 순위를 변경 가능

3) Expression Trees

가지(Leaves) = operands
- 관계 변수 (variables standing for relations)
- 특정 관계 (particular, constant relations)
Interior nodes = operators
- applied to their child or children.

응용

“Maple St.” 이름을 가졌거나 “Bud” 가격이 $3 보다 작은 모든 bars 이름 가져오기

\[Bars(name, addr)\] \[Sells(bar, beer, price)\]

서로 다른 두 개의 맥주를 같은 가격에 파는 bars 가져오기

\[Sells(bar, beer, price)\]

Renaming을 통해 Sells의 복사본 $S(bar, beer1, price)$을 정의하는 모습이다.

bar, price 열 기준으로 자연조인 –> beer, beer1만 cross-product 진행 후 중복 제거

이후, Sells과 S을 Natural Join으로 묶어, 주어진 조건을 만족하는 $quadruples(bar, beer, beer1, price)$를 생성한다.

Complex Queries (복잡한 쿼리)

    Product ( pid, name,  price, category, maker-cid)
    Purchase (buyer-ssn,  seller-ssn,  store,  pid)
    Company (cid, name, stock price, country)
    Person(ssn, name, phone number, city)

Purchase
- buyer-ssn, seller-ssn: foreign keys in Person
- pid: foreign key in Product
Product
- maker-cid: a foreign key in Company

하기 예시들을 통해 응용 이해를 도모하기를 바란다.

Ex #0: Find phone numbers of people who bought gizmos from Fred.

Ex #1: Find people who bought telephony products.

Ex #2: Find names of people who bought American products

Ex #3: Find telephony products that somebody bought

Ex #4: Find names of people who bought American products and did not buy French products

Ex #5: Find names of people who bought American products and they live in Champaign.

Ex #6: Find people who bought stuff from Joe or bought products from a company whose stock prices is more than $$50.

Operations on Bags

Union: $\{a,b,b,c\}\ U\ \{a,b,b,b,e,f,f\}\ =\ \{a,a,b,b,b,b,b,c,e,f,f\}$.
- add the number of occurrences
Difference: $\{a,b,b,b,c,c\}\ –\ \{b,c,c,c,d\}\ =\ \{a,b,b,d\}$.
- subtract the number of occurrences
Intersection: $\{a,b,b,b,c,c\}\ \cap\ \{b,b,c,c,c,c,d\}\ =\ \{b,b,c,c\}$.
- minimum of the two numbers of occurrences
Selection: 특정 요소 선택; preserve the number of occurrences (중복 포함)
Projection: 특정 요소 선택; preserve the number of occurrences (중복 제거)
Cartesian product, join: 중복 포함; no duplicate elimination

상기 개념들이 익숙하지 않다면

이전 글 참조

혹은 Textbook 참조 요망

Glimpse Ahead

Efficient Implementations of Operators

$\sigma_{s(age >= 30\ AND\ age <= 35)}(Employees)$.

Method 1: scan the file, test each employee
Method 2: use an index on age
Which one is better ? Depends a lot…

$Employees\ \infty\ Relatives$

이중 반복문
- 1) Iterate over Employees, then over Relatives
- 2) Iterate over Relatives, then over Employees
Sort Employees, Relatives, do *merge-join
*hash-join
etc.

MERGE-JOIN

조회의 범위가 많을 때 주로 사용하는 조인 방법
양쪽 테이블을 각각 Access 하여 그 결과를 정렬하고, 그 정렬한 결과를 차례로 Scan 해 나가면서 연결고리의 조건으로 Merge

사용 목적

연결 고리에 인덱스가 전혀 없는 경우
대용량의 자료를 조인할때 유리한 경우
조인 조건으로 <, >, <=, >=와 같은 범위 비교 연산자가 사용된 경우
인덱스 사용에 따른 랜덤 액세스의 오버헤드가 많은 경우

HASH-JOIN

조인될 두 테이블 중 하나를 해시 테이블로 선정
조인될 테이블의 조인 키 값을 해시 알고리즘으로 비교하여 매치되는 결과값을 얻는 방식

사용 목적

JOIN 컬럼에 적당한 인덱스가 없어 NL JOIN이 비효율적일 때
JOIN Access량이 많아 Random Access 부하가 심하여 NL JOIN이 비효율적일 때
Sort Merge Join을 하기에는 두 테이블이 너무 커 Sort 부하가 심할 때
수행빈도가 낮고 쿼리 수행 시간이 오래 걸리는 대용량 테이블을 JOIN 할 때

Optimizations

    Product ( pid, name,  price, category, maker-cid)
    Purchase (buyer-ssn,  seller-ssn,  store,  pid)
    Person(ssn, name, phone number, city)

\[\sigma_{price>100}(Product)\ \infty\ (Purchase\ \infty\ \sigma_{city=sea}Person)\] \[(\sigma_{price>100}(Product)\ \infty\ Purchase)\ \infty\ \sigma_{city=sea}Person\]

상기 두 가지 중 어떤 선택이 더 나은 선택인지는 상황에 따라 가변적이다.

Optimizer의 역할..

Summary of Relational Algebra

사용 목적

Operator 활용 복잡한 구현 표현 가능 (i.e., $\infty\ or\ \sigma_{C}$).
RA expressions 재활용 가능 $\rightarrow$ 최적화

한계

“transitive closure” 계산 불가능.

Transitive closure

어떤 정점 A에서 C로 가는 직접경로는 없고, 우회경로가 있을 때 A$\rightarrow$C로의 간선을 연결한 그래프

가령, 상기 테이블에서 Fred와 직간접 친족을 찾는 것은 RA 표현 대신, C program을 작성해야 한다.

Reference

Database Management Systems by Raghu Ramakrishnan and Johannes Gehrke

Relational Operators

Twitter Facebook LinkedIn

쭌스🎄

DBMS: Relational Algebra

INTRO 🙌

RA란?

Complex Expressions (복잡한 표현식)

1) Sequences of Assignment

2) Expressions with Several Operators

3) Expression Trees

응용

Complex Queries (복잡한 쿼리)

Operations on Bags

Glimpse Ahead

Efficient Implementations of Operators

MERGE-JOIN

사용 목적

HASH-JOIN

사용 목적

Optimizations

Summary of Relational Algebra

사용 목적

한계

Reference

공유하기

댓글남기기

참고

2024.10.02
Evaluating on Image Hallucination for TTI Generative Models in I-HallA via PaliGemma

2023.12.04
[논문분석] Saliency as Pseudo-Pixel Supervision for Weakly and Semi-Supervised Semantic Segmentation (PAMI 2023)

2023.12.03
[논문분석] Segment Anything (ICCV 2023)

2023.12.03
[논문분석] Learning Transferable Visual Models From Natural Language Supervision (ICMR 2021)

2023.12.03
[논문분석] Learning Pixel-level Semantic Affinity with Image-level Supervision for Weakly Supervised Semantic Segmentation (CVPR 2018)

2023.12.01
[논문분석] Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization (ICCV 2017)

2023.12.01
[논문분석] Entropy regularization for weakly supervised object localization (PRL 2023)

2023.11.29
[논문분석] Railroad is not a Train: Saliency as Pseudo-pixel Supervision for Weakly Supervised Semantic Segmentation (CVPR 2021)

2023.11.25
[논문분석] Weaker Than You Think: A Critical Look at Weakly Supervised Learning (ACL 2023)

2023.08.03
[논문분석] PETR: Position Embedding Transformation for Multi-View 3D Object Detection (ECCV, 2022)

쭌스🎄

INTRO 🙌

RA란?

Complex Expressions (복잡한 표현식)

1) Sequences of Assignment

2) Expressions with Several Operators

3) Expression Trees

응용

Complex Queries (복잡한 쿼리)

Operations on Bags

Glimpse Ahead

Efficient Implementations of Operators

MERGE-JOIN

사용 목적

HASH-JOIN

사용 목적

Optimizations

Summary of Relational Algebra

사용 목적

한계

Reference

공유하기

댓글남기기

참고

2024.10.02 Evaluating on Image Hallucination for TTI Generative Models in I-HallA via PaliGemma

2023.12.04 [논문분석] Saliency as Pseudo-Pixel Supervision for Weakly and Semi-Supervised Semantic Segmentation (PAMI 2023)

2023.12.03 [논문분석] Segment Anything (ICCV 2023)

2023.12.03 [논문분석] Learning Transferable Visual Models From Natural Language Supervision (ICMR 2021)

2023.12.03 [논문분석] Learning Pixel-level Semantic Affinity with Image-level Supervision for Weakly Supervised Semantic Segmentation (CVPR 2018)

2023.12.01 [논문분석] Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization (ICCV 2017)

2023.12.01 [논문분석] Entropy regularization for weakly supervised object localization (PRL 2023)

2023.11.29 [논문분석] Railroad is not a Train: Saliency as Pseudo-pixel Supervision for Weakly Supervised Semantic Segmentation (CVPR 2021)

2023.11.25 [논문분석] Weaker Than You Think: A Critical Look at Weakly Supervised Learning (ACL 2023)

2023.08.03 [논문분석] PETR: Position Embedding Transformation for Multi-View 3D Object Detection (ECCV, 2022)

2024.10.02
Evaluating on Image Hallucination for TTI Generative Models in I-HallA via PaliGemma

2023.12.04
[논문분석] Saliency as Pseudo-Pixel Supervision for Weakly and Semi-Supervised Semantic Segmentation (PAMI 2023)

2023.12.03
[논문분석] Segment Anything (ICCV 2023)

2023.12.03
[논문분석] Learning Transferable Visual Models From Natural Language Supervision (ICMR 2021)

2023.12.03
[논문분석] Learning Pixel-level Semantic Affinity with Image-level Supervision for Weakly Supervised Semantic Segmentation (CVPR 2018)

2023.12.01
[논문분석] Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization (ICCV 2017)

2023.12.01
[논문분석] Entropy regularization for weakly supervised object localization (PRL 2023)

2023.11.29
[논문분석] Railroad is not a Train: Saliency as Pseudo-pixel Supervision for Weakly Supervised Semantic Segmentation (CVPR 2021)

2023.11.25
[논문분석] Weaker Than You Think: A Critical Look at Weakly Supervised Learning (ACL 2023)

2023.08.03
[논문분석] PETR: Position Embedding Transformation for Multi-View 3D Object Detection (ECCV, 2022)