[개발] PART 3: Voice Recognition

2 분 소요

PART 1: STT(Speech to Text)

!pip install SpeechRecognition
!pip install PyAudio

import speech_recognition as sr

def transform():
    r = sr.Recognizer()

    with sr.Microphone() as source:
        r.pause_threshold = 0.8 # 0.8초 동안 정적이면 자동으로 음성 녹음을 종료한다 terminate recording in 0.8 seconds of silence
        said = r.listen(source) # 녹음본 'said'에 저장하기 save the record into 'said'
        try:
            q = r.recognize_google(said, language="ko") # google 언어팩 사용 using google language package
            return q
        except sr.UnknownValueError:
            return "무슨 말인지 이해를 잘 못 했어요" # "Cannot understnad your saying"
        except:
            return "대기중입니다." # "Awaiting"

transform() # Start STT

마이크를 켜고, 음성을 녹음하면 메시지로 출력되는 것을 보실 수 있습니다. Turn on your microphone, then check if the program creates your record 

PART 2: TTS(Text to Speech)

!pip install pyttsx3 # TTS 클래스

import pyttsx3

def speaking(message):
    engine = pyttsx3.init() # Run TTS engine
    engine.say(message) # 메시지 내용 엔진에 전달 Transfer message to the engine
    engine.runAndWait() # 메시지 음성출력 speak out the message

speaking("이제 STT가 준비가 되었습니다. 계속 해서 Voice Assistance를 만들어 보겠습니다.") # "Now, STT is ready. We are going to make 'Voice Assistance'"

스피커에서 상기 메시지가 출력되는 것을 보실 수 있습니다. Your speaker will produce the message above

TTS Exercise 1: 요일/시간 음성 출력하기 Audio Output (Day/Time)

def query_day():
    day = datetime.date.today()
    weekday = day.weekday()
    week_mapping = {
        0: "월요일", # Monday
        1: "화요일", # Tuesday
        2: "수요일", # Wednesday
        3: "목요일", # Thursday
        4: "금요일", # Friday
        5: "토요일", # Saturday
        6: "일요일", # Sunday
    }

    speaking(f"오늘은 {week_mapping[weekday]}입니다. {week_mapping[weekday]}에도 공부하느라 고생이시네요!" )

query_day()

오늘 요일이 음성 출력되는 것을 보실 수 있습니다. Audio output successfully produced

def query_time():
    time = datetime.datetime.now().strftime("%H:%M:%S")
    speaking(f"현재는 {time[:2]}시 {time[3:5]}분입니다.")

query_time()

현재 시간이 음성 출력되는 것을 보실 수 있습니다. Audio output successfully produced

TTS Exercise 2: STT + TTS

import webbrowser

while (True):
    q = transform()

    if  "무슨 요일" in q:
        query_day()
        continue
    elif "몇 시" in q:
        query_time()
        continue 
    elif "유튜브 시작" in q:
        speaking("유튜브를 시작하겠습니다. 잠시만 기다려주세요!")
        webbrowser.open("https://www.youtube.com")
    elif "네이버 시작" in q:
        speaking("네이버를 시작하겠습니다. 잠시만 기다려주세요!")
        webbrowser.open("https://www.naver.com")
    elif "이동" in q:
        speaking("윈도우키를 실행하겠습니다. 잠시만 기다려주세요!")
        pyautogui.moveTo(700, 1050, 3)
        pyautogui.click(button="left")        
    elif "이제 그만" in q:
        speaking("아쉽지만 다음에 또 뵙겠습니다")
        break

가령, ‘네이버 시작’이라는 마이크 입력을 넣으면, ‘네이버 시작’이라는 말이 스피커로 반향되며 webbrowser 라이브러리를 통하여 네이버를 브라우저에서 띄워준다. If you say ‘NAVER’, your speaker will produce “Start NAVER” and then open the site on your browser.

Bonus: Keyboard I/O

!pip install pyautogui

import pyautogui

screenWidth, screenHeight = pyautogui.size() # screen resolution

pyautogui.moveTo(700, 1050, 3) # 마우스 커서를 x, y 좌표로 이동 Move the cursor to the coordinate of x and y
pyautogui.click(button="left") # 해당 위치 버튼을 좌클릭 left-click the current position

Twitter Facebook LinkedIn

쭌스🎄

[개발] PART 3: Voice Recognition

PART 1: STT(Speech to Text)

PART 2: TTS(Text to Speech)

TTS Exercise 1: 요일/시간 음성 출력하기 Audio Output (Day/Time)

TTS Exercise 2: STT + TTS

Bonus: Keyboard I/O

공유하기

댓글남기기

참고

2024.10.02
Evaluating on Image Hallucination for TTI Generative Models in I-HallA via PaliGemma

2023.12.04
[논문분석] Saliency as Pseudo-Pixel Supervision for Weakly and Semi-Supervised Semantic Segmentation (PAMI 2023)

2023.12.03
[논문분석] Segment Anything (ICCV 2023)

2023.12.03
[논문분석] Learning Transferable Visual Models From Natural Language Supervision (ICMR 2021)

2023.12.03
[논문분석] Learning Pixel-level Semantic Affinity with Image-level Supervision for Weakly Supervised Semantic Segmentation (CVPR 2018)

2023.12.01
[논문분석] Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization (ICCV 2017)

2023.12.01
[논문분석] Entropy regularization for weakly supervised object localization (PRL 2023)

2023.11.29
[논문분석] Railroad is not a Train: Saliency as Pseudo-pixel Supervision for Weakly Supervised Semantic Segmentation (CVPR 2021)

2023.11.25
[논문분석] Weaker Than You Think: A Critical Look at Weakly Supervised Learning (ACL 2023)

2023.08.03
[논문분석] PETR: Position Embedding Transformation for Multi-View 3D Object Detection (ECCV, 2022)

쭌스🎄

PART 1: STT(Speech to Text)

PART 2: TTS(Text to Speech)

TTS Exercise 1: 요일/시간 음성 출력하기 Audio Output (Day/Time)

TTS Exercise 2: STT + TTS

Bonus: Keyboard I/O

공유하기

댓글남기기

참고

2024.10.02 Evaluating on Image Hallucination for TTI Generative Models in I-HallA via PaliGemma

2023.12.04 [논문분석] Saliency as Pseudo-Pixel Supervision for Weakly and Semi-Supervised Semantic Segmentation (PAMI 2023)

2023.12.03 [논문분석] Segment Anything (ICCV 2023)

2023.12.03 [논문분석] Learning Transferable Visual Models From Natural Language Supervision (ICMR 2021)

2023.12.03 [논문분석] Learning Pixel-level Semantic Affinity with Image-level Supervision for Weakly Supervised Semantic Segmentation (CVPR 2018)

2023.12.01 [논문분석] Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization (ICCV 2017)

2023.12.01 [논문분석] Entropy regularization for weakly supervised object localization (PRL 2023)

2023.11.29 [논문분석] Railroad is not a Train: Saliency as Pseudo-pixel Supervision for Weakly Supervised Semantic Segmentation (CVPR 2021)

2023.11.25 [논문분석] Weaker Than You Think: A Critical Look at Weakly Supervised Learning (ACL 2023)

2023.08.03 [논문분석] PETR: Position Embedding Transformation for Multi-View 3D Object Detection (ECCV, 2022)

2024.10.02
Evaluating on Image Hallucination for TTI Generative Models in I-HallA via PaliGemma

2023.12.04
[논문분석] Saliency as Pseudo-Pixel Supervision for Weakly and Semi-Supervised Semantic Segmentation (PAMI 2023)

2023.12.03
[논문분석] Segment Anything (ICCV 2023)

2023.12.03
[논문분석] Learning Transferable Visual Models From Natural Language Supervision (ICMR 2021)

2023.12.03
[논문분석] Learning Pixel-level Semantic Affinity with Image-level Supervision for Weakly Supervised Semantic Segmentation (CVPR 2018)

2023.12.01
[논문분석] Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization (ICCV 2017)

2023.12.01
[논문분석] Entropy regularization for weakly supervised object localization (PRL 2023)

2023.11.29
[논문분석] Railroad is not a Train: Saliency as Pseudo-pixel Supervision for Weakly Supervised Semantic Segmentation (CVPR 2021)

2023.11.25
[논문분석] Weaker Than You Think: A Critical Look at Weakly Supervised Learning (ACL 2023)

2023.08.03
[논문분석] PETR: Position Embedding Transformation for Multi-View 3D Object Detection (ECCV, 2022)