Class Project : MR for Hearing disability

Class Project : MR for Hearing disability

Presentation : FRA500 MR for Hearing disability

Problem

  1. Hearing-impaired individuals face communication barriers as they cannot perceive sound clearly.
  2. Limited access to audio-based information or educational content.

Soluton

The MR system captures spoken voice and converts it into subtitles displayed on HoloLens 2 in real-time, allowing hearing-impaired users to read spoken content directly.

System Scenario

System Scenario
  • The user wears a HoloLens 2 device that listens to nearby speech.
  • Speech is processed into text and displayed directly on the HoloLens display.
  • This enables real-time visual transcription of conversations.

System Data Flow

System Data Flow
  1. Input: Microphone
    • Captures real-time audio input via HoloLens 2 microphone.
    • Raw audio data is collected.
  2. Processing: Speech-to-Text (Azure)
    • Converts audio to text using Microsoft Azure Speech SDK.
    • Supports multiple languages like “en-US” and “th-TH”.
  3. Output: HoloLens 2
    • Displays transcribed text in Mixed Reality on the headset screen.
    • Real-time display with timeout to clear old text.

System Flow

  1. Initializes with default language (“en-US”).
  2. User can switch languages via button, reinitializing the recognizer.
  3. The system transcribes both partial and final speech into text on the UI.
  4. Maintains recent lines of dialogue, discards older entries.

Software Processing : Microsoft Azure Speech to Text with Unity

Tools Used:

  • Unity 2020.3+
  • Visual Studio 2019+

Steps:

  1. Download and import
  • Azure Speech SDK for Unity
  • NuGet for Unity

      Link for Download

2. Create Unity Project and import MRTK3 via Mixed Reality Feature Tool.

Feature of MRTK which Import in Project
  • MRTK3: Select All
  • Platform Support: Mixed Reality OpenXR Plugin

3. Place MRTK XR Rig and InputSimulator in the hierarchy.

prefab of MRTK on Hierarchy

4. Import Speech SDK for Unity

  • Assets > Import Package > Custom Package.

5. Import NuGetforUnity

  • Open menu NuGet > Manage NuGet Packages > Azure.Core
Import Azure Core

6. Create Script

แสดง Flow Chart ของ RealTimeSpeechWithLanguageSwitch.cs

Script Overview (RealTimeSpeechWithLanguageSwitch.cs)

Main Components:

  • Language Button: toggles recognition language
  • TextMeshProUGUI (labelText, outputText):
    labelText: Shows “EN” or “TH” based on current language
    outputText: Displays recognized speech
  • SpeechRecognizer: Azure SDK recognizer for real-time transcription

Key Functions

Start()


async void Start()
{
    UpdateLabel();
    var button = languageButtonObject.GetComponent<Button>();
    if (button != null)
    {
        button.onClick.AddListener(SwitchLanguage);
    }

    await InitRecognizer();
}
  • Updates the language label
  • Adds click listener to language button
  • Initializes the speech recognizer

UpdateLable()


void UpdateLabel()
{
    if (labelText != null)
    {
        labelText.text = (currentLang == "en-US") ? "EN" : "TH";
    }
}

Updates UI language display based on currentLang

SwitchLanguage() และ SwitchLanguageAsync()


public void SwitchLanguage()
{
    _ = SwitchLanguageAsync();  // Fire and forget
}

private async System.Threading.Tasks.Task SwitchLanguageAsync()
{
    currentLang = (currentLang == "en-US") ? "th-TH" : "en-US";
    UpdateLabel();
    await InitRecognizer();
}
  • Toggles language between “en-US” and “th-TH”
  • Reinitializes recognizer

InitRecognizer()


private async System.Threading.Tasks.Task InitRecognizer()
{
    if (recognizer != null)
    {
        await recognizer.StopContinuousRecognitionAsync();
        recognizer.Recognizing -= OnRecognizing;
        recognizer.Recognized -= OnRecognized;
        recognizer.Dispose();
    }

    var config = SpeechConfig.FromSubscription(
        "Your_Key", 
        "southeastasia"
    );
    config.SpeechRecognitionLanguage = currentLang;

    recognizer = new SpeechRecognizer(config);
    recognizer.Recognizing += OnRecognizing;
    recognizer.Recognized += OnRecognized;

    await recognizer.StartContinuousRecognitionAsync();
}
  • Disposes previous recognizer (if any)
  • Creates new SpeechConfig based on language
  • Creates new recognizer and binds events:
    Recognizing → partial text
    Recognized → final result
  • Starts continuous recognition

OnRecognizing()


void OnRecognizing(object sender, SpeechRecognitionEventArgs e)
{
    partialText = e.Result.Text;
}
  • Captures ongoing speech into partialText

OnRecognized()


void OnRecognized(object sender, SpeechRecognitionEventArgs e)
{
    Debug.Log($"[Recognized] Reason: {e.Result.Reason}, Text: {e.Result.Text}");

    if (e.Result.Reason == ResultReason.RecognizedSpeech)
    {
        var newLines = e.Result.Text.Split('\n');

        foreach (var line in newLines)
        {
            lastLines.Enqueue(line.Trim());

            while (lastLines.Count > maxLines)
                lastLines.Dequeue();
        }

        partialText = "";
        lastSpokenTime = Time.time;
    }
}
  • Receives full recognized sentence
  • Splits into lines, enqueues in lastLines (max 2)
  • Clears partialText and updates lastSpokenTime

Update()


void Update()
{
    // ตรวจสอบ timeout
    if (Time.time - lastSpokenTime > speechTimeout)
    {
        if (lastLines.Count > 0 || !string.IsNullOrEmpty(partialText))
        {
            lastLines.Clear();
            partialText = "";
            lastSpokenTime = Time.time;
        }
    }
    string trimmedHistory = string.Join("\n", lastLines);

    if (outputText != null)
    {
        outputText.text = trimmedHistory + "\n&lt;color=#888888&gt;" + partialText + "&lt;/color&gt;";
    }
}
  • Clears all if no speech > timeout (8 sec)
  • Displays last lines + current partial in UI

UI Design

Main UI
  • Text Display Box: shows live speech as text
  • Language Button (LN): toggles language, updates label (“EN” / “TH”)
  • Settings Button: toggles font size slider, future theme options
Setting UI
  • Settings Gear Button: toggles panel visibility
  • Font Size Slider: adjusts main text size via MRTK3 slider

User Expereience

Test User Experiance
  • Convenience:
    • HoloLens weight may cause fatigue during prolonged use
    • Multi-speaker environments reduce recognition accuracy
  • Real-Time Feedback:
    • Small delay due to cloud processing
    • More accurate in quiet environments
  • Recognition Accuracy:
    • High for clear English speech
    • Moderate for Thai, affected by speed and accent
  • UI Design Notes:
    • Dark blue background may reduce readability
    • LN button small and hard to press in headset

Improvements

  • Font size adjustment for better readability
  • Separate language button from output panel
  • Reduced display to 2 lines for clarity

Future Plan

  • Convenience:
    • Directional filtering to focus on main speaker
    • Speaker identification for multi-user support
  • Real-Time Feedback:
    • Multi-mic array for broader coverage
    • Beamforming mics to reduce background noise
    • UI feedback: “Listening…”, “Silent”, “Please speak again”
  • Recognition Accuracy:
    • Fine-tuned models for specific scenarios (classroom, meeting)
  • UI Enhancements:
    • Customizable background color and themes
    • High-contrast mode for visual accessibility
    • Gesture or voice command control (e.g., raise hand = switch language)
    • Feedback sounds for events (e.g., beep on listen/start/error)

Conclusion

The developed system successfully transcribes speech into real-time text displayed on HoloLens 2.

Key Strengths:

  • Multi-language support with live switching
  • Continuous recognition with real-time text display
  • Adjustable font size for accessibility
  • Auto-clear mechanism after inactivity

Limitations:

  • Accuracy drops with unclear or fast speech
  • Minor delay due to cloud processing
  • Not robust to overlapping speakers
  • Long-term use may cause headset discomfort

Responsibilities

1. Ms. Apichaya Sriwong (Student ID: 65340500059)
Responsibilities:

  1. Design the Speech-to-Text system
    • Researched and selected Microsoft Azure Speech SDK
  2. UI development using Unity and MRTK3
    • Designed a suitable interface for HoloLens devices
    • Developed UI to display real-time text, language switch button, and settings panel
  3. Scripted components for audio input handling, language switching, and font size adjustment

2. Ms. Chananachida Prongjit (Student ID: 65340500066)
Responsibilities:

  1. Designed the Speech-to-Text system
    • Researched Google Speech SDK
    • Compared SDKs and summarized the selection process
  2. Collected experimental data and summarized results