Class Project : MR for Hearing disability – Human-Computer Interface Laboratory

Presentation : FRA500 MR for Hearing disability

Problem

Hearing-impaired individuals face communication barriers as they cannot perceive sound clearly.
Limited access to audio-based information or educational content.

Soluton

The MR system captures spoken voice and converts it into subtitles displayed on HoloLens 2 in real-time, allowing hearing-impaired users to read spoken content directly.

System Scenario

The user wears a HoloLens 2 device that listens to nearby speech.
Speech is processed into text and displayed directly on the HoloLens display.
This enables real-time visual transcription of conversations.

System Data Flow

Input: Microphone
• Captures real-time audio input via HoloLens 2 microphone.
• Raw audio data is collected.
Processing: Speech-to-Text (Azure)
• Converts audio to text using Microsoft Azure Speech SDK.
• Supports multiple languages like “en-US” and “th-TH”.
Output: HoloLens 2
• Displays transcribed text in Mixed Reality on the headset screen.
• Real-time display with timeout to clear old text.

System Flow

Initializes with default language (“en-US”).
User can switch languages via button, reinitializing the recognizer.
The system transcribes both partial and final speech into text on the UI.
Maintains recent lines of dialogue, discards older entries.

Software Processing : Microsoft Azure Speech to Text with Unity

Tools Used:

Unity 2020.3+
Visual Studio 2019+

Steps:

Download and import

Azure Speech SDK for Unity
NuGet for Unity

Link for Download

Speech SDK for Unity:https://aka.ms/csspeech/unitypackage
NuGetforUnity:https://github.com/GlitchEnzo/NuGetForUnity/tree/master/src/NuGetForUnity

2. Create Unity Project and import MRTK3 via Mixed Reality Feature Tool.

MRTK3: Select All
Platform Support: Mixed Reality OpenXR Plugin

3. Place MRTK XR Rig and InputSimulator in the hierarchy.

4. Import Speech SDK for Unity

Assets > Import Package > Custom Package.

5. Import NuGetforUnity

Open menu NuGet > Manage NuGet Packages > Azure.Core

6. Create Script

แสดง Flow Chart ของ RealTimeSpeechWithLanguageSwitch.cs

Script Overview (RealTimeSpeechWithLanguageSwitch.cs)

Main Components:

Language Button: toggles recognition language
TextMeshProUGUI (labelText, outputText):
• labelText: Shows “EN” or “TH” based on current language
• outputText: Displays recognized speech
SpeechRecognizer: Azure SDK recognizer for real-time transcription

Key Functions

Start()


async void Start()
{
    UpdateLabel();
    var button = languageButtonObject.GetComponent<Button>();
    if (button != null)
    {
        button.onClick.AddListener(SwitchLanguage);
    }

    await InitRecognizer();
}

Updates the language label
Adds click listener to language button
Initializes the speech recognizer

UpdateLable()


void UpdateLabel()
{
    if (labelText != null)
    {
        labelText.text = (currentLang == "en-US") ? "EN" : "TH";
    }
}

Updates UI language display based on currentLang

SwitchLanguage() และ SwitchLanguageAsync()


public void SwitchLanguage()
{
    _ = SwitchLanguageAsync();  // Fire and forget
}

private async System.Threading.Tasks.Task SwitchLanguageAsync()
{
    currentLang = (currentLang == "en-US") ? "th-TH" : "en-US";
    UpdateLabel();
    await InitRecognizer();
}

Toggles language between “en-US” and “th-TH”
Reinitializes recognizer

InitRecognizer()


private async System.Threading.Tasks.Task InitRecognizer()
{
    if (recognizer != null)
    {
        await recognizer.StopContinuousRecognitionAsync();
        recognizer.Recognizing -= OnRecognizing;
        recognizer.Recognized -= OnRecognized;
        recognizer.Dispose();
    }

    var config = SpeechConfig.FromSubscription(
        "Your_Key", 
        "southeastasia"
    );
    config.SpeechRecognitionLanguage = currentLang;

    recognizer = new SpeechRecognizer(config);
    recognizer.Recognizing += OnRecognizing;
    recognizer.Recognized += OnRecognized;

    await recognizer.StartContinuousRecognitionAsync();
}

Disposes previous recognizer (if any)
Creates new SpeechConfig based on language
Creates new recognizer and binds events:
• Recognizing → partial text
• Recognized → final result
Starts continuous recognition

OnRecognizing()


void OnRecognizing(object sender, SpeechRecognitionEventArgs e)
{
    partialText = e.Result.Text;
}

Captures ongoing speech into partialText

OnRecognized()


void OnRecognized(object sender, SpeechRecognitionEventArgs e)
{
    Debug.Log($"[Recognized] Reason: {e.Result.Reason}, Text: {e.Result.Text}");

    if (e.Result.Reason == ResultReason.RecognizedSpeech)
    {
        var newLines = e.Result.Text.Split('\n');

        foreach (var line in newLines)
        {
            lastLines.Enqueue(line.Trim());

            while (lastLines.Count > maxLines)
                lastLines.Dequeue();
        }

        partialText = "";
        lastSpokenTime = Time.time;
    }
}

Receives full recognized sentence
Splits into lines, enqueues in lastLines (max 2)
Clears partialText and updates lastSpokenTime

Update()


void Update()
{
    // ตรวจสอบ timeout
    if (Time.time - lastSpokenTime > speechTimeout)
    {
        if (lastLines.Count > 0 || !string.IsNullOrEmpty(partialText))
        {
            lastLines.Clear();
            partialText = "";
            lastSpokenTime = Time.time;
        }
    }
    string trimmedHistory = string.Join("\n", lastLines);

    if (outputText != null)
    {
        outputText.text = trimmedHistory + "\n&lt;color=#888888&gt;" + partialText + "&lt;/color&gt;";
    }
}

Clears all if no speech > timeout (8 sec)
Displays last lines + current partial in UI

UI Design

Text Display Box: shows live speech as text
Language Button (LN): toggles language, updates label (“EN” / “TH”)
Settings Button: toggles font size slider, future theme options

Settings Gear Button: toggles panel visibility
Font Size Slider: adjusts main text size via MRTK3 slider

User Expereience

Convenience:
• HoloLens weight may cause fatigue during prolonged use
• Multi-speaker environments reduce recognition accuracy
Real-Time Feedback:
• Small delay due to cloud processing
• More accurate in quiet environments
Recognition Accuracy:
• High for clear English speech
• Moderate for Thai, affected by speed and accent
UI Design Notes:
• Dark blue background may reduce readability
• LN button small and hard to press in headset

Improvements

Font size adjustment for better readability
Separate language button from output panel
Reduced display to 2 lines for clarity

Future Plan

Convenience:
• Directional filtering to focus on main speaker
• Speaker identification for multi-user support
Real-Time Feedback:
• Multi-mic array for broader coverage
• Beamforming mics to reduce background noise
• UI feedback: “Listening…”, “Silent”, “Please speak again”
Recognition Accuracy:
• Fine-tuned models for specific scenarios (classroom, meeting)
UI Enhancements:
• Customizable background color and themes
• High-contrast mode for visual accessibility
• Gesture or voice command control (e.g., raise hand = switch language)
• Feedback sounds for events (e.g., beep on listen/start/error)

Conclusion

The developed system successfully transcribes speech into real-time text displayed on HoloLens 2.

Key Strengths:

Multi-language support with live switching
Continuous recognition with real-time text display
Adjustable font size for accessibility
Auto-clear mechanism after inactivity

Limitations:

Accuracy drops with unclear or fast speech
Minor delay due to cloud processing
Not robust to overlapping speakers
Long-term use may cause headset discomfort

Responsibilities

1. Ms. Apichaya Sriwong (Student ID: 65340500059)
Responsibilities:

Design the Speech-to-Text system
• Researched and selected Microsoft Azure Speech SDK
UI development using Unity and MRTK3
• Designed a suitable interface for HoloLens devices
• Developed UI to display real-time text, language switch button, and settings panel
Scripted components for audio input handling, language switching, and font size adjustment

2. Ms. Chananachida Prongjit (Student ID: 65340500066)
Responsibilities:

Designed the Speech-to-Text system
• Researched Google Speech SDK
• Compared SDKs and summarized the selection process
Collected experimental data and summarized results