Presentation : FRA500 MR for Hearing disability
Problem
- Hearing-impaired individuals face communication barriers as they cannot perceive sound clearly.
- Limited access to audio-based information or educational content.
Soluton
The MR system captures spoken voice and converts it into subtitles displayed on HoloLens 2 in real-time, allowing hearing-impaired users to read spoken content directly.
System Scenario

- The user wears a HoloLens 2 device that listens to nearby speech.
- Speech is processed into text and displayed directly on the HoloLens display.
- This enables real-time visual transcription of conversations.
System Data Flow

- Input: Microphone
• Captures real-time audio input via HoloLens 2 microphone.
• Raw audio data is collected. - Processing: Speech-to-Text (Azure)
• Converts audio to text using Microsoft Azure Speech SDK.
• Supports multiple languages like “en-US” and “th-TH”. - Output: HoloLens 2
• Displays transcribed text in Mixed Reality on the headset screen.
• Real-time display with timeout to clear old text.
System Flow
- Initializes with default language (“en-US”).
- User can switch languages via button, reinitializing the recognizer.
- The system transcribes both partial and final speech into text on the UI.
- Maintains recent lines of dialogue, discards older entries.
Software Processing : Microsoft Azure Speech to Text with Unity
Tools Used:
- Unity 2020.3+
- Visual Studio 2019+
Steps:
- Download and import
- Azure Speech SDK for Unity
- NuGet for Unity
Link for Download
- Speech SDK for Unity:https://aka.ms/csspeech/unitypackage
- NuGetforUnity:https://github.com/GlitchEnzo/NuGetForUnity/tree/master/src/NuGetForUnity
2. Create Unity Project and import MRTK3 via Mixed Reality Feature Tool.

- MRTK3: Select All
- Platform Support: Mixed Reality OpenXR Plugin
3. Place MRTK XR Rig and InputSimulator in the hierarchy.

4. Import Speech SDK for Unity
- Assets > Import Package > Custom Package.
5. Import NuGetforUnity
- Open menu NuGet > Manage NuGet Packages > Azure.Core


6. Create Script

Script Overview (RealTimeSpeechWithLanguageSwitch.cs)
Main Components:
- Language Button: toggles recognition language
- TextMeshProUGUI (labelText, outputText):
•labelText
: Shows “EN” or “TH” based on current language
•outputText
: Displays recognized speech - SpeechRecognizer: Azure SDK recognizer for real-time transcription
Key Functions
Start()
async void Start()
{
UpdateLabel();
var button = languageButtonObject.GetComponent<Button>();
if (button != null)
{
button.onClick.AddListener(SwitchLanguage);
}
await InitRecognizer();
}
- Updates the language label
- Adds click listener to language button
- Initializes the speech recognizer
UpdateLable()
void UpdateLabel()
{
if (labelText != null)
{
labelText.text = (currentLang == "en-US") ? "EN" : "TH";
}
}
Updates UI language display based on currentLang
SwitchLanguage() และ SwitchLanguageAsync()
public void SwitchLanguage()
{
_ = SwitchLanguageAsync(); // Fire and forget
}
private async System.Threading.Tasks.Task SwitchLanguageAsync()
{
currentLang = (currentLang == "en-US") ? "th-TH" : "en-US";
UpdateLabel();
await InitRecognizer();
}
- Toggles language between “en-US” and “th-TH”
- Reinitializes recognizer
InitRecognizer()
private async System.Threading.Tasks.Task InitRecognizer()
{
if (recognizer != null)
{
await recognizer.StopContinuousRecognitionAsync();
recognizer.Recognizing -= OnRecognizing;
recognizer.Recognized -= OnRecognized;
recognizer.Dispose();
}
var config = SpeechConfig.FromSubscription(
"Your_Key",
"southeastasia"
);
config.SpeechRecognitionLanguage = currentLang;
recognizer = new SpeechRecognizer(config);
recognizer.Recognizing += OnRecognizing;
recognizer.Recognized += OnRecognized;
await recognizer.StartContinuousRecognitionAsync();
}
- Disposes previous recognizer (if any)
- Creates new SpeechConfig based on language
- Creates new recognizer and binds events:
•Recognizing
→ partial text
•Recognized
→ final result - Starts continuous recognition
OnRecognizing()
void OnRecognizing(object sender, SpeechRecognitionEventArgs e)
{
partialText = e.Result.Text;
}
- Captures ongoing speech into
partialText
OnRecognized()
void OnRecognized(object sender, SpeechRecognitionEventArgs e)
{
Debug.Log($"[Recognized] Reason: {e.Result.Reason}, Text: {e.Result.Text}");
if (e.Result.Reason == ResultReason.RecognizedSpeech)
{
var newLines = e.Result.Text.Split('\n');
foreach (var line in newLines)
{
lastLines.Enqueue(line.Trim());
while (lastLines.Count > maxLines)
lastLines.Dequeue();
}
partialText = "";
lastSpokenTime = Time.time;
}
}
- Receives full recognized sentence
- Splits into lines, enqueues in
lastLines
(max 2) - Clears
partialText
and updateslastSpokenTime
Update()
void Update()
{
// ตรวจสอบ timeout
if (Time.time - lastSpokenTime > speechTimeout)
{
if (lastLines.Count > 0 || !string.IsNullOrEmpty(partialText))
{
lastLines.Clear();
partialText = "";
lastSpokenTime = Time.time;
}
}
string trimmedHistory = string.Join("\n", lastLines);
if (outputText != null)
{
outputText.text = trimmedHistory + "\n<color=#888888>" + partialText + "</color>";
}
}
- Clears all if no speech > timeout (8 sec)
- Displays last lines + current partial in UI
UI Design

- Text Display Box: shows live speech as text
- Language Button (LN): toggles language, updates label (“EN” / “TH”)
- Settings Button: toggles font size slider, future theme options

- Settings Gear Button: toggles panel visibility
- Font Size Slider: adjusts main text size via MRTK3 slider
User Expereience

- Convenience:
• HoloLens weight may cause fatigue during prolonged use
• Multi-speaker environments reduce recognition accuracy - Real-Time Feedback:
• Small delay due to cloud processing
• More accurate in quiet environments - Recognition Accuracy:
• High for clear English speech
• Moderate for Thai, affected by speed and accent - UI Design Notes:
• Dark blue background may reduce readability
• LN button small and hard to press in headset
Improvements
- Font size adjustment for better readability
- Separate language button from output panel
- Reduced display to 2 lines for clarity
Future Plan
- Convenience:
• Directional filtering to focus on main speaker
• Speaker identification for multi-user support - Real-Time Feedback:
• Multi-mic array for broader coverage
• Beamforming mics to reduce background noise
• UI feedback: “Listening…”, “Silent”, “Please speak again” - Recognition Accuracy:
• Fine-tuned models for specific scenarios (classroom, meeting) - UI Enhancements:
• Customizable background color and themes
• High-contrast mode for visual accessibility
• Gesture or voice command control (e.g., raise hand = switch language)
• Feedback sounds for events (e.g., beep on listen/start/error)
Conclusion
The developed system successfully transcribes speech into real-time text displayed on HoloLens 2.
Key Strengths:
- Multi-language support with live switching
- Continuous recognition with real-time text display
- Adjustable font size for accessibility
- Auto-clear mechanism after inactivity
Limitations:
- Accuracy drops with unclear or fast speech
- Minor delay due to cloud processing
- Not robust to overlapping speakers
- Long-term use may cause headset discomfort
Responsibilities
1. Ms. Apichaya Sriwong (Student ID: 65340500059)
Responsibilities:
- Design the Speech-to-Text system
• Researched and selected Microsoft Azure Speech SDK - UI development using Unity and MRTK3
• Designed a suitable interface for HoloLens devices
• Developed UI to display real-time text, language switch button, and settings panel - Scripted components for audio input handling, language switching, and font size adjustment
2. Ms. Chananachida Prongjit (Student ID: 65340500066)
Responsibilities:
- Designed the Speech-to-Text system
• Researched Google Speech SDK
• Compared SDKs and summarized the selection process - Collected experimental data and summarized results