The Ultimate XR Device - A Dream? Part 1
Introduction
In my previous articles, I have often emphasised that a central aspect of my XR and Metaverse Strategy and Reference Architecture is the "Software Strategic, Hardware Tactical" approach, this is my guiding principle. This strategy also requires endpoint agnosticism. The rationale is straightforward: while I am an advocate for XR and Metaverse, I do not believe the current market's XR hardware is sufficiently advanced for widespread deployment in large enterprises. Nevertheless, it is crucial to begin utilising today's available XR hardware to establish the essential prerequisites and experience for developing a strategic software layer for various reasons. I discussed this extensively in my previous post XR and (Industrial) Metaverse Now or Later? By choosing Software Partners who share our vision, we ensure that our investment in the Software tier—comprising money, time, and expertise—remains secure. As more advanced XR Hardware emerges, we can more easily adapt our software to leverage the enhanced capabilities of new hardware.
This article is part one in an extended series and this initial part exceeds the length of my previous posts, as I am deeply passionate about the evolution of XR devices. I firmly believe in the potential of XR and the Metaverse, and I am convinced that enhancements in the available devices will pave the way for a truly wearable, spatial computer that will surpass desktops, laptops, and mobile devices in ubiquity. I have a dream, a large, complex dream! Please bear with me.
The evolution of XR Hardware is crucial for its broader deployment by large enterprises. This article will explore the necessary advancements from my viewpoint. Your viewpoint may differ, largely driven by the needs of the type of enterprise that you have experience of or work for; though I think there will be some of my ideas that are applicable to any type of Enterprise or in fact consumer. Im going to start with one that is sure to be on everyones list. One final note before getting into the weeds of this article. While the primary focus of this article is on hardware, it's important to note that the success of hardware is intrinsically linked to software. The necessary software improvements that enable the advantages of hardware will also be addressed in the following sections.
This article will continue to focus on the first two, but most important elements of my dream; Form Factor and User Experience.
Form Factor
The current Form Factors need to improve! They are uncomfortable, they diminishes User Experience and Immersion (to different degrees), they are socially isolating and in some environments raise safety issues. We all want a better Form Factor. What could those form factors be? At a high level, lets think wide (even though it may be somewhat science fiction!)
Key Milestones on the XR Form Factor Roadmap
Today, we find ourselves with chunks of metal, glass, and plastic attached to our faces. However, we are gradually moving towards a sleeker spectacle-like form factor. In my view, the clarity of the first two images in the first row is evident, and the industry is aware of the technical and social shifts required before we can advance further towards achieving a significantly improved form factor.
In a more whimsical vein, we might transition from glasses to augmented reality contact lenses, and perhaps even to brain implants that enhance our vision directly, as predicted by Hollywood. Nonetheless, I sincerely hope we steer clear of a future where we're all connected to the Matrix!
As previously mentioned, we are advancing on the path between by the two images in the first row. Yet, there are many lanes to take us on the highway leading to our XR and Metaverse aspirations.
Creating XR devices that are more like spectacles involves overcoming several technological challenges:
Miniaturisation
…is crucial for XR devices to become as compact and lightweight as regular glasses. This demands substantial progress in reducing the size of components such as processors, displays, microphones, speakers, sensors, and batteries, while still maintaining their performance.
Display Technology
High-resolution displays that integrate into compact form factors are crucial. Presently, XR devices frequently grapple with challenges in display resolution, field of view, and brightness—key elements for a good immersive, comfortable user experience.
Battery Life
Achieving extended battery life in compact devices is challenging. XR spectacles demand efficient power management for extended usage without constant recharging. AR glasses must be light and comfortable, constraining battery size. Thus, components should be optimised for high power efficiency to secure adequate battery life.
Optics and Visual Quality
Achieving clear and undistorted visuals in a compact device is challenging. It necessitates advanced optics to deliver a broad field of view and superior visual quality while preserving a slender profile. Moreover, these optics must accommodate individuals with vision impairments, necessitating specific prescription lenses for proper sight. More on this later in this article.
Heat Dissipation
Components in XR devices produce heat, and typically, the more powerful the components, the greater the heat generated. Managing this heat in compact devices is essential to prevent discomfort and ensure user safety. Therefore, efficient cooling solutions are necessary to maintain a comfortable temperature for the device.
Connectivity and Processing Power
XR devices demand strong connectivity and considerable processing power for complex operations such as real-time rendering and tracking. Achieving a balance between these requirements and the limitations of a compact, portable device presents a significant challenge. It is crucial to manage the appropriate mix of cloud-based and on-device processing to ensure optimal performance, as well as to safeguard privacy and security.
User Comfort and Ergonomics
The device should be designed for comfort during extended use. This requires accommodating a range of head shapes and sizes, balancing the weight distribution, and selecting materials that offer both durability and comfort.
Cost
Developing and manufacturing these advanced technologies at a cost that makes the devices affordable for enterprises and consumers is another significant hurdle.
Tackling these challenges necessitates continuous innovation and cooperation across various disciplines, such as materials science, optics, electronics, and software engineering. Software must be designed with a strong emphasis on privacy and security to withstand the threat of sophisticated cyber attacks. As wearable devices become more prevalent for daily tasks, they generate significant amounts of valuable data, making it imperative to guarantee data security.
User Experience
In one of my previous blog posts I discussed User Experience (UX) and the importance of it, I recommend reading that article before continuing here.
User Interface
My dream XR device must have an amazing user experience and most importantly a User Interface (UI) that employs great Human Interface Devices (HID) that are natural and easy to use.
Who wants a stylus?
Before the iPhone, smartphones used a physical keyboard and a stylus. Steve Jobs announced the iPhone with a Multi Touch screen and your finger as a pointing device. This revolutionised the UI for Smart Phones, now every smart phone on the market today uses this as the primary UI for the device.
Currently, most XR devices are operated with controllers, but some incorporate eye tracking and gestures, which vary in quality and precision. For instance:
The Apple Vision Pro's hand and eye tracking combined with gesture controls represent a significant advancement over the Microsoft Hololens, where accuracy is influenced by environmental conditions and specific use cases.
The Meta Quest 3 lacks eye tracking capabilities but offers basic hand tracking and gesture control.
The Magic Leap 2 includes both features; however, the Apple Vision Pro's implementation is notably more refined and integrated, providing a smoother experience.
Apple's approach to the user interface with the Vision Pro mirrors the transformative impact they achieved with the iPhone, showcasing their expertise in harmonising hardware and software to create an exceptional user experience.
Despite these advancements, there is room for improvement. There are still improvements to be made, the journey between the two milestones on the first row in the image below is not yet complete.
Looking at User Interface (UI), in particular the Human Interface Devices (HIDs) meaning devices/methods a person used to interact with a particular XR device. Physical devices like controllers, hardware keyboards, mice, sensors etc. We should also consider non-physical methods such as Hand/Eye Tracking, Gestures, virtual keyboards, Voice and expression.
As stated above, today we find ourselves using controllers with many XR devices. However, we are rapidly moving towards a UI that uses hand and eye tracking together with Gesture Control and other non-physical input devices.
Key Milestones on the XR User Interface (Human Interface Devices (HID) Roadmap
In my view, the clarity of the path between the first two images on the first row is clear, and the industry is aware of the technical and social shifts required before we can advance further towards achieving a significantly improved form factor.
The path to achieving the two milestones depicted in the second row of the image might seem like science fiction to some. Yet, there is a significant number of intriguing advancements in Brain-Computer Interaction (BCI) and Neuroprosthetics suggesting that the pace of technological progress could be swifter than commonly perceived. Nonetheless, numerous non-technical challenges and practical considerations could impede the widespread adoption of these technological capabilities. These capabilities and issues are elaborated upon further below.
The user experience of XR devices can vary significantly depending on the types of UI that are available, their ease of use, effectiveness and accuracy. Sometimes more than one method may be available in such cases its important to consider how the different UI methods interact together to form a UI Experience.
Physical HDI (Controllers, Physical Keyboards, Physical Mice and other Hardware Devices)
There are positive and negative aspects to the use of Physical HDI devices as mentioned below:
Positive
Precision and Control: they offer precise input, which is particularly useful for tasks requiring fine motor skills, such as selecting small objects or navigating complex menus.
Familiarity: Many users are already familiar with these devices from traditional use of a computer, gaming, etc making them easier to adopt for certain demographics.
Haptic Feedback: These devices often come with haptic feedback, providing tactile sensations that can enhance immersion.
Negative
Holding a these devices can detract from the immersive experience
Something else to ensure is charged and connected to the right things
Difficult to combine with non physical methods because hands are busy using/holding device
Need to learn how to use them in context of the application or task at hand
Use can vary across applications
When writing this article I researched what advances are taking place in the area of controllers and other physical HDI. Indeed there are advances in the areas of controllers to increase the immersive experience, however you still have to interact with physical devices with your hands which detract from the overall immersive experience. Today, controllers do offer more precision and control for tasks that require fine motor skills and do provide some haptic feedback. In my opinion this is the only reason they will still be needed for some use cases.
There are some interesting developments in technologies like SenseGlove’s Nova 2 Haptic Gloves.
Nova 2 Haptic Gloves
The Pro’s of these devices are also Con’s, as described below
Enhanced Realism:
Pro : The gloves provide a level of realism in XR experiences by simulating the feel of shapes, textures, stiffness, impact, and resistance.
Con : Its much better than no feedback, however the feel of the feedback is still somewhat artificial.
Comfort and Usability:
Pro : The Nova gloves are designed to be more comfortable and easier to wear compared to their predecessors. They can be put on in just five seconds.
Con : They are a bit clumsy to put on and somewhat fragile. I do have concerns around durability in industrial environments.
Wireless Design:
Pro : Unlike the earlier models, the Nova gloves are wireless, making them more convenient to use with standalone VR headsets.
Price:
Pro : Priced at around $5000, they are more affordable than many competitors.
Con : $5000 is still a large bill for what is effectively a peripheral! This will be a barrier for most use cases.
Although this is an interesting device, better than many of its competitors, I still class this as an experimental prototype, it needs to move along its roadmap and become much cheaper! This is one to keep an eye on.
Recent advancements in Gesture Control, Hand and Eye Tracking, along with associated technologies such as Voice Control, have been significant. The integration of these methods with AI and sensor data has the potential to elevate these control mechanisms to new heights.
Gestures and Hand/Eye Tracking
Natural Interaction
Enhanced Precision and Responsiveness
Personalised Experiences
Realistic Avatars and Social Interactions
Seamless Integrations
Combining eye tracking, hand tracking, and gesture control with data from sensors and AI significantly enhances the user experience in XR devices.
Natural Interaction
Eye Tracking: AI algorithms analyse eye movements to discern the user's focus, enabling more intuitive interactions. For instance, objects can be selected or emphasised just by gazing at them.
Hand Tracking and Gesture Control: AI interprets hand movements and gestures, allowing users to engage with virtual objects in a manner akin to real-world interaction. In many cases this eliminates the necessity for physical controllers, enhancing the immersive quality of the experience.
Enhanced Precision and Responsiveness
AI Algorithms: Machine learning models, including Convolutional Neural Networks (CNNs) and Long Short-Term Memory (LSTM) networks, enhance tracking accuracy by predicting and rectifying errors in real-time.
Sensor Fusion: Integrating data from various sensors, such as cameras, depth sensors, and accelerometers, provides reliable and accurate tracking, even in challenging environments.
Personalised Experiences
Adaptive Interfaces: AI possesses the ability to customize the XR environment according to user behaviors and preferences, as well as the surrounding context. For instance, it can adjust the complexity of instructions and data, or personalise the layout of virtual elements to accommodate particular needs or circumstances.
Biometric AI: Eye and hand tracking technologies can serve in biometric authentication, enhancing security and personalising (or contextualising) the user experience.
Improved Accessibility
Inclusive Design: These technologies make VR/AR more accessible to people with disabilities. For example, eye tracking can assist users who have limited mobility by allowing them to navigate and interact using only their gaze.
Realistic Avatars and Social Interactions
Facial Expressions and Gestures: Eye and hand tracking enable the creation of avatars that mimic the user’s facial expressions and gestures, making social interactions in virtual environments more lifelike.
Seamless Integration
Ergonomic Design: Advances in hardware, such as lightweight and comfortable headsets, combined with AI-driven tracking, reduce user fatigue and enhance the overall experience.
These technologies, powered by AI, are transforming XR from novel experiences into practical, everyday tools with a wide range of applications across many industries and consumer markets. The improvements we need in XR UX can only be powered by AI.
The introduction of the Apple Vision Pro to the market has highlighted the benefits of these on the user experience, compared to other devices on the market this is truly a quantum leap in User Experience, and as Apple Intelligence develops, and gets to the Vision Pro; I expect this will only get better. Of course, other device manufacturers will also improve in these areas in the coming months and years. For more information on the Vision Pro’s UX please see my previous blog post.
Considerations
Accuracy: While gestures and eye tracking are becoming more accurate, they may still lag behind controllers in terms of precision for certain tasks.
Complexity: Implementing effective gesture and eye-tracking systems can be more complex and may require more advanced hardware and software.
Cost: Devices with advanced gesture and eye-tracking capabilities can be more expensive than those relying on traditional controllers.
Overall, the choice between controllers and gesture/eye tracking depends on the specific use case and user preferences. Controllers might be better for tasks requiring high precision, while gestures and eye tracking offer a more natural and immersive experience.
Non-Invasive Brain Computer Interaction (BCI)
The field of brain-computer interfaces (BCIs) has made significant strides in recent years. BCIs enable direct communication between the brain and external devices, allowing users to control computers, prosthetics, and other equipment using their thoughts.
There are a number of types of Non-Invasive BCI:
Electroencephalography (EEG): Electroencephalography (EEG) is the most prevalent non-invasive brain-computer interface (BCI) technology. It records the brain's electrical activity through electrodes on the scalp. EEG has extensive applications in both research and clinical environments, including neurofeedback, gaming, and the operation of prosthetic devices.
Functional Near-Infrared Spectroscopy (fNIRS): This method gauges brain activity through the observation changes in blood flow. It employs near-infrared light to track oxygen concentrations in the brain, indicative of neural activity. Functional Near-Infrared Spectroscopy (fNIRS) is instrumental in cognitive neuroscience and the creation of Brain-Computer Interfaces (BCIs) that manage external devices.
Magnetoencephalography (MEG): MEG captures the magnetic fields generated by neural activity. It offers high temporal resolution, making it valuable in research for understanding brain functions and developing brain-computer interfaces (BCIs).
Functional Magnetic Resonance Imaging (fMRI): While commonly employed in brain imaging, functional magnetic resonance imaging (fMRI) is also applicable in brain-computer interface (BCI) technologies. It gauges brain activity by tracking fluctuations in blood flow and serves as a research tool for examining brain functions and advancing BCI development.
P300 Spellers: Devices utilizing the P300 wave, an event-related potential in EEG signals, enable users to spell words and sentences. They are especially beneficial for individuals with severe communication impairments.
Motor Imagery BCIs: These systems are designed to detect brain signals linked to imagined movements. They serve to operate prosthetic limbs, wheelchairs, and various assistive devices.
These non-invasive BCIs are making significant strides in various fields, from medical applications to enhancing user experiences in gaming and virtual reality.
Here are some key developments:
Medical Applications: BCIs are being used to help people with disabilities. For example, they can enable individuals with paralysis to control computer cursors or robotic limbs, improving their ability to communicate and interact with their environment.
Consumer and Research Applications: Non-invasive BCIs, such as EEG headsets, are available for consumer use. These devices can assist with tasks like meditation and focus. Research is also exploring the use of BCIs for controlling drones and other machinery.
Future Prospects: The potential for BCIs is vast. Future advancements could lead to more sophisticated control of devices, enhanced cognitive functions, and even telepathic communication. However, there are still many technical and ethical challenges to address.
Here are some examples of non-invasive Brain-Computer Interaction (BCI) devices:
Emotiv Epoc+: This is a high-resolution, multi-channel EEG headset used for research and consumer applications. It can be used for cognitive performance monitoring, brain-computer interface applications, and neurofeedback.
Muse Headband: This EEG headband is designed for meditation and mindfulness. It provides real-time feedback on brain activity, helping users improve their meditation practice.
OpenBCI: This open-source platform provides EEG headsets and software for research and development. It’s popular among researchers and hobbyists for creating custom BCI applications.
NeuroSky MindWave: This affordable EEG headset is used for educational purposes, gaming, and research. It measures brainwave signals and provides insights into mental states.
Brain-Computer Interfaces (BCIs) are an exciting and rapidly evolving field with the potential to revolutionise various facets of our daily lives. However, current iterations are not yet a substitute for Physical Human-Computer Interaction (HCI) devices or Hand/Eye Tracking and gesture control, but it remains an area worth monitoring closely.
Implant Based Brain Computer Interaction (BCI)
The use of brain implants or Neuro-prosthetics to control computer equipment is an exciting and rapidly advancing field. Here are some key points about the current state of this technology:
Clinical Trials and Research: Companies like Neuralink and Synchron are leading the way with brain implants designed to help people with severe disabilities. Neuralink, for example, has developed an implant with over 1,000 electrodes that can record and transmit brain signals to control a computer or other devices.
Practical Applications: Brain implants are being used in clinical trials to help paralyzed individuals control computer cursors, type on virtual keyboards, and even operate robotic limbs. These implants capture neural signals and translate them into commands for external devices.
Technological Advances: Recent advancements have improved the precision and reliability of these implants. For instance, participants in the BrainGate project have been able to achieve significant control over computer interfaces, allowing them to perform tasks like typing and navigating the web.
Although challenges remain, such as ensuring long-term safety and enhancing the integration of these devices with biological tissues, the advancements made are encouraging. Brain implants have potential applications that go beyond medical purposes. Currently, researchers are investigating how these technologies might enhance cognitive abilities, manage smart home devices, and interact with virtual reality settings. Nevertheless, today's research is primarily concentrated on medical applications and is not yet considered a practical Human-Computer Interaction (HCI) approach for Extended Reality (XR) devices. It is certainly another technology to monitor closely!
Visual Interface
There are two main aspects to the visual interface that I would like to address, Display and Optics.
Display
Display specifications are vital in defining the user experience on XR devices. The display is the central component of the visual experience, designed to blur the boundary between the virtual and the Physical. To achieve this, displays are engineered to create visuals that mirror the range and sharpness of human vision. Consider the following key factors:
Resolution
Higher resolution displays deliver crisper and more detailed visuals, improving the authenticity and depth of the XR experience. Conversely, lower resolution may result in pixelation, diminishing the overall experience.
Refresh Rate
A higher refresh rate, measured in hertz (Hz), guarantees smoother motion and less motion blur. Most XR headsets target a minimum of 90Hz to ensure a pleasant experience. Conversely, lower refresh rates may result in stuttering and discomfort, including nausea.
Field of View (FOV)
A broader Field of View (FOV) enables users to observe more of the virtual environment simultaneously, enhancing the immersion. Conversely, a narrower FOV may seem limiting and diminish engagement. The field of view, typically around 120 degrees horizontally for binocular vision, is an element of human sight replicated in XR design. This width in the field of view can facilitate immersive experiences that feel boundless.
Pixel Density (PPD)
A higher pixel density, measured in Pixels Per Degree, diminishes the screen door effect, which is the visibility of gaps between pixels. This improvement in pixel density enhances visual quality and immersion for the user. XR displays aim to replicate human visual acuity, quantified in arcminutes. On a standard eye test chart, such as the Snellen chart, the smallest detail discernible by a person with 20/20 vision is 1 arcminute in width, which is approximately 1/60th of a degree in angular measurement.
Latency
Display latency, also known as "motion-to-photon" latency, is the time delay between a user's physical movement and the corresponding visual update on the XR display. This latency has a significant impact on the user experience in various aspects as described below:
Motion Sickness: High latency can lead to a discrepancy between a user's movements and the corresponding visual feedback, resulting in motion sickness/nausea. This occurs as the brain anticipates instantaneous visual updates in response to head movements, and any lag can lead to disorientation.
Immersion: Low latency is essential for preserving immersion. Any delay between head movements and the corresponding visual updates can disrupt the sense of presence within a virtual environment, rendering the experience less convincing and captivating.
Interaction Precision: Accurate interactions in XR, like grasping objects or moving through environments, depend on low latency. Elevated latency can obstruct these activities, leading to a frustrating and less intuitive user experience.
Comfort and Usability: High latency may result in eye strain and fatigue because the eyes and brain exert more effort to compensate for the delay. Consequently, prolonged use of XR headsets can become uncomfortable.
Performance Perception: High latency is frequently perceived by users as suboptimal performance, potentially diminishing their overall satisfaction with the XR device.
To achieve a truly immersive XR experience, the latency should ideally be less than 20 milliseconds. Minimising latency requires optimisation of both hardware and software, which includes the use of high-performance GPUs, the implementation of efficient coding practices, and the maintenance of stable network connections.
Brightness and Contrast
Optimal brightness and contrast levels are crucial for producing vibrant and lifelike visuals, which are fundamental to an authentic XR experience. Key aspects are mentioned below:
Immersion: High brightness and strong contrast levels are crucial for producing vivid and lifelike images, which are vital for an authentic XR experience that enhances the feeling of presence.
Eye Comfort: Proper brightness and contrast settings can help minimize eye strain and fatigue. Displays with insufficient brightness or inadequate contrast may lead to users squinting and straining their eyes, resulting in discomfort over prolonged periods.
Detail Visibility: Good contrast is essential for discerning fine details within complex scenes, which is vital for tasks demanding precision.
Reduced Motion Sickness: Appropriate brightness and contrast settings may contribute to reducing motion sickness. Displays that are dimly lit or have low contrast can challenge the brain's ability to process visual information, potentially causing disorientation and nausea.
Each of these factors contributes to the overall quality and comfort of the XR experience, however there are many more such as colour, uniformity, light leakage, mura and image sticking. For a more in depth discussion on these aspects and more please refer to Understanding Novel Methods for Extended Reality (XR) Optical Testing. This article also addresses aspects of Optics that I discuss in the section below.
Optics
The quality of optics in XR devices is fundamental to the user experience. Here are some key ways it impacts the experience:
Image Clarity: High-quality optics are crucial for producing images that are crisp and distinct. Conversely, substandard optics can result in blurry and distorted images, disrupting the immersive experience and making it challenging to concentrate on the finer details.
Field of View (FOV): Superior optics can offer an extended field of view, enhancing the expansiveness and immersion of the virtual environment. Conversely, a limited field of view may cause a sensation of confinement and diminish the feeling of presence.
Distortion and Aberrations: High-quality lenses are designed to minimise optical distortions and aberrations, including chromatic aberration (colour fringing) and geometric distortion, which can be distracting and detract from the overall visual quality.
Comfort: Well-designed optics can alleviate eye strain and discomfort. Conversely, substandard optics may lead to eye fatigue, headaches, and motion sickness, particularly during prolonged use.
Focus and Alignment: Advanced optics enhance the focus and alignment of images for each user's eyes, accommodating varying inter-pupillary distances (IPD). Such customisation is essential for ensuring a comfortable and crisp viewing experience.
Light Management: High-quality optics typically feature coatings that reduce reflections and glare, thereby improving image clarity and contrast. This is especially crucial in brightly lit settings or in high-contrast scenarios.
Overall, the quality of optics is essential for creating a comfortable, immersive, and natural user experience with XR devices.
I wear glasses, and with the Apple Vision Pro I ordered lenses so that I could get the best experience with the device. However this creates challenges when looking at wide scale deployments of such devices, and creates challenges when we look at shared devices. Wouldn’t it be great if the device were able to measure my prescription and adapt accordingly?
Recent advancements in XR optic technologies have introduced the capability to automatically adjust to a user's prescription. Notable developments include:
Adaptive Lenses: Several companies are in the process of creating adaptive lenses capable of altering their optical properties to match the user's prescription. These innovative lenses can dynamically adjust their focus, ensuring a crisp image without requiring additional prescription inserts.
Mobile App Integration: Certain smart glasses can be synchronized with mobile applications, allowing users to enter their prescription information. Subsequently, the glasses modify their optics to align with the user's specific visual requirements.
These advancements in XR are enhancing accessibility and comfort for users needing vision correction, thereby improving the overall user experience. They also aim to eliminate many of the challenges and complexities linked to device sharing and enterprise adoption.
Audio Interface
As mentioned in my previous article My Apple Vision Pro Journey, Part 3 : Realism and Immersion Audio is an important element of Realism and Immersion associated with XR Devices, there I focussed on Audio Output. In my article XR and Metaverse are enablers for a new era of Collaboration. I touched on Audio Input when I discussed a Spatial Collaboration Roadmap based on the four pillars of collaboration. A reminder of that is shown in the picture below:
Audio Features for my Ultimate XR Device
Spatial Audio
Gen AI Audio
Translate
Transcribe
Optimised Audio
Haptic Audio
Spatial Audio
The Spatial Audio capabilities of the Apple Vision Pro are truly impressive, setting a standard I hope to see in other headsets. For general audio output, it's exactly what I've been dreaming of. However, as an audiophile, it doesn't quite match the quality of my home hi-fi setup or my high-end headphones—so in terms of music, there's still room for the dream to grow.
Gen AI Audio
Adaptive Soundscapes: AI algorithms dynamically adapt the audio environment in response to user interactions and the virtual context, thus improving the immersive quality and responsiveness of the experience.
Personalised Audio Profiles: AI has the capability to analyze individual hearing preferences and conditions to customize audio output, guaranteeing the best sound quality for each listener. This technology is currently available with Nura headphones, which have been acquired by Denon. It has the potential to be developed further to improve audio for individuals with hearing impairments, functioning as an integrated hearing aid, akin to the features Apple is introducing with the AirPods Pro 2.
Real-time Noise Cancellation: Advanced AI-powered noise cancellation techniques can remove unwanted background noise, enhancing clarity and the immersive experience.
Spatial Audio Enhancement: AI improves the accuracy of spatial audio, making it easier to pinpoint the direction and distance of sounds in a virtual environment. Additionally, it can identify sounds that indicate specific events, alerting industrial users. For example, certain sounds from industrial equipment may signal the need for an inspection to ascertain if the machinery is operating normally or if there's a problem that needs addressing.
Voice Recognition and Commands: AI-powered voice recognition enables seamless interactions in the XR environment through voice commands, enhancing both usability and accessibility. Although this technology is available today, it needs further development to more effectively understand technical terms, adjust to noisy environments, and recognise speech from speakers of non-native languages.
Environmental Sound Simulation: AI has the capability to generate realistic environmental audio, such as echoes and reverberations, that can adapt to alterations in virtual environments and user interactions.
Interactive Audio Elements: AI enables dynamic audio responses to user interactions and environmental shifts in virtual environments, thereby increasing interactivity and engagement levels.
From the imagination of Douglas Adams, the Babel Fish!
Translate
Ever since reading Douglas Adams' "The Hitchhiker's Guide to the Galaxy," I've longed for the existence of the Babel Fish. In the novel, this fictional creature is a small, yellow fish that, once placed in one's ear, allows the person to understand and speak any language instantly by translating it directly into their brain. This fanciful concept is humorously used throughout the series to explore themes of communication and the peculiarities of the universe. Imagine if an XR headset could achieve this! Today's Smart Home assistants are nearing this capability, though they still lack accuracy and a grasp of linguistic subtleties, and can be somewhat awkward to use. However, I believe it won't be long before this aspect of my dream becomes reality.
Optimised Audio
Optimised audio refers to audio that has been enhanced or improved through various techniques to achieve better sound quality. This can include noise reduction, equalisation, compression, and other audio processing methods.
The goal is to make the audio clearer, more balanced, and pleasant to listen to with a wide sound stage. It’s often used in music production, podcasting, and broadcasting.
Common techniques include dynamic range compression, equalisation, reverb, and noise gating.
Transcribe
The desired improvements are in the accuracy of voice commands, akin to the enhancements mentioned in the Gen AI Audio section of this list.
Haptic Audio
Haptic audio involves the combination of haptic feedback and audio experiences, enabling users to experience sound not just audibly but also tactilely. This technology enriches the auditory experience by introducing a tactile aspect, thereby making it more immersive and captivating.
Key aspects of Haptic Audio include:
Haptic Feedback: This method utilizes vibrations or different tactile sensations to create an experience akin to hearing sound. For instance, the bass of a song may pulse through your body, or the strumming of a guitar might translate into gentle vibrations.
Applications: Haptic audio technology finds its application across multiple domains such as music, gaming, virtual reality (VR), and assistive devices. It serves to deepen the immersion in music, augment the authenticity of extended reality (XR) experiences, and offer tactile feedback for individuals with hearing disabilities.
Technology: The system utilizes advanced actuators and sensory technology to produce lifelike tactile sensations. It can replicate an extensive array of textures, vibrations, and movements.
Fundamentally, haptic audio serves as a conduit between the virtual and physical realms, enhancing the interaction with audio content to be more instinctive and emotionally resonant.
Summary and Conclusions (so far)
Form Factor
In the short to medium term, my ideal Form Factor would encompass all the features of an Apple Vision Pro in a Spectacles or "Goggles" format, the latter for a more immersive VR experience. This would necessitate advancements in the critical areas outlined in the Form Factor section. Moreover, I envision a price point of $500. Although the initial costs may be higher once the technology becomes feasible, I anticipate that over time, the price will decrease. I am confident that this vision is attainable within the next two to three years.
The concept of contact lens form factors is quite advanced, and the idea of brain implants for enabling XR experiences through our own eyes truly belongs to the realm of science fiction, presenting significant social and medical issues. While it's all hypothetical, for me, it's closer to a nightmare—I could never insert contact lenses!
User Experience
In the User Experience section of this article I focused on three areas:
User Interface
Visual Interface (Display and Optics)
Audio Interface
User Interface
In the short to near term, my dream is to focus on improvements in Eye/Hand Tracking and Gesture control. The Apple Vision Pro is almost there, and it is expected to improve shortly with software updates. Hopefully, Apple will release a new, more affordable edition of the device next year without significantly compromising the user experience.
During my research for this article, I discovered that the field of Brain-Computer Interaction (BCI) is far more advanced than I anticipated, with intriguing developments that could enhance the lives of individuals with disabilities in the near future. While I believe that widespread adoption of BCI as an alternative to conventional User Interface methods may be a medium to long-term prospect, I am hopeful that progress in Non-Invasive BCI might soon complement existing Hand/Eye/Gesture controls, augmented by Gen AI, in the medium term. Indeed, it's a dream of mine!
Visual Interface (Display and Optics)
The conclusion is straightforward regarding display and optics: they should be better, faster, and more affordable. The achievements of the Apple Vision Pro and various Varjo headsets are remarkable, with their displays being particularly impressive. However, the high quality of these displays does affect the overall cost, with the Apple Vision Pro's display being the most expensive component of the device. Additionally, I would like to see substantial improvements in the Field of View to enhance immersion and the overall user experience.
Meta Headsets: Meta Quest 3: Approximately 110° horizontal.
Apple Vision Pro: The Apple Vision Pro offers an estimated 110° horizontal and 96° vertical FoV.
Varjo Headsets: Varjo XR-4: Provides a wider FoV of 120° horizontal and 105° vertical.
Current headsets do not extend into peripheral vision; hence, for true immersion, the field of view (FoV) must improve significantly without compromising affordability, processing power, and battery life. Achieving a 220-degree horizontal FoV that encompasses monocular peripheral vision is challenging, yet it remains my ultimate dream for immersive experiences.
In the realm of optics, I would appreciate having adaptive optics that can accommodate my prescription and evolving eye condition, thus improving my experience and enhancing the accessibility and shareability of XR devices. Additionally, optics that support a field of view of 220 degrees and eliminate the need for Foveated Rendering would be ideal.
Audio Interface
Regarding Audio Output, Apple has nearly perfected spatial audio for general use, yet audiophiles would welcome enhancements in music quality. As for Audio Input, advancements in AI-generated audio, translation, and voice recognition are necessary. I anticipate these improvements will materialise in the short to mid-term, as the requisite technologies already exist across various hardware and software platforms, awaiting integration into a seamless user experience.
So, Dear Santa Claus, please consider this your letter for a gift this Christmas ;)
In upcoming articles, I will explore additional features of my Ultimate XR Device.