Digital twins or real scenarios are increasingly gaining interest for immersive XR experience. However, most pipelines reconstruct geometry and appearance (and sometimes acoustics) while leaving touch behind. Moreover, haptic feedback is usually handcrafted for each specific application and device, limiting transfer across scenes and interaction conditions. In this thesis, we investigate touch-aware digital twins, where a user can query a reconstructed 3D environment at an arbitrary location and interaction (e.g., direction, speed, and force) and receive a plausible vibrotactile stimulus consistent with the touched surface. A key challenge is the gap between representations used to model haptic properties. On the one hand, haptic maps, produced by vision-based tactile sensors, are convenient for spatial binding and learning, on the other hand, vibrotactile signals are the representation closer to human perception. First, we study whether haptic maps encode haptic information that is meaningful for human perception. We introduce a multimodal learning framework that embeds RGB cues and haptic maps into a shared latent space, and we analyze its organization through classification and perception-inspired evaluation. Results show that supervised multimodal features structure material relationships along perceptually relevant attributes (e.g., roughness and stiffness). Second, we make tactile cues a scene-level attribute that can be queried consistently in reconstructed environments. We propose SplatTouch, an explicit multimodal 3D representation that binds sparse tactile observations to a scene via 3D Gaussian Splatting and a viewpoint-consistent coordinate parameterization, enabling touch localization and dense tactile querying. Building on this representation, we introduce Haptic Neural Fields, a geometry- and action-conditioned model that translates local scene cues and interaction parameters into time-varying vibrotactile signals, enabling interactive synthesis of material-specific feedback. Finally, we ground these methodological contributions in a controlled XR manipulation framework. We analyze the combination of wearable haptic devices with performance-driven Dynamic Difficulty Adjustment (DDA). The study suggests that haptic feedback and dynamic adaptation can jointly improve manipulation accuracy and perceived presence/control, but that comfort and fatigue trade-offs are critical for real-world deployment.
Binding Vision and Touch: A Step Towards Touch-Aware Digital Twin / Stefani, Antonio Luigi. - (2026 Apr 29).
Binding Vision and Touch: A Step Towards Touch-Aware Digital Twin
Stefani, Antonio Luigi
2026-04-29
Abstract
Digital twins or real scenarios are increasingly gaining interest for immersive XR experience. However, most pipelines reconstruct geometry and appearance (and sometimes acoustics) while leaving touch behind. Moreover, haptic feedback is usually handcrafted for each specific application and device, limiting transfer across scenes and interaction conditions. In this thesis, we investigate touch-aware digital twins, where a user can query a reconstructed 3D environment at an arbitrary location and interaction (e.g., direction, speed, and force) and receive a plausible vibrotactile stimulus consistent with the touched surface. A key challenge is the gap between representations used to model haptic properties. On the one hand, haptic maps, produced by vision-based tactile sensors, are convenient for spatial binding and learning, on the other hand, vibrotactile signals are the representation closer to human perception. First, we study whether haptic maps encode haptic information that is meaningful for human perception. We introduce a multimodal learning framework that embeds RGB cues and haptic maps into a shared latent space, and we analyze its organization through classification and perception-inspired evaluation. Results show that supervised multimodal features structure material relationships along perceptually relevant attributes (e.g., roughness and stiffness). Second, we make tactile cues a scene-level attribute that can be queried consistently in reconstructed environments. We propose SplatTouch, an explicit multimodal 3D representation that binds sparse tactile observations to a scene via 3D Gaussian Splatting and a viewpoint-consistent coordinate parameterization, enabling touch localization and dense tactile querying. Building on this representation, we introduce Haptic Neural Fields, a geometry- and action-conditioned model that translates local scene cues and interaction parameters into time-varying vibrotactile signals, enabling interactive synthesis of material-specific feedback. Finally, we ground these methodological contributions in a controlled XR manipulation framework. We analyze the combination of wearable haptic devices with performance-driven Dynamic Difficulty Adjustment (DDA). The study suggests that haptic feedback and dynamic adaptation can jointly improve manipulation accuracy and perceived presence/control, but that comfort and fatigue trade-offs are critical for real-world deployment.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione



