Manhole VR

Jayesh Pillai | Abhishek Verma | Banda Shiva Teja | Ananda Bathena

View from manhole

Synopsis: “Compelled by the exigencies of poor economic life and caste identity, Amitabh, a young law graduate, becomes a manual scavenger. One day, to earn some extra money to support his family, he decides to get inside a large unsafe sewer to clear a blockage. He neither has protective gear, nor the accompanying engineer to check for poisonous gases. Will Amitabh come out safely?”


Official Website : manholecollective.com
VR Narrative - Trailer : Into The Manhole
VR Narrative - IMDb Page : Manhole 2021 IMDb

This project is part of a larger narrative by the manhole collective. The collective attempts to create experiences that communicate the difficulties manual scavengers face in India and spread awareness about this topic.

Introduction

### Project Overview

Into the Manhole is a virtual reality (VR) six degrees of freedom (6DoF) narrative that immerses the viewer in the unsettling world of manual scavenging in India. The story centers on Amitabh, a young law graduate who, driven by economic hardship and systemic caste-based oppression, takes up the work of a manual scavenger. In an attempt to support his family, he decides to enter a large, unsafe sewer without protective equipment or any safety checks. The experience raises a critical question: Will Amitabh come out safely?

Manual scavenging the practice of manually cleaning human waste from septic tanks, sewers, and manholes remains a grim reality for thousands in India. This film attempts to shed light on the brutal, often invisible, lives of those trapped in this caste-driven occupation.

Although the viewer is not a character in the narrative, the VR experience is designed to make one feel like a close spectator present within the scene, moving freely through the environment, and sensing the atmosphere as events unfold. The experience is entirely unguided and, yet deeply immersive. The user can walk, sit, crouch, and look around within the space, just as they would in the real world. This unrestrained spatial freedom enhances presence and embodiment, even though the user remains a silent observer.

By blending cinematic storytelling with the spatial affordances of VR, Into the Manhole seeks to raise awareness among the general public and influence policy conversations around sanitation labor, caste inequality, and human dignity. The project uses immersion not to entertain, but to provoke empathy, reflection, and discourse around one of India’s most dehumanizing social issues.

#### Project Details (Duration, Platform, Core Team)

Manhole Collective is a multidisciplinary team focused on creating immersive storytelling experiences that highlight the harsh realities faced by manual scavengers in India. Through film and virtual media, the collective aims to initiate critical conversations around caste, labor, and human dignity.

The Manhole project is envisioned in three progressive phases:
An Animated Short Film
A VR Narrative (Into the Manhole)
An Animated Feature Film

Into the Manhole forms Phase 2 of this initiative, the virtual reality component. Production began in February 2024, with the Hindi version completed in November 2024, followed by the English version in March 2025, crafted specifically for international audiences.

Though the experience was designed for the Meta Quest 3, it currently runs as a PC VR experience, with active plans for porting it to a standalone Meta Quest 3 build and releasing it via the Meta Store.

Core Team

Abhishek Verma

Writer & Director

Jayesh Pillai

VR Director

Banda Shiva Teja

Technical Director

Ananda Bathena

Sound & Background Score

The core team combines strengths across narrative, immersive direction, real-time technology, and sound design to deliver a deeply moving and technically advanced VR Narrative.

Tools and Technologies used

Into the Manhole was developed primarily using Unreal Engine 5.2, which formed the backbone of our entire production pipeline. Unreal’s powerful rendering capabilities, real-time performance, and native support for VR made it the ideal platform for crafting a highly immersive 6DoF experience. To support our development process, we integrated several essential plugins such as Meta XR and OpenXR for cross-platform compatibility, Oculus VR for Quest-specific deployment, and Metahuman for generating realistic digital characters. Assets and tools from the Unreal Marketplace (now part of Fab) also played a significant role in accelerating our prototyping and environment building.

For 3D asset creation, we relied on a combination of Blender and Autodesk Maya for modeling and UV unwrapping, while Substance Painter was used extensively for texturing and material baking. This allowed us to maintain both visual richness and performance optimization across platforms.

Sound design and audio production were handled using Logic Pro, which gave us the flexibility to design rich soundscapes and process recordings with professional-grade tools. Spatial audio, a core component of immersion in this project, was implemented and programmed directly within Unreal Engine, taking full advantage of its 3D sound engine and listener-based audio positioning.

From a pipeline and coordination standpoint, we used Figma for collaborative visual planning, storyboarding, and layout design. Tools like Microsoft Excel helped us track production schedules and asset lists, while Microsoft Word was used throughout for script writing and documentation. This combination of creative and organizational tools ensured a smooth workflow across the various stages of production from early concept to final packaging.

Pre-Production

Story Development

The story of Into the Manhole was conceived by Abhishek Verma, who has long been engaged with themes of caste, labor, and social injustice through animation and storytelling. The narrative is inspired by real-world incidents involving manual scavengers and sanitation workers who are often forced to clean manholes and sewers without protective gear, leading to injury or death. These stories are drawn from news reports, RTI filings, and ground-level research.

Rather than fictionalizing the subject for drama, the story attempts to bear witness, placing the viewer in a virtual environment that resembles the lived spaces of sanitation workers. This isn’t a hero’s journey; it’s an invitation to observe, to feel the discomfort, and to reflect. The story intentionally keeps the viewer as an invisible observer, not part of the story but unavoidably present inside it. This allows emotional proximity without altering the power dynamics or authenticity of the narrative.

Screenplay

The screenplay for Into the Manhole was written with the spatial and temporal possibilities of VR in mind. While the narrative unfolds linearly, it does so in a spatially continuous environment, allowing users to explore, move, and observe from different vantage points. This required rethinking traditional screenwriting, focusing not just on dialogue and action, but also on space, timing, and user agency.

To maintain immersion, visual storytelling took precedence over exposition. Actions, ambient cues, and blocking were carefully structured to guide attention without forcing it. The screenplay functioned more like a sequence blueprint, marked by environmental beats and physical staging, rather than fixed camera angles or cuts. Tools like Word and Figma were used to build and iterate on the scene logic and layout in tandem with the writing process.

Production Pipeline

The production pipeline was designed with modularity and iteration in mind. Since multiple workflows including motion capture, face capture, modeling, texturing, audio, and level design had to come together inside Unreal Engine, the team followed a parallel and integrated production model. Each department worked independently, but with clearly defined points of integration.

MoCap and FaceCap planning

Mocap and Facecap


Motion Capture

Motion capture played a critical role in Into the Manhole, allowing us to capture lifelike performances and translate them into the VR space with realism and emotional weight. The story’s sensitivity required subtle, grounded body language from characters, especially since the viewer shares the same spatial environment in VR. We aimed to avoid artificiality, and for that, a robust motion capture pipeline was essential.

Tools used

For full-body tracking, we used the Xsens Motion Capture suit, known for its high precision and reliability in markerless mocap. In addition to body movement, we captured detailed hand gestures using Xsens Gloves by Manus. Both systems were synced and monitored live via Xsens MVN Animate, which allowed us to view real-time performance data and ensure accurate tracking throughout the sessions.

Calibration and suit workflow

Each motion capture session began with a careful calibration process. The actor would assume an A-pose stance, after which they would be asked to walk forward, backward, and then in a circle to recalibrate orientation and motion stability. It was important to configure the actor’s body dimensions precisely in the MVN software, as well as ensure we had an unobstructed calibration space to minimize tracking drift and artifacts.

Capture sessions

We conducted multiple capture sessions over several days, often working extended hours with one of our sessions lasting nearly 21 hours straight. Most sessions were recorded indoors, with the exception of a particularly complex shoot involving an actor climbing from the first floor to the ground floor using a ladder, which was crucial for a story beat. That session posed logistical challenges as the ladder had to be held vertically, with limited room for support or rehearsal.

In total, two actors performed across the film. We recorded their movements individually, even for scenes involving multiple characters, including both protagonists and several background roles. The choice to capture separately allowed us flexibility in scheduling and gave us more control during post-processing.

Data cleanup and retargeting

The captured data from Xsens MVN was exported in FBX format for further processing. We imported the FBX files directly into Unreal Engine and performed retargeting onto Metahuman skeletons. Since Metahumans use a more complex bone structure than Xsens rigs, this required careful mapping to ensure accurate motion transfer. All motion cleanup including trimming, adjusting foot contacts, and smoothing transitions was done inside Unreal Engine’s Sequencer, without relying on external cleanup tools.

Integration in unreal engine

We didn’t rely on any specific third-party plugins for importing mocap data into Unreal Engine. However, integration posed several challenges. The bone structure mismatch between Xsens rigs and Metahumans required manual attention during retargeting. Moreover, since our motion capture was not done in a real-time virtual production setup, aligning the actions with props in the final scene required trial and error. This post-synchronization effort ensuring that actors’ movements lined up perfectly with virtual ladders, walls, and spatial cues became one of the more time-consuming but necessary steps in maintaining immersion.

Face Capture

Facial capture was crucial in Into the Manhole to preserve the emotional weight of the characters’ performances. Since VR puts the viewer close to the characters’ faces and actions, we needed expressions that felt authentic, well-synced with speech, and subtle enough to support immersion without becoming uncanny or exaggerated.

Tools used

We used the Live Link Face app by Unreal Engine on an iPhone 16 Pro to capture facial expressions. This data was designed to work seamlessly with Metahumans, ensuring high-quality face rigging and expression mapping. The iPhone’s TrueDepth camera provided us with detailed face tracking data that could be streamed or recorded for later use.

Setup and sync strategy

To minimize syncing issues later, we planned and executed facial capture and dialogue (audio) recording sessions simultaneously. This allowed us to maintain natural lip-sync and expressive alignment with the dialogue being delivered by the actors. Once the facial data was captured, we integrated it into Unreal Engine’s Sequencer, where the body motion and audio were already laid out. Sequencer acted as our central timeline and editing tool, allowing for precise coordination of all three performance layers.

Data Clean up and processing

We used Unreal Engine’s Metahuman Animator to process and refine the captured facial data. While most of the data was accurate, a few minor issues did crop up particularly with mouth and tongue animation, usually in longer or more intense shots. These errors were traced back to take duration and fatigue during capture. For those, we opted to reshoot the shots rather than attempt detailed manual cleanup. In general, the captured data required minimal intervention.

Integration in unreal engine

The raw facial animation data was imported directly into Unreal Engine and applied onto Metahuman characters. Since we were working entirely within the UE5 ecosystem, the transition from capture to final output was relatively smooth. The facial animation, once synced with body motion and voice, brought the characters to life in a grounded and expressive way. Despite some initial challenges in getting all three performance layers to sit perfectly in sync, we found Sequencer a powerful tool for fine-tuning everything in one place.

Spatial sound

SubTopic 1

SubTopic 2

SubTopic 3

Integration in unreal engine

Asset Development

Realism was at the heart of Into the Manhole, and to achieve this, we placed strong emphasis on custom-built assets that would recreate the texture and chaos of a street in Delhi with high fidelity. The asset development pipeline involved end-to-end control over modeling, UV unwrapping, texturing, and shader development, using a combination of industry-standard tools.

Tools used

We primarily used Blender and Autodesk Maya for modeling and UV unwrapping tasks. For texturing, Substance Painter was our mainstay tool, complemented by Photoshop for hand-crafted texture maps, adjustments, and overlays. Baking including normal maps and ambient occlusion was also carried out inside Substance Painter, enabling an efficient and unified texture creation workflow. All these assets were then brought into Unreal Engine 5.2, where further optimization and shader development took place.

3D Modeling pipeline

Given the highly specific setting of a back-alley street and manhole system inspired by Delhi most of the 3D models were created from scratch. This helped us closely match the required realism and scale, which wouldn’t have been possible with ready-made assets alone. However, to save time on non-hero props and background clutter, we did make selective use of assets from Sketchfab, CGTrader, Unreal Marketplace, and Quixel Megascans. Every model, whether built or sourced, was carefully reviewed and adjusted to maintain visual consistency and contextual accuracy.

UW Unwrapping

UV unwrapping was handled manually for nearly all assets to ensure maximum control and quality. Since we were aiming for realistic lighting and texture behavior, we needed clean, non-overlapping UV maps with minimal stretching. Several challenges arose during the pipeline especially in complex props with irregular geometry but these were addressed iteratively across modeling and texture review stages.

Texturing & Baking

We exported high-quality 4K textures from Substance Painter to preserve surface detail and realism. These included base color, roughness, normal, metallic, and ambient occlusion maps. After exporting, we carried out compression and optimization inside Unreal Engine, adjusting texture resolution based on platform needs and importance of the asset in the scene.

Materials and Shaders in UE5

For Into the Manhole, we relied primarily on custom materials built inside the Unreal Engine. Given the diversity of environmental elements from wet cement and rusted metal to skin shaders and fabric we developed a library of specialized materials. Advanced features such as opacity masks, emissive materials, and translucent shaders were employed wherever necessary. To optimize performance and simplify parameter adjustments, we made extensive use of material instances, especially for props that appeared frequently with minor variations.

Optimization

When we transitioned Into the Manhole from an animated short film pipeline to a real-time VR experience, optimization became one of the most critical tasks. The original environment and assets were not designed with performance constraints in mind; they were built for cinematic quality, not hardware efficiency. Adapting that level of detail to run smoothly in VR, particularly on standalone devices like the Meta Quest 3, required deep iteration and creative compromises.

Polycount

Since the project initially targeted a non-real-time animation workflow, we did not follow any fixed polycount budget during asset creation. Once we moved into VR development, we had to optimize aggressively. The biggest changes were implemented via Level of Detail (LOD) management. Protagonist characters were retained at LOD 0 or 1, maintaining their detail, while pedestrian NPCs were brought down to LOD 3–5. For props and background elements, we performed mesh decimation using Unreal Engine’s native modeling tools, which helped in reducing vertex count without sacrificing visible quality.

Texture compression

We originally imported 4K textures for all assets to ensure high-quality detail. However, to match the performance demands of VR, we employed a dual texture compression approach using DXT1 and ASTC formats. Textures for background elements and low-priority assets were downscaled to 512x512, while textures for main characters and intractable or close-proximity elements were reduced to 1K or 2K. We also implemented dynamic resolution techniques where feasible to maintain performance consistency without visual degradation.

Lightmap optimization

While we initially planned to use baked lighting for performance reasons, certain constraints in our development timeline led us to adopt a hybrid approach: a mix of static and stationary lighting. We closely monitored lightmap density on a per-asset basis and manually adjusted UV channels to avoid common pitfalls like overlapping or seams. Although not fully baked yet, the lighting pipeline is prepped for further optimization in future iterations.

Audio Optimization

Sounds, Especially in a spatial VR experience, plays a crucial role but it also adds to performance load. We initially used 16-bit and 32-bit WAV files, but quickly realized that wasn’t sustainable for Quest-level deployment. We therefore optimized audio by compressing non-essential sound elements to 8-bit, while maintaining 16-bit quality for key audio elements like dialogue tracks from the protagonist and other narrative-critical moments. This tiered approach helped us balance quality and performance while ensuring emotional moments retained their impact.\

Platform-specific compromisations (Quest)

The heaviest bottlenecks came from using MetaHumans and Megascans assets, both of which were too resource-intensive for VR. As a result, we undertook extensive asset reductions optimizing every model for real-time rendering. Several features typically used in high-end rendering such as Lumen, bloom, and complex post-process effects were selectively disabled or limited. While this affected visual fidelity, these compromises were essential for maintaining frame stability on the Quest’s mobile GPU. The current PC VR build version does reflect some visual compromises, a tradeoff we’ve accepted in favor of smoother performance.

Level Design & Interaction

Scene layout and navigation

The environment of Into the Manhole was designed as a single, large, continuous level that holds the main narrative space. Apart from this, a secondary level was created specifically for the home screen, a minimal environment used for launching the experience. Since the world size was relatively compact and self-contained, we did not opt for world partitioning in Unreal Engine. Instead, the entire narrative takes place within one carefully crafted space.

Despite the absence of direct interaction or teleportation features, we put considerable effort into guiding the viewer’s attention organically. Carefully placed audio and visual cues act as soft nudges to direct the user’s gaze and maintain narrative pacing. This was particularly important given that the experience is unguided and non-interactive; the user is free to move physically, but their understanding of the story is shaped by how their attention is subtly drawn to key moments. There are no physical barriers or explicit boundaries, and users are allowed to roam the space naturally, enhancing the sense of immersion without overwhelming them with freedom.

Gaze based interactions

Gaze-based interaction was used in Into the Manhole as a minimal yet effective way to transition the viewer from the home screen into the main narrative. The interaction begins when the user looks at the project’s animated logo, which triggers both a visual animation and accompanying sound to establish presence and set the tone. Following this, a “Start” button appears. This button is designed to activate on sustained gaze when the user looks at it for a short duration, the button plays a hover animation and initiates the loading of the main level, seamlessly beginning the VR narrative.

This form of interaction was intentionally kept subtle and symbolic. The aim was to avoid gamifying the experience while still offering a gentle nudge into the story. It respects the contemplative and observational nature of the film, ensuring that the user doesn’t feel like a player or protagonist but rather a silent witness to the unfolding story.

Lighting

The lighting setup in Into the Manhole is a mix of static and stationary lights, carefully balanced to meet the performance requirements of VR while still maintaining visual authenticity. While baked lighting was initially considered, we eventually decided against it due to technical constraints and time limitations. Instead, we manually adjusted the lightmap density for each asset in the environment to ensure that the shadows and highlights looked consistent across the scene.

The lighting wasn’t used as a storytelling or mood-setting device in the traditional cinematic sense; our focus was more on realism and clarity. However, we did encounter a few technical issues, especially with shadows behaving inconsistently in certain parts of the environment. These were resolved through trial and error and by fine-tuning individual light components. While the lighting remains functional and atmospheric, it deliberately avoids drawing attention to itself, allowing the narrative and performances to stay in focus.

Post process volumes

In Into the Manhole, post-processing was implemented with restraint, keeping in mind the performance limitations of standalone VR devices like the Meta Quest. A global post-process volume was used across the entire level to maintain consistency in tone and rendering quality. In addition to this, a few localized post-process volumes were placed in specific areas where we needed subtle control over visual elements for instance, slightly adjusting exposure or contrast based on spatial context. These were strictly bounded volumes, affecting only the immediate space around them.

We intentionally avoided heavy visual treatments like LUTs (Look-Up Tables), motion blur, or real-time global illumination effects like Lumen, which could negatively impact performance on lower-end devices. The post-process choices we made aimed to strike a balance between realism and efficiency, enhancing the visual experience without drawing focus away from the story or compromising frame rates.

Packaging

Build Pipeline

In Into the Manhole, post-processing was implemented with restraint, keeping in mind the performance limitations of standalone VR devices like the Meta Quest. A global post-process volume was used across the entire level to maintain consistency in tone and rendering quality. In addition to this, a few localized post-process volumes were placed in specific areas where we needed subtle control over visual elements for instance, slightly adjusting exposure or contrast based on spatial context. These were strictly bounded volumes, affecting only the immediate space around them.

We intentionally avoided heavy visual treatments like LUTs (Look-Up Tables), motion blur, or real-time global illumination effects like Lumen, which could negatively impact performance on lower-end devices. The post-process choices we made aimed to strike a balance between realism and efficiency, enhancing the visual experience without drawing focus away from the story or compromising frame rates.

Target Platform

From the beginning, our primary target platform for Into the Manhole has been Meta Quest 3. All development iterations, testing, and exhibitions have been conducted with this standalone VR device in mind to ensure optimal performance and user experience. However, the project can also be experienced on any XR device that supports PC VR.

Packaging constraints

Packaging the project presented significant challenges, especially with file size and performance. Initially, the executable file was around 15 GB, which was too large to run smoothly on most high-end systems and often crashed during execution. This made it impossible to submit the project to film festivals until substantial optimization was completed. Through extensive effort, we reduced the file size to approximately 2.5 GB while improving the frame rate from an average of 18–20 fps to around 60–70 fps. Our ongoing goal is to further reduce the file size to under 2 GB and achieve a stable 90+ fps, though this will require additional time and optimization.

Distribution

Currently, Into the Manhole has not been released through any public channels. Our plans include distributing the project via the Meta Store and Steam VR to ensure broader accessibility. Meanwhile, the VR film is being showcased at various film festivals and conferences such as Alpavirama, Visual Discourse, and TechFest(IIT Bombay), where it has already received recognition, including winning the Best XR Short Film award at the FICCI Best Animation Awards 2025.

People Involved

Full Team

Collaborators

Collaborators