Optimization and Manipulation of Contextual Mutual Spaces for Multi-User Virtual and Augmented Reality Interaction

Optimization and Manipulation of Contextual Mutual Spaces for Multi-User Virtual and Augmented Reality Interaction

Mohammad Keshavarzi, Woojin Ko, Allen Y. Yang, Luisa Caldas
University of California, Berkeley, USA
{mkeshavarzi, woojin_ko, allenyang, lcladas}@berkeley.edu

Spatial computing experiences are physically constrained by the geometry and semantics of the local user environment. This limitation is elevated in remote multi-user interaction scenarios, where finding a common virtual ground physically accessible for all participants becomes challenging. Locating a common accessible virtual ground is difficult for the users themselves, particularly if they are not aware of the spatial properties of other participants. In this paper, we introduce a framework to generate an optimal mutual virtual space for a multi-user interaction setting. The framework further recommends movement of surrounding furniture objects that expand the size of the mutual space with minimal physical effort. Finally, we demonstrate the performance of our solution on real-world datasets and also a real HoloLens application. Results show the proposed algorithm can effectively discover optimal shareable space for multi-user virtual interaction and hence facilitate remote spatial computing communication in various collaborative workflows.

Spatial Computing, Augmented Reality, Virtual Reality, Tele-presence, Generative Design, Optimization

2020 \setcopyrightacmlicensed \doihttps://doi.org/10.1145/3313831.XXXXXXX \isbn978-1-4503-6708-0/20/04




[500]Human-centered computing Collaborative and social computing \ccsdesc[300]Human-centered computing Virtual reality \ccsdesc[300]Human-centered computing Collaborative and social computing systems and tools \ccsdesc[100]Human-centered computing Mixed / augmented reality \ccsdesc[100]Human-centered computing Contextual design \ccsdesc[300]Applied computing Multi-criterion optimization and decision-making \ccsdesc[100]Theory of computation Evolutionary algorithms


1 Introduction

The emerging fields of augmented reality (AR) and virtual reality (VR) have introduced a large number of exciting applications in tele-communication, immersive collaboration, and social media where multiple users can share a virtual environment. While much work has been done on 3D capturing methods, real-life avatar modeling, and virtual social platforms, one key challenge in AR/VR immersion is the scene understanding of the users’ surrounding spaces and the question of how to optimally utilize them for immersion tasks.

More specifically, acquiring an accessible 3D workspace is a prerequisite for a virtual or augmented immersion experience. Furthermore, the augmentation of the virtual data in the physical space must be compatible with the contextual properties of the physical space, such as a floor that is standable, a chair that is sittable, and a wall that is also a physical barrier of virtual interactions. For many 6 degrees-of-freedom (DOF) VR applications, the user will often be asked to manually initiate a block of free space where the VR immersion can be assumed to be safe. Such a space is called a VR workspace and is typically assumed to be standable. Inferencing the above contextual information for both AR and VR can be readily done using several well-established 3D modeling algorithms in computer vision. Current AR devices, such as the HoloLens or MagicLeap, integrate such algorithms to estimate the layout of the space, including floors, walls, and ceilings, and typical furniture objects such as tables and chairs. In this paper, we assume such contextual information of individual spaces to be available via either a manual or algorithmic process.

However, in scenarios where an immersive experience involves multiple users, understanding of spatial constraints is elevated to that of all involved users. Since different users may participate in the immersive experience from their own spaces, which can hold very contrasting contextual properties, a consensus must be established to identify a mutual space that respects the spatial constraints of all the participants. Yet, having users manually identify such a mutual space would be imprecise and labor intensive, especially when considering the fact that it would be difficult for a user to be aware of the contextual properties of the other users’ spaces. Without more effective and efficient solutions, the establishment of a contextual mutual space will be a bottleneck for multi-user immersion experiences.

Motivated by this challenge, we present in this paper a novel method to optimize contextual mutual space in a multi-user immersion setting. Our method relies on existing semantic scene maps to identify shareable functional spaces. For illustration purposes, we will use standable and sittable as the two functions to walk through our method, although the solution is also compatible with other contextual functions. The method formulates an optimization problem to seek the maximal mutual spaces. Additionally, if one can assume the users have the freedom to rearrange furniture objects on the floor, we introduce a more delicate optimization problem to further increase the mutual space’s size while considering the users’ effort to physically move the objects as another constraint.

In this paper, we propose the use of a genetic algorithm approach to solve the two optimization problems above. Clearly, we believe other comparable algorithms that optimize these NP-Hard problems are equally effective. The end result is a new solution capable of automatically recommending contextual mutual space to multiple participants of virtual immersion experiences in AR/VR applications.

2 Related Work

Immersive AR/VR systems have been widely explored for remote tele-presence applications, providing real-time capture, transmission and display between participants of the platform [16, 6, 30]. Using an array of cameras [56, 29, 54] or depth sensors monitoring the capture space [48, 22, 41, 39], holographic replicas or avatars of the virtual participants are projected in pre-defined local spaces. Such projections have been extensively developed using situated autostereo [43, 41], volumetric [21], lightfield [4], cylindrical [26], and holographic [9] displays. However, participants of such systems are mainly stationed in predefined spaces [61, 8, 38, 58] to avoid any geometrical conflicts with surrounding features in the projected space. Such approach limits free-form motion of the participants within each other’s location, an important factor for achieving co-located presence.

The importance of free-form user movement and the ability to preserve mobility-based communication features such as walking, gestures, and head movement have been studied greatly in the context of co-located collaboration [5, 36, 25]. Another vital aspect of sharing mutual space is described in Clark’s work as grounding [14]. Grounding in communication (or common ground) is a concept that comprises the collection of "mutual knowledge, mutual beliefs, and mutual assumptions" that is essential for communication between two people. Successful grounding in communication requires parties to coordinate both the content and process [31]. As content in spatial computing can also involve the surrounding space itself, providing a common virtual ground can be critical to allow all communication features to be reflected correctly.

More recent examples have explored how tele-presence can be conducted with less spatial constraints, allowing fluid user motion in both ends of the communication. Works of [17] and [7] are examples of such systems where users and their local interaction spaces are continuously captured using a cluster of registered depth and color cameras. However, these systems use stereoscopic projection which limits the ability for remote and local users to access each others space. Instead, spaces are virtually disconnected and interaction occurs through a window from one space into the other. Meanwhile, the Holoportation system introduced by Orts-Escolano et al. allows bilateral tele-presence between participants where participants share a common virtual ground [46]. Their system allows the remote user to be rendered into the local user’s space as an avatar while the local user appears as an avatar in the remote user’s space as well. Such an approach is also seen in [40], where the remote and local users do not share the same functional layout of rooms, but they are calibrated in order to provide the required mutual virtual ground between users.

Figure 1: Abstract illustration of our proposed framework a) initial settings with different spatial restrictions b) semantic segmentation defining standable (yellow boundaries) and sittable (orange boundaries) areas c) search for mutual sittable space (this step can be before, after or simultaneous with object repositioning) d) virtual arrangement of avatars with deterministic line of sight of all participants.

While tele-presence systems via shared spaces present novel workflows for capturing and projecting virtual avatars, the issue of avoiding physical and virtual conflicts within the shared spaces is still an open challenge. In this regard, the work of Lehment et al. [32] may be the closest work to this paper, which proposes an automated method to align remote environments so that they minimize discrepancies in room obstacles and physical barriers. However, the method is limited to two spaces and uses a brute force search to calculate the consensus space between participants. Our method formulates rigorous optimization problems to search and manipulate a potentially unlimited number of spaces in order to find a mutual spatial boundary.

The practice to determine an optimal arrangement of discrete spatial elements is often referred to as floorplanning [15]. Automated floorplanning methodologies have been widely investigated in architectural space layouts, construction [13, 55, 47], electronic design [44, 11, 19], and industrial operation research [2]. Floorplanning aims to achieve a defined functional goal by efficiently generating and evaluating possible spatial combinations while addressing the geometrical and topological constraints of the spatial elements [20]. In electronic physical design floorplanning, proposed methodologies mostly aim at optimizing chip area and wirelengths to reduce interconnections and improve timing [23]. In construction site layout and planning, optimizing the interaction between facilities, such as total inter-facility transportation costs and frequency of inter-facility trips can also be implemented as objective functions [47]. In our proposed framework, we similarly integrate an objective function whose goal is to minimize the amount of effort required to move surrounding furniture while maximizing the area of the mutual virtual ground among all participants.

In floorplanning, various representation methods of spatial arrangements are coupled with optimization engines to efficiently search through all possible combinations of spatial elements. Floorplanning representations are generally divided into two main categories: slicing and non-slicing representations [57]. In slicing methodologies, the floor plan is recursively bisected until each part consists of a single module [59]. Non-slicing representation are utilized for more general use cases where no recursive bisection of a certain area takes place [18, 37, 34]. Multiple studies have integrated these representations with various optimization algorithms such as Simulated Annealing (SA) [27, 28, 59], Genetic Algorithms (GA) [51, 45, 33, 19, 60] and Particle Swarm Optimization (PSO) [53, 12, 24, 52, 42]. More recently, by applying learning based algorithms, hybrid neural networks[7] and annealed neural networks have been used to identify optimal site layout and solve construction site-level problems.[8]

3 Methodology

Our solution consists of the following four steps: (i) Semantic segmentation of surrounding environments; (ii) Topological scene graph generation; (iii) Mutual space identification; (iv) Optionally, manipulation of ground objects to further maximize the mutual space.In this section, we will elaborate on the details of the four steps. To start, we will define the terminologies and notations used in the paper.

Given a closed 3D room space in , one can project its enclosure, i.e., floors, ceilings, and walls, via an orthographic projection to form a 2D projection, which is commonly known as the floor plan of the space. If we assign the coordinates on the floor-plan plane and the coordinate perpendicular to the floor-plan plane, simplifying our optimization problems on to the plane significantly reduces the complexity of our algorithms. It also implies an assumption that there is no overlap between two objects on the plane but with different values. Nevertheless, we believe such simplification is reasonable for analyzing the majority of room structures and thus does not compromise the generality of our analysis provided herein.

Hence, we define for each user their own room space expressed as a 2D floor plan as . Each -th object (e.g., furniture) in is denoted as .The collection of all objects in is denoted as . represents the boundary of the object . Similarly, represents the boundary of the room . Finally, we define the area function as .

Figure 2: Comparison between available (a) standing only and (b) standing and sitting area in rooms.

3.1 Semantic Segmentation

Given the measurement of the surrounding physical environments as large sets of point cloud data, one can take advantage of the semantic segmentation methods widely investigated in computer vision literature [49, 35, 3] to segment their spatial boundaries and obtain their geometric properties, such as dimensions, position and orientation, object classification, functional shapes, and their weights. In doing so, we can convert the 3D point cloud data to labeled objects with a bounding box as .

Additionally, in this paper we exclude lightweight objects (such as pillows, alarm clocks, laptops, etc.) positioned on larger furniture. This is to simplify our calculations in the next steps as we assume these lightweight objects can be easily moved by the users and do not need to be considered in the optimization criteria. Such classification is dependent on the output labeled object categories above.

In the experiment section below, since the implementation of a computer vision algorithm for semantic segmentation is not the main focus of this paper, we will directly integrate a modified version of MatterPort3D [10] object classifier in our system. This module can be replaced with any other robust semantic segmentation system, as long as it provides bounding box coordinates for each object category. In a companion MatterPort3D [10] dataset, out of 1,659 unique text labels, we classify 134 of the labels as lightweight objects and filter their corresponding bounding box from our workflow.

Figure 2(a) illustrates the result of semantic segmentation of two room spaces projected onto the -plane.

3.2 Topological Scene Graph

After identifying the bounding box, orientation, and category type of each object in the scene , a topological graph is readily generated that describes the relationship and constraints of the objects between one each other within . This step will allow us to identify usable spatial functions such as standing in virtual immersion, located between the objects. We categorize this type of functions as standalone spatial functions, and their spaces are called standalone spaces.

A topological scene graph will also allow us to identify other spatial functions on the objects themselves such as sitting on a chair and working on a table. But note that such functions as sitting or working are also constrained by the distances between the object that performs the function and its adjacent other objects. For example, a side of the table can not be utilized for working purposes if that side is adjacent to other furniture or building elements (such as walls, doors, etc.). We categorize this type of functions as auxiliary spatial functions, and their spaces are called auxiliary spaces.

In this paper, we will use two spatial functions standable and sittable as an example to demonstrate how to integrate both standalone spatial functions and auxiliary spatial functions in the optimization of contextual mutual spaces for mutli-user interaction in AR/VR.

Finally, we emphasize that standalone spaces and auxiliary spaces are not mutually exclusive. For example, in this paper, we will classify that a standable space can be assumed to be sittable as well. However, the vice versa may not be true. For example, a portion of a sittable space involves a part of a bed object, which we will not assume to be standable. Such contextual constraints can be highly customizable based on the content of the AR/VR application. But the framework that we are introducing in this paper is general enough to accommodate other contextual interpretations of the standalone spatial functions and auxiliary spatial functions.

In our implementation, we use a doubly-linked data structure to construct the graph. For each side face of an object’s bounding box we define the closest adjacent objects to the face and calculate the distance between the object and the specified face. This information would be stored at the object level, where topological distances and constraints are referenced using pointers.

Mathematically, for each object , we define the function as the shortest distance between the points in that have the maximal value and the other objects including . Similarly, we define the functions , , and .

3.3 Mutual Space Identification

In this step, we will identify the geometrical boundaries of available spaces in each room and then align the calculated boundaries of all rooms to achieve maximum consensus on mutual spaces.

First, using the geometrical and topological properties extracted in previous steps, we are ready to calculate available spaces in each room based on two categories, namely, the standalone spaces and auxiliary spaces. Specifically, we will formulate the calculation of the two most typical spatial functions as examples again, namely, standable and sittable.

3.3.1 Standable Spaces

Standing spaces consist of the volume of the room in which no object located within a human user’s height range is present. In such spaces, user movement can be performed freely without any risk of colliding with an object in the surrounding physical environment. Activities such as intense gaming or performative arts can be safely executed within these boundaries. Such spaces are also suitable for virtual reality experiences, where users may not be aware of the physical surroundings.

We calculate the available standing space () for room simply as follows:


3.3.2 Sittable Spaces

The calculation of maximal sittable spaces is more involved than that of the standable spaces above. As we mentioned before, sittable spaces normally extend the standable spaces by adding areas where humans are able sit on. Furniture types such as sofas, chairs, and beds include sitting areas that can extend usable spaces of a room for social functions such as general meetings, design reviews, and conference calls.

To start, we define a sittable threshold to calculate the sittable area within the bounding box of the object . In other words, is the maximum distance inward from an edge of the object’s bounding box that can be comfortably sit on. We use measurements from [50] to define the of each furniture type. If an object is classified as non-sittable, then .

Therefore, we can first calculate the non-sittable area of an object as


where is a sphere in centered at and with radius .

We note that sittable spaces do not necessarily comprise only objects to be sit on, but rather describe an area where a sittable object can be placed in. For example, while an individual may not be able to comfortably sit on the top of the table, but the foot space bellow the table can be considered as sittable space. Therefore, in such context the sittable area of the room is always larger than its standable area.

Moreover, sittable areas of each object in the room is constrained by the topological positioning of the object. If any of the object’s boundaries is adjacent to a non-sittable object (such as a wall, bookshelf, etc) or does not contain enough standable area between itself and a non-sittable object, the sittable area of the side of the face should be excluded. For instance, if a table is positioned in the center of a room, with no other non-sittable object around it, the sittable area would be calculated by applying the sittable threshold to all four sides of the table’s boundaries. However, if the table is positioned in the corner of the room, then there will be no sittable area accumulated for the sides that are adjacent to the wall.

To simplify our calculation, we define a surrounding boundary threshold for object , which measures the distance from any object’s boundary point outward that allows that point to remain part of the sittable space of the object. In other words, if the boundary point is close to other objects or the room boundary within distance , then that point can not be sit on. defined below collects all such points for exclusion from in room :


Therefore, the sittable space of each object is simply defined as


Finally, the total sittable space for the room is


Figure 3 illustrates two example rooms and compares their standing and sitting areas.

Figure 3: Standable (green) and sittable spaces (yellow) for two sample scenes from the MatterPort3D dataset

3.3.3 Maximizing Mutual Spaces

Now we consider an immersive experience where there are subjects and therefore room spaces , respectively. Then, in the -coordinates, we define a rigid-body motion in as , where describes a translation and a rotation.

If we want to maximize a mutual standable space, we can apply one to each individual standable space for the -th user. The optimal rigid body motion then maximizes the area of the interaction space:


Then the maximal mutual standable space can be calculated as


Similarly, one can calculate the maximal mutual sittable space by substituting the rigid body motions in (7) that maximizes their intersection area function in (6).

3.4 Furniture movement optimization

In the event where individual spaces include movable furniture, additional optimization can be considered to potentially increase the maximal mutual spaces. Diverging from merely considering rigid-body motions to transform just the coordinate representation of the spaces, we consider moving furniture objects in space, which has an additional cost of human effort. Consequently, we will formulate this effort as part of our optimization objective.

More specifically, given a rigid-body motion , we definite as the Euclidean distance of its translation vector. Then we define


where is a given parameter that approximates the weight of each object. Note that such weight estimate can be looked up using architecture standards such as in [50]. Hence, if a room space has objects, then the total effort to re-arrange the space is


where denotes the collection of rigid-body motion parameters.

Since solving for the optimal object transformation is an NP-Hard problem, in this paper, we will demonstrate a heuristic-based but practical algorithm to optimize it in a step-by-step greedy fashion.


where indicates the area value at the -th step with respect to transformation coefficients and . The iteration would stop if the optimization cannot further increase the area of the mutual space.

Figure 4: Furniture optimization and manipulation. In each step, a 10% increase of mutual space area () is determined, while minimizing the overall effort needed () for the required transformation ().

4 Experiments

We created two sets of experiments to evaluate the performance of our workflow. First, to comprehensively observe how the search and recommendation system performs given various rooms types with different spatial organizations, we take advantage of available 3D datasets to be able to experiment with large quantities of real-world case studies. We randomly sample subsets of varying sizes of 3D scanned scenes from the MatterPort3D dataset, and perform the search and recommendation practice on each subset to observe how the mutual spaces are identified and maximized with our system. Second, we integrate our system as part of a Augmented Reality experience in Microsoft Hololens. This allows us to demonstrate how AR users can take advantage of our proposed system in a real-world scenario.

4.1 3D Scanned Datasets

Matterport3D [10] is a large-scale RGB-D dataset containing 90 building-scale scenes. The dataset consists of various building types with diverse architecture styles, each including numerous spatial functionalities and furniture layouts. Annotations of building elements and furniture are provided with surface reconstructions as well as 2D and 3D semantic segmentation. For our experiments, we initially exclude spaces that are not generally used for multi-user interaction (bathroom, small corridors, stairs, closet, etc.). Furthermore, we randomly group the available rooms in groups of 2,3 and 4. We utilize the object category labels (mpcat40) as the ground truth for our semantic labeling purposes.

We implement our framework using the Rhinoceros3D (R3D) platform and its development libraries. For each room, we convert the labeling data structure provided by the dataset to our proposed topological scene graph. This provides the system with bounding boxes for each object and the topological constraints for their potential rearrangement. Using such a structure, we are able to extract the standable and sittable spaces for each room based on our proposed methodology. Figure 3 illustrates the available standable and sittable boundaries for two sample rooms processed by our system. We define a constant for all sittable objects.

Next, we integrate our system with a robust Strength Pareto Evolutionary Algorithm 2 (SPEA 2) [62] available through the Octopus multi-objective optimization tool in R3D. The fitness function (6) is used to maximize the mutual space for calculated standable spaces. Our genotype is comprised of the transformation parameters of each room, allowing free movement and orientation to achieve maximum spatial consensus. Therefore, a total of genes are allocated for the search process. This process would result in the shape, position and orientation of the maximum mutual boundary of the assigned rooms. We use a population size of 100, mutation probability of 10%, mutation rate of 50% and crossover rate of 80% for our search. As our system integrates a genetic search, we expect the solution to gradually converge to the global optimum. Figure 4 shows how the mutual space boundary is progressively expanded with increase of the generations in our search.

Expanding further, we extend our search by manipulating the scene with alternative furniture arrangements. As the objective goal is to achieve an increased mutual spatial boundary area with minimum effort, we calculate the based on the transformation parameters assigned to each object present in the room. However, in our current implementation, the genetic algorithm integrated in our system is not capable of adapting dynamic genotype values, and therefore cannot update the topological values of each object (, , , ) during the search process. Hence, to avoid transformations which result in physical conflicts of manipulated furniture, we penalize phenotypes that contain intersecting furniture within the scene. This penalty is added to the value, lowering the probability of such phenotypes to be selected or survive throughout the genetic generations.

Figure 5: Screenshots from HoloLens illustrating the identified mutual boundaries as augmented overlays for three rooms: A) kitchen; B) conference room; C) robotic laboratory. Blue color indicates mutual boundaries, green color indicates standable spaces and red color indicates non-standable spaces.

The optimization can either be (i) triggered in separate attempts for each step (), where the mutual area value () is constrained based on the resulting step value, or (ii) executed in a single attempt where minimizing and maximizing are both set as objective functions. In the latter, is defined as the solution which holds the largest while . Executing the optimization in a one-time event is also likely to require additional computational cost due to the added complexity to the solution space.

Figure 4 illustrates our results for a furniture manipulation optimization task applied to three sample rooms. A total of 34 objects are located in the rooms. To shorten our gene length we do not apply rotation transformations to objects. We use a population size of 250, mutation probability of 10%, mutation rate of 50% and crossover rate of 80% for the scene manipulation search. We visualize the standable, sittable and mutual boundaries for each spatial expansion step. Moreover we report the corresponding for each room in the alternative furniture layout. Our results in this sample indicate the system can identify solutions which increase the maximum mutual boundary area up to 65% more than its initial state before furniture movement.

4.2 Augmented Reality Visualization

To explore the usability aspect of our system in real-world scenarios, we deploy the resulting spatial segmentation in augmented reality using the Microsoft Hololens, a mixed reality HMD. In this experiment, three types of rooms were defined as potential tele-communication spaces: (i) a conventional meeting room, where a large conference table is placed in the middle of the room and unused spaces are located around the table (ii) a robotics laboratory, where working desks and equipment are mainly located around the perimeter of the room, while some larger equipment and a few tables are disorderly positioned around the central section of the lab (iii) a kitchen space, where surrounding appliances and cabinets are present in the scene.

After the initial scan of the surrounding environment by the user of each room, the geometrical mesh data is sent to a central server for processing. This process happens in an offline manner, as the current Hololens hardware is incapable of processing the computations that our system would require. In addition, we scan the space using a MatterPort camera, and perform the semantic segmentation step using MatterPort classifications to locate the bounding boxes of all the furniture located in the room. We then feed the bounding box data to our system for mutual boundary search. The system outputs spatial coordinates for standable and sittable areas which are automatically updated in the Unity Game Engine to be rendered in the Hololenses.

Figure 5 shows how the spatial boundary properties are visualized within the Hololens AR experience. The red spaces indicate non standable objects, the green spaces indicate standable boundaries, and the blue spaces indicate mutual boundaries that are accessible between all users. The visualized boundaries are positioned slightly above the floor level, allowing users to identify the mutual accessible ground between their local surrounding and the remote participant’s spatial constraints.

5 Discussions

The optimization process was able to generate a well-defined Pareto front, as seen on the bottom of Figure 4, locating both the two extreme points and numerous intermediate trade-off points representing non-dominated solutions. The bottom region of the curve is flat, indicating that for a similar amount of effort, a significant increase in mutual standable area can be achieved. The trade-off frontier thus starts at point , becoming very densely populated in its initial soft slope. This shows that for each modest increase in physical effort (that is, in moving furniture) there can be extensive gains in mutual shareable area, which is an interesting result. After , the Pareto front becomes increasingly steep, signaling that the user would now have to significantly increase physical effort levels for modest gains in shareable area. Point thus seems to indicate a breaking point of diminishing returns.

Similar to the search, in smaller furniture optimization steps, the algorithm seeks solutions which are highly dependent on the transformation parameters of the room itself, whereas in larger steps, we observe the algorithm correctly moving the objects to the more populated side of the room in order to increase the empty spaces in available. In rooms where objects are facing the center, and empty areas are initially located in the middle portion of the space, we see the objects being pushed towards the corners or outer perimeter of the room in order increase the initial unoccupied areas.

Due to the smaller gene size, calculating the optimal (maximum mutual space without furniture manipulation) executes much faster compared to optimization, where the complexity of the search mechanism radically increases due to the additional object transformation parameters. The speed of the optimization is also highly dependent on the transformation range of each object, meaning that objects in larger rooms have more movement options to choose from than those in small, constrained rooms. We observe an example of this effect in the later experiment, where the smaller space (kitchen) dominates the search process, causing the final mutual outcome between the rooms to maintain a very similar shape to the open boundaries of the smaller space. While such an effect would still provide a well-constrained problem for medium-sized rooms with multiple objects (such as the conference room), there are many possible ways of fitting the smaller space in larger rooms with open spaces (such as the robotics laboratory), resulting in an under-constrained optimization problem.

Visualizing the mutual ground within the space itself using HoloLens allows us to understand how complex the problem can be when executed in a manual fashion. Some corner spaces that are not typically used as default social areas of an certain room, may become the only required common ground for interaction with other rooms. Overcoming this spatial bias is easily executed within the algorithm; meanwhile, this may not happen so easily and instantly when individuals are left to deal with it on their own.

However, due to the limited field of view of the HoloLens, detecting non-physical boundaries placed at a lower visual height becomes difficult to follow. This issue proved more challenging when walking closer to the non-orthogonal edges of mutual bounding area, where an individual could easily step outside the designated area. The shareable area also included a number of voids, which resulted on an inconsistent walking path inside the standable spaces. Moreover, the accuracy of the real-time mesh reconstruction in HoloLens played a critical role in calculating the required rendering occlusions for the visualized boundaries. This was mainly because the position of the the visualization was reflected close to the floor with many object placed over it, therefore failing to detect occluding objects, a fact that often misled the user in identifying whether the space was mutually accessible or not.

6 Conclusions

We introduce a novel optimization and manipulation framework to generate an optimal common virtual space for interactions that mostly involve standing and sitting. Our framework further recommends movement of surrounding furniture objects that can expand the size of the mutual space with minimal physical effort. We integrated our system with a Strength Pareto Evolutionary Algorithm for an efficient search and optimization process. The multicriteria optimization process was able to generate a well-defined Pareto front of trade-offs between maximizing mutual space and minimizing physical effort. The Pareto front is more densely populated in some sections of the frontier than others, clearly identifying the best trade-offs region and the on-start of diminishing returns.

Furthermore, we experimented how the output solutions can be visualized using a HoloLens application. Results show that the proposed framework can effectively discover optimal shareable space for multi-user virtual interaction and thus provides better user experience compared to manually labeling shareable space, which would be a labor-intensive and imprecise workflow. In such context, if all participants stand within the calculated mutual spatial boundaries, the line of sight between all participants will be deterministic. In addition, no remote participant will be positioned in a conflicting location for any local user and would comply to the spatial constraints for all other participants.

There are, of course, limitations to the work. First, furniture with fixed positions are not automatically detected in our system. We believe such feature can be integrated with further improvements in semantic segmentation methodologies, or can be optionally specified by the user whether an object is fixed or not. In addition, the furniture weight is calculated based on standard assumptions. We envision that with the growth of spatial computing procedures, such meta-data of the surrounding environment will be customizable by the user itself and can be loaded upon each mutual spatial search execution. Future work can comprise of integrating robust floorplanning representations with the current search mechanism to minimize computation cost and complexity. Lastly, usability studies can be conducted on how to improve the visualization strategies so participants can experience the required tele-communication functionalities while preserving the mutual spatial ground.


  • [1]
  • [2] Sue Abdinnour-Helm and Scott W Hadley. 2000. Tabu search based heuristics for multi-floor facility layout. International Journal of Production Research 38, 2 (2000), 365–383.
  • [3] Iro Armeni, Ozan Sener, Amir R. Zamir, Helen Jiang, Ioannis Brilakis, Martin Fischer, and Silvio Savarese. 2016. 3D Semantic Parsing of Large-Scale Indoor Spaces. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016), 1534–1543. DOI:http://dx.doi.org/10.1109/CVPR.2016.170 
  • [4] Tibor Balogh and Péter Tamás Kovács. 2010. Real-time 3D light field transmission. In Real-Time Image and Video Processing 2010, Vol. 7724. International Society for Optics and Photonics, 772406.
  • [5] E Bardram. 2005. Activity-based computing: support for mobility and collaboration in ubiquitous computing. Personal and Ubiquitous Computing 9, 5 (2005), 312–322.
  • [6] Stephan Beck, Andŕe Kunert, Alexander Kulik, and Bernd Froehlich. 2013a. Immersive group-to-group telepresence. IEEE Transactions on Visualization and Computer Graphics 19, 4 (2013), 616–625. DOI:http://dx.doi.org/10.1109/TVCG.2013.33 
  • [7] Stephan Beck, Andre Kunert, Alexander Kulik, and Bernd Froehlich. 2013b. Immersive group-to-group telepresence. IEEE Transactions on Visualization and Computer Graphics 19, 4 (2013), 616–625.
  • [8] Hrvoje Benko, Ricardo Jota, and Andrew Wilson. 2012. MirageTable: freehand interaction on a projected augmented reality tabletop. In Proceedings of the SIGCHI conference on human factors in computing systems. ACM, 199–208.
  • [9] P-A Blanche, A Bablumian, R Voorakaranam, C Christenson, W Lin, T Gu, D Flores, P Wang, W-Y Hsieh, M Kathaperumal, and others. 2010. Holographic three-dimensional telepresence using large-area photorefractive polymer. Nature 468, 7320 (2010), 80.
  • [10] Angel Chang, Angela Dai, Thomas Funkhouser, Maciej Halber, Matthias Niebner, Manolis Savva, Shuran Song, Andy Zeng, and Yinda Zhang. 2018. Matterport3D: Learning from RGB-D data in indoor environments. In Proceedings - 2017 International Conference on 3D Vision, 3DV 2017. 667–676. DOI:http://dx.doi.org/10.1109/3DV.2017.00081 
  • [11] Yun-Chih Chang, Yao-Wen Chang, Guang-Ming Wu, and Shu-Wei Wu. 2000. B*-trees: a new representation for non-slicing floorplans. In Proceedings 37th Design Automation Conference. 458–463. DOI:http://dx.doi.org/10.1109/DAC.2000.855354 
  • [12] Guolong Chen, Wenzhong Guo, Hongju Cheng, Xiang Fen, and Xiaotong Fang. 2008. VLSI floorplanning based on particle swarm optimization. In 2008 3rd International Conference on Intelligent System and Knowledge Engineering, Vol. 1. IEEE, 1020–1025.
  • [13] M Y Cheng. 1992. Automated Site Layout of Temporary Construction Facilities Using-Enhanced Geographic Information Systems (GIS). Ph. D. Disst., Depart. of Civil Engineering, University of Texas at Austin, Texas, USA (1992).
  • [14] Herbert H Clark, Susan E Brennan, and others. 1991. Grounding in communication. Perspectives on socially shared cognition 13, 1991 (1991), 127–149.
  • [15] Hans Eisenmann, Frank M Johannes, and Frank M Johannes. 1998. Generic global placement and floorplanning. In Proceedings of the 35th annual Design Automation Conference. ACM, 269–274.
  • [16] Henry Fuchs, Andrei State, and Jean Charles Bazin. 2014. Immersive 3D telepresence. Computer 47, 7 (2014), 46–52. DOI:http://dx.doi.org/10.1109/MC.2014.185 
  • [17] Markus Gross, Stephan Würmlin, Martin Naef, Edouard Lamboray, Christian Spagno, Andreas Kunz, Esther Koller-Meier, Tomas Svoboda, Luc Van Gool, and others. 2003. blue-c: a spatially immersive display and 3D video portal for telepresence. In ACM Transactions on Graphics (TOG), Vol. 22. ACM, 819–827.
  • [18] Pei-Ning Guo, Chung-Kuan Cheng, and Takeshi Yoshimura. 2003. An O-tree representation of non-slicing floorplan and its applications. In Proceedings 1999 Design Automation Conference (Cat. No. 99CH36361). IEEE, 268–273. DOI:http://dx.doi.org/10.1145/309847.309928 
  • [19] Bah-Hwee Gwee and Meng-Hiot Lim. 1999. A GA with heuristic-based decoder for IC floorplanning. INTEGRATION, the VLSI journal 28, 2 (1999), 157–172.
  • [20] Jun H. Jo and John S. Gero. 1998. Space layout planning using an evolutionary approach. Artificial Intelligence in Engineering 12, 3 (Jul 1998), 149–162. DOI:http://dx.doi.org/10.1016/S0954-1810(97)00037-X 
  • [21] Andrew Jones, Magnus Lang, Graham Fyffe, Xueming Yu, Jay Busch, Ian McDowall, Mark Bolas, and Paul Debevec. 2009. Achieving eye contact in a one-to-many 3D video teleconferencing system. In ACM Transactions on Graphics (TOG), Vol. 28. ACM, 64.
  • [22] Brett Jones, Rajinder Sodhi, Michael Murdock, Ravish Mehra, Hrvoje Benko, Andrew Wilson, Eyal Ofek, Blair MacIntyre, Nikunj Raghuvanshi, and Lior Shapira. 2014. RoomAlive: magical experiences enabled by scalable, adaptive projector-camera units. In Proceedings of the 27th annual ACM symposium on User interface software and technology. ACM, 637–644.
  • [23] Andrew B Kahng, Jens Lienig, Igor L Markov, and Jin Hu. 2011. VLSI physical design: from graph partitioning to timing closure. Springer Science & Business Media.
  • [24] Prabhjit Kaur. 2014. An enhanced algorithm for floorplan design using hybrid ant colony and particle swarm optimization. Int. J. Res. Appl. Sci. Eng. Technol 2 (2014), 473–477.
  • [25] Mohammad Keshavarzi, Michael Wu, Michael N Chin, Robert N Chin, and Allen Y Yang. 2019. Affordance Analysis of Virtual and Augmented Reality Mediated Communication. arXiv preprint arXiv:1904.04723 (2019).
  • [26] Kibum Kim, John Bolton, Audrey Girouard, Jeremy Cooperstock, and Roel Vertegaal. 2012. TeleHuman: effects of 3d perspective on gaze and pose estimation with a life-size cylindrical telepresence pod. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 2531–2540.
  • [27] Scott Kirkpatrick, C Daniel Gelatt, and Mario P Vecchi. 1983. Optimization by simulated annealing. science 220, 4598 (1983), 671–680.
  • [28] Koji Kiyota and Kunihiro Fujiyoshi. 2005. Simulated annealing search through general structure floor plans using sequence-pair. Electronics and Communications in Japan (Part III: Fundamental Electronic Science) 88, 6 (2005), 28–38.
  • [29] Gregorij Kurillo, Ruzena Bajcsy, Klara Nahrsted, and Oliver Kreylos. 2008. Immersive 3d environment for remote collaboration and training of physical activities. In 2008 IEEE Virtual Reality Conference. IEEE, 269–270.
  • [30] C Kuster, N Ranieri, Agustina, H Zimmer, J. C. Bazin, C Sun, T Popa, and M Gross. 2012. Towards next generation 3D teleconferencing systems. In 3DTV-Conference. IEEE, 1–4. DOI:http://dx.doi.org/10.1109/3DTV.2012.6365454 
  • [31] Benny P H Lee. 2001. Mutual knowledge, background knowledge and shared beliefs: Their roles in establishing common ground. Journal of Pragmatics 33, 1 (2001), 21–44. DOI:http://dx.doi.org/https://doi.org/10.1016/S0378-2166(99)00128-9 
  • [32] Nicolas H. Lehment, Daniel Merget, and Gerhard Rigoll. 2014. Creating automatically aligned consensus realities for AR videoconferencing. ISMAR 2014 - IEEE International Symposium on Mixed and Augmented Reality - Science and Technology 2014, Proceedings September (2014), 201–206.
  • [33] Chang-Tzu Lin, De-Sheng Chen, and Yi-Wen Wang. 2002. An efficient genetic algorithm for slicing floorplan area optimization. In 2002 IEEE International Symposium on Circuits and Systems. Proceedings (Cat. No. 02CH37353), Vol. 2. IEEE, II–II.
  • [34] Jai-Ming Lin and Yao-Wen Chang. 2005. TCG: A transitive closure graph-based representation for general floorplans. IEEE transactions on very large scale integration (VLSI) systems 13, 2 (2005), 288–292.
  • [35] Chen Liu, Jiaye Wu, and Yasutaka Furukawa. 2018. FloorNet: A Unified Framework for Floorplan Reconstruction from 3D Scans. (2018), 1–18. http://arxiv.org/abs/1804.00090
  • [36] Paul Luff and Christian Heath. 1998. Mobility in Collaboration.. In CSCW, Vol. 98. 305–314.
  • [37] Yuchun Ma, Sheqin Dong, Xianiong Hong, Yici Cai, Chung-Kuan Cheng, and Jun Gu. 2001. VLSI floorplanning with boundary constraints based on corner block list. In Proceedings of the 2001 Asia and South Pacific Design Automation Conference. ACM, 509–514.
  • [38] Andrew Maimone and Henry Fuchs. 2011. Encumbrance-free telepresence system with real-time 3D capture and display using commodity depth cameras. In 2011 10th IEEE International Symposium on Mixed and Augmented Reality. IEEE, 137–146.
  • [39] Andrew Maimone and Henry Fuchs. 2012. Real-time volumetric 3D capture of room-sized scenes for telepresence. In 2012 3DTV-Conference: The True Vision-Capture, Transmission and Display of 3D Video (3DTV-CON). IEEE, 1–4.
  • [40] Andrew Maimone, Xubo Yang, Nate Dierk, Andrei State, Mingsong Dou, and Henry Fuchs. 2013. General-purpose telepresence with head-worn optical see-through displays and projector-based lighting. In 2013 IEEE Virtual Reality (VR). IEEE, 23–26.
  • [41] Wojciech Matusik and Hanspeter Pfister. 2004. 3D TV: a scalable system for real-time acquisition, transmission, and autostereoscopic display of dynamic scenes. In ACM Transactions on Graphics (TOG), Vol. 23. ACM, 814–824.
  • [42] D Jackuline Moni and S Arumugam. 2009. VLSI Floorplanning based on Hybrid Particle Swarm Optimization. Karunya Journal of Research 1, 1 (2009), 111–121.
  • [43] Koki Nagano, Andrew Jones, Jing Liu, Jay Busch, Xueming Yu, Mark Bolas, and Paul Debevec. 2013. An autostereoscopic projector array optimized for 3D facial display. In ACM SIGGRAPH 2013 Emerging Technologies. ACM, 3.
  • [44] Shigetoshi Nakatake, Kunihiro Fujiyoshi, Hiroshi Murata, and Yoji Kajitani. 1997. Module placement on BSG-structure and IC layout applications. In Proceedings of the 1996 IEEE/ACM international conference on Computer-aided design. IEEE Computer Society, 484–491.
  • [45] Shingo Nakaya, Tetsushi Koide, and Si Wakabayashi. 2000. An adaptive genetic algorithm for VLSI floorplanning based on sequence-pair. In 2000 IEEE International Symposium on Circuits and Systems. Emerging Technologies for the 21st Century. Proceedings (IEEE Cat No. 00CH36353), Vol. 3. IEEE, 65–68.
  • [46] Sergio Orts-Escolano, Mingsong Dou, Vladimir Tankovich, Charles Loop, Qin Cai, Philip A. Chou, Sarah Mennicken, Julien Valentin, Vivek Pradeep, Shenlong Wang, Sing Bing Kang, Christoph Rhemann, Pushmeet Kohli, Yuliya Lutchyn, Cem Keskin, Shahram Izadi, Sean Fanello, Wayne Chang, Adarsh Kowdle, Yury Degtyarev, David Kim, Philip L. Davidson, and Sameh Khamis. 2017. Holoportation. 741–754. DOI:http://dx.doi.org/10.1145/2984511.2984517 
  • [47] Hesham M. Osman, Maged E. Georgy, and Moheeb E. Ibrahim. 2003. A hybrid CAD-based construction site layout planning system using genetic algorithms. Automation in Construction 12, 6 (2003), 749–764. DOI:http://dx.doi.org/10.1016/S0926-5805(03)00058-X 
  • [48] Tomislav Pejsa, Julian Kantor, Hrvoje Benko, Eyal Ofek, and Andrew Wilson. 2016. Room2Room: Enabling Life-Size Telepresence in a Projected Augmented Reality Environment. In Proceedings of the 19th ACM Conference on Computer-Supported Cooperative Work & Social Computing (CSCW ’16). ACM, New York, NY, USA, 1716–1725. DOI:http://dx.doi.org/10.1145/2818048.2819965 
  • [49] Charles R. Qi, Hao Su, Matthias Niessner, Angela Dai, Mengyuan Yan, and Leonidas J. Guibas. 2016. Volumetric and Multi-View CNNs for Object Classification on 3D Data. (2016). DOI:http://dx.doi.org/10.1109/CVPR.2016.609 
  • [50] Charles George Ramsey. 2007. Architectural graphic standards. John Wiley & Sons.
  • [51] Maurizio Rebaudengo and Matteo Sonza Reorda. 1996. GALLO: A genetic algorithm for floorplan area optimization. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 15, 8 (1996), 943–951.
  • [52] B Sowmya and MP Sunil. 2013. Minimization of floorplanning area and wire length interconnection using particle swarm optimization. International Journal of Emerging Technology and Advanced Engineering 3, 8 (2013).
  • [53] Tsung-Ying Sun, Sheng-Ta Hsieh, Hsiang-Min Wang, and Cheng-Wei Lin. 2006. Floorplanning based on particle swarm optimization. In IEEE Computer Society Annual Symposium on Emerging VLSI Technologies and Architectures (ISVLSI’06). IEEE, 5–pp.
  • [54] Tomohiro Tanikawa, Yasuhiro Suzuki, Koichi Hirota, and Michitaka Hirose. 2005. Real world video avatar: real-time and real-size transmission and presentation of human figure. In Proceedings of the 2005 international conference on Augmented tele-existence. ACM, 112–118.
  • [55] I. D. Tommelein, R. E. Levitt, B. Hayes-Roth, and T. Confrey. 1991. SightPlan experiments: alternate strategies for site layout design. Computing in Civil Engineering 5, 1 (1991), 42–63. DOI:http://dx.doi.org/10.1007/BF01927759 
  • [56] Herman Towles, Wei-Chao Chen, Ruigang Yang, Sang-Uok Kum, , Henry Fuchs, Nikhil Kelshikar, Jane Mulligan, Kostas Daniilidis, Loring Holden, Bob Zeleznik, Amela Sadagic, and Jaron Lanier. 2002. 3D Tele-Collaboration Over Internet2. In In: International Workshop on Immersive Telepresence, Juan Les Pins. Citeseer.
  • [57] Laung-Terng Wang, Yao-Wen Chang, and Kwang-Ting (Tim) Cheng (Eds.). 2009. Electronic Design Automation: Synthesis, Verification, and Test. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.
  • [58] Wei-Chao Wen, Herman Towles, Lars Nyland, Greg Welch, and Henry Fuchs. 2000. Toward a Compelling Sensation of Telepresence: Demonstrating a portal to a distant (static) office. In Proceedings Visualization 2000. VIS 2000 (Cat. No. 00CH37145). IEEE, 327–333.
  • [59] D F Wong and C L Liu. 1986. A New Algorithm for Floorplan Design. In Proceedings of the 23rd ACM/IEEE Design Automation Conference (DAC ’86). IEEE Press, Piscataway, NJ, USA, 101–107. http://dl.acm.org/citation.cfm?id=318013.318030
  • [60] Wang Xiaogang and others. 2002. VLSI Floorplanning Method Based on Genetic Algorithms [J]. Microprocessors 1 (2002), 1.
  • [61] Cha Zhang, Qin Cai, Philip A Chou, Zhengyou Zhang, and Ricardo Martin-Brualla. 2013. Viewport: A distributed, immersive teleconferencing system with infrared dot pattern. IEEE MultiMedia 20, 1 (2013), 17–27.
  • [62] Eckart Zitzler, Marco Laumanns, and Lothar Thiele. 2001. SPEA2: Improving the strength Pareto evolutionary algorithm. TIK-report 103 (2001).
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description