PHOSA — SOTA Neural Network That Extracts an Object 3D Model From an Image
PHOSA is the new state-of-the-art neural network in 3D model extraction from 2D image
Researchers from Carnegie Mellon University, Facebook AI Research, Argo AI and the University of California have developed a neural network model that generates 3D models of people and surrounding objects from a single 2D image. In this case, the model takes into account the spatial relationships between objects.
PHOSA — More about the model
PHOSA (Perceiving 3D Human-Object Spatial Arrangements) works without markup at the scene or object level. The model extracts many connections between the person in the image and other objects and translates them into 3D space. Researchers have introduced constraints into the training process of the model that allows resolving controversial situations during the generation of 3D models. For this, several loss terms are used in the model error functionality, which is responsible for:
- Scale: the size of the object;
- Silhouette: optimization of a person’s posture;
- Interaction: optimizing the person’s relationship with other objects in the image
The proposed framework uses 3D models for assessing human posture, instance segmentation models, and a differentiable 3D rendering model.
PHOSA model performance evaluation
The researchers assessed the model’s results with qualitative and quantitative metrics. We tested the framework on the COCO-2017 dataset. PHOSA produces results comparable to the state-of-the-art for images in which people are in contact with ordinary objects.
If you found this article helpful, click the💚 or 👏 button below or share the article on Facebook so your friends can benefit from it too.