PINNOCHIO: predicting the post-operative face in orthognathic surgery with a physics-informed network, as accurate as finite elements but in seconds (Lee et al. 2026, arXiv)
Jungwook Lee, Daeseung Kim, Kevin Gu, Zhangfeng Hu, Tianshu Kuang, Finn Hopeman, Michael A.K. Liebschner, Jaime Gateno and Pingkun Yan (Rensselaer Polytechnic Institute, Houston Methodist Research Institute and Baylor College of Medicine) post on arXiv on 1 June 2026, ahead of the MICCAI 2026 conference, PINNOCHIO: a physics-informed neural network (PINN) that predicts, patient by patient, how facial soft tissue deforms after the surgical repositioning of the jaws. On 40 real clinical cases — pre-operative CT for the geometry, post-operative 3dMD facial surface for the ground truth — the model matches or beats the reference finite-element simulator on surface fidelity (mean Chamfer distance of 1.12 mm vs 1.30, and 86.55% of facial points within 2 mm of target vs 80.90%), but in 3.24 seconds instead of 3.5 hours. That speed gain makes iterative trial of surgical plans genuinely practical; it should nonetheless be read against a cohort of only 40 patients, supervision that measures the outer surface alone, mechanical parameters identical for every patient, and unreleased code.
The context
Orthognathic surgery corrects dentofacial deformities — misaligned jaws, a receding or protruding chin, asymmetries — by cutting and repositioning the facial bones. The aesthetic and functional outcome depends on how the soft tissues (muscle, fat, skin) follow the bone movement, a strongly non-linear relationship: moving the bone by one millimetre does not move the skin by one millimetre, and the effect depends on location. To plan, the surgeon would like to try several candidate bone movements and see, for each, the predicted face. That is exactly what a good soft-tissue simulation must provide.
Two families of tools have competed so far. On one side the finite element method (FEM: the tissue is cut into a mesh of small elements and the equations of mechanics are solved on each), biomechanically rigorous but slow — several hours per case, incompatible with interactive trial in the clinic. On the other, fast deep-learning models that often produce biomechanically inconsistent deformations: a self-intersecting face, volumes that fold over. PINNOCHIO sits in this gap: keep the speed of the neural network without giving up physical consistency.
The method
The preprint (arXiv:2606.01572, 10.48550/arXiv.2606.01572, posted on 1 June 2026, under the arXiv non-exclusive license) rests on two ideas. The first is a sequential decomposition that separates two phenomena of different natures. At the interface between bone and soft tissue, the displacement is discontinuous: skin bonded to the bone moves with it, the rest does not. In the volume, by contrast, the deformation is continuous. PINNOCHIO first handles the prediction of the interface displacement (the "Boundary Displacement Prediction" module), then uses that result as a boundary condition to propagate the deformation through the whole volume ("Physics-Constrained Volumetric Propagation"). Decoupling the two stabilizes training.
The second idea is the physical anchoring. Soft tissue is modeled as a Neo-Hookean hyperelastic material — a classic constitutive law for biological tissue, describing how strain energy rises as the material is stretched or compressed. From this energy, internal forces are computed at every mesh node, and a physics loss penalizes configurations that are not in equilibrium (non-zero net force). The network thus does not merely learn to imitate examples: it is penalized when its prediction violates mechanics. Both modules rely on graph neural networks (GNNs, which treat a mesh as a graph of connected nodes) with attention (GAT), well suited to irregular meshes.
An important methodological point: supervision covers only the outer surface. In a real patient one does not have the post-operative position of every point inside the tissue; only the facial surface, captured by 3dMD (a 3D photography system), is known. Lacking point-to-point correspondence, the agreement between predicted and real surface is measured by the Chamfer distance (the average distance from each point of one surface to the nearest point of the other). The inside of the volume is therefore constrained only by physics, not by measurements. The authors pre-train the model on FEM-simulated data (where the volumetric ground truth exists), then fine-tune it on the real cases with surface supervision alone — a sim-to-real strategy. The cohort comprises 40 clinical cases evaluated with five-fold cross-validation; each case includes the planned movement of four bone segments (LeFort I, mandibular distal segment and the two distal segments). The mechanical parameters are fixed identically for all (muscle: Young's modulus 6 kPa; superficial layer: 4 kPa; Poisson's ratio 0.49).
The results
PINNOCHIO is compared to three references: FEM-RLSE (the reference finite-element simulator) and two earlier deep-learning models, ACMT-Net (Fang et al. 2024) and DGCFP. Over the whole face it achieves the best fidelity: Chamfer distance of 1.12 ± 0.26 mm (vs 1.30 for FEM, 1.71 and 2.19 for the two learning models), Hausdorff distance of 2.73 ± 0.69 mm (Hausdorff measures the worst gap, not the average; vs 3.16 for FEM), and above all 86.55% of points within 2 mm of the target surface, vs 80.90% for FEM. On speed, the gap is of another order of magnitude: 3.24 seconds per case vs 1.26 × 10⁴ seconds (3.5 hours) for FEM, roughly 3,900 times faster. An ablation study shows both ingredients matter: removing the decomposition or the physics loss degrades either accuracy or mechanical validity (the equilibrium residual rises from 0.20 to 1.73, and mesh quality measured by the Jacobian drops from 0.87 to 0.68 without the physics constraint).
Clinical translation. The 2 mm threshold is not arbitrary: on a face, a gap of that order is roughly the limit of what the eye perceives. That 86.55% of points fall below this threshold also means that about one facial point in seven remains more than 2 mm from target — a residual error that may concentrate in expressive zones (lips, folds, nose tip) unnoticed, since the metric is averaged over the whole face. The decisive gain is elsewhere: going from 3.5 hours to 3 seconds per simulation transforms planning. Trying ten candidate surgical plans took about 35 hours of finite-element computation; it now takes half a minute, making iterative optimization possible within a single planning session. This is pre-operative decision support, not an autonomous act.
What works well
The physics is in the loss function, not just in the data. By explicitly penalizing configurations outside Neo-Hookean equilibrium, the model produces mechanically plausible deformations, and the ablation proves it with figures: without the physics constraint, the equilibrium residual is multiplied by nearly nine and mesh quality degrades. This is precisely what pure learning models lack, which can fit the surface while producing an aberrant volume.
The interface/volume decomposition addresses a real problem. Separating the discontinuous jump at the bone–tissue interface from the continuous deformation of the volume is a faithful modeling of the actual phenomenon, and it measurably improves learning. It is a reusable methodological contribution beyond the facial case alone, wherever a sharp boundary condition drives a volumetric deformation.
Evaluation is against the real post-operative surface, and beats FEM on it. The model is not only compared to another simulation: the ground truth is the 3dMD surface actually observed after surgery. Beating the reference FEM on this criterion (86.55% vs 80.90% of points under 2 mm) while being thousands of times faster is a concrete result, not a promise — and comparing to two recent learning models, rather than an obsolete straw man, is honest.
What works less well
Forty patients, single source: generalization remains open. A cohort of 40 cases, even with five-fold cross-validation, is narrow, and the preprint does not specify the institution or country of origin of the data. This is the territory of population bias: nothing guarantees that the accuracy holds on other morphologies, other types of deformity, other scanners or 3dMD systems. Without external multi-center validation, the 1.12 mm figure describes this cohort, not the population of orthognathic surgery candidates.
Only the surface is measured; the interior is never verified. Supervision covers only the skin, the inside of the volume being constrained by physics alone. Yet a model may reproduce the surface correctly while getting the deformation of the deep layers wrong — a variant of shortcut learning (the network learns what suffices to minimize the surface loss, not necessarily the true internal mechanics). Moreover the mechanical parameters are fixed identically for all patients, whereas tissue stiffness varies from one person to another; the authors acknowledge this, deferring patient-specific estimation to future work. The FEM "ground truth" used in pre-training is itself a model, with its own approximations.
An averaged metric, and no released code. The distances are averaged over the whole face: this is the classic misleading metric, where a good average can hide errors localized in clinically decisive regions. The preprint does not report accuracy region by region (lips, nose, chin). Finally, neither code nor weights are announced, and the text is released under the arXiv non-exclusive license — not an open reuse license: independent reproducibility is therefore not guaranteed as it stands. Funding (NIH, grants R01DE027251 and R01DE021863) and the absence of declared conflicts of interest are, for their part, properly stated.
What this changes
For the research community, the message goes beyond facial surgery. PINNOCHIO illustrates a recipe that generalizes: inject a mechanical law into the loss function of a graph network, and decompose the problem by the nature of the displacements (discontinuous interface, continuous volume). The sim-to-real strategy — pre-train on FEM simulations where the volumetric ground truth exists, then fine-tune on real data where only the surface is measured — transfers to other tissue-deformation problems. The natural next steps are multi-center validation, patient-specific estimation of mechanical properties (the authors mention ultrasound) and extension to other procedures such as genioplasty.
For surgeons, the potential contribution is tangible: a simulation in a few seconds allows, in theory, several plans to be compared interactively during the consultation, where finite elements imposed overnight computation. But this is a research prototype: no CE marking, no FDA clearance, no opinion from France's Haute Autorité de Santé covers such a tool today to guide a surgical decision, and a residual error above 2 mm on part of the face is not trivial in facial aesthetic surgery.
For patients and the public, the value is that of a better pre-operative dialogue: being able to visualize the expected face faster and more faithfully helps set realistic expectations. Caution remains warranted — a prediction is not a guarantee of outcome, and the surgical decision remains the responsibility of the care team, which integrates many factors beyond tissue geometry alone.
Further reading
The preprint is openly available on arXiv: arxiv.org/abs/2606.01572 (DOI 10.48550/arXiv.2606.01572). On deep learning applied to medical imaging and the question of the comparator, see our analysis of Liu 2026 on a mixture-of-experts model for rectal MRI. On translating an imaging performance into clinical value, see our analysis of the Brzus 2026 prognostic neuroimaging pipeline after stroke.