How we compare: next-best-view methods, honestly tagged

a comparative survey · Wave 6 NBV literature · tagged by what's actually comparable to our interface

There is no shared public benchmark for next-best-view over BIM/MEP scenes: the NBV field spans object-centric turntables, RL drone policies, and NeRF-uncertainty planners, each with its own scene model, action space, and metric. So a raw number-vs-number ranking would be apples-to-oranges. Below, every published method is tagged by how well it maps onto our exact interface: a partial point cloud + a candidate pose set → a per-pose score. comparable = same interface, reusable head or runnable baseline; partial = same spirit, mismatched scene/action model; off-pipeline = different inputs (RGB/NeRF) or output (RL actions, not candidate scores).

Where we sit. We do not claim SOTA on a public NBV benchmark: none exists for this setting. What we contribute is (a) a clean BIM/MEP eval harness over 297 manifest scenes + IFC-Bench v2, (b) the OctoMap-IG baseline integration (Bircher et al., ICRA 2016) that our hybrid beats by +5.4σ / +5.0σ / +5.6σ across test_locked / held_out / OOD, (c) a metric-pathology fix that is a caution for the whole subfield, and (d) the corpus > target finding: three target-engineering tricks regressed, one data-diversity fix won.

The metric caution for the subfield. Our compute_mep_recall originally divided captured instances by the instances visible in the partial cloud: using the scanned cloud as its own ground truth. Observing one instance scored 1.0; oracle inflated 5-50×. Re-anchoring to scene-full GT (scene.instance_class) dropped oracle on gni_model_173 from 1.0 → 0.0093. Any NBV result that scores recall/coverage against a partial reconstruction's own labels is exposed to the same inflation: worth checking before trusting cross-paper numbers.

The NBV field: tagged by comparability to our (partial cloud, candidate poses) → per-pose-score interface

The one we reuse, and the one we beat. Our joint candidate-scoring head is PC-NBV-style (Zeng et al., IROS 2020), partial cloud in, per-candidate value out, so that line is genuinely comparable: same interface, we cite and reuse the design. The only method we run head-to-head on identical scenes is OctoMap-IG / nbvplanner (Bircher et al., ICRA 2016), integrated as our octomap_ig baseline; the learned-ranker-plus-lookahead hybrid beats it on every split. Everything else is the right idea on the wrong scene model (NBV-Net's 32³ grid + fixed 14-view sphere; MA-SCVP's object-centric turntable) or a different problem entirely (NeU-NBV needs a per-scene NeRF; GenNBV samples RL actions, not candidate scores).

Corpus sources surveyed for expansion (Wave 6)

The OOD win came from more diverse buildings, not a cleverer loss, so Wave 6 surveyed open BIM/IFC corpora. IFC-Bench v2 was used (93 sub-scenes added); the rest are documented leads for further expansion.

IFC-Bench v2 · HuggingFace sylvainHellin/ifc-bench · CC-BY · USED (11 MEP IFCs → 93 sub-scenes) IfcOpenShell/files OpenIFC Auckland BIMData R&D buildingSMART Sample-Test-Files