无人驾驶-数据集01：NAVSIM: Data-Driven Non-Reactive Autonomous Vehicle Simulation and Benchmarking-开发者社区

NAVSIM：数据驱动的非反应式自动驾驶车辆仿真与基准评测

Daniel Dauner 1,2 Marcel Hallgarten 1,5 Tianyu Li 3
Xinshuo Weng4 Zhiyu Huang 46 Zetong Yang3 Hongyang
Li 3 Igor Gilitschenski 78 Boris Ivanovic 4 Marco Pavone 4,9
Andreas Geiger 1,2 Kashyap Chitta 1,2
1 University of Tubingen ² Tubingen AI Center 3
OpenDriveLab at Shanghai AI Lab
1 图宾根大学²图宾根AI Center’上海AI Lab 的
OpenDriveLab
4 NVIDIA Research 5 Robert Bosch GmbH6 Nanyang
Technological University
4 NVIDIA Research 5 Robert Bosch GmbH南洋理工大
学
7 University of Toronto 8 Vector Institute 9 Stanford
University
7 多伦多大学8Vector Institute 9斯坦福大学

Abstract 摘要

Benchmarking vision-based driving policies is challenging. On one hand,openloop evaluation with real data is easy,but these results do not reflect closedloop performance. On the other,closed-loop evaluation is possiblein simulation,but ishard toscale due to its significant_computational demands.Further,the simulators available today exhibit a large domain gap to real data.This has resulted inan inability to draw clear conclusions from the rapidly growing body of research on end-to-end autonomous driving. In this paper, we present NAVsIM,a middle ground between these evaluation paradigms, where we use large datasets in combination with a non-reactive simulator to enable large-scale real-world benchmarking. Specificaly, we gather simulation-based metrics，such_as progress and time to colision,by unrolling bird’s eye view abstractions of the test scenes for a short simulation horizon.Our simulation is non-reactive,i.e.,the evaluated policy and environment do not influence each other. As we demonstrate empirically，this decoupling allows open-loop metric computation while being beter aligned with closed-loop evaluations than traditional displacement errors.NAVSIM enabled a new competition held at CVPR 2024, where 143 teams submitted 463 entries,resulting in several new insights. On a large set of challnging scenarios,we observe that simple methods with moderate compute requirements such as TransFuser can match recent large-scale end-to-end driving architectures such as UniAD.Our modular framework can potentially be extended with new datasets,data curation strategies,and metrics,and will be continually maintained to host future challenges. Our code is available at https://github.com/autonomousvision/navsim.

对基于视觉的驾驶策略进行基准评测具有挑战性。一方面，使用真实数据进行开环评估很容易，但这些结果并不能反映闭环性能。另一方面，在仿真中可以进行闭环评估，但由于其显著的计算需求，难以扩展到大规模。此外，当前可用的模拟器与真实数据之间存在较大的域差距。这导致人们无法从快速增长的端到端自动驾驶研究中得出清晰结论。本文提出了NAVSIM，作为这些评估范式之间的一种折中方案：我们将大规模数据集与非反应式模拟器相结合，从而实现大规模真实世界基准评测。具体而言，我们通过在较短的仿真时域内展开测试场景的鸟瞰图抽象，收集基于仿真的指标，例如前进进度和碰撞时间。我们的仿真是非反应式的，即被评估策略与环境彼此不相互影响。正如我们的实证结果所表明的，这种解耦使得能够进行开环指标计算，同时相比传统的位移误差，与闭环评估具有更好的一致性。NAVSIM支持了一项在CVPR 2024举办的新竞赛，共有143 支团队提交了463份参赛结果，并由此产生了若干新的见解。在一大批具有挑战性的场景上，我们观察到，像 TransFuser 这样计算需求适中的简单方法，可以与近期大规模端到端驾驶架构（如 UniAD）相匹敌。我们的模块化框架有望通过新的数据集、数据整理策略和指标进行扩展，并将持续维护以承载未来的挑战赛。我们的代码可在 https://github.com/autonomousvision/navsim 获取。

1 Introduction
1引言

Autonomous vehicles (AVs)have gained immense research interest dueto their potential to change transportation and improve trafic safety[23,9].This has createdalargecommunity working onthedevelopment of AValgorithms,whichmap high-dimensional sensordatatodesired vehiclecontroloutputs.Therefore,measuring and comparing the performanceof AV algorithms is a crucial task.

由于Autonomous vehicles (AVs)具有改变交通运输并提升交通安全的潜力，它们已引起了极大的研究兴趣 [23,9]。这催生了一个庞大的社区，致力于开发AV算法，将高维传感器数据映射为期望的车辆控制输出。因此，衡量和比较AV算法的性能是一项至关重要的任务。

Unfortunately,itisextremelychallenging toevaluatedriving performane,andthemostwidely-usedbenchmarks todayfall short inseveralrespects: (1)thedatasets used,such as nuScenes [5l,were created forperceptiontasks suchasobject detection.Assuch,they focus on visual diversityand labelquality insteadofthe relevance ofthedataforresearch on planning.Often,most frames haveatrivial solution ofextrapolating the historical driving behavior,leading to“blind" drivingpolicies thatobserveonlythevehicle’spasttrajectoryobtaiingstate-of-the-artperformance56,32,16].(2)Dueto thefactthatdrivingisaninherentlymultifacetedtaskwhere thealgorithmmustcoordinateseveraldesired propertiessuch as safety, comfort,and progress, the evaluation metric must also balance

遗憾的是，驾驶性能的评估极具挑战性，而当今最广泛使用的基准在多个方面仍存在不足：(1)所使用的数据集（如nuScenes [5]）是为目标检测等感知任务创建的。因此，它们关注的是视觉多样性和标注质量，而不是数据对于规划研究的相关性。通常，大多数帧的解法都很简单，只需对历史驾驶行为进行外推，这导致仅观察车辆过去轨迹的“盲”驾驶策略也能取得最先进的性能[56,32,16]。(2)由于驾驶本质上是一项多方面的任务，算法必须协调安全性、舒适性和通行进度等多种期望属性，因此评估指标也必须进行平衡

Figure1: Figure1: NAVSIM.Traditional metrics suchas the average displacement eror (ADE)overlook the multi-modality of driving.They penalizetrajectories thatdeviate fromarecordedhuman drivinglog,even ifsuchatrajectory is safe.Our benchmark evaluates trajectory outputs of sensor-based driving policies with simulation-based metrics,considering collisions and map compliance.

图1：图1：NAVSIM。传统指标（如平均位移误差（ADE））忽略了驾驶的多模态性。它们会惩罚偏离已记录人类驾驶日志的轨迹，即使这样的轨迹是安全的。我们的基准使用基于仿真的指标评估基于传感器的驾驶策略输出的轨迹，同时考虑碰撞和地图合规性。

potentiallyconflictingobjectives.However,asshown inFig.1,existing metricssuchas theaverage displacementerror (ADE)betweenapredictedandrecorded human trajectoryoftenmisrepresenttherelativeacuracyoftrajectories.(3)Since driving involves interactions among multiple agents, evaluation must ideally be interactive,e.g., in simulation. Unfortunately,existing simulators with synthetic sensordata exhibitasignificantdomaingap toreal-world driving. (4) Besides,the lack ofastandardized evaluation setup hasledtosubtle inconsistencies between metrics in existing work, leading to unfaircomparisonsand inaccurateconclusions [5o,32].Colectively,theseproblems hinder progressin the development of AVs,emphasizing the need for more principled benchmarks.

潜在相互冲突的目标。然而，如图1所示，现有指标（例如预测轨迹与记录的人类轨迹之间的平均位移误差（ADE））往往会错误表征轨迹的相对准确性。(3)由于驾驶涉及多个智能体之间的交互，评估理想情况下必须是交互式的，例如在仿真中进行。不幸的是，现有使用合成传感器数据的模拟器与真实世界驾驶之间存在显著的域间差距。(4)此外，缺乏标准化的评估设置导致现有工作中的指标之间存在细微不一致，从而造成不公平的比较和不准确的结论[50,32]。总的来说，这些问题阻碍了AV 的发展进程，凸显了对更具原则性的基准的需求。

Inthis work,wetake steps towards aleviating these issues.First,we proposeastrategy forsampling interesting driving scenariosandapplyit tothelargestpublicly-availabledrivingdataset[26].Weobtain,forthefirst time,overook challenging real-worlddriving scenarios fortrainingand evaluatingsensor-based driving policies.Weshowthatinthese scenarios,“blind”driving policiesfail tocompetewith more principledsensor-based policies.Second,wedrawinspiration fromtheliteratureofruebasedplaningforAVs41,8,39,16]toidentifyasetofdiverse,eficient,andprincipledmetrics thatcovr multiplefacetsoftheautonomous driving task.Third,wecircumvent theneedforinaccuratesensorsimulation with domain gaps bysimplifying our simulation toanon-reactiveone. Given an observed real-worldsensorinput,the agent undertestcommits toasetofactions foraspecifictimehorizon.Further,these actionsare assumedtonotafectthefuture behaviorofotheragents intescene.Underthissetting,itisposible tosimulate theexpectedmotionofallagentsoverthis timehorizon in a simplified bird’s-eye-view (BEV) abstraction ofthe scene,and incorporate metrics that involve interactions,as weobserve inFig.1.Empirically,we demonstrate thatourselected metricsare well-correlated tothe outcomes ofclosed-loop simulations.Finaly,weestablishanoficial evaluationserverontheopen-source HuggingFace platform,whichis free,hasalowmaintenance overhead,and enables future scaling tomorechallenging datasetsand metrics.

在这项工作中，我们朝着缓解这些问题迈出了几步。首先，我们提出了一种用于采样有趣驾驶场景的策略，并将其应用于最大的公开可用驾驶数据集[26]。我们首次获得了超过100k个具有挑战性的真实世界驾驶场景，用于训练和评估基于传感器的驾驶策略。我们表明，在这些场景中，“blind”驾驶策略无法与更具原则性的基于传感器的策略竞争。其次，我们从AV的基于规则的规划文献[41,18,39,16]中汲取灵感，识别出一组多样、高效且具有原则性的指标，这些指标覆盖了自动驾驶任务的多个方面。第三，我们通过将仿真简化为非反应式仿真，避免了带有域差距的不准确传感器仿真的需求。给定观测到的真实世界传感器输入，待测试的智能体需要在特定时间范围内提交一组动作。此外，这些动作被假定不会影响场景中其他智能体的未来行为。在这种设置下，可以在场景的简化鸟瞰图（BEV）抽象中模拟所有参与体在该时间范围内的预期运动，并纳入涉及交互的指标，如图1所示。通过实证，我们证明了所选指标与闭环仿真的结果具有良好的相关性。最后，我们在开源的 HuggingFace平台上建立了一个官方评测服务器，该平台免费、维护开销低，并支持未来扩展到更具挑战性的数据集和指标。

Wecombine these ideas to propose NAVSIM,a comprehensive toolfor AVdata curation,simulation,and benchmarking. Weinstantiate standardized training and evaluation splits for NAVSIM with the OpenScene dataset [15], though our framework canbeextended tootherdatasets.With these splits,we presentadetailed analysis of popular end-to-end driving models previouslybenchmarked either exclusively on CARLA[17]or nuScenes [5], providing the first direct comparison between these familiesof approaches inanindependent evaluationseting.Interestingly,we findthat the performances of the best methods developed in both setings are similar,despite avast diffrence incomputational requirements fortheir training.Finally,wereviewtheinsights gainedthroughthe2024NAVSIMchallenge1,hostedin conjunctionwith the CVPR 2o24 WorkshoponFoundation Models forAutonomous Systems.Forthechallenge,143 teams from13 countries developed diverse methods thatcompeted on the proposed benchmark.The top methods ranged from multi-billionparametervisionlanguagemodels453,5,58]tomoreefficientandrecentlyoverlookedapproacesbased ontrajectorysamplingandscoring[4o,2o,1o],demonstrating theremarkableabilityofthebroadercommunitytoadvance AV research when provided with the right tools.

我们结合这些思想，提出了NAVSIM- 一个用于AV数据整理、仿真和基准测试的综合工具。我们基于OpenScene数据集[15]为NAVSIM 构建了标准化的训练和评估划分，尽管我们的框架也可以扩展到其他数据集。基于这些划分，我们对先前仅在CARLA[17]或nuScenes [5]上进行基准测试的流行端到端驾驶模型进行了详细分析，在独立评估环境中首次对这些方法类别进行了直接比较。有趣的是，我们发现，尽管训练所需的计算资源存在巨大差异，但在这两种环境中开发的最佳方法的性能相近。最后，我们回顾了通过2024NAVSIMchallenge1所获得的见解；该挑战与CVPR 2024Workshop onFoundation Models for Autonomous Systems联合举办。针对该挑战，来自13个国家的143支团队开发了多样化的方法，并在所提出的基准上展开竞争。排名靠前的方法涵盖了从多十亿Parameter的视觉语言模型[45,33,53,58]到基于轨迹采样与评分、更加高效且近期被忽视的方法[40,20,10]，这表明当提供合适的工具时，更广泛的社区能够显著推动 AV研究的发展。

Contributions.(1)WebuildNAVsIM,aframework fornon-reactiveAVsimulation,with standardized protocols fortraining andtesting,datacurationtools ensuringbroadaccessbility,andanoficialpublicevaluationserverusedfortheinaugural NAVSIMchallnge. (2)We developconfigurable simulation-based metrics that are wellsuited for evaluatingsensor-based motionplanning.(3)We reimplementacollction ofend-to-end approaches forNAVSIM including TransFuser,UniAD, and PARA-Drive, showcasing the surprising potential of simple models in our challenging scenarios.

贡献。（1）我们构建了NAVSIM，一个用于非反应式AV仿真的框架，具备标准化的训练与测试协议、确保广泛可访问性的数据整理工具，以及用于首届NAVSIM挑战赛的官方公开评测服务器。（2）我们开发了可配置的基于仿真的指标，非常适合用于评估基于传感器的运动规划。（3）我们为NAVSIM 重新实现了一系列端到端方法，包括 TransFuser、UniAD 和PARA-Drive，展示了简单模型在我们具有挑战性的场景中令人惊讶的潜力。

2 Related Work

2相关工作

End-to-End Driving.End-to-end drivingstreamlines theentirestack from perceptionto planning intoasingleoptimizable network.This eliminates the need for manuallydesigning intermediate representations.Following pioneering work[35,4, 27],adiverselandscapeofend-to-end models has emerged.Forinstance,anextensivebodyofend-to-endapproaches focusesonclosed-loopsimulators,utilizing single-framecameras,LiDAR pointclouds,oracombinationofboth forexpert imitation[7,11,36,8,52,43,44,24,57,22].rerecentlyevelpnged-tdmodelsoen-lobrs has gained traction[20,21,25,5432,5o].Ourwork introducesanewevaluationscheme withwhich wecompareend-to-end

models from both communities.

端到端驾驶。端到端驾驶将从感知到规划的整个栈整合为一个可优化的单一网络。这消除了手动设计中间表示的需求。继开创性工作[35,4,27]之后，端到端模型已形成多样化的发展格局。例如，大量端到端方法聚焦于闭环模拟器，利用单帧摄像头、LiDAR 点云或二者的组合进行专家模仿[7,11,36,8,52,43,44,12,24,57,22]。近来，在开环基准上开发端到端模型也日益受到关注[20,21,25,54,32,50]。我们的工作引入了一种新的评估方案，用以比较来自这两个社区的端到端模型。

Closed-Loop Benchmarking with Simulation. Driving simulators allow us to evaluateautonomous systems in a closed-loop manerandcolletdownstream driving statistics,including collsionrates,traffc-rulecompliance,orcomfort.Abroad bodyof research conducts evaluations in simulators,such as CARLA[17]orMetadrive[29]with sensor simulation,or nuPlan[26]and Waymax[19]fordata-drivensimulation.Unfortunately,ensuringrealismwhensimulating traficbehavior or sensor data remains achalengingtask.To simulatecamera or LiDAR sensors, most established simulators rely on graphics-basedrendering methods,eading toan inherent domaingapin terms ofvisual fidelityandsensorcharacteristics. Data-driven simulators for motion planning incorporate traffc recordings but do notsupport image or LiDAR-based methods[2619,13].Data-rivensensorsimulationleveragesandadaptsreal-worldsensordata tocreatenewsiulations wherethe vehicle may move differently,buttherenderingqualityofexisting tools issubpar[1,2,49].Further,while promising image[47]orLiDAR[34]synthesisapproaches exist,eficientlysimulating sensors entirely fromdataremains anopenproblem.Inthis work,we provideanapproach fortheevaluationofrealsensordata withsimulation-based metrics by making a simplifyingassumption that the agent and environmentdonotinfluence each other over a short simulation horizon.Despitethis strong assumption,whenbenchmarkingonrealdata,NAVsIMbeterreflects planning performance than established evaluation protocols,as demonstrated through our systematic experimental analysis.

使用仿真的闭环基准测试。驾驶模拟器使我们能够以闭环方式评估自主系统，并收集下游驾驶统计数据，包括碰撞率、交通规则遵守情况或舒适性。大量研究在模拟器中开展评估，例如带有传感器仿真的CARLA[17]或Metadrive [29]，以及用于数据驱动仿真的nuPlan [26]和Waymax[19]。遗憾的是，在模拟交通行为或传感器数据时确保真实性仍然是一项具有挑战性的任务。为了模拟camera 或 LiDAR 传感器，大多数成熟的模拟器依赖基于图形的渲染方法，这在视觉保真度和传感器特性方面导致了固有的领域差距。用于运动规划的数据驱动模拟器整合了交通录制数据，但不支持基于图像或LiDAR 的方法[26,19,13]。数据驱动的传感器仿真利用并调整真实世界的传感器数据来创建新的仿真，其中车辆的运动方式可能不同，但现有工具的渲染质量较差[1,2,49]。此外，尽管已经存在很有前景的图像[47]或 LiDAR [34]合成方法，但仅凭数据高效地完整模拟传感器仍然是一个开放问题。在这项工作中，我们通过作出一个简化假设——即在较短的仿真时域内，智能体与环境彼此不发生影响——提出了一种利用基于仿真的指标对真实传感器数据进行评估的方法。尽管这一假设较强，但我们系统性的实验分析表明，在真实数据上进行基准测试时，NAVSIM 比已有的评估协议更能反映规划性能。

Open-Loop Benchmarking with Displacement Errors.Open-loop evaluation protocols commonly measure displacement errors between trajectoriesofarecorded expert (i.e.,ofahumandriver)andamotion planner.However,severalisses concerning evaluationwithdisplacement errorshavesurfacedrecently,particularlyontheuScenesdataset[5].Giventhat nuScenes does not provide standardized planning metrics,priorworkreliedonindependent implementations,which led to inconsistencies whenreporting orcomparing results[5o,32].Next,most planning models innuScenes receive thehuman trajectory endpointasadiscretedirectioncommand2o,21,25,325o],therebyleakingground-truthinformationinto inputs.Moreover, about75%7 5 \%75%of the scenarios in nuScenes involve trivial straight driving [32],leading to simple solutions when extrapolating the ego-motion.For instance, AD-MLP demonstrates that an MLPon the kinematic ego status (ignoring perceptioncompletely)canachieve state-of-the-artdisplacementerors[56].Suchblindagentsareundeniably dangerous,which highlightsabroaderconcern:displacement metricsare notcorrelated toclosed-loop driving[14,17,3 16].Inthis work, weaddressprevalent issesof nuScenesand proposeastandardizeddriving benchmark withchaenging scenariosand an offcial evaluationserver.We derivea navigation goal from thelane graph insteadof thehuman trajectory to prevent labelleakage,and propose principled simulation-based metrics as an alternative to displacement errors.

使用位移误差的开环基准测试。开环评估协议通常测量记录的专家（即人类驾驶员）与运动规划器之间轨迹的位移误差。然而，近年来，关于使用位移误差进行评估的若干问题逐渐显现，尤其是在nuScenes 数据集上[5]。鉴于nuScenes 未提供标准化的规划指标，先前工作依赖于各自独立的实现，这导致在报告或比较结果时出现不一致 [50,32]。其次，nuScenes 中的大多数规划模型将人类轨迹终点作为离散方向指令接收 [20,21,25,32,50]，从而将 ground-truth 信息泄露到输入中。此外，nuScenes 中约有75%7 5 \%75%的场景涉及简单的直线行驶[32]，这使得在外推自车运动时存在简单解。例如，AD-MLP表明，仅基于运动学自车状态的MLP（完全忽略感知）即可实现最先进的位移误差表现[56]。这类盲目智能体无疑是危险的，这也凸显了一个更广泛的问题：位移指标与闭环驾驶并不相关[14,17,3,16]。在这项工作中，我们解决了 nuScenes 中普遍存在的问题，并提出了一个具有挑战性场景和官方评测服务器的标准化驾驶基准。我们从车道图而不是人类轨迹中推导导航目标，以防止标签泄漏，并提出了基于模拟的原则性指标，作为位移误差的替代方案。

3 NAVSIM: Non-Reactive Autonomous Vehicle Simulation

3NAVSIM：非反应式自动驾驶车辆仿真

NAVSIM combines the ease of use ofopen-loop benchmarks such as nuScenes [5] with metrics based on closed-loop simulatorssuchas nuPlan26].Inthefollowing,wegiveadetailedintroductiontothetaskand metrics thatdrivingagents arechallenged with inNAVSIM.Subsequently,we proposeafltering methodtoobtain standardizedtrainandtest splits

covering challenging scenes.

NAVSIM结合了像nuScenes [5]这样的开环基准的易用性，以及像nuPlan [26]这样的闭环模拟器所采用的指标。下面，我们将详细介绍NAVSIM中驾驶智能体所面临的任务和指标。随后，我们提出一种筛选方法，以获得覆盖挑战性场景的标准化训练集和测试集划分。

Task description.DrivingagentsinNAVsIMmustplana trajectorydefinedasasequenceoffuture poses,overahorizonofhhhseconds.Their input contains streams of past frames from onboard sensors,suchas cameras,LiDAR,as well as the vehicle’scurrentspeed,acceleration,ndnavigationgoal,jntlytermedtheegostatus.Forompatibilitywith priorwork [20,21,255ol,we provide the navigation goal as aone-hotvectorwiththree categories:left,straight,orright.

任务描述。NAVSIM中的驾驶智能体必须规划一条轨迹，该轨迹被定义为在hhh秒时间范围内的一系列未来位姿。其输入包含来自车载传感器的过去帧流，例如相机、LiDAR，以及车辆当前的速度、加速度和导航目标，这些信息统称为自车状态。为与先前工作[20,21,25,50]保持兼容，我们将导航目标表示为一个one-hot向量，包含三个类别：左转、直行或右转。

Non-Reactive Simulation.Traditionalclosed-loopbenchmarksnormally infer planners athigh frequencies,e.g.,1oHz[17, 26].However,thisrequires eficientsimulationofallinput modalitiesforthedrivingagent,including high-dimensional sensor streams inthecase ofsensor-based approaches.To sidestep this,thecore idea of NAvSIM is to evaluate driving agents using a non-reactive simulation.This means driving agents are only queried in the initial frame ofeach scene. Afterwards,theplanedtrajectoryiskeptfixedfortheentiretrajectoryduration.Overthisshorthorzon,noenviromental feedback is provided tothedriving agent,andtheNAvSIMevaluation is purelybasedonthe initialreal-world sensor sample.This makes theagent’s task more chalenging,limiting simulations toshort horizons.We selecta horizonofh=4h = 4h=4seconds,which hasbeenshown in prior work tobeadequate forclosed-loop planning[16].Despite this imitation,nonreactive simulationoffersakeyadvantage: unlike traditional open-loop benchmarks,which mainlycompare theplanned trajectorytothehumandriver’s tajectoryinasimilarsetting,itenablestheuseofsimulationoutcomestocomputemetrics reflecting safety,comfort,andprogress.AnLQRcontroller28]isapliedateachsimulationiteration tocalculatesteering and accelerationvalues,andakinematic bicycle model[37]propagates the ego vehicle.We execute this pipelineat1oHz over the 4strajectory horizon.In Sec.4.1,we showthatdespite our simplifying assumption,our evaluation results ina much better alignment with closed-loop metrics than traditional open-loop metrics achieve.

非反应式仿真。传统的闭环基准通常以高频率推理规划器，例如 $ 1 0 ~ \mathsf { H z }$ [17,26]。然而，这要求对驾驶智能体的所有输入模态进行高效仿真，包括在基于传感器的方法中高维传感器流的仿真。为规避这一点，NAVSIM的核心思想是使用非反应式仿真来评估驾驶智能体。这意味着仅在每个场景的初始帧查询驾驶智能体。之后，规划轨迹在整个轨迹持续时间内保持固定。在这一较短的时域内，不向驾驶智能体提供任何环境反馈，NAVSIM的评估完全基于初始的真实世界传感器样本。这使智能体的任务更具挑战性，并将仿真限制在较短的时域内。我们选择了h=4h = 4h=4秒的时域，先前工作已表明这对于闭环规划是足够的[16]。尽管存在这一限制，非反应式仿真仍提供了一个关键优势：不同于传统的开环基准测试——后者主要是在相似场景中将规划轨迹与人类驾驶员的轨迹进行比较——它能够利用仿真结果来计算反映安全性、舒适性和推进进度的指标。在每次仿真迭代中，都会应用一个LQR controller [28]来计算转向和加速度值，并使用一个 kinematic bicycle model [37]来传播 ego vehicle 的状态。我们以 $ 1 0 ~ \mathsf { H z }$ 的频率在4s 的轨迹预测时域上执行该流程。在第4.1节中，我们表明，尽管采用了这一简化假设，我们的评估结果与闭环指标之间实现了比传统开环指标更好的Alignment。

PDM Score.NAVsIMscores drivingagents intwo steps.First,subscores inrange [O,1]arecomputedafter simulation. Second, these subscores are aggregated into the PDM Score(PDMS)∈[0,1].\mathrm { ( P D M S ) } \in \left[ 0 , 1 \right] .(PDMS)∈[0,1].It is named after the Predictive Driver Model (PDM)[16],astate-of-the-artrule-based plannerwhichusesthisscoring function toevaluate trajectoryproposals during closed-loop simulation in nuPlan.The metric is alsoan eficientreimplementationof the nuPlan closed-loopscore metric [26]. In NAVSIM,thePDMScanbeadaptedbyadding orremoving subscores,changingaggregationparameters,ormaking subscores morechallenging,e.g.,byadapting theirinternalthresholds.Itiscalculatedperframeand averagedacross frames.In this work, we use the following aggregation of subscores:

PDM Score。NAVSIM 分两步对驾驶智能体进行评分。首先，在仿真结束后计算取值范围为[0,1]的子分数。其次，将这些子分数聚合为PDM Score（PDMS）∈[0,1]\in [ 0 , 1 ]∈[0,1]。该名称来源于Predictive DriverModel_(PDM）[16]，这是一种最先进的基于规则的规划器，在nuPlan的闭环仿真中使用该评分函数对轨迹提案进行评估。该指标也是对nuPlan 闭环评分指标[26]的一种高效重新实现。在NAVSIM中，PDMS可以通过添加或移除子分数、修改聚合参数，或提高子分数的挑战性（例如调整其内部阈值）进行适配。它按每帧计算，并在所有帧上取平均。在本文中，我们使用如下的子分数聚合方式：

Subscores arecategorized bytheir importanceas penaltiesor terms inaweighted average.Apenalty punishes inadmisible behavior such as collisions with a factor<1< 1<1The weighted average aggregates subscores for other objectives such as progressandcomfort.Inthefollowing,webrieflydescribeeach subscore.Moredetailscanbefound inthesupplementary material.

子分数根据其重要性被分类为惩罚项或加权平均中的项。惩罚项通过因子<1< 1<1对诸如碰撞之类的不可接受行为进行惩罚。加权平均则聚合其他目标（如进展和舒适性）的子分数。下面我们将简要描述每个子分数。更多细节可参见补充材料。

Penalties.Avoiding collsionsand staying on theroadisimperativeformotion planning as itensures traffcrulecompliance and the safety ofpedestrians and road users.Thus,failing todrive with nocollsions (NC) with road users (vehicles, pedestrians,and bicycles)or infractions with regard to drivable area compliance (DAC)result in hard penalties ofs^coreNC=0\mathrm { \hat { s } c o r e { _ { N C } } = 0 }

无人驾驶-数据集01：NAVSIM: Data-Driven Non-Reactive Autonomous Vehicle Simulation and Benchmarking

2相关工作

3NAVSIM：非反应式自动驾驶车辆仿真

LazyLLM低代码框架：快速构建多智能体LLM应用的工程实践

【职业发展】程序员成长路径：从初级到架构师的进阶指南

AI驱动BI分析：MCP协议与Metabase助手实战指南

3个关键步骤：用Video2X让你的视频画质重获新生

ARM定时器寄存器CNTEL0ACR详解与应用

ARM MPMC控制器架构与优化配置详解