Run-to-Run Control: Practical Guide to R2R in Semiconductor Manufacturing
Key Takeaway
Run-to-run (R2R) control automatically adjusts process recipe parameters between wafer runs to compensate for tool drift, reducing lot-to-lot variation by 40–60%. R2R controllers use EWMA filtering on metrology or VM feedback to update setpoints without engineer intervention. MST NeuroBox E3200S deploys R2R on any SECS/GEM tool in 2 weeks, closing the loop from VM prediction to recipe update automatically.
Introduction: Why Process Drift Is the Silent Yield Killer
Semiconductor manufacturing operates at tolerances measured in nanometers, yet the physical world refuses to hold still. Chamber walls accumulate polymer films run after run. RF match networks age and shift their impedance characteristics. Photoresist batches arrive from suppliers with subtle viscosity differences that alter spin uniformity. Ambient humidity changes between morning and afternoon shifts. Each of these disturbances, individually small, compounds into measurable drift that pushes critical dimensions, film thicknesses, and etch depths away from target.
Without a corrective mechanism, process engineers must rely on scheduled preventive maintenance intervals, manual recipe adjustments triggered by statistical process control (SPC) alarms, and engineering judgment accumulated over years of tool ownership. This reactive model is expensive, slow, and fundamentally limited by human bandwidth. A single process engineer may own dozens of tools across multiple process steps, making continuous manual oversight impossible during high-volume production.
Run-to-run (R2R) control solves this problem by closing a feedback loop between metrology or virtual metrology measurements and recipe setpoints, automatically compensating for drift before it reaches a magnitude that triggers an SPC alarm or causes a yield excursion. This guide covers the complete engineering picture: the mathematics of R2R controllers, tuning methodology, integration architecture, and the capabilities of the MST NeuroBox E3200S platform that brings production-ready R2R to any fab.
What Is Run-to-Run Control?
Run-to-run control is an advanced process control (APC) strategy that updates recipe parameters between discrete processing events—typically between wafer lots or individual wafers—based on measured process outputs from previous runs. Unlike real-time feedback control, which adjusts actuators during a single process event (for example, adjusting RF power while a wafer is being etched), R2R control operates on the inter-run timescale.
The core idea is straightforward. After processing lot N, a metrology tool measures the output—say, gate oxide thickness or CMP removal amount. The R2R controller computes the deviation from target, applies a filtering algorithm to separate real drift from measurement noise, and calculates a new recipe setpoint—for example, an adjusted deposition time or polish pressure—to be applied when lot N+1 enters the tool. The engineer sets the control objective and tuning parameters once during commissioning; the controller then runs autonomously, lot after lot, shift after shift.
This approach directly addresses the two dominant sources of process variation in high-volume manufacturing: systematic drift caused by tool aging and consumable degradation, and run-to-run noise caused by incoming material variation and environmental disturbances. R2R control cannot eliminate all sources of variation, but studies across CMP, CVD, lithography, and etch processes consistently show lot-to-lot Cpk improvements of 40 to 60 percent following R2R deployment.
Why Wafers Drift Without R2R Control
Understanding the physical mechanisms driving drift is essential for designing an effective R2R strategy. The most common drift sources in typical unit processes are:
- Chamber conditioning effects: In CVD and ALD processes, film deposits on chamber walls between wet cleans. This changes the effective surface area available for gas-phase reactions, altering deposition rate even when gas flows and temperature setpoints are unchanged. Plasma etch chambers accumulate polymer on quartz components, shifting etch rate and selectivity progressively across the seasoning cycle.
- Consumable wear: CMP polish pads lose surface texture with usage hours and number of wafers processed. As the pad glazes, its ability to transport slurry to the wafer-pad interface degrades, reducing removal rate. Conditioning operations partially restore the surface, but the restoration is never perfect, creating a sawtooth removal rate profile that requires continuous compensation.
- RF component aging: Match network capacitors and transmission line connectors drift in their impedance characteristics over months of operation. The delivered power to a plasma chamber at a given RF generator setpoint can vary by several percent across a maintenance cycle.
- Incoming material variation: Photoresist spin-on thickness depends on resist viscosity, which varies slightly between batches. Pre-diffusion oxide thickness entering an implant anneal step carries run-to-run variation from the upstream furnace. These upstream variations propagate to downstream metrology unless the receiving process actively compensates.
- Environmental disturbances: Clean room temperature and humidity affect resist bake conditions, develop rates, and spin uniformity. These disturbances have both slow seasonal components and faster diurnal cycles that require continuous correction.
EWMA Controller Mathematics
The exponentially weighted moving average (EWMA) filter is the most widely deployed R2R controller in semiconductor manufacturing because it provides a mathematically sound balance between disturbance rejection and noise sensitivity with a single tunable parameter.
The EWMA update equation for the estimated process disturbance d after run k is:
d(k) = λ · [y(k) − ŷ(k)] + (1 − λ) · d(k−1)
Where y(k) is the measured output at run k, ŷ(k) is the predicted output based on the nominal process model, and λ is the EWMA gain, a value between 0 and 1. The new recipe setpoint u(k+1) is then computed as:
u(k+1) = u_nominal − d(k) / G
Where G is the process gain—the sensitivity of the output to the manipulated recipe parameter, in units such as angstroms per second of deposition time or nanometers per watt of RF power.
The EWMA gain λ controls the controller’s memory. A value close to 1 makes the controller aggressive: it weights recent measurements heavily and responds quickly to step changes in disturbance, but it also amplifies measurement noise into recipe corrections. A value close to 0 makes the controller conservative: it responds slowly to real drift but is largely immune to measurement noise. The optimal choice depends on the ratio of the disturbance magnitude to the metrology noise, a tradeoff that can be formalized using minimum variance analysis.
For most CMP and CVD applications in production, λ values between 0.3 and 0.6 provide a good starting point. Etch processes with high inter-run variability often require λ values closer to 0.5 to 0.7, while relatively stable furnace processes may use values as low as 0.2.
R2R Control vs. SPC: Complementary, Not Competing
A common misconception is that R2R control replaces SPC monitoring. In reality, the two strategies operate at different timescales and serve different purposes within a complete process control architecture.
| Dimension | SPC | R2R Control |
|---|---|---|
| Primary purpose | Detect out-of-control signals and trigger investigation | Continuously compensate for known drift patterns |
| Action taken | Alarm and hold for engineer review | Automatic recipe update before next run |
| Responds to | Special cause variation (shifts, trends, outliers) | Common cause drift within the control envelope |
| Human involvement | Required for every alarm response | Minimal after initial commissioning |
| Effect on process capability | Does not actively reduce variation | Directly improves Cpk by reducing systematic offset |
In a well-designed APC architecture, R2R control operates continuously to compensate for predictable drift, while SPC charts monitor the residual variation after R2R correction. Because R2R removes the systematic drift component, the remaining SPC signal is predominantly noise and true special cause events—making SPC alarms both less frequent and more meaningful when they do occur.
Input Sources: Physical Metrology vs. Virtual Metrology
The quality and latency of the feedback signal fundamentally determine what an R2R controller can achieve. There are two primary input sources for R2R: physical metrology and virtual metrology (VM).
Physical Metrology Feedback
Physical metrology—ellipsometry for film thickness, CD-SEM for critical dimensions, four-point probe for sheet resistance—provides the highest accuracy measurements available. However, physical metrology has two significant limitations for R2R control. First, it is typically sampled: not every wafer or every lot receives a measurement, so the controller must operate on sparse, delayed feedback. Second, measurement latency can range from minutes to hours depending on the metrology tool queue, meaning the R2R correction is applied several runs after the drift that caused it.
Despite these limitations, physical metrology remains the gold standard for long-term R2R bias correction and for controller validation. Most mature R2R deployments combine physical metrology for periodic recalibration with virtual metrology for run-by-run updates.
Virtual Metrology Feedback
Virtual metrology predicts process outputs from in-situ sensor data—chamber pressure traces, RF power delivered, OES spectral signatures, temperature profiles—using machine learning models trained on historical run data. VM provides a predicted output value for every wafer, with latency measured in seconds rather than hours, enabling single-wafer R2R control without the cost and throughput impact of 100 percent physical metrology.
The tradeoff is prediction accuracy. VM models achieve mean absolute errors of 2 to 5 percent of the process window for well-characterized processes, which is sufficient to capture the major drift trends that R2R control targets. For critical applications where absolute accuracy matters most, VM predictions can be periodically recalibrated against physical metrology measurements through an automatic bias correction mechanism.
Controller Tuning: Gain, Dead Band, and Correction Limits
Deploying an R2R controller in production requires careful tuning of three parameters beyond the EWMA gain: the correction dead band, the per-run correction limit, and the cumulative correction limit.
Dead Band
The dead band defines a range around the current recipe setpoint within which no correction is applied. If the controller’s estimated disturbance would call for a recipe change smaller than the dead band threshold, the setpoint remains unchanged. Dead bands prevent the controller from making trivially small recipe adjustments in response to measurement noise, which would increase recipe version proliferation in the MES without providing meaningful process improvement. Typical dead band values are set at 1 to 2 times the metrology tool repeatability (1-sigma).
Per-Run Correction Limit
The per-run correction limit caps the maximum recipe change that the controller can apply in a single step. This guard prevents an aggressive correction from overshooting the target in response to a large measurement outlier or a sudden large disturbance. Per-run limits are typically set at 30 to 50 percent of the total allowable recipe excursion window.
Cumulative Correction Limit
The cumulative correction limit defines the maximum total deviation of the recipe from its nominal value. When the controller reaches this limit, it generates an alert for engineering review rather than continuing to apply corrections. A recipe that has drifted to its cumulative limit almost always indicates a hardware maintenance event—pad replacement, chamber clean, RF component service—rather than a disturbance that further recipe adjustment can compensate. Treating cumulative limit alerts as maintenance triggers rather than alarms to be acknowledged and ignored is a key discipline in mature APC programs.
Cascade R2R for Multi-Step Processes
Modern CMOS and 3D NAND processes involve long chains of interdependent unit operations where the output of one step becomes the input for the next. A lithography exposure step, for example, depends on the incoming resist thickness and pre-bake uniformity, which in turn depend on the resist spin step. The subsequent etch step then depends on both the patterned CD from lithography and the incoming film thickness from deposition.
Cascade R2R control addresses this complexity by propagating upstream metrology measurements forward as feedforward inputs to downstream controllers, in addition to the standard feedback correction from the step’s own output measurements.
In a litho-etch cascade, the etch R2R controller receives not only the post-etch CD measurement from the previous lot but also the pre-etch resist CD measurement from the current lot—enabling it to proactively correct for incoming lithography variation before the etch step is run. Studies on 28 nm gate CD control show that cascade R2R with litho feedforward achieves a further 15 to 25 percent reduction in final CD variation compared to standalone etch R2R with feedback only.
Integration with MES and SECS/GEM
An R2R controller that cannot reliably deliver updated recipes to tools and receive metrology data from measurement systems is purely theoretical. Production deployment requires robust integration with the factory’s manufacturing execution system (MES) and the SECS/GEM communication layer that connects the host computer to process tools.
The standard integration architecture involves four components: the MES as the system of record for recipe versions and lot routing, the SECS/GEM interface for tool communication (recipe download and equipment data collection), a metrology data management system that receives measurement results and associates them with specific lots and tools, and the R2R controller itself, which subscribes to metrology events, computes new setpoints, and pushes updated recipes to the MES.
Latency in this chain directly affects R2R effectiveness. A controller that takes 30 minutes to receive a metrology result, compute a correction, and confirm recipe delivery to the tool may find that several lots have already been processed at the uncorrected setpoint before the correction takes effect. Minimizing integration latency—through direct MES API calls rather than batch file transfers, and through co-location of the controller with the MES rather than routing through multiple middleware layers—is as important as the controller algorithm itself.
NeuroBox E3200S R2R Module
MST NeuroBox E3200S is a production-ready APC platform engineered specifically for the integration challenges of high-volume semiconductor manufacturing. The R2R module within E3200S combines EWMA and model-based controllers with an integrated virtual metrology engine, enabling closed-loop control from in-situ sensor data without requiring a dedicated metrology tool in the feedback loop for every wafer.
Key capabilities of the E3200S R2R module include:
- SECS/GEM native integration: E3200S connects directly to SECS/GEM-compliant tools via HSMS, subscribing to equipment status and data collection reports without requiring MES middleware for the data path. This reduces feedback latency from minutes to under 30 seconds for most tool configurations.
- VM-to-R2R automatic loop closure: The virtual metrology engine inside E3200S generates a predicted output value within 10 seconds of run completion. This prediction is automatically fed to the R2R controller, which computes and delivers the updated recipe to the tool before the next wafer or lot begins processing. No engineer action is required in the loop.
- Multi-controller cascade support: E3200S supports up to 16 controllers in a cascade chain, with automatic feedforward signal routing between upstream and downstream steps. The cascade configuration is defined in a graphical editor with no code required.
- Correction guard rails: Dead band, per-run limit, and cumulative limit parameters are enforced at the platform level with full audit logging. All recipe changes are recorded with the controller state, measurement inputs, and computed correction, enabling rapid root cause analysis when process excursions occur.
- MES recipe version management: E3200S integrates with major MES platforms including Camstar, Workstream, and Factory Works to create properly versioned recipe instances for each correction, maintaining full traceability between recipe versions and the metrology data that drove each change.
Deployment timeline for E3200S R2R on a new process step is typically 10 to 14 days from tool connection to first autonomous correction cycle, including model training on historical run data, integration testing with the MES, and controller commissioning with production lots.
Real Results: What R2R Delivers in Production
Production R2R deployments using NeuroBox E3200S have delivered the following results across multiple customer fabs:
| Process | Controlled Parameter | Before R2R (3-sigma) | After R2R (3-sigma) | Cpk Improvement |
|---|---|---|---|---|
| CMP (STI) | Post-polish oxide thickness | ±18 Å | ±8 Å | +55% |
| PECVD (gate oxide) | Film thickness | ±12 Å | ±5 Å | +58% |
| Etch (polysilicon gate) | Post-etch CD | ±4.2 nm | ±2.1 nm | +50% |
| Implant anneal | Sheet resistance | ±3.8 Ω/sq | ±1.9 Ω/sq | +48% |
Beyond the direct process capability improvements, customers have reported a 30 to 40 percent reduction in unplanned maintenance events triggered by process drift, because the R2R cumulative correction limits provide an early warning signal for hardware degradation before it causes a yield excursion. Reduction in the number of SPC alarms requiring engineer response—because R2R absorbs the systematic drift component that would otherwise trigger control chart signals—typically ranges from 50 to 70 percent in the first three months after deployment.
Getting Started with R2R on NeuroBox E3200S
Implementing R2R control on a new process step begins with three assessments: identifying the dominant drift mechanisms and their timescales, confirming the availability and latency of metrology or VM feedback, and mapping the SECS/GEM data items available from the target tool.
MST provides a structured APC readiness assessment that covers all three areas in a two-day on-site engagement. The assessment output is a deployment plan that specifies controller architecture, VM model training requirements, MES integration scope, and expected process capability improvement based on the fab’s historical metrology data.
For fabs that have not previously deployed APC, NeuroBox E3200S includes a built-in data collection mode that passively records all tool sensor and metrology data for 30 days before controller activation, building the historical dataset needed to train VM models and characterize process gain without requiring any changes to production operation during the observation period.
Conclusion
Run-to-run control is the foundational layer of advanced process control in any competitive semiconductor fab. By automatically compensating for tool drift, consumable aging, and incoming material variation between processing runs, R2R controllers deliver consistent improvements in process capability that no amount of manual engineering intervention can match at scale. The combination of EWMA feedback control, virtual metrology for dense run-by-run feedback, and cascade architecture for multi-step processes provides a complete solution for maintaining critical dimensions and film properties within tolerance across an entire technology node’s production life.
MST NeuroBox E3200S makes this capability accessible on any SECS/GEM-compliant tool with a two-week deployment timeline and a no-code configuration interface, removing the barriers that have historically limited R2R adoption to the largest and most sophisticated global fabs. Contact MST to schedule an APC readiness assessment and identify the highest-value R2R opportunities in your process flow.
Deploy real-time AI process control with sub-50ms latency.