MES and EAP System Architecture: The Nervous System of a Semiconductor Factory
Anyone who has worked on semiconductor fab automation knows that what makes a fab run is not how advanced any single tool is, but whether the entire system can coordinate hundreds of tools, thousands of lots, and tens of thousands of recipes into a coherent operation. Having spent years in this field, I have seen fabs evolve from manual to semi-automated to fully automated, and I have stepped on plenty of MES and EAP integration pitfalls along the way. Today I want to discuss the “nervous system” of a semiconductor factory — what MES and EAP actually do, how they relate to each other, and where AI should fit in.
1. The System Hierarchy: From ERP to Equipment Controllers
Let me start with the big picture. A semiconductor factory’s information systems are generally organized into four layers, from top to bottom: ERP, MES, EAP, and equipment controllers. Many people confuse the boundaries between these layers, especially MES and EAP, which are frequently conflated.
The ERP layer sits at the top and handles enterprise-level concerns: orders, materials, and financials. In essence, ERP tells the factory “how many wafers to ship this month, what materials to use, and who the customer is” — but it has no interest in the current temperature of an AMAT CVD chamber.
The MES layer is the Manufacturing Execution System and serves as the fab’s central hub. It manages decisions like “which tool should this wafer go to next, what recipe should it run, and where should the results be recorded.” You can think of MES as the factory’s brain — it makes decisions, issues commands, and keeps the historical record.
The EAP layer is the Equipment Automation Program, serving as the translator between MES and the equipment. When MES says “dispatch Lot A to CVD-03 to run recipe XYZ,” the EAP translates this into SECS/GEM messages that the equipment controller can understand. When the tool finishes processing, it is the EAP that collects the data and feeds it back to MES and the database.
Equipment controllers are the lowest layer — the proprietary software systems built into each tool. Every equipment vendor has its own control software — TEL, Lam, AMAT — all different. The EAP’s job is to abstract away these differences and present a uniform interface to MES.
I like to use an analogy for these four layers: ERP is the executive, MES is the production manager, EAP is the shift supervisor, and the equipment controller is the machine operator. The executive sets targets, the production manager creates the schedule, the shift supervisor oversees execution, and the operator does the work. Remove any layer and the whole operation breaks down.
2. Core MES Functions: WIP, Dispatching, Recipe Management, and Lot History
MES handles a great many things, but the four most critical are:
First: WIP tracking (Work-In-Process tracking). A 300mm fab typically has several thousand lots in process simultaneously, and MES must know the real-time location, status, and next destination of every lot. In one project I worked on, the fab had over 1,200 lots online at any given time. Query any lot ID in MES, and within seconds you would know which tool it was on, what its current step was, and how long it had been waiting. This capability sounds simple, but the underlying data model and real-time requirements are demanding.
Second: Dispatching. Which lot has priority? Which tool is idle? Which route will complete fastest? The MES dispatcher module must weigh priorities, equipment status, process constraints, cycle time targets, and more. In real fabs, dispatch rules can be extraordinarily complex: sometimes first-come-first-served, sometimes urgent lots take priority, sometimes considering “this tool just ran Product A, switching to Product B requires qualification — can we queue more Product A instead?” All of this logic lives in the MES dispatch engine.
Third: Recipe management. Every process step in semiconductor manufacturing is defined by a recipe — how many nanometers to etch, what film to deposit, what the temperature, pressure, and gas flows should be. A mature fab may have tens of thousands of recipes. Which product’s which step uses which recipe, recipe version control, golden copy management of recipe parameters — all fall under MES responsibility. You absolutely do not want someone manually changing a parameter on a tool without recording it, only to have an entire batch of wafers scrapped. I have seen it happen.
Fourth: Lot history. From the moment a lot enters the fab until it ships out, every tool it visited, every recipe it ran, every process parameter at each step, and every metrology result is recorded in the lot history. This is critical when problems arise. If a batch of wafers suddenly loses 5 percentage points of yield, you need the lot history to trace back step by step: which step went wrong? Which tool was used? Was that tool’s PM normal that day? Did other lots processed at the same time show similar issues? This traceability is the lifeline of semiconductor manufacturing quality control.
3. The Role of EAP: The Bridge Between Equipment and MES
Now let me discuss EAP. Many people understand EAP simply as “software that talks to equipment.” That is correct but incomplete. EAP stands for Equipment Automation Program, and it does far more than communication.
Think of it this way: MES is the decision layer, determining “what to do”; EAP is the execution layer, responsible for “how to do it.” When MES issues an instruction to “execute Step 5 for Lot B on PVD-07,” the EAP must perform a sequence of actions: first verify that PVD-07 is in a normal state, confirm that the recipe has been downloaded to the tool, verify that Lot B’s previous step has been completed, cross-check recipe parameters against the golden copy, and only then send the start command to the equipment via the SECS/GEM protocol. While the tool is running, the EAP continuously collects process data (temperature, pressure, power, gas flows) and stores it in the database. When processing is complete, the EAP reports the completion signal back to MES.
Each tool has its own EAP instance. A fab with 300 to 500 tools runs 300 to 500 EAP processes. This scale is substantial, making EAP stability and resource consumption a significant engineering concern.
4. Key EAP Capabilities
Specifically, EAP provides several critical capabilities:
SECS/GEM communication. This is the fundamental skill of any EAP. SECS (SEMI Equipment Communications Standard) and GEM (Generic Equipment Model) are the industry-standard protocols for semiconductor equipment communication. Virtually all mainstream tools support SECS/GEM interfaces, and EAP exchanges messages with equipment through this protocol. It sounds simple, but anyone who has done this work knows that each equipment vendor implements SECS/GEM differently — the same message may be parsed completely differently by different tools. EAP engineers spend enormous amounts of time “interfacing with equipment” — a polite term for painstaking tool-by-tool debugging.
Recipe download and verification. The EAP is responsible for downloading the MES-specified recipe to the equipment and performing parameter verification. If MES requests “recipe ABC,” the EAP compares recipe ABC’s parameters against the golden copy in the database; any discrepancy triggers an alarm and blocks the tool from starting. This is called recipe body verification, and it is a critical safeguard against operator errors.
Data collection. While the tool is running, it generates large volumes of process data — trace data. For example, a single etch step might sample RF power, chamber pressure, and gas flows every 100 milliseconds, producing hundreds of thousands of data points per step. The EAP collects this data in real time and stores it in a time-series database or file system. This data feeds downstream FDC (Fault Detection and Classification) and VM (Virtual Metrology) systems.
Interlock. This is the EAP’s most critical safety function. Interlocks mean “prohibiting the tool from executing an operation when certain conditions are not met.” For example, if the previous step’s metrology data was out of spec, the EAP blocks the lot from proceeding to the next step; or if a tool has just completed PM but has not yet run qualification wafers, the EAP prevents production lots from entering. Interlocks are the last line of defense against batch scrap in a fab. I have seen cases where interlock configuration gaps led to losses of hundreds of thousands of dollars — you only need that lesson once.
5. Where Does AI Fit in the Architecture?
This is a question many people ask. Everyone in the semiconductor industry is talking about AI, but which layer of the system architecture does AI actually run on?
The answer is: most AI models run at the EAP layer or on dedicated APC (Advanced Process Control) servers.
Why not in the MES layer? Because MES is a transactional system prioritizing high availability and strong consistency — it is not suited for compute-intensive inference tasks. Moreover, the input data that AI models need — equipment trace data, sensor data, feedforward metrology data — arrives first at the EAP layer, where inference latency is minimized.
Typical AI use cases include: VM (Virtual Metrology) models using trace data to predict wafer metrology results, saving actual measurement time; FDC models monitoring equipment status in real time and raising early warnings when anomalies are detected; and R2R (Run-to-Run) control models that automatically adjust recipe parameters for the next batch based on the previous batch’s results. All of these require inference at the second or even millisecond level, so they must be deployed close to the equipment.
6. Data Flow: From Equipment to AI and Back
The end-to-end data pipeline looks like this:
Equipment –> EAP –> Database –> AI Model –> Control Command –> EAP –> Equipment
Using R2R control as an example: the tool finishes processing a lot, and the EAP collects trace data and stores it in the database. The metrology tool measures the wafers, and results are also stored. The AI model reads trace data and metrology results from the database, calculates recipe adjustments for the next lot. The adjustment commands are sent through MES to the EAP, which modifies the recipe parameters on the tool. The next lot runs with the updated parameters. In a mature system, the entire loop from metrology results to recipe update completes within two to three minutes.
A key question here is: where does the AI model run? If it runs on a remote cloud server, network latency and data transfer become bottlenecks. If it runs on local servers in the fab, compute capacity and operations become concerns. This is why “edge AI” is gaining increasing attention in semiconductor manufacturing — placing AI inference as close to the equipment as possible ensures low latency without adding burden to MES and EAP.
7. Comparison of Major MES/EAP Products
Globally, the MES and EAP markets have long been dominated by a handful of vendors.
On the MES side, IBM’s SiView (later part of Applied Materials’ acquisition) and Camstar (now part of Siemens) are widely used in large fabs. Domestic fabs historically purchased foreign MES systems, but with the trend toward localization in recent years, domestic vendors such as Cimetrix Manufacturing (with Critical Manufacturing as its technology source) are rapidly catching up. Its cmNavigo platform has been deployed in several new domestic fabs, with functionality and stability continuously improving.
On the EAP side, Cimetrix is the name that cannot be avoided. Its CIMConnect and EquipmentConnect products are virtually industry standards, used by a large number of fabs worldwide. Brooks (now Azenta) FabWorks also has a significant customer base. Domestically, some teams are developing EAP products, but frankly they still lag behind Cimetrix, primarily in the accumulated experience of equipment protocol integration — Cimetrix has simply interfaced with far more equipment models than any domestic vendor.
A notable trend is the blurring boundary between EAP and APC. Traditionally, EAP handled communication and basic automation while APC performed advanced control. Increasingly, APC functionality is being pushed down into the EAP layer, with some solutions running lightweight AI models directly within the EAP. The driving logic is to reduce data transfer latency between systems.
8. MST Semiconductor NeuroBox and EAP Collaboration
When we at MST Semiconductor were developing NeuroBox E3200, we initially debated whether to build a completely new EAP replacement or to create an “intelligent extension” for existing EAP. We chose the latter.
The reasoning was practical: the EAP already running in a fab has undergone extensive equipment interfacing and validation. Recklessly replacing it carries too much risk. No customer will swap out their entire EAP simply because your AI features are compelling. So NeuroBox E3200 is positioned as an edge AI box, deployed alongside the existing EAP. It receives equipment data from the EAP through standard interfaces, performs AI inference locally (VM prediction, FDC detection, R2R calculations), and sends control commands back to the EAP for execution.
This architecture offers several advantages. First, it does not intrude on existing MES and EAP systems, minimizing deployment risk. Second, AI inference is completed at the edge, achieving millisecond-level latency. Third, data does not leave the fab, satisfying semiconductor enterprises’ data security requirements. In actual projects, we have found that customers are often less concerned about how precise the model is (accuracy differences across most scenarios are small) and more concerned about “whether connecting this thing will affect the stability of my existing systems.” E3200’s non-intrusive architecture effectively addresses that concern.
From a data flow perspective, E3200 simply adds a branch to the “Equipment -> EAP -> Database” pipeline: the EAP simultaneously pushes a copy of collected trace data to E3200, which runs its models and writes results back to the database or pushes them directly to the EAP. The intrusion on the existing pipeline is minimal.
Ultimately, MES and EAP are the cornerstones of semiconductor factory automation, having operated reliably for decades. AI is not here to replace them, but to work alongside them. Understanding this system architecture is how you find where AI can truly deliver value — rather than making empty proclamations about “AI disrupting manufacturing.”