A plant I worked with wanted anomaly detection across a four-step process. The machines were instrumented, the historian was full, the team was ready. The project stalled on something nobody had budgeted for: step two logged local machine time, step three logged server time, and the two drifted eleven minutes apart. Every cross-step analysis was quietly joining events that never happened together.
The least glamorous fields in your data (timestamp, lot number, batch ID) are the ones that decide whether analysis across machines, shifts and steps is possible at all.
Why join keys beat new sensors
Most root-cause questions are join questions. Did this defect lot run on that material batch? Did the drift start before or after the tool change? Was the night shift's parameter set actually different? None of these need a new sensor. All of them need events that can be matched: one clock, one lot identity, carried through every step.
A plant with mediocre sensors and clean join keys can answer cause questions. A plant with beautiful sensor data and broken keys can only draw charts.
The three usual breaks
Clocks. Machine controllers, MES and quality systems each keep their own time. Drift of minutes is common and invisible until you join across systems. One synchronized time source, plus a habit of checking it, is cheap insurance.
Lot identity. The lot number gets retyped at a manual station, truncated by an interface, or merged when material is combined. Every manual re-entry is a future unjoinable record. Follow one lot physically through the plant and write down every place its identity is touched by hand; that list is your fix backlog.
Batch boundaries. "Batch" means different things to production, QA and the ERP. If the start and end of a batch aren't recorded the same way everywhere, batch-level analysis silently compares different material. Agree on one definition and log the boundaries explicitly.
Treat it as improvement work, not an IT project
The instinct is to commission a data-quality program. The leaner move: fix the keys on the one value stream where you're doing causal work right now, driven by the questions that work is asking. Each unjoinable record is a small defect with a process cause, findable with the same 5-why discipline you'd use on a quality problem.
Tools like Jidokai can flag where joins fail and where identities go missing, which shortens the search. The fixes themselves are process fixes: a scanning step, an interface field, a standard for clock checks. Floor work, not platform work.
Three boring fields. Fix them on one stream and most of your "we'd need better data for that" sentences quietly disappear.