Enterprises trying to use the internet of things already face a deluge of data and a dizzying array of ways to analyze it. But what happens if the information is wrong?
Bad data is common in IoT, and though it’s hard to get an estimate of how much information streaming in from connected devices can’t be used, a lot of people are thinking about the problem.
About 40 percent of all data from the edges of IoT networks is “spurious,” says Harel Kodesh, vice president of GE’s Predix software business and CTO of GE Digital. Much of that data isn’t wrong, just useless: duplicate information that employees accidently uploaded twice, or repetitive messages that idle machines send automatically.
In addition, building a new IoT platform on top of old industrial reporting systems can cause problems because the legacy tools format data in their own way, Kodesh said. “You’re not taking the real, elemental data, you’re taking some translation of that.”
But sometimes devices just generate stuff that’s false or misleading.
Measuring the wrong thing
For example, if a worm crawls over a temperature and humidity sensor in a field, the farmer will get a reading on how warm and moist the worm is, which doesn’t help to run a farm. If a sensor gets covered with dirt or factory grime, or if it’s damaged by vandals, that can tweak the data it produces, too.
The harsher the surrounding conditions and the more isolated the device, the worse the bad-data problem is likely to be. In addition to agriculture, industries like oil and gas and energy distribution face this. But it’s not just far-flung sensors that have problems. Even in a hospital, a blood oxygen sensor clamped on a patient’s finger can start giving bad data if it gets bumped into the wrong position.
On top of that, some IoT devices malfunction on their own and start spewing out bad data, or stop reporting at all. In many other cases, human error is the culprit: The wrong settings mess up what the device generates.
One way to cut down on bad data is to make sure the gear is set correctly.
John Deere equips its giant farm tools with sensors that detect whether the machines are working right. The company’s ExactEmerge planter, which rolls behind a tractor planting seeds across a field, has three sensors per row of crops to detect how many seeds are being planted and at what rate. At least once a year, before planting time, the farmer or a Deere dealer will manually calibrate those sensors so they’re accurate, said Lane Arthur, Deere’s director of digital solutions.
More is better
But many IoT sensors are too hard to reach for regular calibration and maintenance. In those cases, redundancy may be the answer, though it’s not a silver bullet.
Duplicates of the same sensor on a machine, in a mine, or in a field generate more inputs, which can be helpful in itself. Weather Underground, part of IBM’s Weather Company business, creates its reports partly with data from uncalibrated, low-cost sensors in consumers’ back yards. For not much money, they give Weather Underground more data points, but quality is a big issue. One sensor may malfunction and report several inches of rain while the one next to it senses none, said John Cohn, the IBM Fellow for Watson IoT.
“The great thing is, if you have enough density of these kinds of sensors, you can … mathematically find the outliers and reason, from that, that one requires work,” Cohn said.
Companies can also use different sensing devices, especially cameras, to check on sensors that may be having trouble. A video camera combined with image analysis software can detect whether a remote device has gotten dirty, damaged or vandalized, said Doug Bellin, senior manager of global private sector industries at Cisco Systems. Sometimes security cameras already there for something else can do this job.
One technique for verifying different kinds of sensors against each other is called sensor fusion. It weighs inputs from two or more sensors to come to a conclusion.
Sensor fusion is now being implemented in hospitals, where false alarms are rampant, said Stan Schneider, president and CEO of IoT software company Real-Time Innovations (RTI). For example, rather than setting off an alarm every time the blood oxygen sensor on a patient’s finger showed low oxygen, a sensor fusion system would constantly compare that reading with those from other sensors on the patient, like respiration and heart rate monitors.
The phantom sensor
Other sources can also stand in for a sensor that isn’t even there anymore. GE tests each jet engine that comes out of its factories for exhaust gas temperature, a figure that reflects its efficiency, Kodesh said. GE puts one sensor right in the path of the exhaust even though it will always burn up after a few minutes. Meanwhile, sensors in safer spots around the engine collect data at the same time, and by comparing their readings with what the doomed device recorded before it was destroyed, GE can recreate the direct sensor as a virtual one – a mathematical function.
Drawing conclusions from multiple information streams takes the data-quality problem into the realm of machine learning. That’s where the most interesting stuff is happening, IBM’s Cohn says.
For example, IBM uses its Watson analytics platform to understand energy use at IBM facilities in Ireland. Not only can Watson flag a discrepancy if an air-conditioner says it’s off but the total power draw is too high for that to be true, but over time it can learn to identify the particular way in which that air-conditioner draws power when it comes on. With that knowledge, a system that says it’s not on can be caught red-handed.
As a check on faulty data, machine learning does take time to get up to speed, unlike added sensors or cameras.
“It gets smarter the more it runs. The first time it runs, I wouldn’t trust it,” Cisco’s Bellin said. “The thousandth time it runs, it’s … probably smarter than I am.”
The more critical the IoT system is, the more important is is to deal with bad data. Sensor fusion, for example, is necessary for things like patient health and missile detection because reliability is a big issue when the stakes are that high, RTI’s Schneider said.
But some forms of IoT can probably get by without it multiple sources of data, he said. “You don’t need that in the thermostat in your house.”