By Noisy Data, we mean conditions where the input data have errors or dropouts and sometimes lag the decision.
Perhaps no complexity factor separates the academic from the real world as much as Noisy Data. Many brilliant academics, when confronted with the possibility of Noisy Data, will shrug their shoulders and say “Garbage In, Garbage Out." Translated, this means: if you give my model Noisy Data, don’t be surprised when you get a terribly wrong answer.
In our over 30 years in business, we have yet to meet the company or the industry with perfectly complete and clean data, so we have spent a considerable amount of our time working on practical ways to reliably clean data.
- For the asset management company we were able to provide a sophisticated automated data cleaning facility that would use our proprietary “best current hypothesis (BCH)” techniques that piece together missing data, then potentially change the data as new data (pieces of the puzzle) arrive.
- For the high frequency hedge fund, real-time data cleaning techniques are necessary because parties in the market motivated by competitive gaming are purposefully obfuscating their true intentions with storms of “out of the money” quotes and other techniques.
- For the freight railroad, the large number of parties in the network contributes to Noisy Data. These parties include multiple shippers, trucking companies, facilities and even multiple railroads sharing responsibility for different legs of the same trans-continental movement. Data cleaning techniques at multiple levels are used to pinpoint potential service failures and avoid excessive "false positives."