Data versus information: What does it take to get usable analysis to your traders?

We often hear in surveys and at conferences that ‘access to good data and analysis’ is the number one priority and challenge for many buy-side dealing desks.

Given the sheer scale of data and the growing complexity in the market microstructure, the enormity of the task cannot be disputed, however developments in technology and commercial offerings have made it easier and more cost effective than ever to acquire independent, reliable and informative analytics capabilities for both supervision and research into the trading process.

Before we discuss data, let’s start with a few statistics. In European equities alone, our firm captures around 600 million market data updates (quotes and trades) on a typical day, or 20,000 per second. Putting that into context, the storage of one day’s data requires the equivalent of all the hard drive space on 30,000 laptops of the type on which this article has been written. If you piled them on top of each other, they would reach a third of a mile into the air.

In liquid names, the average number of price updates to the touch is 20 per second, and much more if you include the full depth of the order book. Over the last 12 months, we have seen that figure average above 400 on some days, with over 1.2 billion total updates. By the time you add ETFs and futures, and then North America and Asia Pacific, these figures more than treble.

Acquiring the capability to fully access this data requires investment in historical tick data either by purchasing a licence (which normally carries a high fee) or implementation of co-located tick data recording devices and a substantial storage plant. For the required level of accuracy, for example to calculate benchmarks, the data needs to be captured using sub millisecond granularity or smaller. The technical challenge of gathering the data is just the beginning.

A critical part of the process is ‘curation’. Once the data is captured, it needs to be cross checked against alternative sources and historical patterns to highlight and fix errors before being stored. For example, with the trades data we receive from our clients for TCA processing, we first cross check all values against public data to ensure they match – with 97% accuracy – or are quickly highlighted. Without these steps, the information and analysis produced later is at risk of spurious results that can waste time in being investigated, and can inaccurately skew results which potentially lead to bad decisions.

Then the data needs to be categorised and stored in a meaningful way, while allowing flexibility to enable a broad scope of subsequent enquiries. One example is that to align equivalent trades on different venues so that they are classified in a comparable way, over 1000 trade conditions need to be mapped across the many different trading venues and trade reporting mechanisms, with all their nuances such as frequent batch auctions or trade amendments and cancellations. This requires deep knowledge of the market microstructure and must be constantly maintained.

The data is now almost ready to be used, however, the analytics processes need to run quickly, so the metrics must be built in advance (for example spreads, benchmarks, fragmentation, price impact, averages, standard deviations, etc.), then computed, and then stored away ready for use. For some of the most complex metrics that require full order book depth, it can take several days to analyse a month’s worth of data.

By adopting this approach, flexibility can be enabled by parameterising the metrics in anticipation of users’ needs – for example adding options to the VWAP benchmark such as ‘exclude the close’. This fine tuning is a vital step when huge amounts of information need to be accessed in many different dimensions, such as by portfolio, by sector, by country, or by benchmark over any time period given by the user.

Finally, the visualisation of information into timelines, cluster charts, rankings, reports or streaming analytics must be organised for use during the trading day. This can be accomplished using packages that do most of the heavy lifting when it comes to interacting with the data, empowering traders with little or no programming or data science background.

The data has now been transformed into information to illustrate two main use cases: Execution Telemetry and Execution Research. Execution Telemetry is the exercise of continuous supervision over trading performance, for example in broker selection, best execution monitoring, strategy behaviour and algo wheel results.

These are reasonably stable processes where the key drivers are data reliability, independent results, and ease of use, for example in detecting and explaining outliers. Execution Research provides a firm with a far-reaching ability to investigate patterns and trends in their execution such as the impact of timing and speed of execution during major events such as transitions, inflows, outflows and trading around index events and corporate activity. This is an important feedback loop to portfolio managers and a key part of the value added by the trading desk.

This standard architecture for an analytics framework is now more accessible than ever before. Recent advances in hardware and software and in commercial offerings in this space have substantially reduced the cost of acquiring comprehensive and flexible execution analytics capabilities. All four stages of the process described can be outsourced for much less cost. This frees up firms of all sizes to use their in-house expertise to focus on the analysis, rather than spending time on acquiring and creating accurate data.

By Richard Hills, head of client engagement, big xyt

Blog