Blog

A balancing act – Performance and reliability in trading matching engines

The matching engine is the core technological pillar of any trading venue, it’s the engine room where all the action happens, driving global markets that exchange trillions of dollars daily. With this in mind, Sergey Samushin, head of exchange solutions at Devexperts delves into the intricate balance between performance and reliability in matching engines and how to engineer a system that excels at both.
The matching engine is the core technological pillar of any trading venue, it’s the engine room where all the action happens, driving global markets that exchange trillions of dollars daily. With this in mind, Sergey Samushin, head of exchange solutions at Devexperts delves into the intricate balance between performance and reliability in matching engines and how to engineer a system that excels at both.

A matching engine acts as a sophisticated state machine, altering its internal state with every input and output. It processes orders from clients and commands from exchanges, producing outcomes such as filled or rejected orders and various updates related to trades and instruments.

However, it’s the networking and storage elements that equip the matching engine with the capability to manage vast order volumes and maintain a durable record of trading sessions.

Performance and reliability 

Performance and reliability should not conflict in a well-designed exchange. Whether there are three or five working nodes, users should not experience any type of performance dip. 

The additional nodes should proactively ensure consistent performance in case the primary node fails. An overly reliable system might require more efforts in terms of maintenance, but as the primary node is independent, the additional clusters will not slow down the system.

For example, a single-instance matching engine might suffice for a demo or test environment of a retail exchange with moderate latency requirements, but it is insufficient for a system to rely on one node, as it becomes a single point of failure risk. If the one node fails, everything fails. 

Replication as a solution 

To prioritise reliability, a replicated system design is adopted where multiple instances of gateways, matching engines, and databases run simultaneously. Such architecture enhances failure resilience as replicated components can take over in case of individual malfunctions. 

However, this replication comes at the cost of requiring more resources such as additional hosts for extra datasets, increases disk storage, etc. due to the overhead involved in maintaining multiple synchronised datasets.

Addressing latency

Latency is a critical factor, especially for institutional players who engage in algorithmic trading and require swift order processing. Crypto exchanges and retail-centric trading platforms may operate comfortably with latencies ranging from 200-500 microseconds, often hosted in cloud environments for their cost-effectiveness and ease of setup. 

In contrast, institutional venues lean towards bare-metal installations with hardware acceleration to minimise latency further.

High-performance and high-reliability systems

The most demanding trading applications expect both stellar performance and robust reliability. To achieve this, state-of-the-art matching engines operate entirely in RAM, avoiding latency introduced by disk or solid-state drives.

For enhanced reliability, these systems use replication techniques, running multiple engine instances in parallel and employing consensus algorithms to ensure synchronised states across replicas. 

Throughput and scalability 

Exchanges must also be designed to handle sudden surges in trading activity, such as those seen during “black swan” events or market movements driven by social media.

Clusters of independent order processing units and strategies like horizontal scaling, where instrument lists are segmented and managed by individual engine instances, are deployed to ensure scalability and high throughput.

The consensus challenge

Maintaining consensus across distributed systems, especially under high loads, is a complex task. The RAFT protocol is the best solution at the moment to achieve consensus between matching engine clusters, in other words to ensure all engine replicas agree on input sequences.

This might involve electing a “leader” node responsible for input propagation, with systems in place to elect new leaders in case of failure, thus maintaining system consistency and reliability. 

Persistence, recovery, and storage needs

Exchange venues often have to fulfill extensive reporting obligations, necessitating a system that stores event histories without impairing performance. Regular snapshots of the matching engine’s state complement a full event log, allowing for quick recovery and state resumption. 

Additionally, separate storage solutions cater to the extensive querying needs without taxing the matching engine.

In conclusion, designing a matching engine that marries high performance with unwavering reliability is a complex yet achievable goal. It requires an understanding of the interconnectedness of latency and throughput: when the exchange grows in popularity its throughput increases; to increase throughput, the engineering team should work on achieving the lowest latency possible. 

Other key technology considerations are state synchronisation alongside sophisticated replication and consensus strategies.

As the financial trading landscape continues to evolve, so too must the technological backbone that supports it, ensuring that trading venues can withstand the tests of both time and volume without sacrificing speed or stability.

Does the industry really want to be on 24/7?

Virginie O’Shea, founder and chief executive of Firebrand Research, explores the possibility of around-the-clock trading, analysing the potential impacts it will have on markets, potential benefits, and whether demand for it exists at all.

How compliant traders can manage the generative AI synthetic data tsunami

Generative AI was last year’s technological innovation favourite, but its full potential for trading professionals is not yet entirely realised. Using AI for communications risk and compliance purposes in the sector should become commonplace argues Shaun Hurst, principal regulatory advisor at communications compliance firm, Smarsh.

The growing regulatory risk around AI

Firms can expect to add proof of AI usage to the ever-growing list of governance and reporting items that need to be maintained just in case a regulator or two come knocking, writes Firebrand Research founder and chief executive, Virginie O’Shea.

Operational resilience important in the context of T+1

As the US market prepares for an industry-wide transition to T+1 settlement, Stephane Ritz, managing principal, capital markets and T+1 global lead at Capco, discusses the need for organisations to reassess their operational strategy, resilience frameworks, and impact tolerance levels.

The multi-asset multiverse: Anticipating the trading landscape of the future

Following a volatile year for capital markets, Mark Govoni, chief executive of Liquidnet, delves into some key considerations for the ever-accelerating market, including: continuing multi-asset migration, the importance of emerging technologies in the pursuit of liquidity, and the evolving role of brokers.

Macro volatility will force market operators to reform

A range of macroeconomic, regulatory and market structure changes occurring across the world will drive the need for transformation for market operators across the globe, writes, Magnus Haglind, head of marketplace technology, Nasdaq, who points out that change doesn’t always come with a heavy price tag and substantial execution risk.

T+1 in the UK: Ignore the clock

As the US shift to T+1 settlement looms ever closer, John Bevil, senior solutions manager at Xceptor, discusses the potential for a similar adjustment on the other side of the Atlantic, the key hurdles for firms, and the importance of best practice irrespective of time constraints.

Charting the path forward as agreement on Mifid II revisions nears

After five years in practice, the European regulators are on the final stretch to agreeing changes to certain aspects of Mifid II. Anne Plested, MCSI, EU regulation, ION Markets, delves into the key milestones and implications of the imminent decisions.

Intraday FX swaps needs better tech for a prime efficiency revolution

As T+1 looms ever nearer, Alex Knight, head of EMEA at Baton Systems, delves into the effect of the shift on FX swaps, highlighting the opportunities they hold for market participants, as well as further establishing the importance of more sophisticated settlement technology.