FPGAs and the future of high-frequency trading technology

How has client demand for ultra-low latency FPGA solutions evolved in recent years?

John Courtney

John Courtney: Ultra-low latency solutions are a specific part of the fintech trading market, where a special technology solution is co-located at the exchange. The consumers of this technology are generally electronic market makers and proprietary trading firms seeking a competitive advantage in trade execution. We are, however, seeing a change in the asset classes that companies are trading. Traditionally, it has been equities and options, but increasingly firms are trading foreign exchange and cryptocurrencies in this marketplace.

Ultra-low latency FPGA solutions can optimize the full trading pipeline high frequency traders use, but there are additional advantages in pre-processing of market data to filter irrelevant information before being processed by the trading algorithm. FPGAs can provide inline processing at network speeds, filtering data with virtually no lag time, so compute resources can execute the trading algorithm faster and more efficiently.

Micheal McGuirk

Another consideration for traders is the many regulations they must follow. The trading solution – an ultra-low latency card in this case – must comply with the relevant trading regulations and provide accurate, up-to-date records in a compliant manner. This increases the processing and latency burden for a sector that’s extremely sensitive to trade execution performance. Additionally, we’re now seeing machine learning (ML) on these cards, further increase the computing demand.

Micheal McGuirk: In addition, as the arms race for being the fastest performing technology continues, we’ve seen an increase in HFT firms shifting to FPGAs away from CPUs. In situations where every nanosecond counts, you can do inline trades and FPGAs at the nanosecond level which not quite possible for CPUs.

What are the driving factors behind this?

Courtney: One factor is firms are simply trading higher volumes of stocks, but there is also a growing number of retail investors, placing orders via brokerages, who require ultra-low latency trade execution. This trend is a potential target for HFT and growth driver for ULL technology.

McGuirk: With that growth, there’s a need to stay competitive. Anyone in this space needs to review all their options from a technology perspective and, importantly, make sure that you stay ahead of the curve from a technology perspective. This continues to drive the need for new and faster technology in this space.

What role is AI playing in this space and how is this developing?

Courtney: Initially, AI is enabling the discovery and development of better trading algorithms. That’s typically by recognizing patterns and discovering features in the data streams which can be manipulated at the trading level. But increasingly, it’s become possible to attach an ML processing solution directly to the pipeline on the trading card. That can assist the trading or it can even perform the buy/sell decision.

McGuirk: If you think about it from the perspective of everything going towards lower latency, we’re reaching the point where we can’t get any lower in latency from a hardware perspective. If you can’t get any faster, can you get smarter? That’s where AI comes into the trading pipeline to help you make smarter decisions and give you the advantage, particularly when you’re already operating at maximum speed.

How does demand/use cases for ultra-low latency trading differ between different firm types? E.g. asset managers vs hedge funds?

Courtney: Not everybody uses low latency trading, and the question might be why? Asset managers typically focus on asset growth, stability and wealth preservation, so they are less focused on low latency trading. Hedge funds and proprietary trading firms are the opposite. They do engage in low latency HFT, where technology choices and the associated investment will determine returns and overall competitive advantage. They’re looking for gaps in the market where they can essentially insert themselves in the middle of the transaction and make a profit. An example is latency arbitrage, where traders look for an advantage in price difference between two exchanges, allowing them to buy at the lowest price on one exchange and sell to the highest bidder the other exchange before prices are aligned. Different firms use different strategies linked to their trading technology.

How can an ultra-low latency FPGA solution landscape be further developed through open APIs and interoperability?

Courtney: At AMD we supply the FPGA cards and there’s an ecosystem of vendors who provide other technology solutions to trading firms. One example is APIs and interoperability solutions which are important for firms interacting with compliance platforms.

All trading solutions consist of trading algorithms and networking. To make a trade you must send trade orders to the exchange, and that’s a networking function. You do have partners or vendors in the ecosystem who will supply network protocol solutions so traders can focus on their core business of trading. Traders don’t need to focus on networking technologies. This is something they should use an outside partner for, as these solutions offer standard interfaces for networking functions.

McGuirk: As competitive as the trading industry is, the main players want to focus on how to add value. The partner ecosystem is primarily where this is done, as well as at conferences where the industry gets together to discuss the advancement of the technology. While it is competitive, there’s general interest in moving the technology along and encouraging interoperability between technologies and solutions.

How does the latest AMD FPGA fit into a trading firm’s overall technology stack, particularly in terms of integration with existing trading systems, data feeds, and software applications?

Courtney: Our latest FPGA—the AMD Alveo™ UL3422 FinTech accelerator, offers two use cases. The first is where the financial institution is already engaged in HFT. The Alveo cards are designed to fit seamlessly into existing server and trading stacks, built on a 16-nanometer silicon node, which is well-known and widely deployed. The main innovation in this card was around the transceivers and reducing the latency by up to 7X over the previous generation AMD FPGA FinTech accelerator¹—which is not easy to do in a single generation.

We also introduced an ultra-low latency integrated Ethernet MAC an as part of the solution, which is a standard IEEE networking function.

The second use case is more nuanced—CPU offload. If you are not already an FPGA user, there’s a well-trodden route you can still go down. You can gradually offload your software stack onto hardware. There is value in doing the simple protocol operations in an FPGA to offload your CPU, and that would be the place to start. There’s a big ecosystem of partners AMD works with for traders who want to migrate from software-based trading to AMD hardware acceleration.

McGuirk: FPGAs perhaps have the perception that they’re very difficult to use, and that’s because if you’re not familiar with the technology, you do need FPGA developers to implement the trading algorithm (or portions of it) in hardware. We do, however, have many ecosystem partners that can ease that transition with pre-built trading frameworks and FPGA IP to accelerate the design flow.

Given the diverse compute technologies available for algorithmic trading, such CPUs and high-speed NICs, how does one choose a technology platform and where does FPGA trading fit in this spectrum of solutions?

Courtney: Software-based solutions are by far the easiest and most flexible to deploy and are low cost. But software-based solutions also have limited performance when compared to hardware accelerators. Even if you have highly optimized software solutions with the high-speed NIC, the latencies are in the milliseconds (compared to nanoseconds of hardware accelerators). Additionally, as the CPU becomes busy with processing, you get poor repeatability and scalability, and that’s unavoidable. If you want to get the best performance and the rewards from winning the low latency race, you have to invest in a hardware solution. FPGA card-based trading is the most flexible way of doing that.

The other alternative is to use a full ASIC solution, but that is a very large investment both in time and money, with limited flexibility as modern trading algorithms evolve. AMD has developed an FPGA solution to the point where it stays competitive to an ASIC solution. It gives you ASIC-like hardware-accelerated performance, relatively short development times, and total flexibility. If you need to change your trading algorithm quickly, you can do that immediately.

McGuirk: At AMD, we’re lucky to be purveyors of all these types of technologies. Most customers are choosing CPUs and NICs for the applications where it’s the best cost performance benefit and choosing FPGAs where they want to compete in the ultra-low latency race. It’s important to look at your overall trading strategy and what applications you want to run and where, and to choose the appropriate technology across CPUs, NICs, FPGAs and GPUs, as well.

1: Based on an AMD comparison of simulated latency in February 2024, using the Synopsis VCS 2019.06-SP2 ultra-low latency Virtex UltraScale+ VU2P GTF transceivers versus GTY transceivers (ALV-15).

THOUGHT LEADERSHIP