Machine Learning in Finance

“Replicants are just like any other machine: they’re either a benefit or a hazard. If they’re a benefit, it’s not my problem.”

If you’ve spent any time worrying about whether robots are going to take over the world, you’ve probably come across the seminal 1982 film Blade Runner, in which Harrison Ford – as Rick Deckard, hardboiled detective – makes the remark above (in italics) about the human-like machines that alternately serve or torment his post-apocalyptic Earth. Shoutout to the 2017 film, Blade Runner 2049, too.

In our own 2024 Earth, the proliferation of wide-ranging applications of artificial intelligence is about to redefine our patterns of work, interaction, and organisation – and we don’t quite know yet if it will be a benefit or a hazard.


For readers of this newsletter, one of the most significant areas to be impacted by this development is, of course, finance. Algorithmic trading has been deeply invested (excuse the pun) in machine learning for years, if not decades. Companies like The D. E. Shaw Group and Renaissance Technologies LLC have likely been working on AI techniques as far back as the 1990s, and many other firms are now keen to hire the best AI talent from leading firms like OpenAI, Google DeepMind, and Anthropic.

In 2024, algo trading is expected to be a $16bn industry, and with NVIDIA aiming for a 1,000,000X speedup in computing speed across the stack in the next decade, it’s clear that we’re on the cusp of a huge transformation for this sector. Yet as AI models take up an increasingly significant proportion of market activity, there is a risk that they will adapt to each others’ behaviour, leading to collusive outcomes that are not well understood even by the creators of those models – just one possible danger in a whole unpredictable set of them.

It’s always helpful to refresh on a few key definitions. Artificial intelligence refers to the broad area of making computers think like humans. This includes machine learning, but also other approaches like rule-based methods (i.e., if X condition is true, then give Y response). Machine learning is a set of techniques emerging from statistics, whereby a computer discerns patterns in sets of data, and works out predictions or classifications based on those patterns. The structure and specific content of that data will determine which ML technique you use to model it.

A further subset of techniques is, of course, deep learning, whereby the data is parsed through 3 or more layers of neural networks (essentially a group of functions). We also have reinforcement learning, in which there is no predefined dataset, but merely a problem and an agent that carries out tasks over and over, with ideal responses receiving a reward that reinforces a certain trend of behaviour until this becomes a clearly defined policy that can be applied in live environments.


The range of applications for machine learning finance is immense, and due to the secrecy of the industry, many of the possible scenarios are not well-known to outsiders. In the simplest form, quantitative researchers apply basic machine learning algorithms like linear regression (basically drawing a line through points on a graph) to discern trends in the movement of stock prices. This would give them a signal that they can build into a trading strategy. A more niche area is market microstructure: when you send a trade to the market, its appearance on the order book might cause a change in the price, negatively affecting the performance of your overall strategy.

Consequently, trading firms spend a lot of time understanding the dynamics of fluctuating prices at a micro level – a research problem with an immense amount of data, which is well-suited to machine learning.

Another interesting problem is NLP and sentiment analysis. Natural Language Processing involves a fascinating aspect of ML known as the attention mechanism (how to know which piece of the data you should focus on), and its application to textual data. Sentiment analysis is understanding what people are saying about a particular stock or other security, and extracting predictions for future trades accordingly. NLP allows for very large-scale sentiment analysis, such as a data firm that takes in 500 million tweets daily, filtering 90% to eliminate noise from “noise from duplicates and spam re-tweets”, and then computing an average sentiment for each remaining tweet, which gets aggregated into a sentiment score for a given stock.


In terms of the adoption of AI/ML across the industry, again the level of secrecy makes it difficult to know who is doing what. Nonetheless, the above-mentioned companies like DE Shaw, Renaissance Technologies, and Two Sigma (one of the world-leading firms in AI for finance) as well as other US firms like PDT Partners and The Voleon Group, have been engaged in this research for a long time.

Citadel and Citadel Securities of course do everything under the sun to maintain Ken Griffin’s dominance over competitors, while other hedge funds like WorldQuant and Balyasny Asset Management L.P. are further integrating ML into their approach. The HFT space has a particular interest in deep learning, because it is well-suited for the gargantuan datasets produced when trading at subsecond speeds, and even a scroll through LinkedIn demonstrates that firms like Jump Trading Group have an interest in AI specialists. The most famous example of a successful ML-oriented HFT would naturally be XTX Markets, which has made owner Alex Gerko the UK’s largest taxpayer. Otherwise in the UK, G-Research is perhaps the largest AI-oriented trading firm, though its activities are shrouded in secrecy maintained by NDAs and a complex corporate structure.


When thinking about machine learning, it’s easy to get preoccupied with exotic-sounding algorithms and neural network architectures, but in fact, the most important thing to grasp is the data itself. Preparing a dataset for analysis is often the most time-consuming part of an ML researcher’s job. Missing values, unidentified outliers, inconsistency in formatting, or the presence of irrelevant data, are significant problems that can lead to an ML model being uselessly inaccurate. Some researchers may end up spending more than half their time in addressing such issues, with only a minority of their workday allocated to the much more fun and exciting job of finding useful patterns in the data. One of the compelling reasons to join firms like G-Research or DE Shaw is that they usually have vast repositories of clean, ready-to-use data, so incoming quants can have the luxury of spending most of their day on interesting problems – finding more than 1% improvements that will benefit the company’s trading strategies.

Another crucial part of the process is, of course, the GPU. Graphical Processing Unitsare commonly employed for ML computations, due to their capacity to handle multiple concurrent processes, unlike CPUs which can handle just one at a time. This is essential when dealing with vast ML problems – think about the difference between completing a modelling task in minutes rather than days or weeks. Time is money for every kind of business, but perhaps none more so than for trading firms, especially those high-frequency strategies where the smallest improvement in speed can give them a significant advantage.

No wonder, then, that XTX Markets boasts of having 20000 GPUs, more than Tesla and perhaps just behind Meta. This report by State of AI is a very interesting insight into the firms holding a massive number of GPUs as of 2023, though, like everything else in the area of cutting-edge finance, it’s probably quite incomplete: we have no idea how many GPUs are employed by Two Sigma, DE Shaw, and others. We do know of at least one quant firm that has recently made an order from NVIDIA running in the hundreds of millions, and it is hard to imagine that they are alone.

Another concept, less well-publicised outside of the specialist community but of no less importance, is explainability. Advanced machine learning models are often “black boxes”, operating at such a level of complexity that even their creators may be unsure why certain outputs are being delivered. Just last month, the Bank of England warned that AI developments could pose dangers to the UK’s financial stability, in part because the reliability of these “black box” outputs is uncertain. This, coupled with the herding behaviour that would result from widespread adoption, could push the economy in a negative direction before financial professionals can properly understand what is happening. Another studypoints out that a lack of explainability in AI models makes it difficult to tell if those models are compliant with regulations.

But what exactly is explainability? It incorporates both the ability of a system to reveal its inner works and also the capability of humans to understand the factors and knowledge contained within that system. This is somewhat different to interpretability (which involves tracking the exact workings of a model through each step) and is concerned more with post-hoc understanding of why a result was achieved. The Alan Turing Institute, for example, is working on a research project on explainability methods for financial applications, because such insight “into model behaviour is essential for safety-critical applications (e.g. finance, healthcare)”. It may be that computers think faster than us, but we still need to know how they’re thinking.


One particularly interesting intersection for readers of this newsletter may be that between AI and DeFi. Whether it is for rapid credit scoring and risk assessment, fraud detection, or automating the creation of smart contracts,machine learning tools are a way to massively upgrade the performance of DeFi activities. Equally, however, that brings some dangers: as we have already seen with prompt injection attacks onChatGPT, AI models are vulnerable to adversarial attempts, which is of some concern in an ecosystem built around the idea of security.


Artificial intelligence is just like any other technology: it’s either a tool or a weapon. If it’s a tool, we don’t have a problem. Yet the whole history of technology is one of pushing boundaries and seeking competitive advantage, and it is that very urge to get ahead, to find a more powerful technical solution, that can push us into an unknown environment, populated with bothunimaginable opportunities and dangers.

Once again, we find that Marshall McLuhanwas right: the consequences of a new medium are always far greater than the specific content of that medium. We can be certain only of one thing: talented technicians in industries like finance will always strive to leverage new developments, be it AI or anything else, to build ever-more efficient systems.

Written by Calvin Duffy

📅 This Week in Crypto 📅

After a brutal crypto winter in 2022, the resurgence that swept through the crypto asset class in 2023 was more than refreshing. With signs that a bull market might be looming ahead, 2024 is shaping up to be a year for the history books: It’s still Bitcoin’s year, Ethereum is overdue for a run, and the Bitcoin renaissance continues.

SBF, who once ran one of the world’s biggest cryptocurrency exchanges and is facing decades in jail, will not face another trial, US prosecutors say. The 31-year-old was found guilty of fraud and money laundering last month. Prosecutors said the “strong public interest” in a resolution of their case against the former billionaire outweighed benefits of a second trial. He had faced six charges that had been separated from his first trial.

This guide explores the best performing defi crypto assets with high liquidity and great communities. The DeFi total value locked (TVL) rallied to a whopping $179 billion in November 2021, dropping significantly in 22/23, but a recent market-wide rally has helped DeFi projects gain momentum again. Per DefiLlama, the total DeFi TVL has been constantly rising since mid-October 2023, reaching $52 billion. Before deciding which coins are the best defi tokens to buy, research and consider the risks.