Avanzai Blog
Posts
Context is everything: financial data ontology in an AI world

Context is everything: financial data ontology in an AI world

Shoutout to Palantir for making the word poppin’ again

Guillermo Malena
October 03, 2024

Subscribe to our blog for more content around the intersection of AI and finance

I. Lets take it back

Think back to your sophomore year Finance 101 class. There were things in that class that were presented to you as facts: inverse relationship of bonds and stocks, job numbers growth leading to a rise in stock prices, etc. These were basic assumptions that history has shown us time and time again dictate the relationship between assets. But recently, we've all noticed some of these relationships don't always hold true. For example, take a look at the average weekly return of equity indices and gold with job numbers increasing vs decreasing: the average returns of the S&P500 are higher than gold futures when jobs decrease week over week.

Well this analysis, like a lot of financial analysis in the news, lack one crucial thing: context. You know who also lacks context? Large language models

II. Why does every finance x AI startup go through the same use cases?

Go ahead and ask ChatGPT what the Fed Funds futures are as of today. As you can imagine, ChatGPT will lack both context and data. Plenty of Medium articles, LinkedIn and YouTube tutorials will teach you how to use RAG to solve this problem, but overall the same problem persists: the LLM actually has no idea why you're asking for Fed Funds futures. The pretraining step of building a model for most SOTA providers goes through most of the same financial literature available on the public web. Most of it is centered around the most common use cases: equity valuations, technical indicator calculation, DCFM models, etc. Put simply, ChatGPT with no data at least knows why you're asking for NVIDIA's revenue numbers; it's not dumb. But for a Fed Funds futures curve, it has no idea what you're talking about.

Let's take building a treasuries yield curve as an example: taking the actively traded treasuries, calculating their bond yields based on traded price and tying together a yield curve is tough when the notion of the actively traded yield is consistently updated and not announced on a board. It's assumed by either liquidity or outright given by a data vendor. This same notion also exists with creating futures curves; you guys get it. The point is, LLMs by default lack asset class context and relationships: the same market conventions we spent years learning on the street. Finetuning only causes this problem to exacerbate due to overfitting: your examples post-training exist only in a world where history presented us with truths, so if something unprecedented happens, it could potentially not flag it correctly. If you take a look at the correlations between the S&P500 and 10-year treasuries for the past nearly 24 years, you can see it yourself.

III. Ok.. so what?

Why does all this matter? Well, it looks like people are getting unhappy with GenAI x Finance apps, and I agree. It seems like we went through years of finance experience in the market, yet we still have to teach LLMs the basics. The problem is that the things LLMs struggle with the most are what we learned in our jobs, not in the textbooks: market convention is a nuance that can't simply be figured out by RAG.

But it can be figured out by something just as easy: data ontology. Defined as a formal representation and classification of concepts, entities, and their relationships within a specific domain or system, providing a shared vocabulary and structure for organizing and understanding data. Recently regaining momentum thanks to Palantir, data ontology is making its way through all data engineering blogs as the foundation for how AI agents should think about their decision making. The assumed relationship between gold and the dollar can be backed up by data, but it's also a market convention.

Ontology makes sense in a world where you understand the nuance. Macroeconomic datasets are the best example of this, as countries can both issue debt but also have a trade surplus/deficit. But a country is also a domicile for equities and the home of many exchanges where equities can be traded.

IV. Ah…so how does Avanzai fit into all this?

This is likely the part where you think I'm going to say Graph RAG can solve it all. Nah, I won't do that, but what I will do is explain where Avanzai goes into all of this.

A workflow is simply a set of tasks, where each step of the task requires nuance and context for the action that will be taking place. Where Avanzai can help, though, is when you are building a workflow (universe screen, factor exposure, etc.). Building an ontology of the relationship between different datasets is crucial for letting your agents have an understanding of nuance.

For calculating factor exposure, determining the metric alone requires nuance as the LLM needs to understand where to get the data from (quality requires debt to equity data), calculate the data (pull debt data, pull equity data, calculate ratio), and then perform a regression between this data and your portfolio data. As you can imagine, a lot can be lost between each step. Avanzai lets you determine what the relationships between each agent's task are and how they make sense in the overall context of the workflow.

Soon we'll have a full demo of the new Avanzai and how we plan on letting you automate investment operations by defining your own context, relationships, and nuance between agents as they complete your workflow. Saving you tons of time and letting you dictate how each AI agent solves a crucial task within the context of your end goal. Subscribe below to stay tuned!