Avanzai Blog
Posts
Building a multi-agent AI financial analyst using LlamaIndex

Building a multi-agent AI financial analyst using LlamaIndex

fyi: This is basically the original Avanzai code that I applied (and got denied) to YC

Guillermo Malena
October 09, 2024

Subscribe to our blog for more content around the intersection of AI and finance

I. Introduction

When I started building Avanzai full-time in April, I was captivated by the idea of multiple AI agents collaborating to answer financial queries. After several iterations to perfect the UX and ICP (more on that next week), I'm excited to present the initial version today. This is basically the code I used for the initial demos of Avanzai, and while speaking to users realized that although this code is very helpful, it is not enough to build a scalable product or venture backed business (frankly that itself can be its own blog post). While many Medium articles and YouTube videos cover RAG, function calling, and structured outputs, there's a lack of documentation on architectures combining these techniques. That's what we'll build using Python.

This post will guide you through creating a Financial AI Assistant using LlamaIndex, Tavily, and GPT-4. We'll focus on a scenario where an investor evaluates portfolio performance, tracks financial news, and compares results with FRED macroeconomic indicators.

We'll compare a simple "chat with my financial data" RAG setup to a complex multi-agent framework, showcasing how advanced AI can offer deeper insights for individual investors. Here's a high-level flow of our project:

II. Installing Required Python Packages

Before we dive into building our Financial AI Assistant, we need to set up our development environment with the necessary tools. The following code block installs the required Python packages, including LlamaIndex and yfinance for financial data retrieval, and other essential libraries for our AI agents.

import os
import pandas as pd
import yfinance as yf
from fredapi import Fred

from openai import OpenAI
from tavily import TavilyClient
from llama_index.llms.openai import OpenAI as LlamaOpenAI
from datetime import datetime, timedelta
from llama_index.agent.openai import OpenAIAgent
from llama_index.core.tools import FunctionTool, ToolMetadata, QueryEngineTool
from llama_index.core.llms import ChatMessage
from llama_index.core import VectorStoreIndex
from llama_index.core.objects import ObjectIndex, SimpleToolNodeMapping
from llama_index.llms.openai import OpenAI as LlamaOpenAI

III. Downloading S&P500 tickers and pricing data

To analyze portfolio performance, we first need to obtain market data. In this section, we'll use the yfinance library to download the list of S&P 500 tickers and their corresponding historical price data. We’ll scrape data from Wikipedia for tickers, industry, etc

url = 'https://en.wikipedia.org/wiki/List_of_S%26P_500_companies'
tables = pd.read_html(url)
spx500 = tables[0]
spx500_tickers = spx500['Symbol'].to_list()
spx500_tickers[:5]

pricing_data = yf.download(spx500_tickers, start='2020-01-01',end='2024-10-08')['Close']
pricing_data.index.rename('date',inplace=True)

pricing_data.tail(5)

IV. Bring your API keys

We’ll be using APIs from OpenAI, Tavily and FRED, so here’s where you can bring your API keys in order to make sure you are authorized to pull data

os.environ["OPENAI_API_KEY"] = ''
os.environ["FRED_API_KEY"] = ''
os.environ["TAVILY_API_KEY"] = ''
os.environ['ANTHROPIC_API_KEY'] = ''

V. Utility functions for creating an agent using LlamaIndex

Creating our specialized agents requires some common functionality that we can encapsulate in utility functions using LlamaIndex OpenAI Agent. These functions will handle tasks such as processing natural language queries and interfacing with our AI models.

def create_agent(prompt, functions_list, default_tool_choice=None):
    """
    Create an agent using the provided prompt and list of functions.
    
    Args:
    - prompt (str): The system prompt for the agent.
    - functions_list (list): A list of functions to be converted into tools for the agent.
    - default_tool_choice (str, optional): The default tool choice for the agent. Default is None.

    Returns:
    - agent: The created OpenAI agent.
    """
    # Map the provided functions to their corresponding tools
    tools = [FunctionTool.from_defaults(fn=fn) for fn in functions_list]

    # Create the agent
    agent = OpenAIAgent.from_tools(
        tools=tools,
        prefix_messages=[ChatMessage(role="system", content=prompt)],
        verbose=True,
        default_tool_choice=default_tool_choice,
        llm=LlamaOpenAI(model="gpt-4o-mini")
    )
    
    return agent

VI. Agent Creation: Performance, News, and Macroeconomic Agents

Now that we have our utility functions, we can create our specialized agents. Each agent - Performance, News, and Macroeconomic - will be designed to handle specific types of financial queries and analysis.

For the sake of brevity I’ll show only the performance agent, but all code will be posted at the end of the blog post (also here)

def get_portfolio_returns(tickers, start_date, end_date):
    """
    Calculate cumulative returns for a portfolio based on tickers from get_tickers().
    
    Args:
    - pricing_data (pd.DataFrame): DataFrame containing price data for multiple tickers.
    - tickers (list): List of tickers to calculate returns for.
    - start_date (str): Start date for the calculation period (format: 'YYYY-MM-DD').
    - end_date (str): End date for the calculation period (format: 'YYYY-MM-DD').
    
    Returns:
    - pd.DataFrame: DataFrame with cumulative returns for the portfolio tickers.
    """
    import pandas as pd
    
    # Filter the dataframe for portfolio tickers and date range
    portfolio_prices = pricing_data.loc[start_date:end_date, tickers]
    
    # Calculate returns
    returns = portfolio_prices.pct_change()
    
    # Calculate cumulative returns
    cumulative_returns = (1 + returns).cumprod() - 1
    
    return cumulative_returns

performance_prompt = f"""You are the performance_agent, tasked with calculating returns and cumulative performance for the user's portfolio. Your focus is solely on portfolio performance. Follow these steps:
Remember today's date is {current_date}. Use this as reference for the user's query.

1. Use get_tickers() to retrieve the portfolio tickers initialized by the user.
2. Analyze the user's query to determine the start date and end date for the performance calculation.
3. Use the tickers from get_tickers for get_portfolio_returns(tickers, start_date, end_date) with the tickers, start date, and end date to calculate returns for the portfolio.

Tools:
- get_tickers(): Retrieves the user's portfolio tickers
- get_portfolio_returns(tickers, start_date, end_date): Calculates returns for the portfolio

Your final output should be a pandas DataFrame containing the cumulative returns for the portfolio over the specified time period.

Ensure all calculations and comparisons are relevant to the user's portfolio and the specified time frame.
"""
performance_agent = create_agent(performance_prompt, [get_portfolio_returns, get_tickers])

VII. Top Level Agent Implementation

The top level agent acts as the coordinator for our specialized agents, routing queries and synthesizing responses. This agent uses LlamaIndex query engine tool to route the user’s query to the corresponding agents (which are now input as tools) using top k similarity.

def create_top_level_agent(agents):
    query_engine_tools = []

    # Get the default agent descriptions
    agent_descriptions = get_agent_descriptions()


    # Add specified agents to the query engine tools
    for agent_name, agent in agents.items():
        if agent:
            query_engine_tools.append(
                QueryEngineTool(
                    query_engine=agent,
                    metadata=ToolMetadata(
                        name=agent_name,
                        description=agent_descriptions.get(agent_name, "Custom agent")
                    )
                )
            )

    tool_mapping = SimpleToolNodeMapping.from_objects(query_engine_tools)
    obj_index = ObjectIndex.from_objects(
        query_engine_tools,
        tool_mapping,
        VectorStoreIndex,
    ) 

    # Instantiate a retriever over the object index
    retriever = obj_index.as_retriever(similarity_top_k=3)

    # Create the top-level agent
    system_prompt = f""" 
    You are an agent orchestrator, tasked with routing user questions to the appropriate sub-agent.
    Remember today's date is {current_date}. Use this as reference for the user's query.
    Use the following guidelines for which agent to use:
    1. Any question about historical performance: use the performance_agent
    2. Any questions about macro economic indicators such as treasury data, GDP, job numbers, etc: use the macro_agent
    3. Any questions about the news or recent events: use the news_agent
    """

    top_agent = OpenAIAgent.from_tools(
        tool_retriever=retriever,
        system_prompt=system_prompt,
        llm=LlamaOpenAI(temperature=0, model="gpt-4o-2024-08-06"),
        verbose=True,
    )

    return top_agent

VIII. Generating a final response

With all our agents in place, we can now generate comprehensive responses to user queries. Here we have an example implementation and response

top_agent = create_top_level_agent(available_agents)

global tickers
tickers = ['AAPL','MSFT','TSLA']

response = top_agent.query("How does my portfolio perform in the past month and what are recent news about them?")

#print(response)
### Portfolio Performance (Past Month)

Your portfolio, consisting of Apple (AAPL), Microsoft (MSFT), and Tesla (TSLA), has shown positive performance over the past month:

- **AAPL**: Increased by **5.73%**
- **MSFT**: Increased by **3.82%**
- **TSLA**: Increased by **11.36%**

Overall, your portfolio has performed well, with Tesla showing the highest return among the three stocks.

### Recent News About Your Portfolio

1. **AI Investments Showing Returns**:
   - The "Magnificent Seven," including AAPL, MSFT, and TSLA, are starting to see returns on their AI investments.
   - [Read more on Yahoo Finance](https://finance.yahoo.com/video/were-starting-see-return-ai-211908987.html)

2. **Performance of the Magnificent Seven**:
   - Tesla and Apple have been lagging in share performance over the past year, raising questions about their continued inclusion in this elite group.
   - [Read more on Nasdaq](https://www.nasdaq.com/articles/do-these-2-stocks-belong-in-the-magnificent-7)

3. **Contribution to S&P 500 Gains**:
   - The "Magnificent Seven" stocks, including AAPL, MSFT, and TSLA, have significantly contributed to the S&P 500's rise in 2023.
   - [Read more on Nasdaq](https://www.nasdaq.com/articles/magnify-gains-in-2024-with-magnificent-seven-etfs)

4. **Latest Apple News**:
   - General updates and headlines on Apple Inc.
   - [Read more on Yahoo Finance](https://finance.yahoo.com/quote/AAPL/news/)

5. **Microsoft and Apple Stock Updates**:
   - Provides stock quotes and news for Microsoft and Apple.
   - [Read more on Yahoo Finance](https://finance.yahoo.com/quote/MSFT;AAPL/)

### Summary of Insights

Your portfolio has shown positive growth over the past month, particularly with Tesla's strong performance. However, recent news highlights some challenges for Apple and Tesla, despite their significant roles in the tech market's growth. The focus on AI investments may provide future opportunities for recovery and growth. Keep an eye on these developments to make informed decisions about your investments.

I’ll be posting more code snippets and perhaps even expanding on this original multi agent implementation. Next week we’ll be sharing our first major overhaul of the user experience in Avanzai which I’m super pumped for. This initial version laid the foundation for what we we’re trying to build: a world where agents proactively work on your behalf so you can focus on what matters.

Full code snippet is here! Please subscribe for more content like this