Wedbush Securities, a leading securities firm, is looking to utilize Large Language Models (LLMs) and AI to automate the process of generating financial summaries from SEC financial reports. This project's primary objective is to process the financial reports and generate an accurate summary of the important financial information contained within the report, helping Wedbush analysts to quickly and efficiently write their financial analysis.
The primary technical challenge of this problem was that each company's quarterly 10-Q filing would typically consist of 60+ pages of data that lacked standard formatting. Furthermore, we had to consider the historical contexts of past filings. Directly inputting this amount of data into an LLM would be practically impossible due to its volume.
The solution involved using a combination of BERT encoding, vector databases and the Retrieval Augmented Generation (RAG) framework. First, the SEC filings are harvested via the SEC EDGAR API. Then, the reports are split into text chunks, which are then embedded into high-dimensional vectors using a custom BERT model. These vectors are then stored in ChromaDB, a vector database.
Next, a list of default targeted queries, which can be modified by the end user, are vectorized and a similiarty search is run against the vector database to identify the most relevant content from the reports.
Finally, these content are provided as inputs into OpenAI's GPT-4 API to generate a summary analysis of the report. This summary is then transformed into a PDF report, offering a quick, streamlined, and insightful view of the company's financial performance.