This project provides you with the foundation to extract data from a financial statement.
The goal of this challenge is to optimize the current extraction logic to improve the speed of the extraction at scale. The current extraction completes in an average time of just over 30 seconds for a document that has ~30 pages. Assuming we were to give it a larger document, 100+ pages, we need to ensure that it runs quickly and accurately.
There are many ways to approach this challenge. We are looking to see how you navigate the challenge and showcase your problem solving skills.
Once you've completed the challenge, invite Andy (@ondrosh) and Peter (@bo-dun-1) as collaborators to your Github fork so we can review it. Please email us a link to the repository as well once you're finished!
- Node v22.4.1 (
.nvmrcincluded in project) - Yarn v1.22.x
- Create a new private repository using this one as a template (Click the "Use this template" button on the repository's page)
- Clone your new repository
- Install the dependencies by running
yarn - Copy the
.env.templateto a new file named.envin the root of the project. Copy your OpenAI API key into the new.envfile.
You can run the extraction via the command:
yarn extract
- Do not modify the
src/index.tsfile. - The current extraction code can be found in
src/extraction.ts. - The time and usage are logged each time you run
yarn extract.
We will reimburse you for the OpenAI usage that was needed to complete this challenge.