This is a Streamlit application that allows users to chat with the content of multiple PDFs. The application reads and processes the text from the PDFs, generates embeddings for the text chunks, and uses a retrieval-based conversation model to answer user questions about the content of the PDFs.
The application uses OpenAI's language model for embeddings and generating responses, the FAISS library for efficient similarity search, and a conversation buffer memory to keep track of the chat history.
You need to install the following dependencies to run the application:
- Python 3.8 or later
streamlitpython-decouplePyPDF2langchainhtmllayouts
You need to set the following environment variable:
OPENAI_KEY- Your OpenAI API key. If this is not set, the application will not run.
- Install the required dependencies.
- Set the
OPENAI_KEYenvironment variable. - Run the application with the command
streamlit run app.py.
The application has a text input where you can ask a question about the content of your PDFs. Below it, the conversation history is displayed.
In the sidebar, there is a file uploader where you can upload your PDFs. After uploading your PDFs, click on the "Process" button. The application will read the PDFs, split the text into chunks, create embeddings for the chunks, and prepare the conversation model.
After processing the PDFs, you can ask questions about the content of the PDFs in the text input.
- The PDFs are read and the text is extracted.
- The extracted text is split into chunks. Each chunk is a separate "document" for the conversation model.
- The chunks are embedded using OpenAI's language model. The embeddings are stored in a vector store using FAISS.
- A conversation chain is created. The conversation chain uses the vector store to retrieve relevant chunks based on a user's question and a language model to generate a response. The conversation history is stored in a buffer.
- When a user asks a question, the conversation chain retrieves the most relevant chunks, generates a response, and updates the conversation history.
- The conversation history is displayed in the main part of the application.
This application is meant to be a demonstration of how one can build a retrieval-based conversation model using OpenAI's language model and other open-source libraries. It is not intended to be used for sensitive information or at scale.
This project is licensed under the terms of the MIT license.