Document Q & A

This is a Streamlit application that allows users to chat with the content of multiple PDFs. The application reads and processes the text from the PDFs, generates embeddings for the text chunks, and uses a retrieval-based conversation model to answer user questions about the content of the PDFs.

The application uses OpenAI's language model for embeddings and generating responses, the FAISS library for efficient similarity search, and a conversation buffer memory to keep track of the chat history.

Dependencies

You need to install the following dependencies to run the application:

Python 3.8 or later
streamlit
python-decouple
PyPDF2
langchain
htmllayouts

Environment Variables

You need to set the following environment variable:

OPENAI_KEY - Your OpenAI API key. If this is not set, the application will not run.

Usage

Install the required dependencies.
Set the OPENAI_KEY environment variable.
Run the application with the command streamlit run app.py.

Interface

The application has a text input where you can ask a question about the content of your PDFs. Below it, the conversation history is displayed.

In the sidebar, there is a file uploader where you can upload your PDFs. After uploading your PDFs, click on the "Process" button. The application will read the PDFs, split the text into chunks, create embeddings for the chunks, and prepare the conversation model.

After processing the PDFs, you can ask questions about the content of the PDFs in the text input.

How It Works

The PDFs are read and the text is extracted.
The extracted text is split into chunks. Each chunk is a separate "document" for the conversation model.
The chunks are embedded using OpenAI's language model. The embeddings are stored in a vector store using FAISS.
A conversation chain is created. The conversation chain uses the vector store to retrieve relevant chunks based on a user's question and a language model to generate a response. The conversation history is stored in a buffer.
When a user asks a question, the conversation chain retrieves the most relevant chunks, generates a response, and updates the conversation history.
The conversation history is displayed in the main part of the application.

Note

This application is meant to be a demonstration of how one can build a retrieval-based conversation model using OpenAI's language model and other open-source libraries. It is not intended to be used for sensitive information or at scale.

License

This project is licensed under the terms of the MIT license.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
__pycache__		__pycache__
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
htmllayouts.py		htmllayouts.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Document Q & A

Dependencies

Environment Variables

Usage

Interface

How It Works

Note

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

License

aasem/DocQA

Folders and files

Latest commit

History

Repository files navigation

Document Q & A

Dependencies

Environment Variables

Usage

Interface

How It Works

Note

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages