Gemini Vision Pro is an advanced image analysis tool powered by Google's Gemini AI 2.0. It allows users to upload images, input custom queries, and receive detailed insights based on the content of the image. With its advanced OCR capabilities and real-time processing, Gemini Vision Pro transforms any image into valuable information, providing users with an intuitive and powerful tool for image analysis.
- Advanced OCR Capabilities: Extract text from images with high accuracy.
- Multi-format Support: Supports JPEG, PNG, and other common image formats.
- Real-time Analysis: Instant results after uploading and inputting a query.
- High Accuracy: Powered by Gemini AI 2.0, ensuring precise and detailed analysis.
To get started with Gemini Vision Pro, follow these steps:
- Python 3.8+: Ensure you have Python 3.8 or later installed.
- Streamlit: Gemini Vision Pro is built using Streamlit for a fast and interactive interface.
- dotenv: Load environment variables securely for the API configuration.
- Pillow: Used for image handling and previews.
- Google Generative AI: Used for processing the image analysis and generating results.
To install the required dependencies, you can use pip:
pip install streamlit google-generativeai Pillow python-dotenv- Obtain your API key for Google's Gemini AI from Google Cloud.
- Create a
.envfile in the project root and add the following line:
GOOGLE_API_KEY=your_google_api_key_hereTo start the app, use the following command:
streamlit run app.pyThis will open the app in your default browser.
- Upload an Image: Click on the file uploader to upload a JPEG or PNG image.
- Enter a Query: Provide a query in the text input box. For example, you can ask Gemini AI to extract text, analyze the content, or provide detailed insights about the image.
- Analyze: Click the "Analyze with Gemini AI" button to process the image.
- View Results: The analysis results will be displayed below the upload section. You will also have the option to download the analysis results as a
.txtfile.
The app is set up with a custom theme using Streamlit's set_page_config method, providing a wide layout and a custom title and icon.
st.set_page_config(
page_title="Gemini Vision Pro",
page_icon="🔮",
layout="wide",
initial_sidebar_state="expanded"
)The sidebar includes basic information about the app and its features, along with a footer indicating the app's creators.
with st.sidebar:
st.title("🔮 Gemini Vision Pro")
st.markdown("---")
st.subheader("About")
st.markdown("""
Transform your images into insights with Gemini Vision Pro - powered by Google's
advanced gemini-2.0-flash-exp model for lightning-fast image analysis.
""")
st.markdown("---")
st.caption("Powered by Gemini AI 2.0")The user uploads an image, which is processed by Gemini AI. The user can also input a query about the image.
uploaded_file = st.file_uploader(
"Upload Image (JPEG/PNG)",
type=["jpg", "jpeg", "png"],
help="Drag and drop or click to upload"
)
if st.button("🔍 Analyze with Gemini AI", key="analyze"):
# Image and query processing
response = get_gemini_response(detailed_prompt, image_data, input_prompt)
st.success("✨ Analysis Complete!")
st.markdown("### 📊 Analysis Results")
st.markdown(response)Custom CSS is applied to improve the user interface, such as customizing button styles, input fields, and image display.
st.markdown("""
<style>
.main {
padding: 2rem;
}
.stButton>button {
width: 100%;
border-radius: 10px;
height: 3rem;
background-color: #FF4B4B;
color: white;
font-weight: bold;
}
.uploadedImage {
border-radius: 10px;
box-shadow: 0 4px 8px rgba(0,0,0,0.1);
}
.stTextArea>div>div>textarea {
border-radius: 10px;
}
</style>
""", unsafe_allow_html=True)Feel free to contribute by submitting issues or pull requests. If you have any suggestions or encounter any bugs, please open an issue on the GitHub repository.
This project is licensed under the MIT License.
Created by Amar Sharma using Streamlit and Gemini AI