Hello!
Welcome to Google Developer Groups on Campus
Gurugram University - Build with AI program. These
resources, including this deck, are made for you to
learn and grow in Google’s latest AI technology.
Intro to the
Gemini API
The official Google AI train Community
deck!
Nitin Chauhan
GDG Organizer (President)
what is generative ai?
What is
an LLM?
LLMs Explained
[...] [...] [...]
[...]
0.02
0.03
0.9 0.01 0.0 …
Dogs Rain Drops Fish Wind …
and
cats
raining
It’s
Roses are red,
Violets are blue,
Sugar is sweet,
LLMs Explained
Roses are red,
Violets are blue,
Sugar is sweet,
LLMs Explained
for(var i = 0;; i < 10; i++) {
for(var i = 0; i < 10; i++) {
Modern LLMs are
large.
LLMs Explained
Classic Natural
Language Problems
LLMs Explained
Entity extraction Classification Summarization
Sentiment Analysis Translation …
LLMs let us
prototype fast.
LLMs Explained
Why are large language models
different?
LLMs are characterized by emergent
abilities, or the ability to perform tasks
that were not present in smaller models.
LLMs contextual understanding of human
language changes how we interact with
data and intelligent systems.
LLMs can find patterns and connections in
massive, disparate data corpora.
Search
Conversation
Content generation
Multimodality
(Android AICore)
ai.google.dev/gemma
The Gemini Ecosystem
The most advanced AI from Google
For Developers
For Consumers
For Business and Enterprise
Models
Gemini API
(in Google AI Studio + ai.google.dev)
Gemini for Google Workspace
Gemini for Google Cloud
Gemini in Vertex AI
Gemini | app and web
Gemini in the Google App
Gemini in Gmails, Docs…
this deck
is about
Getting started with the
Gemini API
Train the Trainer in AI
Tarun Bhardwaj
Core Team
AI Studio
aistudio.google.com
● Generate API Keys
● Create, test, and save prompts
● Customize models in minutes
● Generate starter code
AI Studio
aistudio.google.com
AI Studio
aistudio.google.com
AI Studio
aistudio.google.com
AI Studio
aistudio.google.com
AI Studio
ai.google.dev
REST API + Client libraries for Python, Node, Java, and Swift
Libraries
Libraries
SDKs
Vertex AI
Enterprise grade support.
Full MLOps (Examples: Model
evaluation, monitoring, registry)
Vertex AI
Check it out when you're
ready for production
Gemini API and Vertex AI
Both give access Gemini family
models.
Vertex AI
Cloud Production
ai.google.dev/docs/migrate_to_clou
d
PlatfoEndpoints
rms
Platforms
Platforms
Usage patterns
Train the Trainer in AI
Jatin Kumar
Vice President
We’ll cover
● Basic Prompting
● Code generation
● Search, retrieval and information synthesis
● Image understanding
● Tuning
Usage Patterns
Basic prompting
Train the Trainer in AI
Write a poem about a magic backpack.
In a world of wonder, behold,
A backpack with tales yet untold.
Its fabric shimmers, alive with might,
A portal to realms, day and night.
aistudio.google.com
Generate a short tip about JavaScript. Good tips are short and contain a code
example showing the tip.
**Tip: Use Array.prototype.flat() to flatten an array of arrays.**
```javascript
const array = [[1, 2], [3, 4], [5, 6]];
console.log(array.flat());
// Output: [1, 2, 3, 4, 5, 6]
```
Generate a short tip about JavaScript. Good tips are short and contain a code
example showing the tip.
**Tip: Use Array.prototype.flat() to flatten an array of arrays.**
```javascript
const array = [[1, 2], [3, 4], [5, 6]];
console.log(array.flat());
// Output: [1, 2, 3, 4, 5, 6]
```
Generate a short tip about JavaScript. Good tips are short and contain a code
example showing the tip.
topic: let and const
tip: Prefer using `let` and `const` to initialise variables, instead of `var`, as `var` has
no block scope or protection against mutability.
e.g. const myVar = "EXAMPLE";
topic: arrow syntax
tip:
import google.generativeai as genai
model = genai.GenerativeModel('models/gemini-pro')
resp = model.generate_content(
'Write the first paragraph of a story about a magic backpack')
>>> print(resp.text)
In a bustling city, amidst the vibrant tapestry of human existence, there existed a
peculiar entity named Archie. Archie, however, was no ordinary backpack...
chat = model.start_chat()
response = chat.send_message(
"Hello, what should I have for dinner?")
print(response.text)
# 'Here are some suggestions...'
response = chat.send_message(
"How do I cook the first one?")
Contents
Settings/Configs Text
Text
Text
Tools
Tools
Safety Settings
Safety Settings
Generation Config
Content of role: "user"
containing multimodal Parts:
Content of role: "model"
"user" content: a text Part
Tools specified by the caller
Safety setting configured by
caller
Generation Config includes: temperature, Top P,
Top K, stop seq, max output tokens etc.
Generate Content
Request
Candidate(s)
Feedback
Text
Block Reason
Safety Ratings
Candidate "Content"
note: only one candidate returned
today
why the model stopped generating
feedback on the prompt
Finish Reason
Finish Message set if finish reason is present
Safety Ratings
Safety Ratings
how safe is the response
Generate Content
Response
…
AI Studio
Endpoints
Prompting
https://ai.google.dev/docs/prompt_best_practices
● Chained prompts - make a plan, then execute it
● Context - Few shot prompts
● Generation parameters - Temperature, Safety settings, Top-P, TopK
Code generation
Train the Trainer in AI
Keshav Chauhan
Vice President
Code Generation
Content generation
Convert human language
requests to machine
requests
Rapid tool building
Generate code based on a
user prompt
Examples
Code generation
● Generate data
Examples
Code generation
● Generate data
● Generate a SQL query
Examples
Code generation
● Generate data
● Generate a SQL query
● Simulate Execution
Search and Information
Synthesis
Train the Trainer in AI
● Models have knowledge cut-offs
● LLMs are not fact engines
● No exposure to private data
BYO Data
Search & IR
● Instructions + Context + Question all in the prompt
● Easy to implement
○ No extra code, just ask.
Use the prompt's context window
Search & IR
model = genai.GenerativeModel('gemini-pro')
document = pathlib.Path('document.txt').read_text()
result = model.generate_content(f"""
Explain how deep-sea life survives.
Please answer based on the following document:
{document}""")
Use the prompt's context window
Learning more
● Limited by the model's context length
○ gemini-1.0-pro: 30K tokens.
Search & IR
Train the Trainer in AI
Image understanding
Mukul Mehta
Sponsorship Lead
Image understanding
Multimodality
● Images are just tokens in the input
● Can be used for instructions, context or query subject
import google.generativeai as genai
# Use the Gemini vision model.
PRO_VISION = 'models/gemini-pro-vision'
model = genai.GenerativeModel(PRO_VISION)
!wget -O instrument.jpg -q https://goo.gle/instrument-img
import PIL.Image
img = PIL.Image.open('instrument.jpg')
# Preview the image
(thumb := img.copy()).thumbnail((200, 200))
thumb
response = model.generate_content(
['What instrument is this?',
img,
'What kinds of music would use it?'])
print(response.text)
This is a pipe organ. It is a musical instrument that produces sound by
driving pressurized air through pipes. Organs are often used in churches,
concert halls, and other large venues. They can be used to play a wide
variety of music, from classical to contemporary.
● Structured data extraction
● Image conditioning
● RAG
Image understanding
Multimodality
Image conditioning
Multimodality
● Generate text or structured data from images
AI Studio
Endpoints
Images
Structured Data
AI Studio
Endpoints
Images
Structured Data
AI Studio
Endpoints
Images
Structured Data
Tuning
Train the Trainer in AI
Akshu Grewal
Design Team Head
Tuning
● In AI Studio
Tuning
Responsible AI
Train the Trainer in AI
Nitin Chauhan
GDG Organizer (President)
Responsible AI
Thank you!
BRAVE Futures Forum
Join Us!
GDG Gurugram University
Join Us!

Intro To Gemini API - Build with AI.pptx

Editor's Notes

  • #4 Hello! This deck is your starting point for talking about Gemini. You can use it as-is, adding your name & details, but we recommend adding your own content too - people love to see examples and hear about how you’ve used Gemini to solve real problems. If you have any questions, please reach out via Discord - https://discord.com/invite/google-dev-community - in the #gde-private channel. To match the style of the code slides, use this tool, with the “Dark” theme and “Roboto Mono”. Happy presenting! - Gemini DevRel
  • #5 Generative AI refers to deep-learning models that can generate high-quality text, images, and other content based on the data they were trained on.
  • #6 An LLM is a neural network that models the distribution of the text on which it’s trained. It can take input like “It’s raining cats and…” and give the next word based on what it has seen in its training data. So a large language model is essentially a sophisticated auto-complete.
  • #7 A model trained on poetry will predict output that’s….
  • #8 poetic
  • #9 Or a model trained on code samples will produce output that’s…
  • #10 cody
  • #11 The current generation of LLMs are trained on large, diverse datasets, representing all kinds of language and contexts from around the web. These models have billions, sometimes even trillions of parameters, so they capture language structures more complex than we’ve seen in previous generations, like your phone’s auto-complete
  • #12 This makes them useful for a number of common, classical NLP tasks that we previously would have trained task-specific models for.
  • #13 On top of all these abilities, LLMs let us prototype AI-driven applications really fast. This is huge, because traditionally, building an ML-powered app would mean collecting potentially thousands of examples for your domain and then training or tuning a model before you can start to evaluate or even explore its suitability. This took a lot of time and resources. But, with LLMs, you can start to explore an AI-powered idea in minutes, and you don’t need ML expertise.
  • #14 Why are large language models different? Three reasons: They have emergent abilities, which means they do not have to be explicitly trained to do things. You can train it on a large corpus of text and it learns to translate, answer questions, code, and so on. Next, LLMs change how we interact with intelligent systems because of their contextual understanding of human language. With an LLM, a SQL query can become a simple question. Need to find information on a website? Type it in a chatbot. Finally, LLMs are the next in our machine learning journey. Many years ago, you’d use a simple regression to find a pattern. Then with neural networks, you were able to unlock hidden patterns in a much larger dataset. Now, with LLMs, we can find patterns and connections in massive, disparate datasets.
  • #15 What is multimodality? Why is this important? Can't we just do with multiple models working together? A: signal is lost between models, e.g. audio input contains tone and emotion that’s lost in text.
  • #16 Gemini models are built from multimodality features — reasoning seamlessly across text, images, video, audio, and code.
  • #17 Today there are 3 Gemini model sizes Ultra, the most capable and largest model for the more complex tasks Pro the best model for scaling across a wide range of tasks Nano the most efficient model. For on-device tasks - Available through Android AI Core: https://ai.google.dev/tutorials/android_aicore
  • #18 Gemini models are built from the ground up for multimodality — reasoning seamlessly across text, images, video, audio, and code. Note to presenters: Another Gemma-specific deck is coming soon.
  • #22 This is Google AI Studio - it’s a web interface to the Gemini API that you can use to quickly prototype and riff on your ideas in the browser without writing any code. In AI Studio, and with the Gemini API, you can just ask the model what you want it to do, using plain human language, and get results in seconds. Then when you are ready, Google AI Studio will help you move your prompt into your app. Started off getting the API key. That will will give you access to do anything on AI Studio. Think of it as your password to accessing Gemini.
  • #23 Select the model that you want to use.
  • #24 This is Google AI Studio - it’s a web interface to the Gemini API that you can use to quickly prototype and riff on your ideas in the browser without writing any code. In AI Studio, and with the Gemini API, you can just ask the model what you want it to do, using plain human language, and get results in seconds. Then when you are ready,Google AI Studio will help you move your prompt into your app.
  • #25 Use this widget to get the code in the language that you are interested in.
  • #26 Then when you are ready, Google AI Studio will help you move your prompt into your app. In the prompt UI there is a “Get Code” button that will show you code snippets for a number of languages, and they are pre-filled with your prompt and and use the same exact settings you used in AI Studio. There’s even an option to load a Colab notebook that’s pre-filled with your prompt and settings.
  • #27 The Gemini API is a REST API, you can call directly or using one of the user friendly client libraries in Python, JS, Swift, Java, Go or Flutter. This presentation uses the Python SDK as it is the most popular language, but everything should be possible using the API directly. Speaker note: Making a language-specific version of this deck would be great :)
  • #28 The content I’m going to cover will be on what we call the “Gemini API”. I am referring to the API that is available through ai.google.dev, but it’s important to be aware that we also have a Gemini API available through Vertex AI in Google Cloud. Vertex AI’s Generative Studio gives the same access to models like Gemini, and we’ve kept the APIs as close as possible to make migration to the Cloud as easy as possible. We expect individual developers to start with Gemini API and move to Vertex when they need access to commercial support and higher rate limits. If you’re already Cloud-friendly or Cloud-native, then you can get started in Vertex straight away.
  • #29 This section gives a high level overview of the sub-topics covered by the major secretions of the rest of this document.
  • #30 This is the outline of the upcoming sections. Note to presenters: There are a series of slides after this one that are marked as “skipped” that give a short version of the same content. Depending on your audience and presentation, you may want to use the short versions instead, or keep the in-depth slides. You probably don’t want both though.
  • #31 The best place to start is with simple text prompts
  • #32 We can start off with a simple creative prompt.
  • #33 Let’s imagine we want to send a tip with our daily JavaScript newsletter. Here we’ve used a basic instruction prompt with some extra guidance.
  • #34 Not bad. One issue with this kind of prompt is that when you always use the same input prompt, you’re going to see repeated outputs very quickly. While you can request some diversity and creativity through the model’s settings (e.g. temperature and number of outputs), it will buy a little runway, but for this example we want unique output forever.
  • #35 This time we have added examples. We call this a few-shot prompt, as we give the model a few examples of what the input/output should look like, and it continues the pattern. Technically this is a “1-shot” prompt, as we only have one example. We leave the prompt “hanging” at the end, so that the model will continue at exactly this point - generating the tip itself.
  • #36 Here’s the basic text interface in Python. As well as .text, you can dig into some other fields in the response too: prompt_feedback, which will tell you about any safety filtering that has happened. citation_metadata, which will tell you about any text that has been identified as duplicated on the web so you can cite your source
  • #37 And here’s a chat interface. Chat has conversational state, etc. If you are making raw API calls, you’ll need to manage that yourself. If you use the SDKs like I have here, it’s taken care for you. This statelessness has benefits though - it means you can save and resume conversations, and even edit history if you need to.
  • #40 Prompt design has a lot of depth and nuance to it - we have a number of guides up on the website, this one is very comprehensive and could be a talk on its own. e.g. If you need the Gemini API to perform multiple tasks, you might find it works better when you break the prompt up into small, atomic steps that are chained together. Adding context is also helpful - the model might be able to remember some facts, but they tend to work better when you give them all of the information they need to answer the question. And there are some model parameters you can tweak - things like temperature, top-k and top-p. This guide explains all of these and how they affect the model
  • #42 The next iteration on top of text generation is code generation. The concepts are the same, but you can request code or scripts as output. I’ll show a few examples.
  • #43 We can ask the model to generate data for us, here we have requested 3 columns and some sorting. https://aistudio.google.com/app/prompts/1muB-8fPe6K_aUsJzzn3Zp274pt9OImR3
  • #44 We can ask the model to generate data for us, here we have requested 3 columns and some sorting. https://aistudio.google.com/app/prompts/1muB-8fPe6K_aUsJzzn3Zp274pt9OImR3
  • #45 We can ask the model to generate data for us, here we have requested 3 columns and some sorting. https://aistudio.google.com/app/prompts/1muB-8fPe6K_aUsJzzn3Zp274pt9OImR3
  • #46 If we have a more complex, existing, structure, we can also specify it explicitly in the prompt, and request a query in plain English. https://aistudio.google.com/app/prompts/1muB-8fPe6K_aUsJzzn3Zp274pt9OImR3
  • #47 We can ask the model to generate data for us, here we have requested 3 columns and some sorting. https://aistudio.google.com/app/prompts/1muB-8fPe6K_aUsJzzn3Zp274pt9OImR3
  • #48 We can ask the model to simulate the results too. Note that this isn’t backed by a real SQL engine or even real data, so you can expect the output to be somewhat speculative. https://aistudio.google.com/app/prompts/1muB-8fPe6K_aUsJzzn3Zp274pt9OImR3
  • #50 A common question when starting with LLMs is “how do I bring my own data”? Whether that is asking questions about your internal knowledge base or building a chatbot for your website. All models have knowledge cut-off dates - they were trained at a specific point in time, and cannot know about anything that happened after. Because of this they are limited in their ability to reliably produce facts, and ultimately some things you want to ask about may not be public - whether it’s your product catalogue or files on your computer. The way we deal with this is to supply as much relevant information as we can to the model when we make requests.
  • #51 The simplest way to do supply contextual information is to just put it in the prompt. Some platforms call this a “stuff prompt”, as we just stuff the information into the prompt.
  • #53 Gemini 1.0 pro is limited to 30k tokens (input), limited to small problems. Gemini 1.5's limit is 1M tokens, handles much bigger problems.
  • #54 Gemini 1.0 pro is limited to 30k tokens, limited to small problems. Gemini 1.5's limit is 1M tokens, handles much bigger problems.
  • #55 We have focused on the textual capabilities of the Gemini API so far, but that has side-stepped one of the most powerful features - image understanding! Let’s start off by demoing some of the capabilities.
  • #58 Garden image (CC-BY-SA 4.0): https://commons.wikimedia.org/wiki/File:Humble_Administrator%27s_Garden_7193_(6399187813).jpg
  • #60 Those examples were fun, let’s try something that uses some reasoning. (note that this image was created for this slide deck, it is not going to be in the training dataset)
  • #62 As you’ve seen, the Gemini model is quite capable of understanding the contents of images. They are a positional part of the model input, so can be interleaved into text prompts as the subject of a question (e.g. what is this) or as context for some larger textual instruction (e.g. “these photos and text messages represent a day of activity, please summarise the events”).
  • #63 Let’s start by setting up the vision model. The API is the same as we used for text interactions - we just need to use a model that has vision capabilities. Here we’re using the Gemini Pro Vision model. If you use the wrong model you’ll get a pretty clear error message back.
  • #64 First test will be basic. Start by loading an image.
  • #65 And we use the same generate_content call. A single turn in the GenerateContent request takes a sequence of input chunks. Each chunk is a single mode, or mime type, so you can interleave text and images in your requests.
  • #66 Also RAG - though we don’t have an example in here
  • #67 Remember the JS tip generator from earlier? When we used the topic as input to the tip, we call that “conditioning”. We can do that with images too. A fun way to express this is by using images as a creative source. This example uses a hand-drawing of a character as the source of inspiration for a TTRPG character sheet. https://goo.gle/sketch-img
  • #68 To demonstrate structured data extraction, we have this paper form that has been filled out by hand. We can take a photo of the form using a smartphone this like
  • #69 And convert in directly into JSON. Note: There’s some nuance in the details - here the model has come up with a schema of its own, however it might not match yours. We have solutions for that too. (Ask the audience how they could solve it - this can be a chance to see if they’re learning anything) E.g. Few-shot prompting, like we saw before
  • #70 Few-shot example.
  • #72 Tuning is available in Google AIStudio, and through the API.
  • #73 In Google AIStudio click “New tuned model” and then either create a structured prompt or import data from google sheets.
  • #74 From google sheets, you choose which columns to use as input and output, then import the data.
  • #75 Set the tuning configuration, and click “Tune”
  • #76 After a few minutes of training you’ll find the training run in the “files” section. Then you can use that model to create a prompt.
  • #77 Similarly, you set the training data (here ~20 increment examples), and the tuning parameters. Then send the request to launch the job. There are special requests for monitoring the job that are not shown here.
  • #78 Once the job completes you just use the “tunedModels/” name anywhere you could pass a model name. Running inference with this model you can see that it picked up the intended task: increment. It Gives correct answers in roman numerals, french and Japanese, even though those were not included in the training set.
  • #79 Before we wrap up I want to cover something important to think about while working with generative technologies. These models are powerful but you need to make sure that their output is being used in a way that is safe. Google has published AI principles that they use for their AI products and I encourage you to do the same.
  • #80 The Gemini API comes with built-in safety filters to filter out certain content and these are enabled by default. Developers are able to change these, but we advise you to do so with caution, and to make sure you perform comprehensive evaluations along your critical user journeys before deploying them in any real-world scenario.