‫בית‬
.
‫תוכנה‬
.
‫חברה‬
.
Generative AI in C#
Harnessing Large Language Models
for Enhanced Development
info@zion-net.co.il
Is Generative
AI the End for
Developers?
About Me
 Alon Fliess:
 CTO of ZioNet
 More than 30 years of hands-on experience
 Microsoft Regional Director & Microsoft
Azure MVP
About ZioNet
 A Professional Software service provider company
 The home for hi-potential novice developers and expert
leaders
 We support our developers’ growth, provide them with
professional and personal mentoring
 ZioNet management has over 20 years of experience
 We strive to fulfill the need by ensuring developers have the best first-job
experience!
Introduction to AI
Overview of Different Types of AI
 Algorithm-based AI: Uses rule-based decision-
making systems
 Supervised Learning: Learns from a labeled dataset
 Unsupervised Learning: Finds hidden patterns in
data
 Reinforcement Learning: Improves via reward-
based feedback
 Hybrid AI: Combines different AI methodologies
 Large Language Models: Generates text by
learning from a tremendous amount of text
What is a Neural Network?
 Inspired by the human or animal brain
 Far from being the same
Large Language Model Overview (GPT, LLaMA, LaMDA, PaLM)
 Capture the semantics of a language
 Trained with very large amount of data
 All you need is Attention… and position (context)
 The model predicts the next world using
probability, based on the input text
 The next word (token) is predicted using the
original input and all the word that were
generated before
LLM Processing (GPT)
Tokenization
The text is split into smaller
parts, like words or parts of
words.
Mapping
Each word or part of a word is
given a unique number (ID).
Embedding
The ID for each word is then
turned into a list of numbers
(a vector) that represent the
meaning of the word.
Context
Understanding
The position of each word in
the text and the influence of
nearby words are considered
to understand the context.
Passing through Layers
All these vectors are passed
through a series of layers in a
neural network, which helps
the model understand
complex relationships
between words.
Output
The model gives an output
in the form of a prediction
for the next word or part of
the word based on the
input and learned context.
Training
The model learns from its
mistakes and adjusts its
predictions based on actual
data.
Tokenization - Explained
What are the Base Models
(OpenAI, Azure)
 Some predefined models were
trained to do specific tasks
 There are variations:
 The size (input + output) tokens
 The speed
 The usage price
 The ability to be fine-tuned
Can I have my Fined-Tuned Models?
 Why Fine-Tune?
 Improve the model performance on specific tasks
 Adapt the model to new data
 Customize the model's behavior
 When to Fine-Tune?
 When the Pre-Trained Model is not performing well, or the grounding (system) prompt is too large
 When you have a specific task or data
 How to Fine-Tune?
 Use tools like Azure Machine Learning Studio or OpenAI's Python client for fine-tuning
 Specify the Base Model and Dataset
 Monitor Fine-Tuning Job and retrain
 Use Fine-Tuned Model for Predictions
Learn how to Generate or Manipulate Text
 Classification, such as sentiment
 Generation – Create a text or formatted text (JSON, XML)
 Conversation – Chat to get information, give commands, or generate
and manipulate the result
 Transformation:
 Language Translation
 Text to emoji
 Any format to any format
 Summarization – reduce the size of text
 Completion – complete a statement
 Code - Use Codex to generate, complete, or manipulate source code
LLM for Code
is like SoC for
Hardware
Introduction to Prompt Engineering
 What is the first thing that comes to your mind when I say
“<prompt>?”
 Prompt Engineering is the art of crafting inputs to get desired
outputs from AI models
 It's a crucial part of using AI models effectively
 The design of the prompt can greatly influence the model's response.
 Examples:
 If you want a list, start your prompt with a numbered list.
 If you want a specific format, provide an example in that format.
 It often involves a lot of trial and error
 Different prompt strategies may work better for different tasks
 Less is more!
Prompt Engineering Recommendations
 Goal: Define what you want from the model
 Instructions: Be clear and explicit
 Examples: Use them for specific formats or styles
 Iterate: Experiment with different prompts
 Guidance: Use system-level and user-level instructions
 Settings: Adjust temperature and max tokens as needed
Playground – ChatGPT and Azure
 To get started, use ChatGPT
or Azure Open AI playground
- Demo
 You can generate your
boilerplate code for
grounding and examples
 You can play with fine-tuning
parameters
LLM Meets C#
aka.ms/semantic-kernel
The Easy Way To Add AI To Your App
Goals-First AI
ernel
Steps ready
from planner
1 2 3 …
RUNNING STEPS PIPELINE
Result
is ready
1 2 3 …
Execute
Steps
APIs
Gather
Connectors
Gather
Skills
SKILLS
V1 READY
Gather
Memories
It all starts with
a user’s AI ask …
ASK
… resulting in
new productivity GET
Semantic Kernel Main Concepts
 Prompt Functions:
 Define interaction patterns with LLMs
 Native Functions:
 Enable direct code execution by AI
 Plugins:
 Custom-built, modular elements enabling specialized LLM task handling
 Memory:
 Contextual data repository; supports key-value pairs and semantic embeddings
 Connectors:
 Interface with external data and APIs
 Planners:
 Orchestrate tasks, manage complex tasks through intelligent LLM planning
 Agents:
 Autonomous entities executing orchestrated tasks
Demo
 Event Log: Query Windows
Event Logs
 File System: Read and search
files
 Registry: Access and query
keys
 Performance: Monitor system
metrics
 Network Info: Retrieve
connection details
 Process Info: List running
processes
 Service Info: Query system
services
 WMI Query: Execute WMI
commands
 Everything: Advanced file search
 Paging: Supports result
pagination
 Tenant/PC ID: Requires GUIDs.
 Multi-Function: Multiple queries
in one call
 Parameter Flex: Customizable
parameters
 Truncation: Handles truncated
results
 Max Token: Customizable data
size.
Windows Troubleshooting Plugin Capabilities
The Windows
Troubleshooting Plugin
Lesson Learned
 The first project iteration: An OpenAI ChatGPT plugin
 Before Semantic Kernel
 Had to bridge LLM to a C# Code
 Simple APIs, Json, Reflection, Default Values
 Needed to reduce the description load dew to token limitation
 No Planner
 The second project iteration: Use multiple LLM and SK
 Work in progress
 Requires lots of fine tuning – debug while you go
 Expensive
 For agents  Use Asynchronous Model or a batch processing
Lessons Learned from Developing Plugin
 Plugin API Design
• Be Systematic: Stick to one or a few APIs for consistency
• OpenAPI Description: Provide a comprehensive Open API specification
 Plugin Manifest
• Examples: Include examples for each query method
• Instructions: Update examples and instructions if ChatGPT calls with the wrong data schema
 Data Handling
• Paging: Implement paging capabilities
• Truncation: Use HTTP code 206 and a special message to indicate truncated results
• Important Data: Always return key data like I do in the Windows Troubleshooting with Tenant Id and PC ID
 Performance & Limitations
• Trial and Error: Extensive testing is crucial
• Size Limits: Be aware of total size and per JSON element limits
• Resource Management: Be cautious of overusing ChatGPT 4 resources
 Dynamic Content
• For complex plugins, dynamically generate the Open API specification and plugin manifest
Lessons Learned from Embedding LLM into Applications
 Grounding & Serialization
• Provide accurate grounding and a JSON schema describing the result
• Use Semantic Kernel Prompt and Native function – use string and Json parameters
 Model Selection
• Use GPT3/3.5/4 for chat functionalities
• Use other models for embedding, image creation and recognition
 Performance & Cost
• GPT4 is more accurate but costly and slower
• Consider fine-tuned models if applicable – high cost for a good result
 Message Handling
• Handle truncated messages with a continuation strategy – less important on 120K tokens
• Manage message size by counting tokens and removing history
• Use summary messages to replace original history if needed
• Use Memory (Embedding/Vectorization)
Overcome the Model Limitation
 Token Limitation
 Use Summarization
 Use Memory (store important facts)
 Use Retrieval Augmented Generation (RAG) + Vector and other
search
 Overcome the lack of current information
 Use Bing, or other search Plugins
 Overcome cost and availability
 Use GPT3.5 Turbo
 Use Open-Source Models
 Prompt engineer cheap models (3.5) with GPT 4
The Windows Event Log Plugin
 https://github.com/alonf/WindowsEventLogChat
GPTPlugIn.git
 Retrieve specific events from the Windows Event
Log using XPath queries.
 Supports all major log names: Application,
Security, Setup, System, and ForwardedEvents.
 Use it to solve problems and get information
about your Windows system status.
 The plugin supports paging. It estimates the
number of tokens and limits the result.
Examples
Developing ChatGPT Plugin Using C#
 Use ASP
.NET Minimal or Controller based API
 Use YamlDotNet to convert Json Open API specification to yaml
based
 The Open API specification, the plugin manifest and the icon file can
come from the file system, or as a HTTP query
app.UseStaticFiles(new StaticFileOptions
{
FileProvider = new PhysicalFileProvider(
Path.Combine(app.Environment.WebRootPath, "OpenAPI")),
RequestPath = "/OpenAPI"
});
Developing ChatGPT Plugin Using C#
 Route HTTP Request to a function
 For a simple plugin, it is just a get request
 For A complex plugin, use POST with a body and have your own route
 In the Windows Troubleshooting plugin, I use a map of providers
and made the call to the specific function using reflection
 I use validation to make sure the data size and type is correct
 I provide extensive error message
 Use correlation id for local development (Tenant ID, PC ID)
 Use OAuth for released plugin
Controlling the Output layer
 Max response: defines the token limit for the model's response
 One token is roughly equivalent to 4 English characters
 Temperature: Controls the randomness of the model's responses.
 Lower values result in more deterministic responses, higher values lead to creativity
 Top P: Another parameter to control randomness
 Lower values make the model choose more likely tokens
 Stop sequence:
 Specifies a sequence at which the model should stop generating a response
 Frequency penalty: Reduces the likelihood of the model repeating the same
text
 by penalizing tokens that have appeared frequently
 Presence penalty:
 Encourages the model to introduce new topics in a response
 by penalizing any token that has appeared in the text so far.
Cost, Privacy & Security
 ChatGPT and Azure OpenAI services can be used without
donating your data
 For the Public ChatGPT you can ask to opt-out
 For the API, You can ask to opt-in
 ChatGPT 4 is a very hi-cost model (become cheaper)
 You can use ChatGPT 4 to create prompt and examples for ChatGPT
3.5
 You can host your model on-premise, however:
 The Open-Source models do not contain the latest ChatGPT 3.5 and 4
 You may train your own model - costly
Summary
 AI Types: Explored diverse AI forms
 Neural Networks & LLMs: Discussed their functionality
 Base Models & Fine-Tuning: Highlighted fine-tuning's importance
 Prompt Engineering: Introduced crafting effective prompts
 Semantic Kernel: Your LLM Swiss army tool
 Application Transformation: Extending applications with AI
 Autonomous Agents: The future of AI systems
LLM  The Software System on a Chip
34
The End
Generative AI in CSharp with Semantic Kernel.pptx

Generative AI in CSharp with Semantic Kernel.pptx

  • 1.
    ‫בית‬ . ‫תוכנה‬ . ‫חברה‬ . Generative AI inC# Harnessing Large Language Models for Enhanced Development [email protected]
  • 2.
    Is Generative AI theEnd for Developers?
  • 3.
    About Me  AlonFliess:  CTO of ZioNet  More than 30 years of hands-on experience  Microsoft Regional Director & Microsoft Azure MVP
  • 4.
    About ZioNet  AProfessional Software service provider company  The home for hi-potential novice developers and expert leaders  We support our developers’ growth, provide them with professional and personal mentoring  ZioNet management has over 20 years of experience  We strive to fulfill the need by ensuring developers have the best first-job experience!
  • 5.
  • 6.
    Overview of DifferentTypes of AI  Algorithm-based AI: Uses rule-based decision- making systems  Supervised Learning: Learns from a labeled dataset  Unsupervised Learning: Finds hidden patterns in data  Reinforcement Learning: Improves via reward- based feedback  Hybrid AI: Combines different AI methodologies  Large Language Models: Generates text by learning from a tremendous amount of text
  • 7.
    What is aNeural Network?  Inspired by the human or animal brain  Far from being the same
  • 8.
    Large Language ModelOverview (GPT, LLaMA, LaMDA, PaLM)  Capture the semantics of a language  Trained with very large amount of data  All you need is Attention… and position (context)  The model predicts the next world using probability, based on the input text  The next word (token) is predicted using the original input and all the word that were generated before
  • 9.
    LLM Processing (GPT) Tokenization Thetext is split into smaller parts, like words or parts of words. Mapping Each word or part of a word is given a unique number (ID). Embedding The ID for each word is then turned into a list of numbers (a vector) that represent the meaning of the word. Context Understanding The position of each word in the text and the influence of nearby words are considered to understand the context. Passing through Layers All these vectors are passed through a series of layers in a neural network, which helps the model understand complex relationships between words. Output The model gives an output in the form of a prediction for the next word or part of the word based on the input and learned context. Training The model learns from its mistakes and adjusts its predictions based on actual data.
  • 10.
  • 11.
    What are theBase Models (OpenAI, Azure)  Some predefined models were trained to do specific tasks  There are variations:  The size (input + output) tokens  The speed  The usage price  The ability to be fine-tuned
  • 12.
    Can I havemy Fined-Tuned Models?  Why Fine-Tune?  Improve the model performance on specific tasks  Adapt the model to new data  Customize the model's behavior  When to Fine-Tune?  When the Pre-Trained Model is not performing well, or the grounding (system) prompt is too large  When you have a specific task or data  How to Fine-Tune?  Use tools like Azure Machine Learning Studio or OpenAI's Python client for fine-tuning  Specify the Base Model and Dataset  Monitor Fine-Tuning Job and retrain  Use Fine-Tuned Model for Predictions
  • 13.
    Learn how toGenerate or Manipulate Text  Classification, such as sentiment  Generation – Create a text or formatted text (JSON, XML)  Conversation – Chat to get information, give commands, or generate and manipulate the result  Transformation:  Language Translation  Text to emoji  Any format to any format  Summarization – reduce the size of text  Completion – complete a statement  Code - Use Codex to generate, complete, or manipulate source code
  • 14.
    LLM for Code islike SoC for Hardware
  • 15.
    Introduction to PromptEngineering  What is the first thing that comes to your mind when I say “<prompt>?”  Prompt Engineering is the art of crafting inputs to get desired outputs from AI models  It's a crucial part of using AI models effectively  The design of the prompt can greatly influence the model's response.  Examples:  If you want a list, start your prompt with a numbered list.  If you want a specific format, provide an example in that format.  It often involves a lot of trial and error  Different prompt strategies may work better for different tasks  Less is more!
  • 16.
    Prompt Engineering Recommendations Goal: Define what you want from the model  Instructions: Be clear and explicit  Examples: Use them for specific formats or styles  Iterate: Experiment with different prompts  Guidance: Use system-level and user-level instructions  Settings: Adjust temperature and max tokens as needed
  • 17.
    Playground – ChatGPTand Azure  To get started, use ChatGPT or Azure Open AI playground - Demo  You can generate your boilerplate code for grounding and examples  You can play with fine-tuning parameters
  • 18.
  • 19.
    aka.ms/semantic-kernel The Easy WayTo Add AI To Your App Goals-First AI ernel Steps ready from planner 1 2 3 … RUNNING STEPS PIPELINE Result is ready 1 2 3 … Execute Steps APIs Gather Connectors Gather Skills SKILLS V1 READY Gather Memories It all starts with a user’s AI ask … ASK … resulting in new productivity GET
  • 20.
    Semantic Kernel MainConcepts  Prompt Functions:  Define interaction patterns with LLMs  Native Functions:  Enable direct code execution by AI  Plugins:  Custom-built, modular elements enabling specialized LLM task handling  Memory:  Contextual data repository; supports key-value pairs and semantic embeddings  Connectors:  Interface with external data and APIs  Planners:  Orchestrate tasks, manage complex tasks through intelligent LLM planning  Agents:  Autonomous entities executing orchestrated tasks
  • 21.
  • 22.
     Event Log:Query Windows Event Logs  File System: Read and search files  Registry: Access and query keys  Performance: Monitor system metrics  Network Info: Retrieve connection details  Process Info: List running processes  Service Info: Query system services  WMI Query: Execute WMI commands  Everything: Advanced file search  Paging: Supports result pagination  Tenant/PC ID: Requires GUIDs.  Multi-Function: Multiple queries in one call  Parameter Flex: Customizable parameters  Truncation: Handles truncated results  Max Token: Customizable data size. Windows Troubleshooting Plugin Capabilities
  • 23.
  • 24.
    Lesson Learned  Thefirst project iteration: An OpenAI ChatGPT plugin  Before Semantic Kernel  Had to bridge LLM to a C# Code  Simple APIs, Json, Reflection, Default Values  Needed to reduce the description load dew to token limitation  No Planner  The second project iteration: Use multiple LLM and SK  Work in progress  Requires lots of fine tuning – debug while you go  Expensive  For agents  Use Asynchronous Model or a batch processing
  • 25.
    Lessons Learned fromDeveloping Plugin  Plugin API Design • Be Systematic: Stick to one or a few APIs for consistency • OpenAPI Description: Provide a comprehensive Open API specification  Plugin Manifest • Examples: Include examples for each query method • Instructions: Update examples and instructions if ChatGPT calls with the wrong data schema  Data Handling • Paging: Implement paging capabilities • Truncation: Use HTTP code 206 and a special message to indicate truncated results • Important Data: Always return key data like I do in the Windows Troubleshooting with Tenant Id and PC ID  Performance & Limitations • Trial and Error: Extensive testing is crucial • Size Limits: Be aware of total size and per JSON element limits • Resource Management: Be cautious of overusing ChatGPT 4 resources  Dynamic Content • For complex plugins, dynamically generate the Open API specification and plugin manifest
  • 26.
    Lessons Learned fromEmbedding LLM into Applications  Grounding & Serialization • Provide accurate grounding and a JSON schema describing the result • Use Semantic Kernel Prompt and Native function – use string and Json parameters  Model Selection • Use GPT3/3.5/4 for chat functionalities • Use other models for embedding, image creation and recognition  Performance & Cost • GPT4 is more accurate but costly and slower • Consider fine-tuned models if applicable – high cost for a good result  Message Handling • Handle truncated messages with a continuation strategy – less important on 120K tokens • Manage message size by counting tokens and removing history • Use summary messages to replace original history if needed • Use Memory (Embedding/Vectorization)
  • 27.
    Overcome the ModelLimitation  Token Limitation  Use Summarization  Use Memory (store important facts)  Use Retrieval Augmented Generation (RAG) + Vector and other search  Overcome the lack of current information  Use Bing, or other search Plugins  Overcome cost and availability  Use GPT3.5 Turbo  Use Open-Source Models  Prompt engineer cheap models (3.5) with GPT 4
  • 28.
    The Windows EventLog Plugin  https://github.com/alonf/WindowsEventLogChat GPTPlugIn.git  Retrieve specific events from the Windows Event Log using XPath queries.  Supports all major log names: Application, Security, Setup, System, and ForwardedEvents.  Use it to solve problems and get information about your Windows system status.  The plugin supports paging. It estimates the number of tokens and limits the result.
  • 29.
  • 30.
    Developing ChatGPT PluginUsing C#  Use ASP .NET Minimal or Controller based API  Use YamlDotNet to convert Json Open API specification to yaml based  The Open API specification, the plugin manifest and the icon file can come from the file system, or as a HTTP query app.UseStaticFiles(new StaticFileOptions { FileProvider = new PhysicalFileProvider( Path.Combine(app.Environment.WebRootPath, "OpenAPI")), RequestPath = "/OpenAPI" });
  • 31.
    Developing ChatGPT PluginUsing C#  Route HTTP Request to a function  For a simple plugin, it is just a get request  For A complex plugin, use POST with a body and have your own route  In the Windows Troubleshooting plugin, I use a map of providers and made the call to the specific function using reflection  I use validation to make sure the data size and type is correct  I provide extensive error message  Use correlation id for local development (Tenant ID, PC ID)  Use OAuth for released plugin
  • 32.
    Controlling the Outputlayer  Max response: defines the token limit for the model's response  One token is roughly equivalent to 4 English characters  Temperature: Controls the randomness of the model's responses.  Lower values result in more deterministic responses, higher values lead to creativity  Top P: Another parameter to control randomness  Lower values make the model choose more likely tokens  Stop sequence:  Specifies a sequence at which the model should stop generating a response  Frequency penalty: Reduces the likelihood of the model repeating the same text  by penalizing tokens that have appeared frequently  Presence penalty:  Encourages the model to introduce new topics in a response  by penalizing any token that has appeared in the text so far.
  • 33.
    Cost, Privacy &Security  ChatGPT and Azure OpenAI services can be used without donating your data  For the Public ChatGPT you can ask to opt-out  For the API, You can ask to opt-in  ChatGPT 4 is a very hi-cost model (become cheaper)  You can use ChatGPT 4 to create prompt and examples for ChatGPT 3.5  You can host your model on-premise, however:  The Open-Source models do not contain the latest ChatGPT 3.5 and 4  You may train your own model - costly
  • 34.
    Summary  AI Types:Explored diverse AI forms  Neural Networks & LLMs: Discussed their functionality  Base Models & Fine-Tuning: Highlighted fine-tuning's importance  Prompt Engineering: Introduced crafting effective prompts  Semantic Kernel: Your LLM Swiss army tool  Application Transformation: Extending applications with AI  Autonomous Agents: The future of AI systems LLM  The Software System on a Chip 34
  • 35.

Editor's Notes

  • #6 Old algorithm-based AI, such as min-max
  • #7 Algorithm-based AI: These are rule-based systems that follow predefined logic to make decisions. Examples include traditional algorithms like Minimax (used in game theory and decision making), A* (used in pathfinding and graph traversal), and others. Machine Learning (ML) based AI: This type of AI learns patterns from data and makes predictions or decisions without being explicitly programmed to do so. Machine Learning can be further divided into subtypes: Supervised learning: The model is trained on labeled data (data with known outputs). Unsupervised learning: The model is trained on unlabeled data and identifies patterns within the data. Reinforcement learning: The model learns based on rewards and penalties and iteratively improves its performance. Deep Learning based AI: A subset of ML, deep learning utilizes artificial neural networks with multiple layers (hence the term 'deep') to model and understand complex patterns. This is the foundation for models like Convolutional Neural Networks (CNNs) used in image recognition, or Recurrent Neural Networks (RNNs) and their derivatives like LSTMs and GRUs used in sequence prediction tasks. Hybrid AI: Hybrid AI systems use a combination of the above methods to solve complex problems. They may combine rule-based algorithms with machine learning models to leverage the strengths of both. Large Language Models (LLM): These are a type of AI models based on deep learning, specifically using architectures like Transformers. They are trained on a large corpus of text and are capable of generating human-like text, making them useful in a variety of applications, which we'll be delving into in this presentation.
  • #8 Prompt: Show an example of multilayer neural networks with the weights and functions. The human brain contains around 86 billion neurons, each connected to thousands of others, for a total of about 100 trillion connections
  • #9 Tokenization: The text input is broken down into tokens. These tokens can represent words, parts of words, or even individual characters, depending on the language model. Integer IDs: Each token is mapped to an integer ID according to a predefined vocabulary. Embedding: Each integer ID is then mapped to a high-dimensional vector. These vectors, or embeddings, represent the meanings of the tokens and are learned during the training phase. Positional Encoding: The model takes into account the position of each token in the sequence, adding this information to the embeddings. This allows the model to understand the order of the words in a sentence, which is crucial for understanding language. Transformer Layers: The sequence of updated embeddings is passed through a series of transformer layers. These layers can "attend" to different parts of the input sequence when making predictions for each token. The transformer layers essentially help the model to understand the context of each word by taking into account the other words in the sentence. Output: The final transformer layer outputs a new sequence of vectors, one for each input token. Each vector is a set of scores, one for each word in the model's vocabulary. The model's prediction for each token is the word with the highest score. Training and Adjustment: The whole process is governed by a large number of parameters (weights and biases in the embeddings and transformer layers). These parameters are adjusted during the training phase to minimize the difference between the model's predictions and the actual words in the training data. The model learns to predict each word based on the context provided by the other words in the sentence.
  • #10 Tokenization: The text input is broken down into tokens. These tokens can represent words, parts of words, or even individual characters, depending on the language model. Integer IDs: Each token is mapped to an integer ID according to a predefined vocabulary. Embedding: Each integer ID is then mapped to a high-dimensional vector. These vectors, or embeddings, represent the meanings of the tokens and are learned during the training phase. Positional Encoding: The model takes into account the position of each token in the sequence, adding this information to the embeddings. This allows the model to understand the order of the words in a sentence, which is crucial for understanding language. Transformer Layers: The sequence of updated embeddings is passed through a series of transformer layers. These layers can "attend" to different parts of the input sequence when making predictions for each token. The transformer layers essentially help the model to understand the context of each word by taking into account the other words in the sentence. Output: The final transformer layer outputs a new sequence of vectors, one for each input token. Each vector is a set of scores, one for each word in the model's vocabulary. The model's prediction for each token is the word with the highest score. Training and Adjustment: The whole process is governed by a large number of parameters (weights and biases in the embeddings and transformer layers). These parameters are adjusted during the training phase to minimize the difference between the model's predictions and the actual words in the training data. The model learns to predict each word based on the context provided by the other words in the sentence. The output: During the initial pass, all input tokens influence all other input tokens in forming their respective context-aware hidden states. During the text generation phase, each newly generated token is influenced by all previous tokens (including both original input tokens and already generated tokens), but it does not influence the hidden states of the tokens that came before it.
  • #11 Prompt: I am an LLM beginner that need some examples. Please show using a basic example the different parts of Large Language model execution stages. Use vectors and matrices. Use a few token-based language. Prompt: Show an example of two words that there embedding vectors are closed Prompt: f I have a sentence the King loves the Queen. What does the attention algorithm do? Can you show it using vector numbers?
  • #17 Define Your Goal: Understand what you want the model to generate. This could be a specific type of text, a certain format, or a particular style. Craft Clear Instructions: Start your prompt with explicit instructions. The more specific you are, the better the model can generate the desired output. Provide Examples: If you want a specific format or style, provide an example in your prompt. The model will try to follow the pattern you set. Experiment and Iterate: Don't be afraid to tweak your prompts and try different approaches. Prompt engineering often involves a lot of trial and error. Use System-Level and User-Level Instructions: These can guide the model's behavior throughout the conversation or for a specific user turn. Adjust Temperature and Max Tokens: These settings can influence the output. Higher temperature values make output more random, while lower values make it more deterministic. Max tokens limit the length of the model's response.
  • #19 Semantic Kernel has more GitHub stars on a daily basis, more developers joining our open source Discord community, and regular blog posts are trending as well. You’ve joined Semantic Kernel at the right time!
  • #20 It always starts with an Ask. A user has a goal they want to achieve. We have seen how the Kernel orchestrates the ask to the planner. The Planner finds the right AI skills that can be used to solve that need. Some skills are enhanced with memories and with live data connections. The steps to complete the users ask are executed as part of the plan and the results are returned to the user, resulting in productivity gains and ideally the goal reached.
  • #28 Prompt: What are the main APIs of Dapr Workflow Building Block Prompt: https://github.com/dapr/python-sdk/blob/master/ext/dapr-ext-workflow/dapr/ext/workflow/dapr_workflow_client.py Prompt: https://github.com/alonf/Apple_IIe_Snake/blob/master/snake_asm.txt Prompt: Please describe the code Prompt: Please translate the code to C Prompt: Continue