This post was contributed by Provar’s Chief Strategy Officer, Richard Clark. For more blog posts contributed by Provar’s executive leadership team, be sure to check back!
AI is everywhere on social media, TV, streams, and newspapers. It’s hard to avoid it. Whether espousing AI as a tool of the devil or a super being, it’s essential to understand some new terms. By no means is this complete! This list is meant to be fun and digestible, not as an all-encompassing reference guide.
I’ve compiled an A-Z list of popular terms for Generative AI, like ChatGPT. It helps everyone grasp their meanings and importance, which is crucial for the AI revolution.
AI
AI is the broad concept of having machines think and act like humans. Generative AI is a specific type of AI.
Bias
Any large language model (LLM) is limited to a set of data on which its answers are based. Like humans, it can only repeat what it has learned but also introduces subtle variances (see Hallucination). If we feed an LLM based on the National Enquirer, we’d get different results than if we used The New York Times (well, you’d hope so anyway).
ChatGPT
A conversational AI developed by OpenAI to allow users to interact with its LLMs through a chatbot interface. The GPT element stands for Generative Pre-trained Transformer, which is a fancy way of saying it’s an LLM using historical data sets (pre-trained) and a unique algorithm for generating human-like responses (using natural language processing). As OpenAI increases the data set on which their LLM is trained, they release version updates such as 3, 3.5, and 4, which reflect the currency of the information used. Despite the lag in information currency, these models can be augmented through Grounding to increase the accuracy and relevance of results.
Deep Learning
Broadly used within the domain of Machine Learning, which includes the method in AI for teaching computers to process data much like the human brain. The fundamental method enables AIs to recognize pictures, text, and sounds more accurately, including when only incomplete information is available. Using pre-classified examples, the model learns to recognize patterns and apply the knowledge when it comes up against new content it has never encountered before. The word “deep” applies to the multiple layers of the neural network used.
Error Metrics
Everyone must understand AIs are fallible, just like us. To measure this, Error Metrics are a mechanism for benchmarking how well an AI is performing. When you increase the Variance in an AI model, you also increase the errors.
Fine Tuning
When using an LLM to optimize the results, the weighting of parameters (specifically, hyperparameters) is adjusted. If I want an AI to suggest something unique, I would increase its Variance. I would lower it if I wanted a precise, textbook, consistent response. You can also fine-tune responses by feeding back Generative AI responses and asking them to improve their answer to consider additional Grounding instructions.
Grounding
I wanted to cover GPT, but Grounding is more critical to understand. Grounding is a mechanism for reducing Hallucinations in the AI responses through additional external data sets.
It’s one of many mechanisms for fine-tuning and training the model to your requirements. Salesforce’s new AI Cloud is exciting, with a built-in capability to augment publicly available LLMs with customer data sources.
For Salesforce customers, it would include using the data in your systems to give narrower and more specific results. What works for my competitor in terms of their product sales as a following best action may be entirely different to me based on my product’s different capabilities, customer demographics, and historical success and failures.
You can also Ground a model through human feedback on the quality of responses.
Hallucination
Occurs in AI when the output from a generative AI, in particular, comes to a false result and creates content inconsistent with reality. In my experience, if I ask GPT to create a unit test following best practices, it creates unnecessary teardown functions and annotations that don’t exist. Still, reflection could be a good idea in the future!
While very interesting for creativity purposes and innovation, we generally think of Hallucinations as undesirable outcomes, and they can indicate a problem in the model or insufficient prompting and Grounding to accomplish the task requested.
Prompt chaining is a valuable mechanism for the model to re-evaluate its previous answers, find errors, and fix them. It’s crucial to understand Hallucination and not blindly trust the AI output without validation.
Inference
The process where we expose an AI model trained on one set of data to a new scenario or data set it hasn’t seen before and allow it to apply what it’s learned on its trained data to make predictions or classifications based on its previous knowledge.
Joint Probability
The probability of two or more events occurring simultaneously. In AI, it can be used to determine the dependency between events or variables.
K-means, K-nearest, K-fold
You read a proper machine learning data science paper at this point. Just be aware these terms exist. Next.
Large Language Model (LLM)
Of course, we all know what an LLM is, right? An LLM is an AI trained on a lot of text data. This training data can be used in conversation to generate human-sounding responses. Different LLMs specialize in writing poetry, generating code, or answering questions, while others, like ChatGPT, attempt to provide answers for a broad set of needs.
Machine Learning
Before generative AI hit the headlines, most AI applications in software were specifically relying upon Machine Learning (but most still can’t tell you how they use it, hmm…). Despite vendor miscommunication, Machine Learning focuses explicitly on classifying existing data only. This is important for the pre-trained aspects of GPT tools but not the same as Machine Learning, which does not make predictions or have the capacity for Hallucination or creating new content. Machine Learning can provide probabilities. If you show it a picture of a dog and have trained it only on cat pictures, it will probably say (depending on the dog) it’s 20% likely to be a cat when a human can say categorically (most of the time). Remember this when using your camera phone to identify edible plants when foraging. Yes, it could still be a deadly nightshade you’re about to put into that salad! Do you feel lucky?
Natural Language Processing (NLP)
NLP is the branch of AI dedicated to understanding human language and its actual use, not just perfect grammatical inputs. It includes both text and voice inputs and is used by Generative AI to process requests and generate results. You can ask a Generative AI to make its responses more or less formal, or you can ask it to write for a specific audience or age category.
Overfitting
This can occur when an AI model has a too-narrow set of training data and, as a result, becomes too specialized and less reliable at handling Inference. You could have a Machine Learning system only trained on cats, which means it’s much worse at understanding why a dog isn’t a cat than one trained on both cats and dogs. Don’t forget to use foxes either … I’m sure they’re in between the two!
Prompting
Prompting, or Prompt Engineering, is an AI technique that provides more detailed instructions to the LLM to guide the output for your specific requirement. This can include informing the AI what its expertise is “As A Quality Engineer.” Related to this, templating inputs to AIs is a helpful way to structure the repeatability of responses, reduce the verbosity of typed requests, or limit input for specific topics.
Chain of Thought Prompting allows you to tell the AI you will provide input through multiple steps and wish to receive responses in multiple outputs, too. You can also use Prompt Chaining to take the output of one request and feed it back into the AI with a request to use that output to perform further analysis or generation. Example: Create me an Apex trigger to XYZ -> Using this Apex class, create a Unit Test using best practices -> Create me a Salesforce Flow that passes this Apex Unit Test.
Quantization
It’s a method for shrinking AI models and speeding them up. It approximates numbers, sacrificing some accuracy, which saves memory and processing power. With Generative AI moving to smaller devices, this method will be used more.
Reinforcement
Reinforcement learning is a process for training AI models to improve their results through feedback about the output provided. This can be used for Grounding, but the feedback can be collected from human input to improve the results of more general Machine Learning. When I show my cat detector a picture of a dog, and it says it’s 30% likely to be a cat, I need to tell it, no, this is a dog, to learn what dogs are. Eventually, it will tell me it’s 90% a dog and not 10% likely to be a cat.
Supervised Learning
A paradigm for teaching a model using examples with known answers. It’s used heavily in image recognition, translation, and predictions to improve accuracy before introducing unseen data. See Inference.
Transformers
It is a deep learning model that is incredibly useful for NLP. Transformers help tokenize the input received, apply the model’s understanding of the meaning and requests, and create the appropriate output unique to that combination of request, understanding, and type of response required. Transformers can process based on a series of interactions (a conversation) rather than only on a single question. This is why when we both ask ChatGPT the same question, we may get different responses depending on our prior conversations in that session.
Unsupervised Learning
Unsurprisingly, Unsupervised Learning lets AI find hidden patterns in your data without specific guidance on what’s right or wrong. This allows teams using AI to discover new patterns and correlations in data that may not have been known previously. One day, we may know if a butterfly flapping its wings can cause a storm on the other side of the world.
Validation
In Machine Learning terms, validation checks how well the model performs during and after training. The model is tested using a new, previously unseen data set to validate it is learning and not just repeating answers to questions it already knows. It’s a valuable mechanism for validating if overfitting is occurring.
Weighting/Weight Initialization
Weight Initialization is the process of setting starting hyperparameters before training the model. Effective weighting can impact the convergence, speed, and efficiency of LLMs. Each LLM will likely have different Weight Initializations, which can affect their specialization and the training data used.
Explainable AI (XAI)
It provides insight into what has influenced the model’s results, which is essential for understanding and trusting them. It can help understand reasons for bias, for example, or opportunities to improve results to be more accurate. Transparency of results is a significant requirement for the ethical use of AI. When discussing with app vendors how they use AI in their products and where their data comes from, it is unacceptable and unethical for them to say, “I can’t tell you. It’s intellectual property”.
You Only Look Once (YOLO)
YOLO relates to real-time object detection. It’s an algorithm that rapidly uses a single pass through an image or video to allow objects to be identified in a single scan. It’s commonly used in self-driving cars, security cameras, and robotics. It has absolutely nothing to do with reckless behavior or social media memes.
Zone of Proximal Development (ZPD)
Ah, you thought I wouldn’t have a Z, right? ZPD is an educational concept. Like children going through school, ZPD is the fancy name for training a model to perform progressively more complex tasks to improve their learning ability.
Interested in learning more about how Provar uses intelligent capabilities in its solutions? Connect with us today!