What AI Can Actually Do Today
After a year spent deep in the weeds understanding and programming AI for electronic component sourcing, I’ve recently emerged to witness the onslaught of marketing hype about AI - and boy is there a lot of hype.
Here is a description of what is possible, and how you can actually achieve it with a focus on electronic components.
I’ll approach this by describing the most popular types of AI available today in ascending order of what they can really do. Note I will not mention Generative AI, GenAI, GAI, or Conversational AI. That’s because these are terms that describe a use of AI, not the technology itself.
Table of Contents
A few key concepts
ChatGPT is not an AI, it is an interface to an AI. ChatGPT is created by OpenAI and interfaces to an AI, in their case the Large Language Models (LLMs) GPT3.5 and GPT4.
LLMs are just one form of AI, and Generative Pretrained Transformers (GPT) are just one type of LLM.
Interacting with ChatGPT utilizes tokens. A token, in simple terms, is a unit of measure for the volume of information. 1000 tokens are roughly 750 words.
Context Window is the number of tokens that can be sent to the LLM.
Knowledge Cut Off Date refers to last date up to which the LLM was trained.
LLMs do not have memory. They only seem to have memory because the interface (like ChatGPT) keeps track of the inputs and outputs and passes this history to the LLM each time you send a prompt. This is how LLMs seem to chat, meaning they remember the history of your conversation.
A prompt consists of three sets of text:
System Instructions, which provide high level guidance to the LLM and are included with every prompt.
User Query is the actual question/instruction input by the user.
Context is optional data to supplement the LLMs training data.
To maintain focus I won’t try to explain the universe of other AIs or LLMs and will concentrate on the models from OpenAI. Let’s dive in.
What ChatGPT can do
Anonymous
This is the first free version of ChatGPT that started it all. It is connected to GPT3.5 only, the Context Window is 16k, the Knowledge Cutoff Date is Sept 2021. Here’s what it’s good for:
· Summarization
· Knowledge Retrieval
· Comparing and Contrasting ideas
· Role Play
· Simulation
· Exam Preparation
And it can do a zillion other things. Many fun things. Google “ChatGPT prompts for Procurement” or something similar, to discover ideas for work. It really is fantastic, but…
GPT3.5 is terrible if you want answers that are 100% correct, 100% of the time.
Or if your question requires multiple levels of abstraction; as in figure out question A, then figure out question B, then use these results to answer question C.
It is also laughably bad at identifying potential sources. This is mentioned a lot but I find it basically useless for this purpose. If you're curious try it yourself by asking it for sources you already know and see if it replicates your knowledge.
On the plus side GPT3.5 is fast, at least relative to GPT4.
It can also make positive contributions to reviewing or formulating supplier agreements, supplier reviews, standard terms, and similar common documents.
GPT3.5 is knowledgeable about standards like ISO9000, IPC610 and regulations like FAR, DFAR, and ITAR.
To me, GPT3.5 is good for two things, very straightforward knowledge retrieval, and semantic parsing (more on this later).
With Free Account
Capabilities same as above plus three features.
1 You can add Custom Instructions, which are essentially modifications to ChatGPT’s System Instructions. Very nice if your use case is repetitive, but it can be a pain for general use.
2 It is more capable of acting as a data analyst, meaning it can generate code. You need to toggle this to On in Settings for best performance.
3 It will archive chats so you can go back to them later.
It also adds Memory, but what this actually means is currently a mystery. The assumption is that it is using some algorithm to decide which of your inputs have lasting significance, but nobody knows what that means in practice.
Probably not a good place to explore inner demons, which reminds me…
IMPORTANT WARNING: CHATGPT FREE VERSIONS MAY RETAIN ANYTHING YOU INPUT AND USE THE INFORMATION TO TRAIN FUTURE MODELS. NEVER INPUT ANYTHING PERSONAL OR PROPRIETARY INTO A FREE VERSION OF CHATGPT.
UPDATE May 13, 2024
OpenAI announced a Spring Update focused on the roll out of GPT-4o, their latest model with improved multi-modal capabilities, better language support, and better reasoning.
GPT-4o also extends knowledge cut off date to December 2023. It will be rolled out to Enterprise, Team, API, and ChatGPT+ users first.
Then “over the coming weeks” the free version will be remarkably enhanced with these features:
· GPT-4o access
· Create charts
· Chat about photos you provide
· Upload files (big deal, see comment below)
· Use of Custom GPT’s
What ChatGPT+ can do ($20/month)
Now it gets interesting.
There are several important upgrades with ChatGPT+:
1 Option to use GPT4.
2 128k Context Window.
3 Can append documents.
4 Does NOT retain inputs for future training.
And the Knowledge Cutoff Date is Dec 2023, which is super useful if you want to use ChatGPT to help you code because of the rapid pace of software development.
Let’s take these upgrades in order. GPT4 is much more performant than GPT3.5, it can reason much better.
Believe me, it is so much better it’s not worth discussing. Google the research papers if you want, but for any higher order task there is simply no comparison.
Which brings us to Context Window. 128K is a lot of data, far more than you would ever realistically type. You might paste in this much data, but even better…
You can now directly drop in files. This is a very big advance.
When you add files you are providing context as described above. Now you can base your queries on ChatGPT’s knowledge plus the context you have provided. For example, you might drop in two datasheets and ask for a comparison.
You can drop in thousands of rows Excel data, especially if you use CSV which is typically more compact, and then execute instructions related to the data.
With the ability to add this much Context the importance of GPT4’s greater performance is amplified, GPT3.5 would be overwhelmed.
NOTE: We will update once the Spring Update is fully rolled out. At the moment it’s looking like the only improvements ChatGPT+ will offer over the free version are faster performance, more queries per day, and data NOT retained for future training.
What you can do with the OpenAI API
And now it gets serious. This is where the action is.
This will make more sense if we start with how you access it.
I mentioned earlier ChatGPT is just an interface to the LLMs GPT3.5/GPT4 that passes the prompt consisting of System Instruction, User Query, and Context. But how does ChatGPT interface to the LLM?
Via an API. And you can use the very same API.
The back end
The back end of software handles the data processing. You will need to write custom software, but don’t worry, anyone can do it!
A few steps are necessary to get started. First, you need to choose a programming language. NodeJS and Python are common choices. I chose Python, this is the most common for AI applications. Install Python (don’t use Conda for this), install Visual Studio (or your IDE of choice) activate GitHub (extremely helpful but not required), learn how to create virtual environments, get your OpenAI API keys, and choose a low/no code interface (I chose Streamlit).
Can you guess why I’m not giving you more detailed instructions?
That’s right, ChatGPT+ can walk you through every step!
It will literally give you the code, though you will need to become accustomed to describing what you want.
To start, tell it your computer and operating system and it will walk you through installing Python (which can be a real pain). There are tons of YouTube walkthroughs on each of these steps as well.
WARNING: BEWARE LANGCHAIN. IF YOU START SEARCHING THESE TOPICS YOU WILL BE INTICED TOWARD LANGCHAIN. DON’T DO IT!
Many of us that started this AI journey in February/March of 2023 were seduced by the siren call of Langchain and it almost killed us. Surely Langchain will improve, but if you choose to put your sanity in the hands of Langchain you are warned!
Now you are in the domain of creating an AI software application. Your mission is clear:
1 Accept a query.
2 Understand its intent.
3 Add system instructions.
4 Add context.
5 Send to LLM.
6 Manage the return from the LLM.
Query/Intent: The fancy term for this is Semantic Routing. In your first efforts you can skip this, but it quickly becomes important.
I use the GPT3.5 API for this because it is very fast and capable of the task. Use ChatGPT to get specific instructions, details after this become quite boring.
System Instructions: Based on the Semantic Routing you will apply System Instructions appropriate to the task. Also too detailed for this article, ask ChatGPT for guidance.
Context: You will provide context data relevant to the User Query. This is a HUGE subject and critical to getting a successful response so we will examine this in more detail.
About Context
Why is context so important?
Because if information is not already in the LLM’s training data and you don’t supply it with the information it needs to answer a query, it will fail, simple as that.
Think of real time inventory levels as an example. All proprietary data is definitionally not part of the LLM’s training data, so you must provide it via Context.
Before we dive into managing Context let’s understand cost. The OpenAI API is a paid feature, current costs (as of 5-13-24) are:
$10.00 per million tokens for GPT-4 Turbo
$5.00 per million tokens for GPT-4o
$0.50 per million tokens for GPT-3.5
This is for input tokens, the tokens returned cost 3x.
While this is amazingly inexpensive relative to the value, it still adds up if you’re not careful.
If you take full advantage of the 128k context window for GPT-4o each time you send a query it will cost $0.64 ($5 / 1 million x 128k).
There are three key elements to understanding Context:
1 Loading & Splitting
2 Indexing
3 Retrieval
Loading & Splitting
When it comes to managing Context data the predominant method is a vector database. A vector is a mathematical representation of data.
Before you convert data into vectors you have to do two things, load it and split it.
Let’s use a PDF datasheet as an example. Say you want to be able to ask questions about a datasheet. First you get the URL and download the datasheet.
Then you need a loader that reads the PDF and converts it to a machine-readable text format. This is pretty straightforward.
Now, because you need to respect the Context Window, you will split the data into smaller pieces. To these smaller pieces a header and metadata is added to create Documents.
I know, this is confusing, your data and documents (usually text, PDF, Excel, or CSV) are used to create smaller Documents, but that’s how it is.
The strategy you use to split the data into Documents is very important. The most common method is “chunking”, which usually means arbitrarily splitting the data every X number of tokens. You get to pick the value of X, which will make more sense in a moment.
Chunking randomly like this can cause the text to lose meaning, so you can also choose to overlap each chunk by Y tokens. So, you might split every 200 tokens with a 15 token overlap meaning the first 15 tokens of the next chunk will be the last 15 tokens of the prior chunk.
Or you might not chunk at all.
A datasheet, for example, is better split by page than by arbitrary chunk.
Other common strategies are splitting by sentences or even paragraphs, both with the goal of preserving meaning.
Or structured data like spreadsheets or tables might be split by some change in a cell value.
Indexing
Now that you have a set of split up Documents you can store them in a vector database.
There are many to choose from, Chroma is open source and widely supported, Elasticsearch is maybe the most widely used especially for big data, Pinecone is quite popular for fully cloud based.
I use Deeplake because I find it faster with better options for our use case.
The vector database is commonly referred to as an index, and the process of getting data into the index is called indexing. I will use these terms going forward.
The data stored in your index may be dynamic or static. For example, datasheets are static, real-time pricing is dynamic.
If you are using a Semantic Router it will direct what context data is supplied, and if it is not currently in your index or the data is dynamic you will need to fetch the data and then add it to your index.
Now that the necessary Context data is in the index, how do you get it out so you can add it to the prompt?
Retriever
You use a Retriever.
A Retriever converts the User Query into vectors (using embeddings) and finds Documents that contain vectors most similar to the User Query vectors.
The Retriever then returns K number of the most similar Documents. These Documents are then included in the prompt passed to the LLM as Context.
This process is fraught with vulnerability. In order for the LLM to correctly answer User Queries the correct Context must be selected by the Retriever.
But you can’t allow too much Context data because:
1 You must respect the Context Window.
2 Sending more Context data can be expensive.
3 More Context slows down response time.
4 Too much Context data can confuse the LLM.
Remember the Retriever will return K number of Documents, and you will determine the value of K, and you determine the size of each Document with the splitter.
So you will need to balance the number of Documents with the size of the documents.
There are basic guidelines, but ultimately this is a matter of trial and error to find the optimal balance for your use case. You may also want to vary these values based on guidance from the Semantic Router.
This strategy for handling proprietary data is called Retrieval Augmented Generation, or RAG.
While RAG is the most popular, it is not the only approach to selecting Context. I’ll have to leave it to another article to describe alternatives to RAG.
The importance of data enrichment
While your use case may require only your data, enriching your internal Context data with external data greatly expands what is possible.
External data is easily accessible via APIs.
For electronic components, we can fetch real time price and stock directly from distributor APIs.
If you do this from multiple distributors you will quickly see you have data challenges because terminology and conventions vary and you will have to reconcile these differences.
A simpler method is to use the API from an aggregator like Octopart who has already reconciled these differences (or tried at least).
For more detailed component data like parametrics, compliance, import/export, and alternates you need to use the API of either Silicon Expert, Z2Data, or Accuris (formerly IHS Markit). These can be expensive, and you need to check the terms of use carefully.
Just as above, ChatGPT can guide you every step of the way from connecting to their APIs to parsing the data that comes back.
To review, the basic blocks of the backend are the user query, semantic router if necessary, system instructions, indexing proprietary data and any enrichments from 3rd party APIs, retrieval, and sending the combined prompt to the LLM. You will get a response back from the LLM most likely as JSON (there are options) which needs to be parsed a bit and passed to your front end.
The Front End
The front end is the user interface. I am most familiar with Python as the back end, so these front end suggestions are geared to Python.
With Python the simplest front end is Flask combined with HTML. ChatGPT can set this up for you very easily. Now you will have your own application running on just your machine.
If you want a more versatile and capable interface (you likely will) then there are many no-code options.
Streamlit is very popular, Bubble is also popular, and there are too many others to mention.
You’re better off going with an older more established no-code platform because how to use it will be in ChatGPT’s training data and this is really helpful.
If you intend to make your application accessible via the web be sure to understand that process when you select your front end application. ChatGPT will help, just ask questions like “how to deploy a Bubble application on [your hosting provider]”.
So now you’re ready to chat with an LLM that also knows the proprietary and enriched data you have provided it, what shall you talk about?
Use Cases
#1: Knowledge Retrieval
This is the base application.
At its simplest you might ask for any attribute of a component, like country of origin or RoHS or Lifecycle Status.
Or your internal part number, or your inventory level, the list of possibilities is long and just depends on what you provided via the Context.
A bit more complicated is price lookup at distributors, this must take into consideration a host of factors like quantity, packaging, mins, mults, and more. This is more or less the boundary of what LLMs are capable of right now, so you need to be careful with this.
You can also do knowledge retrieval on lists, for our case let’s use any list with manufacturer part numbers.
Might be a BOM, a buy action list, a shortage list, any list with manufacturer part numbers.
One interesting case is obsolescence notices, you can use your new app to retrieve the stock levels at every distributor for every newly obsolete part.
A common use of knowledge retrieval on lists is costing a BOM and BOM Health checks.
These are powerful capabilities largely because they allow you to query a set of data using natural language. You can access almost any data point you want in seconds by simply saying what you want.
Amazing.
#2: Data Transformation
As powerful as knowledge retrieval can be, the real time saver for procurement right now is data transformation. Not very sexy sounding, but bear with me.
Applications like we are discussing cannot really “analyze vast amounts of data” and “discover hidden insights”, or other claims of this nature. This is at best hype, or at worst misleading.
The main thing we can do with OpenAI API applications right now is save time.
The best way to save time is by having AI execute repetitive tasks. And one of the most time consuming, boring, and non-valued added tasks is moving data between software applications.
It turns out AI is fantastic at this.
First let’s consider internal systems.
One survey found typical electronics manufacturing operations use 10 software applications. Examples: ERP (Oracle, SAP), CRM (Salesforce), QMS, MES (Aegis), PLM (Agile, Arena), CAD (Siemens/Mentor, Cadence, Altium), Data Visualization (Tableau, Domo), Customer Portals, Help Desk (Zendesk), and don’t forget Excel.
Let’s imagine just one scenario, a new PCBA is ready for a prototype build, the initial files come from Altium.
Your OpenAI App can read the Altium BOM and enrich it with component data including prices, stock, and almost any attribute.
This new BOM may have part numbers that are incorrect, out of stock, even obsolete. These items are instantly visible and can be sent to Tableau or a Customer Portal (if you’re an EMS) for resolution.
When ready, the enriched BOM can be transformed into the format necessary to create items and BOMs in the ERP.
This scenario would take about 9 seconds to execute.
All these links between systems can be created in natural language by just about anyone, so no need to fight for IT resources.
Now let’s consider passing data to external services.
Perhaps you need to estimate this new BOM including production pricing and labor costing, and maybe you need to find stock to build the prototype.
If you do a lot of this you might use RFQ software like Luminovo or CalcuQuote. Your AI app can automatically transform the enriched BOM into the formats most acceptable to either platform.
Or perhaps you need to quote distributors yourself. In this case your AI app can determine who is franchised for each line item, and then format a quote request in each distributors preferred format for their BOM tool, no more column matching.
The enriched BOM can also be transformed into the data set necessary to create the BOM in tools like Agile, including full links to datasheets, or even the datasheet itself, and any attributes you might want to track in Agile.
Use Case #3: Democratization
While not exactly a use case, democratization is a major benefit. This means more people can do more things.
With a well-designed OpenAI API application anyone can access the information most helpful to their individual workflow.
They don’t need a SQL specialist to create a report, they can create it themselves, designing and refining on the fly.
This is more efficient, more useful, and frees up the SQL specialist to focus on high priority tasks where their expertise is better deployed.
But perhaps the benefits of knowledge retrieval and data transformation are not enough, you can imagine more.
What Custom LLMs can do
And now let’s talk about the big boys.
If you do hunger for AI that consumes “vast amounts of data” and “uncovers hidden insights”, this is the AI for you.
This is the AI most of the consulting whitepapers that make such sweeping claims are talking about.
The most basic and accessible form of custom LLM is called ‘fine-tuning’.
Generally, fine-tuning is best used to improve instruction following by providing thousands of question and answer pairs to an already trained LLM.
You will access your fine-tuned LLM via the API, it will have a special name like gpt-3.5-yourname. It will also cost more per token.
A fully custom trained LLM means that instead of using data scraped from the web like GPT3.5/4, you will provide the data to train it.
For very specialized tasks you might train a model with relatively little data and it will work fine.
For larger data sets the benefits are speed and performance.
And if you use an open source LLM it will be free once it’s trained.
This is a very specialized subject requiring very specialized technical expertise and is beyond what we can cover, we include it now just to provide some perspective.
What you should do right now?
Do you know anyone who is confused by a computer mouse? Probably not. But there was a time when people had to learn how to interface with a computer via a mouse.
The mouse is just one of many Human Interface Devices (HID), others include the trackball, pad, and pen to name just a few.
The newest HID is the most powerful ever developed, it’s called a prompt. And if you don’t learn to interface with computers via a prompt you will soon be the modern equivalent of someone who doesn’t know how to use a mouse.
What to do as an individual – have fun!
The number one thing everyone should be doing right now is use ChatGPT. Or Bard, or Perplexity, anything that uses a prompt.
The number two thing everyone should be doing is get better at using prompts!
What to do as Management – have faith!
If you believe people are “the company’s greatest asset” then for heaven’s sake spend $20 on them and get them access to ChatGPT+.
If you’re concerned about security then use it for training, or simulations, or contests, anything to get your people comfortable with how to use it and expose them to the vast applications. [Note: once the Spring Update kicks in maybe you won’t even need to spend $20]
As the team becomes more and more comfortable start looking for redundant tasks that can be handled by AI. Moving data between tools or faster lookup of information are two great places to start.
Remember you are not looking at this point to re-imagine all your workflows. You are looking at your existing workflows for repetitive low value-added tasks that can be off loaded to an AI assistant so knowledge workers can focus on high value-added objectives.
Also consider paying attention to companies, individuals, and websites that are on the leading edge of adopting new technology like Luminovo, CircuitMind, IBS Electronics including the podcast of their CEO Rob Tavi , and EMSNOW frequently publishes video interviews with AI leaders.
This article has skimmed over a lot of ground, if you would like advice on how to implement yourself or want to brainstorm what is possible please reach out to us.
Subscribe Now To Get The Latest