In the search engine era, we have always used “keywords” when we look for information. Search engines such as Google, Bing, and DuckDuckGo have organized information such that keyword matching happens cleverly using algorithms. We constantly review at least the first 2 – 3 links from the search engine results. However, ChatGPT has completely shifted how we search for information. ChatGPT will provide accurate information to the customer’s questions. In terms of searching, we moved on from “using keywords” to “asking accurate questions”. OpenAI, which owns ChatGPT, also provides a large Application Programming Interface (API) that can be used to build ChatGPT-like interfaces on the proprietary data you have. This blog talks about how to build a ChatGPT-like assistive search tool using the data you hold.
Why is it important to create a ChatGPT-like system?
Motivated by shifting customer behavior and new technological developments, many organizations across the globe have implemented GenAI-powered assistive search in addition to lexical search which uses keywords. The below table shows the different approaches in search paradigms.
Characteristics |
Lexical Search |
GenAI Assistive Search |
Knowledge discovery |
Keywords |
Prompts (questions) |
Context required |
No |
Yes |
Response time |
Milliseconds |
1 – 5 seconds |
Matching algorithm |
Keyword matches |
Semantic matches |
Autocomplete keyword |
Yes |
No |
Response |
Articles that contain the “keyword” |
Exact response to the prompt (questions) |
Building your own GenAI assistive search tool has a lot of advantages compared to using OpenAI’s ChatGPT interface. The ChatGPT is built using a Large Language Model (LLM) that takes a large corpus of text data, time, and compute resources. The latest ChatGPT model is trained using the data until April 2023. Thus, ChatGPT cannot generate any responses if the question concerns current events after April 2023. If you subscribe to their plan, you can access their advanced ChatGPT 4 model.
ChatGPT is hosted in the US region and thus all chats happening in the ChatGPT interface are stored in the US region. OpenAI uses these chats to improvise their underlying Large Language Models. Users of ChatGPT can opt out of this if they choose to. There is always of risk of information or data leaking if any of your employees shares any confidential information in the ChatGPT interface. There are a lot of organizations around the world banning the usage of ChatGPT within their security perimeters to keep their tacit knowledge in-house stored in a secured knowledge repository. OpenAI is now SOC 2 compliant, and you can execute a Data Processing Agreement with them to protect your privacy.
Given that ChatGPT is open for anyone to use, we cannot limit access to information based on user permission and their roles. Moreover, the behavior of the ChatGPT cannot be customized. For example, if you want ChatGPT to use a certain tone and behave in a certain way for users in your organization, it is not possible. ChatGPT collects user feedback on the generated responses that is used to train their underlying LLM.
ChatGPT does not offer any analytics to its users or any organization. The types of questions, responses, and user feedback help to understand user behavior; This provides a wealth of information to users and organizations. To overcome ChatGPT limitations, organizations can build their own ChatGPT-like assistive search tools or chatbots utilizing OpenAI APIs.
Take a look at our video: The Impact of GenAI on Search Experience
Benefits of GenAI Assistive search
Organizations can use Retrieval Augmented Generation (RAG) frameworks to build their own GenAI search engine or chatbot. This framework helps to overcome the limitations of general-purpose ChatGPT and reap the benefits of having GenAI assistive search.
Private knowledge base
Organizations can point their ChatGPT to their private knowledge base or organization knowledge repository such that it only uses the information present in them to generate accurate responses
Content updates
Once the content is updated, your ChatGPT-like assistive search tools can pick it up instantly to produce timely responses
Access control
Users in the organization can be restricted from accessing certain information. ChatGPT-like assistive search tools might respond, saying, “You do not have access to that information, or I am sorry.” Role-based access control over the knowledge base prevents information leakage and helps protect confidential information.
Data security and privacy
Data can be held in a private server within your organization’s security perimeter to protect your confidential knowledge
Data Analytics
All prompts (questions) entered into a ChatGPT-like assistive search tool or chatbot can be stored in the backend for further processing. Once analyzed, they can help understand knowledge content gaps and also improve important knowledge base contents. How does GenAI search differ from ChatGPT?
The GenAI assistive search on top of your knowledge base is very different from the ChatGPT (general purpose) tool. The following table describes how ChatGPT differs from building your own GenAI assistive search on top of your public or private knowledge repository.
Characteristics |
ChatGPT |
Build your own GenAI Assistive search |
Underlying data |
Whole internet |
Your private data |
Access control |
Not possible to apply access control to limit access to information based on user roles |
Easy to apply access control to limit access to information based on user roles |
Behavior customization |
Not possible |
Possible |
Data privacy and security |
Data stored in the US servers can be used for training to improve their model |
Data stored in your server |
Analytics |
ChatGPT does not provide any analytics on the prompts/questions |
Analytics provide insights to address knowledge gaps and improve content quality |
Customizable |
No |
Yes |
Provide feedback on generated responses |
Yes |
Yes |
Content update to reflect in generated responses |
No |
Yes |
A GenAI assistive search tool can be built using APIs from OpenAI which introduced the ChatGPT tool! Apart from OpenAI APIs, organizations can choose to host any open-source LLM in their private server and use their proprietary data to train their model. Open-source models such as Llama from Meta, Mistral, and so on can be used. Hugging Face produces a list of model catalogs that can be used for a wide variety of use cases. Hosting open-source models in your cloud infrastructure will increase the cost of implementation but give you flexibility in terms of privacy.
Also read: LLM Agents Next Big Wave in Knowledge Management
Conclusion
Building your own ChatGPT-like assistive search tool or chatbot can be easily done using the RAG framework and utilizing third-party APIs. An organization that prioritizes data security and privacy are encouraged to use open-source LLMs such that their corporate data never leaves their security perimeter, and they can adopt new technologies faster. Organizations that have the luxury to take some risks can make use of ChatGPT APIs to build new tools quickly for enhancing customer experience. Knowledge discovery is shaping how we use information and empowers us with the ability to use the newly discovered information to create more business value.
An intuitive knowledge base software to easily add your content and integrate it with any application. Give Document360 a try!
GET STARTED