Start free trial Book a demo
Webinar on Using search analytics for improving content - June 26, 2024 | 2:30PM UTC - Register Now!
Refactoring content for GenAI readiness

Refactoring content for GenAI readiness: Best Practices and Guidelines

Category: AI

Last updated on Jun 14, 2024

Refactoring content is a must, given the proliferation of GenAI tools in the market. Most of the GenAI vendors have scrapped the internet to train their Large Language Model (LLM). If your knowledge base is public, this means there is a high chance that GenAI vendors have already used your knowledge base content. There are high chance that your customers might be going into tools such as ChatGPT, Claude,, and so on to look for answers to their questions!!

If you have implemented a GenAI-powered search engine on top of your knowledge base, it is high time you refactored the content to make it suitable for GenAI-based agents’ consumption. This refactoring must be done to take into account the different characteristics of the human reader and GenAI-based agents



GenAI-based agents

Attention span

Short – Minimalism

Long – Extensive content

Media artefacts


Multi-modal model can understand text, image, audio, and images


Content caters to specific customer persona


Content style




This blog provides practical guidelines for undertaking content audits to refactor the content suitable for GenAI-based agents. This blog also covers tips on balancing the characteristics of human readers and GenAI-based agents. Before you undertake the content audit, understand the scope of the content audit holistically. This helps technical writers, information architects, and other stakeholders to focus on their efforts. Any public-facing knowledge base should be the first one where content audit must be undertaken, given their importance to keeping customers customer satisfied with your product and services.

Top 5 Guidelines for Refactoring Content for GenAI

Rule1: Content hierarchy

The content hierarchy assures that content is well-researched and well-written considering readability and comprehensibility. The content hierarchy matters for GenAI-based agents to understand the holistic perspective and more importantly how the sections of the content are interrelated. The content hierarchy also provides GenAI-based agents with how the information flows from top to bottom. During the content audit, check whether the data is structured and all information is presented adhering to the H1 – H6. Technical writers must focus on the part of the documentation content where this semantic rule is followed. Following best practices in structuring content as per content hierarchy helps not only GenAI-based agents but also human readers and content-scrapping bots from the search engines as well. In addition to fixing the content hierarchy, the technical writers must address whether the article title and other metadata associated with the article are appropriate and resonate well with the article content itself. The information architects can help with restructuring the whole content.

The page shows how content is organized in the right hierarchy.

Content hierarchy


Rule 2: Content length

GenAI-based agents such as assistive search and chatbots need more textual data to understand the context better, and this enhances their ability to answer many questions from human readers. Having minimalistic content does not suit the characteristics of GenAI! The content should be revised such that more content is added. Explaining simple concepts more elaborately helps the GenAI to understand the semantic structure and build domain expertise of your knowledge base content.


Airtable’s getting started guide is very elaborate and takes 18 minutes to read


The following guidelines can be used for adding more content

  • The procedural content should be displayed in sequential steps in more detail
  • Tutorial content should be more as elaborate as possible
  • User guides should be very precise

The FAQs (Frequently Asked Questions) should be appended to the end of each article. This helps GenAI-based agents understand the nuances in the article content. Some of the FAQs should cover content that is not readily available inside the content. For example, if the content is about refund policy, then one of the FAQs could be about scenarios where refund policy does not apply (this information might not be present inside the article content itself!). The best source to update FAQs can be from

  • Customer support team – where frequent support ticket queries can be sourced
  • Customer success team – where frequent customer interactions on specific product-related / service-oriented queries can be obtained
  • Sales team – where frequent questions from new prospects can be got

After the content audit, a set of knowledge base articles should be identified so that their content can be enriched, and a few FAQs are added to them.



Also Read: A Quick Guide to Get Accurate ChatGPT Responses for Your FAQs

Rule 3: Content freshness

To provide accurate and current information, the underlying article content should be always updated regularly. Thus, your article metadata must contain the “last modified” / “last updated” date as part of the article. This aids GenAI-based agents in assessing the validity of the article content in responding to the customer’s questions. During the content audit, prepare a list of articles with the creation and last updated date to assess the content freshness of your knowledge base.  Based on the last updated date of each article, it can be put into different groups on how fresh the content is! After the content audit, content can be prioritized for update based on time criticality, content quality, and content’s importance in the knowledge base.

Also Read: How to Optimize Content Using GenAI Powered Search Analytics?

Rule 4: Business terms

Business glossary plays an important role in eliminating ambiguity in the business terms that is used across your knowledge base content. The content audit must include building a catalog of all business terms used in the knowledge base. After the audit, you can spot any similar terms that are used synonymously across the knowledge base articles, but their definitions could be different. It is very crucial to pass a business glossary to your GenAI-based agent to generate accurate responses and provide the utmost clarity to human readers.

Here is an example of a business glossary

business glossary


Rule 5: Metadata

During the content audit, undertake a thorough investigation on the metadata that is added to the media artifacts such as

  • Alt-text data added to images
  • <abbr> Tag data added to symbols and acronyms
  • Table description and row headers
  • Date, author, and article tags
  • Code snippets have metadata in <code> tags

Adding these metadata allows GenAI-based to understand the context better and generate accurate responses with more clarity.

Best practices to Evaluate Content for GenAI readiness

Below are some of the tips on conducting a content audit to evaluate your content for GenAI readiness

  • Define the scope of the content audit and success metrics for the audit
  • Involve all stakeholders early including technical writers, information architects, and managers
  • Have a plan on implementing recommendations from the content audit report
  • Utilize GenAI tools to evaluate the success of rewriting content that is suitable for the tool of your choice

Also Read: How Does GenAI Powered Search Engine Work?


Conducting a content audit is an essential effort before deploying GenAI-based agents in your knowledge base. The main objective of the content audit is to build trust, reliability, and readability of your content so that GenAI-based agents can generate responses with the utmost clarity! To provide a rich knowledge experience, content needs to be ready for GenAI consumption.

An intuitive AI-powered knowledge base software to easily add your content and integrate it with any application. Give Document360 a try!

Using search analytics for improving content                            

Related Articles