Search Engine + GraphRAG + LLM Agents = AI-powered Smart Search

Sep 23, 2024

The latest advances in artificial intelligence (AI) that have been presented this past week aren’t trivial. So, where are things heading?

OpenAI recently announced the launch of SearchGPT, an AI-driven search engine that strives to challenge Google’s long-standing dominance.

However, one such innovation is MindSearch, an open-source alternative to SearchGPT and Perplexity. Competing with ChatGPT, which can connect to the internet, and Perplexity, which specializes in AI search engines, MindSearch outperformed these two-star products in terms of depth, breadth, and accuracy.

It can even collect and integrate more than 300 pages of relevant information in less than 3 minutes, a task that would take human experts about 3 hours to complete.

So, let me give you a quick demo of a live chatbot to show you what I mean.

Let me give a simple example: What is LangChain? If you take a look at how MindSearch generates the output, you’ll see it employs a graph-based approach. When I submit a query, the agent decomposes it into multiple atomic sub-questions. Each sub-question is represented as a node in a directed acyclic graph. LangChain is the starting point, and the end node represents the final answer.

The graph is dynamically constructed as the agent processes the user query and the results from web searches. As new information is retrieved, the agent adds new nodes and edges to the graph. This graph facilitates context management across different agents. By maintaining a clear structure of relationships between sub-questions, the agent uses the graph to aggregate and synthesize information. This helps in combining the retrieved data into a coherent and comprehensive response to the original query.

In this step-by-step guide, we will cover what is MindSearch, what makes MindSearch unique, why MindSearch is so much better than ChatGPT-Web and Perplexity AI, and how to install MindSearch locally.

Before we start! 🦸🏻‍♀️

If you like this topic and you want to support me:

like my article; that will really help me out.👏

Follow me on my YouTube channel
Subscribe to me to get the latest article.

What is MindSearch?

MindSearch is an open-source AI search framework launched by the joint R&D team of the Shanghai Artificial Intelligence Laboratory, which combines large-scale information collection and organization capabilities.

Using the InternLM2.5 7B dialogue model, MindSearch can collect effective information from more than 300 web pages in 3 minutes, completing a task that usually takes humans 3 hours. It uses a multi-agent framework to simulate human thinking, planning first and then searching, improving information accuracy and completeness.

The project has been fully open-sourced, and users can experience and deploy it locally for free.

What makes MindSearch Unique?

It consists of two main components: WebPlanner and WebSearcher. WebPlanner breaks down a user’s question into partial search tasks and determines the next step based on the search results. This process is represented using a graph structure. Meanwhile, WebSearcher performs a hierarchical information search to collect relevant information.

MindSearch’s unique feature is that it can effectively decompose complex questions and efficiently extract relevant information from a large number of web pages. Its multi-agent design allows it to explore and synthesize information in parallel from over 300 web pages in less than three minutes, which is a scale that would take a human expert approximately three hours to perform a similar cognitive task.

how did MindSearch beat ChatGPT and Perplexity.ai?

The answer starts with its name. MindSearch’s core competitiveness lies in its use of a multi-agent framework to simulate human thinking processes.

The answer starts with its name. MindSearch’s core competitiveness lies in its use of a multi-agent framework to simulate human thinking processes.

In actual use, MindSearch has demonstrated impressive performance. It can quickly collect and integrate valuable information from massive web pages in a very short time. For example, in the face of a complex search task, it can collect and integrate hundreds of pages of relevant content in just a few minutes, while it may take hours for human experts to complete the same task.

In comparison tests ChatGPT-web and Perplexity, MindSearch performed exceptionally well. For example, when faced with a question like “Which shooter is the strongest in the current season of King of Glory?” it will conduct an in-depth logical analysis, such as analyzing the characteristics of the current season, the indicators for measuring the strength of shooters, etc., and then combine information from all parties to give a comprehensive and accurate answer, rather than simply summarizing existing online responses.

Install MindSearch Locally

Step 1: Dependencies Installation

Open your terminal or command prompt. This is where you’ll type all the commands for installation.
Clone the MindSearch repository from GitHub. This downloads all the necessary files to your computer. Enter the following command:

git clone https://github.com/InternLM/MindSearch

Navigate to the MindSearch directory. After cloning, you need to move into the directory that was created:

cd MindSearch

Install required Python packages. The repository includes a file named requirements.txt which lists all the necessary Python packages. Install them using:

pip install -r requirements.txt

Step 2: Setup MindSearch API

Understand the command structure. You’re going to run a Python module that starts up the MindSearch API. The command has several options:

--lang en sets the model's language to English. You can change en to cn for Chinese.
--model_format specifies the model's format. Use internlm_server for the InternLM2.5-7b-chat model optimized for Chinese or gpt4 for using GPT4.

Start the FastAPI server. Use the following command, adjusting parameters as necessary for your setup:

Open-Source Model

python -m mindsearch.app --lang en --model_format internlm_server

GPT-4o

If you want to use GPT4, you have to modify the model file in the project that handles API model configurations. then Replace 'YOUR OPENAI API KEY' with the actual API key from OpenAI.

gpt4 = dict(type=GPTAPI,
            model_type='gpt-4-turbo',
            key=os.environ.get('OPENAI_API_KEY', 'YOUR OPENAI API KEY'))

If you’ve set up GPT-4 in the configuration file, ensure the command to start the server points to the correct model format.

python -m mindsearch.app --lang en --model_format gpt4

Step3: Setup MindSearch Frontend

Running the Gradio Interface

Make sure you’re in the root directory of the MindSearch project where the Gradio script is located and Use the following command in the terminal to start the Gradio interface.

python frontend/mindsearch_gradio.py

Running the Streamlit Application:

Use the following command to run the Streamlit application. Streamlit will automatically open the web application in your default web browser

streamlit run frontend/mindsearch_streamlit.py

Conclusion :

MindSearch offers a simple yet powerful solution to the complex task of information retrieval and integration. Its multi-agent framework, combining the cognitive abilities of LLMs with the extensive data access of search engines, sets it apart from existing solutions. By decomposing queries, managing context, and utilizing hierarchical retrieval, MindSearch significantly improves the precision and recall of retrieved web information. With its ability to efficiently process a large number of web pages in a short time, MindSearch empowers users with the timely and accurate information they need to make informed decisions.

🧙‍♂️ I am an AI Generative expert! If you want to collaborate on a project, drop an inquiry here or Book a 1-on-1 Consulting Call With Me.

Gao Dalie (高達烈)

Discussion about this post