Text Summarization Llama2: how to Use LLama2 with Langchain

In this Tutorial, I will guide you through how to use LLama2 with langchain for text summarization and named entity recognition using Google Colab Notebook.

Nov 19, 2023

Meta, better known to most of us as Facebook, has released a commercial version of Llama-v2, its open-source large language model (LLM) that uses artificial intelligence (AI) to generate text, images, and code.

What is LLama2?

Llama 2 is a successor to the Llama 1 model released earlier this year. However, Llama 1 was “closely guarded” and was only available on request.

Before we start! 🦸🏻‍♀️

If you like this topic and you want to support me:

Like my articles and share that will really help me out👏
Subscribe to me to get the latest article.

Let’s start coding!

Set Up Google Colab: Go to Google Colab (colab.research.google.com) and create a new notebook.

Install Required Libraries: In the first code cell of your Colab notebook, install the necessary libraries using the following code:

!pip install -q transformers einops accelerate langchain bitsandbytes

!huggingface-cli login

If you already have a Hugging Face account, you can obtain your access token by going to the settings section on the Hugging Face website. From there, click on the “Access Tokens” tab, and you’ll be able to generate or copy your personal access token.

If you don’t have a Hugging Face account yet, you can easily sign up for one on their website. Once you have an account, you can follow the same steps mentioned above to get your access token and use it to access private models and resources through the Hugging Face API or CLI.

Installed the SentencePiece

If you haven’t installed the SentencePiece library, you can install it in your Colab notebook using:

!pip install sentencepiece

We going to set up a language generation pipeline using Hugging Face’s transformers library and a specified model. The AutoTokenizer is used to fetch the tokenizer associated with the model.

The pipeline is then configured with parameters such as the text generation task, model, tokenizer, max_length of the generated text, and a few more.

from langchain import HuggingFacePipelinefrom transformers import AutoTokenizerimport transformersimport torch

model = "meta-llama/Llama-2-7b-chat-hf"

tokenizer = AutoTokenizer.from_pretrained(model)

pipeline = transformers.pipeline(
    "text-generation", #task
    model=model,
    tokenizer=tokenizer,
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
    device_map="auto",
    max_length=1000,
    do_sample=True,
    top_k=10,
    num_return_sequences=1,
    eos_token_id=tokenizer.eos_token_id
)

We create an instance of the HuggingFacePipeline class, using the previously configured pipeline and setting the model’s ‘temperature’ parameter, which influences the randomness of predictions.

llm = HuggingFacePipeline(pipeline = pipeline, model_kwargs 
= {'temperature':0})

The PromptTemplate class is used to create a template for a language model prompt. This is the instruction that the model will follow when generating text.

In this case, the template asks the model to summarize a text. The text to summarize is placed within triple backquotes (```).

The model is asked to present the summary in bullet points. The {text} inside the template will be replaced by the actual text you want to summarize.

from langchain import PromptTemplate,  LLMChain

template = """
              Write a concise summary of the following text delimited by 
              triple backquotes.
              Return your response in bullet points which covers the key 
              points of the text.
              ```{text}```
              BULLET POINT SUMMARY:
           """

prompt = PromptTemplate(template=template, input_variables=["text"])

llm_chain = LLMChain(prompt=prompt, llm=llm)

text = """ As part of Meta’s commitment to open science, today we are publicly
releasing LLaMA (Large Language Model Meta AI), a state-of-the-art foundational
large language model designed to help researchers advance their work in this 
subfield of AI. Smaller, more performant models such as LLaMA enable others 
in the research community who don’t have access to large amounts of 
infrastructure to study these models, further democratizing access in this 
important, fast-changing field.Training smaller foundation models like LLaMA 
is desirable in the large language model space because it requires far less
computing power and resources to test new approaches, validate others’ work,
and explore new use cases. Foundation models train on a large set of unlabeled
data, which makes them ideal for fine-tuning for a variety of tasks. 
We are making LLaMA available at several sizes (7B, 13B, 33B, and 65B 
parameters) and also sharing a LLaMA model card that details how we built 
the model in keeping with our approach to Responsible AI practices.
Over the last year, large language models — natural language processing (NLP)
systems with billions of parameters — have shown new capabilities to generate 
creative text, solve mathematical theorems, predict protein structures, 
answer reading comprehension questions, and more. They are one of the clearest 
cases of the substantial potential benefits AI can offer at scale to billions 
of people.

Even with all the recent advancements in large language models, full research 
access to them remains limited because of the resources that are required to 
train and run such large models. This restricted access has limited researchers
’ ability to understand how and why these large language models work, 
hindering progress on efforts to improve their robustness and mitigate known 
issues, such as bias, toxicity, and the potential for generating misinformation.

Smaller models trained on more tokens — which are pieces of words — are easier
to retrain and fine-tune for specific potential product use cases. 
We trained LLaMA 65B and LLaMA 33B on 1.4 trillion tokens. Our smallest model,
LLaMA 7B, is trained on one trillion tokens.

Like other large language models, LLaMA works by taking a sequence of words 
as an input and predicts a next word to recursively generate text. 
To train our model, we chose text from the 20 languages with the most speakers,
focusing on those with Latin and Cyrillic alphabets.

There is still more research that needs to be done to address the risks of 
bias, toxic comments, and hallucinations in large language models. 
Like other models, LLaMA shares these challenges. As a foundation model, LLaMA 
is designed to be versatile and can be applied to many different use cases, 
versus a fine-tuned model that is designed for a specific task. 
By sharing the code for LLaMA, other researchers can more easily test new 
approaches to limiting or eliminating these problems in large language models. 
We also provide in the paper a set of evaluations on benchmarks evaluating 
model biases and toxicity to show the model’s limitations and to support 
further research in this crucial area.

To maintain integrity and prevent misuse, we are releasing our model under 
a noncommercial license focused on research use cases. Access to the model 
will be granted on a case-by-case basis to academic researchers; those 
affiliated with organizations in government, civil society, and academia; 
and industry research laboratories around the world. People interested in 
applying for access can find the link to the application in our research paper.

We believe that the entire AI community — academic researchers, civil society, 
policymakers, and industry — must work together to develop clear guidelines 
around responsible AI in general and responsible large language models 
in particular. We look forward to seeing what the community can learn — 
and eventually build — using LLaMA.
"""
print(llm_chain.run(text))

Let's try it out

print(llm_chain.run(text))

/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py:
1270: UserWarning: You have modified the pretrained model configuration to 
control generation. This is a deprecated strategy to control generation and 
will be removed soon, in a future version. Please use a generation 
configuration file (see https://huggingface.co/docs/transformers/main_classes/
text_generation )
  warnings.warn(
 • The article discusses the public release of LLaMA (Large Language Model Meta AI)
,a state-of-the-art foundational language model designed to advance research 
in the field.
• Training smaller models like LLaMA requires less computing power and 
resources,

text1 = """
Tesla, Inc. (/ˈtɛslə/ TESS-lə or /ˈtɛzlə/ TEZ-lə[a]) is an American 
multinational automotive and clean energy company headquartered in Austin, 
Texas. Tesla designs and manufactures electric vehicles (cars and trucks), 
stationary battery energy storage devices from home to grid-scale, 
solar panels and solar roof tiles, and related products and services.

Tesla is one of the world's most valuable companies and, as of 2023, 
was the world's most valuable automaker. In 2022, the company led the 
battery electric vehicle market, with 18% share.

Its subsidiary Tesla Energy develops and is a major installer of photovoltaic
systems in the United States. Tesla Energy is one of the largest global 
suppliers of battery energy storage systems, with 6.5 gigawatt-hours (GWh) 
installed in 2022.

Tesla was incorporated in July 2003 by Martin Eberhard and Marc Tarpenning as 
Tesla Motors. The company's name is a tribute to inventor and 
electrical engineer Nikola Tesla. In February 2004, via a $6.5 million 
investment, Elon Musk became the company's largest shareholder. 
He became CEO in 2008. Tesla's announced mission is to help expedite 
the move to sustainable transport and energy, obtained through electric 
vehicles and solar power.

Tesla began production of its first car model, the Roadster sports car, 
in 2008. This was followed by the Model S sedan in 2012, the Model X SUV in 
2015, the Model 3 sedan in 2017, the Model Y crossover in 2020, and the 
Tesla Semi truck in 2022. The company plans production of the Cybertruck 
light-duty pickup truck in 2023.[8] 
The Model 3 is the all-time bestselling plug-in electric car worldwide, 
and in June 2021 became the first electric car to sell 1 million units 
globally.[9] 
Tesla's 2022 deliveries were around 1.31 million vehicles, a 40% increase 
over the previous year,[10][11] and cumulative sales totaled 4 million cars 
as of April 2023.[12] In October 2021, 
Tesla's market capitalization temporarily reached $1 trillion, the sixth
 company to do so in U.S. history.

Tesla has been the subject of lawsuits, government scrutiny, and 
journalistic criticism, stemming from allegations of whistleblower 
retaliation, worker rights violations, product defects, 
and Musk's many controversial statements.
"""

print(llm_chain.run(text1))

 - Tesla is an American multinational automotive and clean energy company
 - Headquartered in Austin, Texas
 - Designs and manufactures electric vehicles, battery energy storage devices,
   solar panels, and solar roof tiles
 - One of the world's most valuable companies
 - Led the battery electric vehicle market in 2022
 - Installed 6.5 gigawatt-hours of battery energy storage systems in 2022
 - Founded in 2003 by Martin Eberhard and Marc Tarpenning
 - Elon Musk became CEO in 2008
 - Mission is to help expedite the move to sustainable transport and energy
 - Produces the Model S sedan, Model X SUV, Model 3 sedan, Model Y crossover,
   and the Tesla Semi truck
 - Plans production of the Cybertruck light-duty pickup truck in 2023
 - Cumulative sales totaled 4 million cars as of April 2023
 - Market capitalization temporarily reached $1 trillion in 2

Setting up a task to detect named entities within a given text using the PromptTemplate and LLMChain classes from the langchain library.

The PromptTemplate lays out instructions for the language model to identify named entities and return results in JSON format, with details like the entity name, its type, and its location within the text.

The placeholder {text} in the template will be replaced with the actual text to analyze.

The LLMChain class then binds the prepared prompt template with the specified language model (LLM). The resulting system aims to receive a text, identify the named entities in it, and return a detailed JSON file outlining each identified entity.

template = """
      Detect named entities in following text delimited by triple backquotes.
      Return your response in json format with spans of named entities with 
      fields 
      "named entity","type","span".
       Return all entities
       ```{text}```
       json format file:
           """

prompt = PromptTemplate(template=template, input_variables=["text"])

llm_chain = LLMChain(prompt=prompt, llm=llm)
print(llm_chain.run(text2))

 {
"namedEntity": [
{  
"namedEntity": "Apple Inc.", 
"type": "company",
"span": [
  {"start": 0, "end": 87}
]
},
{  
"namedEntity": "Steve Jobs", 
"type": "person",
"span": [
  {"start": 109, "end": 141}
]
},
{  
"namedEntity": "Steve Wozniak", 
"type": "person",
"span": [
  {"start": 168, "end": 183}
]
},
{  
"namedEntity": "Ronald Wayne", 
"type": "person",
"span": [

🚀 You are now well-positioned to begin experimenting with the massive potential that the collaboration of LLAM 2 and LangChain will bring to the industry!

Reference :

https://huggingface.co/meta-llama/Llama-2-7b-chat-hf
https://ai.meta.com/blog/large-language-model-llama-meta-ai/
https://python.langchain.com/docs/get_started/introduction.html

Gao Dalie (高達烈)

Discussion about this post