Deep Learning and AI

Extractive Summarization with BERT

May 29, 2024 • 10 min read

Lots of Words, No Time to Read

In the era of information, digesting articles, excerpts, peer-reviewed journals, and blogs gets almost too much to handle. Reading every single word is just not feasible. So why can’t we employ an LLM?

In short, LLMs and chatbots are creative. Sometimes they can have trouble with larger text prompts, have biases in training, or are unable to handle or comprehend. They have more knobs to turn by prompting different things.

Enter Extractive Summarization, a clustering algorithm that measures sentence and phrase importance from an original body of text and combines them into a summary, preserving structure and meaning without inferring extra details.

Advantages Extractive Summarization has versus Chatbot like ChatGPT

	Extractive Summarization	Chatbots like ChatGPT
Accuracy & Fidelity	Pull direct sentences or phrases from the original text maintaining the exact wording and structure. Ensures accuracy and fidelity to the source material. Important in contexts where precise language is critical.	Can generate summaries that capture the gist, they rephrase and reinterpret the content Can sometimes introduce slight inaccuracies or miss subtle nuances. You can prompt it to extract, but this can prove unreliable.
Consistency & Predictability	Output is more predictable; code is transparent and defined Select sentences based on criteria like frequency, position, or significance. Behavior is more consistent across different types of texts.	Summaries with gen AI can vary significantly depending on the prompt. Criteria is prompt based but still has variability. Inconsistent in the summary output from prompt to prompt.
Simplicity & Efficiency	Simpler and faster because since they don't generate new text. Selects important sentence text only. More efficient, especially for large-scale summarization tasks.	Involves complex language modeling and text generation. Can be computationally intensive and slower Generates creative summarizations that can interpret nuances.
Interpretability & Traceability	Output consists of actual sentences from the source text. Easy to trace back each part of the summary to its origin. Helps verify the summary against the source.	Harder to trace back to specific parts of the source text. Often paraphrased or restructured. Can be a drawback in applications where traceability is crucial.
Suitability	In formal and critical applications like legal, medical, or academic fields Precise reproduction of content in context is often required.	Informal settings where information is just digested. Capable of producing generating coherent and fluent summaries that are easy to read.
Lower Risk of Bias	Minimize risk of introducing biases since no new interpretations were made.	The process of generating text involves interpretation, which can inadvertently introduce inadvertent biases.

How Does Extractive Summarization Work?

The model will parse the text, then extract features of each sentence, score each sentence for importance, and lastly select the highest-scored sentences and combine them into a paragraph summary. Thats it!

Parse Text - This first text is to break down the text into its sentences and/or phrases. These are the components to pick from once extractive summarization is finished.
Feature Extraction - The algorithm analyzes sentences to identify characteristics or features to gauge significance to the overall text. Common features include frequency, length, position, and keywords.
Scoring - Each sentence is then scored based on the features. Sentences that score higher are deemed to carry more weight or relevance.
Selection & Combine - The final phase involves selecting the highest-scoring sentences and compiling them into a summary. These are the extracted sentences aggregated into a single paragraph and you can decide the summary length.

Extractive Summarization with BERT Tutorial

Step 1: Install BERT, Import Summarizer

We will leverage the pre-trained BERT model: the BERT Extractive Summarizer which has been finely tuned for specialized tasks in extractive summarization. There are other models to choose from by BERT that are open-source and easy to install for our demo.

!pip install bert-extractive-summarizer
from summarizer import Summarizer

Step 2: Summarizer

The Summarizer() function imported from the summarizer in Python is an extractive text summarization tool. It uses the BERT model to analyze and extract key sentences from a larger text. This function aims to retain the most important information, providing a condensed version of the original content. It's commonly used to summarize lengthy documents efficiently.

model = Summarizer()

Step 3: Prompt the Summarizer

Here, we will import any piece of text that we would like to test our model on. To test our extractive summary model we generated text with our creative chatbot friend ChatGPT 3.5 with the prompt: “Write a 3-paragraph essay about the advancements of GPU accelerated computing.”

text = "The advancements in GPU-accelerated computing have revolutionized numerous fields by dramatically enhancing computational speed and efficiency. Initially developed to handle the intense graphics demands of video gaming, GPUs (Graphics Processing Units) have evolved far beyond their original purpose. They are now integral to scientific research, artificial intelligence, and big data analytics due to their ability to process massive amounts of data in parallel. This parallel processing capability allows GPUs to perform complex calculations much faster than traditional CPUs (Central Processing Units), which are optimized for sequential task execution. Consequently, tasks that once took days or weeks can now be completed in hours, driving rapid progress in fields like molecular modeling, climate simulation, and financial modeling. In the realm of artificial intelligence, GPU-accelerated computing has been particularly transformative. Machine learning and deep learning, subsets of AI, rely heavily on vast datasets and intricate mathematical models. GPUs excel in these areas due to their parallel processing power, significantly speeding up the training of neural networks. Companies like NVIDIA have developed specialized GPUs and software frameworks such as CUDA and TensorFlow to optimize these processes. These advancements have enabled breakthroughs in natural language processing, computer vision, and autonomous systems, facilitating the development of technologies like self-driving cars, sophisticated recommendation systems, and advanced medical diagnostics. Beyond scientific and AI applications, GPU-accelerated computing is driving innovation in big data analytics and real-time data processing. Industries ranging from finance to healthcare utilize GPUs to analyze large datasets quickly, enabling real-time decision-making and predictive analytics. Financial institutions employ GPUs for high-frequency trading and risk management, where rapid data processing and analysis are critical. Similarly, in healthcare, GPUs assist in processing and analyzing medical images swiftly, leading to quicker diagnoses and personalized treatment plans. As GPU technology continues to advance, its applications are likely to expand further, heralding a new era of computational capabilities and efficiencies across various sectors."

The text we will be summarizing:

The advancements in GPU-accelerated computing have revolutionized numerous fields by dramatically enhancing computational speed and efficiency. Initially developed to handle the intense graphics demands of video gaming, GPUs (Graphics Processing Units) have evolved far beyond their original purpose. They are now integral to scientific research, artificial intelligence, and big data analytics due to their ability to process massive amounts of data in parallel. This parallel processing capability allows GPUs to perform complex calculations much faster than traditional CPUs (Central Processing Units), which are optimized for sequential task execution. Consequently, tasks that once took days or weeks can now be completed in hours, driving rapid progress in fields like molecular modeling, climate simulation, and financial modeling.

In the realm of artificial intelligence, GPU-accelerated computing has been particularly transformative. Machine learning and deep learning, subsets of AI, rely heavily on vast datasets and intricate mathematical models. GPUs excel in these areas due to their parallel processing power, significantly speeding up the training of neural networks. Companies like NVIDIA have developed specialized GPUs and software frameworks such as CUDA and TensorFlow to optimize these processes. These advancements have enabled breakthroughs in natural language processing, computer vision, and autonomous systems, facilitating the development of technologies like self-driving cars, sophisticated recommendation systems, and advanced medical diagnostics.

Beyond scientific and AI applications, GPU-accelerated computing is driving innovation in big data analytics and real-time data processing. Industries ranging from finance to healthcare utilize GPUs to analyze large datasets quickly, enabling real-time decision-making and predictive analytics. Financial institutions employ GPUs for high-frequency trading and risk management, where rapid data processing and analysis are critical. Similarly, in healthcare, GPUs assist in processing and analyzing medical images swiftly, leading to quicker diagnoses and personalized treatment plans. As GPU technology continues to advance, its applications are likely to expand further, heralding a new era of computational capabilities and efficiencies across various sectors.

Step 4: Extract Summary

We will run our summarization function by defining the number of sentences for the summary, setting the output to a variable, and then printing that variable. Each sentence from the output can be traced back to the original text.

summary = model(text, num_sentences=4)
print("Extractive Summary Output:")
print(summary)

Extractive Summary Output:

The advancements in GPU-accelerated computing have revolutionized numerous fields by dramatically enhancing computational speed and efficiency. GPUs excel in these areas due to their parallel processing power, significantly speeding up the training of neural networks. Beyond scientific and AI applications, GPU-accelerated computing is driving innovation in big data analytics and real-time data processing. Industries ranging from finance to healthcare utilize GPUs to analyze large datasets quickly, enabling real-time decision-making and predictive analytics.

Conclusions

In conclusion, extractive summarizers offer distinct advantages over generative models like ChatGPT when it comes to text summarization. Their ability to maintain accuracy and fidelity to the source material, provide consistent and predictable outputs, and ensure simplicity and efficiency make them highly appealing for many applications.

The interpretability and traceability of extractive summaries are crucial for formal and critical fields such as legal, medical, and academic contexts, unlike our example which is a pretty generalized history. While generative models excel in creating fluent and engaging summaries, the risk of introducing slight inaccuracies and biases makes extractive summarizers a preferred choice for tasks demanding precision and exact reproduction of content. Ultimately, the choice between extractive summarization and generative models should be guided by the specific requirements and constraints of the task at hand.

Blog