LCM + Mapping Hidden Embedding = New Architecture Model

Jan 16, 2025

Recently I have been involved in training models from multimodal to text generation taking a look at the latest techniques to train the model

Scrolling from Twitter to Youtube I was shocked to see another major change in the world of AI.

This is because the open source multimodal next-generation infrastructure model “ Large Scale Conceptual Model (LCM) “ was announced at Meta’s

You may be familiar with the popular LLM (Large Language Model), such as ChatGPT. Their core mechanism is to predict the next word (token). But Meta has gone big this time. Their new model LCM (Large Concept Model) does not even look at the token, but directly understands the “meaning” of the sentence!

Say goodbye to “word by word” and welcome “overall grasp”

Imagine that when we understand a sentence, do we understand its meaning directly, rather than breaking it down into individual words first? LCM is like this. It no longer analyzes “word by word” like LLM, but encodes the entire sentence or paragraph into a high-dimensional vector, which Meta calls a “ concept “.

Simply put, LCM is a new type of AI that can create sentences and understand various things in a slightly different way than previous AI.

The “C” in LCM stands for “concept,” a model with a concept.

By the end of this video, you will understand what LCM is, the difference between traditional LLM and LCM, what makes LCM unique, how LCM works, and how to use LCM to create the architecture model

Before we start! 🦸🏻‍♀️

If you like this topic and you want to support me:

like my article; that will really help me out.👏

Follow me on my YouTube channel
Subscribe to me to get the latest article.

What is LCM?

The large Concept Model is a new language model architecture proposed by Meta AI. It aims to imitate the way humans think, starting from macro concepts and then gradually filling in details. LCM treats each sentence as a concept and performs reasoning and generation at the sentence level instead of operating at the token level

In LCM, a concept usually corresponds to a complete sentence. it is a high-level semantic representation independent of the specific language and modality.

LLM Vs LCM

When looking at Traditional LLMs and LCMs, the way they handle understanding and memory is quite different.

Traditional LLM (such as GPT): Imagine you are reading a book, but your brain can only remember a few words at a time. To understand the following content, you have to review the previous words frequently. If the book is too long, your memory will be overloaded, resulting in biased understanding or missing information. This is the dilemma of LLM when dealing with long texts. Their “attention window” is limited and they cannot effectively remember and process too long contexts.
In contrast LCM (Large Concept Model): for example imagine that your brain can directly understand an entire sentence or paragraph as a complete “meaning unit”, just like giving each sentence or paragraph a label or a summary. When you read the book, you no longer need to remember every word but remember each “meaning unit”. In this way, even if the book is long, you only need to remember relatively few “meaning units”, the memory burden is greatly reduced, and the understanding will be more coherent and accurate.

What makes LCM unique

So why is LCM so unique? It‘s because it is similar to the process of human thought.

When we write, we first decide on an overall theme, like “What should I write about today?”, and then think about the specific content, right? LCM is similar in that it first grasps the big concept (theme) and then creates detailed sentences.

In addition, LCM has the following features:

Supports various languages: LCM can understand 200 languages using a special " SONAR " tool. Since sentences written in Spanish and English can be treated as “concepts” in the same way, you can create various AIs without worrying about language barriers.
Leave long sentences to us: LCM is good at creating long sentences. Ordinary AIs tend to make sentences sound strange or repeat the same thing over and over again when the sentence gets long. Still, LCM can create coherent sentences while considering the overall structure.
Zero-shot understanding of unknown languages: LCM can potentially understand languages that you have not trained on, as long as SONAR supports them, without any additional training.

How LCM Works

LCM’s architecture aims to treat language at a conceptual level, moving away from the token-based approach of traditional models. It consists of three main components:

Concept Encoder: This component converts words or phrases into abstract concepts, creating higher-level linguistic representations beyond surface-level text.
Large Concept Model: At the heart of the system, this component processes and understands concepts independently of specific sequences of words or tokens, focusing on the relationships and meaning behind the text.
Concept decoder: Translates abstract concepts back into human-readable language, ensuring the output is clear, coherent, and meaningful.

By dividing language processing into these different stages, LCM prioritizes the underlying meaning of a text over its surface structure. This results in output that is not only more accurate but also more consistent with the context and intent of the input.

Let’s start coding

While trying to understand the concept of coding the LCM approach, I came across Richard’s channel. I thank him for making the code open-source all the credit goes to him. Please note that I did not write a single line of the code; I am only here to explain it and give my feedback

Before we jump into the code I would like to show the base-LCM. the core is a standard decoder-only transformer surrounded by a PerNet and PostNet

Basically, he creates a PreNet class to normalize the input embeddings before the main model processes them. It transforms the input SONAR embeddings into a format suitable for the model’s hidden dimensions. This normalization helps stabilize the training process and improve the model’s performance by appropriately scaling the input data.

import torch
import torch.nn as nn
import torch.nn.functional as F

# Base-LCM Architecture Components
class PreNet(nn.Module):
    """
    Maps input embeddings to the model's hidden dimension after normalization.
    """
    def __init__(self, input_dim, hidden_dim):
        super(PreNet, self).__init__()
        self.linear = nn.Linear(input_dim, hidden_dim)
        self.scaler_mean = 0.0  # Placeholder for robust scaler mean
        self.scaler_std = 1.0   # Placeholder for robust scaler std

    def normalize(self, x):
        return (x - self.scaler_mean) / self.scaler_std

    def forward(self, x):
        x = self.normalize(x)
        x = self.linear(x)
        return x

Then he creates PostNet to denormalise the output embeddings generated by the model. After the TransformerDecoder processes the normalized inputs, the PostNet converts the output back into a format that can be interpreted as SONAR embeddings this step is crucial for translating the model’s internal representations back into a usable form, such as subwords or sentences

class PostNet(nn.Module):
    """
    Maps hidden state outputs back to the embedding space with denormalization.
    """
    def __init__(self, hidden_dim, output_dim):
        super(PostNet, self).__init__()
        self.linear = nn.Linear(hidden_dim, output_dim)
        self.scaler_mean = 0.0  # Placeholder for robust scaler mean
        self.scaler_std = 1.0   # Placeholder for robust scaler std

    def denormalize(self, x):
        return x * self.scaler_std + self.scaler_mean

    def forward(self, x):
        x = self.linear(x)
        x = self.denormalize(x)
        return x

he set up TransformerDecoder to transduce a sequence of preceding concepts (or sentence embeddings) into a sequence of future concepts. It is a key component of the LCM, enabling it to generate coherent and contextually relevant outputs based on the input sequence. The Transformer architecture is known for its ability to handle long-range dependencies in data, making it suitable for generative tasks

class TransformerDecoder(nn.Module):
    """
    Standard Decoder-Only Transformer.
    """
    def __init__(self, hidden_dim, num_heads, num_layers, ff_dim, dropout=0.1):
        super(TransformerDecoder, self).__init__()
        self.layers = nn.ModuleList([
            nn.TransformerDecoderLayer(
                d_model=hidden_dim, nhead=num_heads, dim_feedforward=ff_dim, dropout=dropout
            )
            for _ in range(num_layers)
        ])
        self.pos_encoder = nn.Parameter(torch.zeros(1, 512, hidden_dim))  # Positional encoding

    def forward(self, x):
        seq_len = x.size(1)
        x = x + self.pos_encoder[:, :seq_len]
        for layer in self.layers:
            x = layer(x, x)  # Self-attention in decoder layers
        return x

Then he creates BaseLCM refers to the baseline architecture for the Large Concept Model, which incorporates the PreNet, PostNet, and TransformerDecoder. It is designed for the task of next-concept prediction, where the model predicts the next concept in a sequence based on the preceding concepts. the BaseLCM serves as the foundational model

class BaseLCM(nn.Module):
    """
    Base Large Concept Model (LCM):
    - PreNet: Maps input embeddings to hidden space.
    - TransformerDecoder: Autoregressively processes embeddings.
    - PostNet: Maps output back to the embedding space.
    """
    def __init__(self, input_dim, hidden_dim, num_heads, num_layers, ff_dim, output_dim):
        super(BaseLCM, self).__init__()
        self.prenet = PreNet(input_dim, hidden_dim)
        self.transformer_decoder = TransformerDecoder(hidden_dim, num_heads, num_layers, ff_dim)
        self.postnet = PostNet(hidden_dim, output_dim)

    def forward(self, x):
        x = self.prenet(x)
        x = self.transformer_decoder(x)
        x = self.postnet(x)
        return x

Finally, he designed The BaseLCM function to operate entirely in the SONAR embedding space. Both the input and output are continuous numerical vectors with the same dimensionality (256).

This consistency ensures the model can seamlessly integrate with other components of a pipeline that uses SONAR embeddings. The test confirms that the model processes the inputs correctly, preserving the batch size and sequence structure while applying its transformations.

This architecture is designed for tasks where the input and output remain in a shared semantic embedding space, such as sentence encoding or intermediate processing

# Testing the Base-LCM architecture
def test_base_lcm():
    batch_size = 4
    sequence_length = 10
    input_dim = 256  # SONAR embedding dimension (e.g., pre-encoded sentences)
    hidden_dim = 512
    num_heads = 8
    num_layers = 6
    ff_dim = 2048
    output_dim = 256  # Output embedding dimension (same as input)

    # Random input to simulate SONAR embeddings
    input_embeddings = torch.randn(batch_size, sequence_length, input_dim)

    # Initialize and test Base-LCM
    model = BaseLCM(input_dim, hidden_dim, num_heads, num_layers, ff_dim, output_dim)
    output_embeddings = model(input_embeddings)

    print("Input shape:", input_embeddings.shape)
    print("Output shape:", output_embeddings.shape)

if __name__ == "__main__":
    test_base_lcm()

Guys, he made more examples with different approaches. I don’t want to take all his code and explain it here. Instead, I encourage you to visit his channel and check it out. We’re here to help each other and spread useful content, so feel free to explore it. I’m just here to explain the concept of LCM, share my learning journey, and show you what I’ve found.

Check out the code : [link]

A bit of technical terminology

Finally, here is a summary of the terminology used in this article:

Large-scale conceptual model (LCM): AI that understands words as “concepts”
Concept: A high-level semantic unit that is language and modality-independent
SONAR: A tool used by LCM that can understand various languages
Base-LCM: A basic model that directly predicts the next “concept”
Diffusion-based LCM: A model for creating natural “concepts” by removing noise
Quantized LCM: A model for efficiently grouping and handling “concepts”
LPCM: A model that adds planning capabilities to LCM

Remembering these terms will help you understand LCM even better!

Conclusion:

Meta’s LCM paper has undoubtedly opened a new door for the field of NLP. It indicates that we are moving from “understanding words” to a higher level of “understanding meaning”.

Although it is still in the early stages of research, the potential of LCM is huge. It is expected to completely change the way we interact with machines and bring unlimited possibilities for the future development of artificial intelligence.

Let us wait and see how this “concept-driven” NLP revolution will reshape our world!

Reference :

🧙‍♂️ I am an AI Generative expert! If you want to collaborate on a project, drop an inquiry here or Book a 1-on-1 Consulting Call With Me.

Gao Dalie (高達烈)

Discussion about this post

Ready for more?