Interactive Visual Tutorial for the Transformer Model

1School of Data Science, Fudan University
2School of Big Data & Software Engineering, Chongqing University
3Centre for Interdisciplinary Methodologies, University of Warwick
4Shanghai Key Laboratory of Data Science

We present TransforLearn, the first interactive visual tutorial designed for deep learning beginners and non-experts to comprehensively learn about Transformers. TransforLearn supports interactions for architecture-driven exploration and task-driven exploration, providing insight into different levels of model details and their working processes. It accommodates interactive views of each layer's operation and mathematical formula, helping users to understand the data flow of long text sequences. By altering the current decoder-based recursive prediction results and combining the downstream task abstractions, users can deeply explore model processes.

What is Transformer?

Transformer has emerged as a prominent and widely-used tool in natural language processing (NLP) due to its exceptional performance, first proposed by Google in 2017. Transformers serve as key kernels supporting the most popular large language models.

Structurally, the Transformer belongs to the encoder-decoder design. The encoder block features a multi-head self-attention mechanism and a positional feed-forward network, while the decoder block includes a multi-head cross-attention mechanism. These elements are linked by a combination of two operations: residual connection and layer normalization.

In machine translation tasks, the input text is expressed by embedding and positional encoding to obtain numerical representations of word embedding.

What is TransforLearn?

We propose TransforLearn for beginners as a tutorial tool for Transformers. TransforLearn uses a visual approach to provide learners with a better learning experience through interactive exploration.

Our major contributions can be listed as:

  • TransforLearn, the first interactive visual explanation system as a tutorial for Transformers.
    TransforLearn provides a hierarchical overview of the model architecture. It combines interactive displays of data flow transformations and mathematical formulas. TransforLearn aids users in gaining a comprehensive understanding of the model architecture and its intricate execution processes.
  • Novel interactive exploration approaches to facilitate training on Transformer models.
    We support architecture-driven exploration guided by the structure and task-driven exploration based on the iteration of downstream tasks. These distinct interactive modes are designed to assist beginners in comprehending the intricacies of Transformer.
  • Evaluating the effectiveness of our work.
    A user study confirms that TransforLearn provides users with an immersive learning experience. After using the system, users generally exhibited better performance when completing Transformer-related tasks.


The implementation of TransforLearn is built on the foundational Transformer model. We visualize the forward propagation process of the training model: converting an input text to be translated into translation results. To better assist beginners in overcoming the hurdle of aligning model input and output with task requirements, we propose two exploration modes: architecture-driven exploration and task-driven exploration.

Architecture-driven Exploration

In the architecture-driven exploration, the system provides an overview of the Transformer architecture and presents the detailed modules. There is a hierarchical relationship between the overview and the detailed modules: blue tabbed views are the topmost structures of the Transformer; orange tabbed views are the first-level unfolding state, and green tabbed views show the second-level detailed operations. Users can drill down from architecture overview to module detailed views by specific interactions.

Task-driven Exploration

In the task-driven exploration, users will have a deeper understanding of the data flow transformation and model structure with the help of actual downstream tasks (machine translation in this system). By changing the decoding time step, users can discover the changes in data flow and final output results within the module.


This work is supported by the Natural Science Foundation of China (NSFC No.62202105) and Shanghai Municipal Science and Technology Major Project (2021SHZDZX0103), General Program (No. 21ZR1403300), Sailing Program (No. 21YF1402900) and ZJLab.

Welcome to visit our website: FDU-VIS