Abstract:Proteins are the basic building blocks of life. Studying the protein expression mechanism is essential for understanding the cellular organization principles and the development of biotechnology. Protein expression, involving transcription, translation, folding, and post-translational modification, is a complicatedly regulated process affected by various cellular components and sequence features of the expressed protein. Establishing protein expression models based on expression data is of great significance for probing into the regulatory factors and mechanisms of protein expression. Here we review the recent research progress in the mechanism models for quantitatively simulating the protein expression process and the prediction algorithms based on artificial intelligence for analyzing the regulatory factors. Chemical reaction network models have been developed to mathematically describe the elementary processes in protein expression and simulate the influences of various cellular components such as RNA polymerase and tRNA. However, the experimental determination of the huge number of model parameters is a big challenge. The main objective of data-driven AI models is to study the effects of protein/DNA sequences of the target protein on its expression, and subsequently optimize the sequences to improve protein expression. Methods combining mechanism models and AI models have the potential to deepen our understanding of protein expression processes, providing theoretical and technical support for the efficient production of high-value proteins and coordinate the regulation of different proteins.