Introduction

The SARS-CoV-2 spike (S) glycoprotein is responsible for viral fusion with the host cell, initiating an infection that leads to COVID-19(Walls et al., 2020; Wrapp et al., 2020). S is a homotrimer with a structure subdivided in two topological domains, namely S1 and S2, see Figure 1a, separated by a furin site, which is cleaved in the pre-fusion architecture(Walls et al., 2020; Wrapp et al., 2020). In the Wuhan-Hu-1 strain (WHu-1), and still in most variants of concern (VoCs), host cell fusion is predominantly triggered by S binding to the Angiotensin-Converting Enzyme 2 (ACE2) receptor located on the host cell surface(Jackson et al., 2022; Wrapp et al., 2020). This process is supported by glycan co-receptors, such as heparan sulfate (HS) in the extracellular matrix(Clausen et al., 2020; Kearns et al., 2022) and by monosialylated gangliosides oligosaccharides (GM1os and GM2os) peeking from the surface of the host cells(Nguyen et al., 2021). The interaction with ACE2 requires a dramatic conformational change of the S, known as ‘opening’, where one or more Receptor Binding Domains (RBDs) in the S1 subdomain become exposed. The region of the RBD in direct contact with the ACE2 surface is known as receptor binding motif (RBM)(Jackson et al., 2022; Lan et al., 2020; Yi et al., 2020). Ultimately, S binding to ACE2 causes shedding of the S1 subdomain and the transition to a post-fusion conformation, which exposes the fusion peptide near the host cell surface, leading to viral entry(Dodero-Rojas et al., 2021; Jackson et al., 2022).

Panel a) Atomistic model of the SARS-CoV-2 (WHu-1) S glycoprotein trimer embedded in a lipid bilayer as reported in ref(Casalino et al., 2020). In the conformation shown, the S bears the RBD of chain A in an open conformation, highlighted with a solvent accessible surface rendering. The topological S1 and S2 subdomains are indicated on the left-hand side. Glycans are represented with sticks in white, the protein is represented with cartoon rendering with different shades of cyan to highlight the chains. Panel b) Close-up of the open RBD (WHu-1) in a ACE2-bound conformation (PDB 6M0J), with regions colour-coded as described in the legend. Key residues for anchoring the FA2G2 (GlyTouCan-ID G00998NI) N343 glycan, namely S371, S373 and S375, across the beta sheet core are highlighted also in the Symbol Nomenclature for Glycans (SNFG) diagram on the bottom-right with links to the monosaccharides corresponding to primary contacts. Key residues of the hydrophobic patch (orange) found to be inverted in the recently isolated FLip XBB1.5 variant are also indicated. Panel c) Heat map indicating the interactions frequency (%) classified in terms of hydrogen bonding and van der Waals contacts between the N343 glycan and the RBD residues 365 to 375 for each VoC, over the cumulative conventional MD (cMD) and enhanced GaMD sampling. Panel d) Side view of the RBD with the antigenic Region 1 (green), Region 2 (or RBM in yellow), and Region 3 (orange) highlighted. Key residues Y351 and L452 at the intersection between Region 1 and the Receptor Binding Motif (RBM) are indicated, together with the predicted site for the GM1 co-receptor binding. Rendering with VMD (https://www.ks.uiuc.edu/Research/vmd/).

To exert its functions, S sticks out from the viral envelope where it is exposed to recognition. To evade the host immune system, enveloped viruses hijack the host cell’s glycosylation machinery to cover S with a dense coat of host carbohydrates, known as a glycan shield(Casalino et al., 2020; Chawla et al., 2022; Grant et al., 2020; Turoñová et al., 2020; Watanabe et al., 2020b, 2020a, 2019). In SARS-CoV-2 the glycan shield screens effectively over 60% of the S protein surface(Casalino et al., 2020), leaving the RBD, when open, and regions of the N-terminal domain (NTD) vulnerable to immune recognition(Bangaru et al., 2022; Carabelli et al., 2023; Chawla et al., 2022; Chen et al., 2022; Harvey et al., 2021; Piccoli et al., 2020). The RBD targeted by approximately 90% of serum neutralising antibodies(Piccoli et al., 2020) and thus a highly effective model not only to screen antibody specificity(Du et al., 2020; Lan et al., 2020; Lin et al., 2022) and interactions with host cell co-receptors(Clausen et al., 2020; Mycroft-West et al., 2020; Nguyen et al., 2021), but also as a protein scaffold for COVID-19 vaccines(Dickey et al., 2022; Kleanthous et al., 2021; Montgomerie et al., 2023; Ochoa-Azze et al., 2022; Tai et al., 2020; Valdes-Balbin et al., 2021; Yang et al., 2022).

As a direct consequence the RBD is under great evolutionary pressure. Mutations of the RBD leading to immune escape are particularly concerning(Cao et al., 2023; Starr et al., 2021), especially when such changes enhance the binding affinity for ACE2 or give access to alternative entry routes(Baggen et al., 2023; Cervantes et al., 2023). The identification of mutational hotspots(Cao et al., 2023) and the effects of mutations in and around the RBM have been and are under a great deal of scrutiny(Barton et al., 2021; Bloom et al., 2023; Cao et al., 2023; Dadonaite et al., 2022; Greaney et al., 2021; Starr et al., 2022a, 2022b, 2021). Yet, less attention is devoted to mutations in the glycan shield, which have been shown to lead to dramatic changes in infectivity(Harbison et al., 2022; Kang et al., 2021; Zhang et al., 2022) and in immune escape(Newby et al., 2022; Pegg et al., 2023). Successful changes in the glycan shield are evolutionarily difficult to achieve, since the nature and pattern of glycosylation of the S is crucial not only to the efficiency of viral entry and evasion, but also to facilitate folding and to preserve the structural integrity of the functional fold. Therefore, identifying potential evolutionary hotspots in the S shield is a complex matter, yet of crucial importance to immune surveillance.

Potential changes of the shield, e.g. loss, shift or gain of new glycosylation sites, can likely occur only where these do not negatively impact the integrity of the underlying, functional protein architecture. In this work we present and discuss the case of N343, a key glycosylation site on the RBD. Results of extensive sampling from molecular dynamics (MD) simulations exceeding 45 μs, show how the loss of N-glycosylation at N343 affects the structure, dynamics and co-receptor binding of the RBD and how these effects are modulated by mutations in the underlying protein, going from the WHu-1 strain through the VoCs designated as alpha (B.1.1.7), beta (B.1.351), delta (B.1.617.2) and omicron (BA.1). In addition, we provide important insight into the structure and dynamics of the omicron BA.2.86 RBD. This variant, designated as variant under monitoring (VUM) and commonly referred to as ‘pirola’, carries a newly gained N-glycosylation site at N354, which represents the first change in the RBD shielding since the ancestral strain.

The SARS-CoV-2 S RBD (aa 327-540) from the WHu-1 (China, 2019) to the EG.5.1 (China, 2023) shows two highly conserved N-glycosylation sites(Harbison et al., 2022), one at N331 and the other at N343(Watanabe et al., 2020a). While glycosylation at N331 is located on a highly flexible region linking the RBD to the NTD, the N343 glycan covers a large portion of the RBD(Casalino et al., 2020; Harbison et al., 2022), stretching across the protein surface and forming a bridge connecting the two helical regions that frame the beta sheet core, see Figure 1b. In this work we show that removal of the N343 glycan induces a conformational change which in WHu-1, alpha and beta allosterically controls the structure and dynamics of the RBM, see Figure 1c. In delta and omicron these effects are significantly dampened by mutations that strengthen the RBD architecture. Further to this molecular insight, we show that enzymatic removal of the N343 glycan affects binding of monosialylated ganglioside co-receptors(Nguyen et al., 2021) in the WHu-1 RBD, but not in delta. We also observe that the affinity of the RBD for GM1 GM1os and GM2 GM1os changes significantly across the VoCs, with beta and omicron exhibiting the weakest binding.

Ultimately, the molecular insight we provide in this work adds to the ever-growing evidence supporting the role of glycosylation in protein folding and structural stability. This information is not only central to structural biology, but also critical to the design of novel COVID-19 vaccines that may or may not carry glycans(Huang et al., 2022), as well as instrumental to our understanding of the evolutionary mechanisms regulating the shield.

Results

In this section we start with a brief overview of the architecture of the RBD, we then explain how the RBD structure is modulated by interactions with ACE2 and why the N343 glycan is integral to its stability. We then describe how and why the loss of N343 glycosylation affects the RBD structure and its binding affinity for GM1os and GM2os to different degrees in the VoCs.

SARS-CoV-2 S RBD structure and antigenicity

The SARS-CoV-2 S RBD encompasses both structured and intrinsically disordered regions. The structured region is supported by a largely hydrophobic beta sheet core, framed by two flanking, partially helical loops (aa 335-345 and aa 365-375), linked by a bridging N-glycan at N343, see Figure 1b. The aa 335-345 loop carries the N343 glycosylation site and it is part of an important antigenic region targeted by Class 2 and 3 antibodies(Bangaru et al., 2022; Barnes et al., 2020; Carabelli et al., 2023; Chen et al., 2022). In the bridging conformation, the N343 glycan pentasaccharide extends across the RBD beta sheet to reach the aa 365-375 loop forming highly populated hydrogen bonding and dispersion interactions with the backbone and with the sidechains of residues 365 to 375, see Figures 1b,c and S.2. The bridging N343 glycan shields the hydrophobic beta sheet core of the RBD from the surrounding water, preventing energetically unfavourable contacts. Due to its amphipathic nature, the N343 forms dispersion interactions with the hydrophobic residues of the beta sheet through its core GlcNAc-β(1-4)GlcNAc, while engaging in hydrogen bonds with the surrounding water and with the aa 365-375 helical loop. Notably, the key anchoring residues S371, S373 and S375 within this loop are all mutated to hydrophobic residues in all omicron variants (BA.1-2, BA.4-5, BQ.1.1, EG.5.1, XBB.1.5).

The receptor binding motif (RBM) encompasses aa 439-506 and counts all the RBD residues in direct contact with ACE2(Lan et al., 2020). The RBM is heavily targeted by both Class 1 and 2 antibodies(Bangaru et al., 2022; Barnes et al., 2020; Carabelli et al., 2023; Chen et al., 2022) and under high evolutionary pressure, with all VoCs carrying mutations in this region. As shown by earlier MD simulations studies(Casalino et al., 2020; Harbison et al., 2022; Sztain et al., 2021; Williams et al., 2022), the RBM in unbound S is largely unstructured and dynamic, an insight also supported by the low resolution cryo-EM maps of this region(Gobeil et al., 2022; Walls et al., 2020; Wrapp et al., 2020). The RBM’s inherent flexibility is likely an important feature in the opening and closing mechanism of the RBD, where the N343 from adjacent RBDs engage with the protein in closed conformation and gate RBD opening(Sztain et al., 2021). The only relatively structured region of the RBM is what we define here on as the hydrophilic patch, see Figure 1b, a hairpin stabilised by a network of interlocking salt-bridges and polar residues, namely R454, R457, K458, K462, E465, D467, S469, and E471, that faces the interior of the S when the RBD is closed.

When in complex with ACE2 or with antibodies, the RBM adopts a structured fold, also shared by the SARS-CoV-1 S RBD(Li et al., 2005). In this conformation only the terminal hairpin of the RBM (aa 476 to 486) retains a high degree of flexibility, as shown in this work and by others(Williams et al., 2022). The RBM bound fold is stabilised by a hydrophobic patch supported by the stacking of the aromatic and aliphatic residues L455, F456, Y473, A475, see Figure 1b, which are part of the protein interface with ACE2. Notably, all residues in the hydrophobic and hydrophilic patches are highly conserved across the VoCs, possibly due to their critical function in inducing and/or stabilising the RBD into its ACE2-bound conformation. As an interesting observation, the loss of stacking in the hydrophobic patch due to the recent F456L mutation in the EG.5.1 variant (China, 2023) is recovered by the L455F mutation in the, appropriately named, FLip variant.

Based on evidence from screenings(Bangaru et al., 2022; Carabelli et al., 2023; Chen et al., 2022), we subdivided the RBD into three different antigenic regions known to be targeted by different classes of antibodies, see Figure 1d. Region 1 stretches from aa 337-353, which includes the N343 glycosylation site, and counts residues targeted by class 2 and 3 antibodies(Bangaru et al., 2022; Carabelli et al., 2023; Harvey et al., 2021). The aa sequence in Region 1 has been highly conserved so far, allowing specific antibodies to retain their neutralisation activity across all VoCs, such as S309(Piccoli et al., 2020; Pinto et al., 2020), whose binding mode also directly involves the N343 glycan(Liu et al., 2021). A notable and dramatic exception to this high degree of conservation in Region 1 is given by the BA.2.86 variant (Denmark, 2023), known as ‘pirola’, where the K356T mutation introduces a new N-glycosylation sequon at N354. Region 2 coincides with the RBM, which, in addition to binding ACE2 and neutralising antibodies, used to bind the N370 glycan from adjacent RBDs(Allen et al., 2023; Harbison et al., 2022; Watanabe et al., 2020b). N370 glycosylation is lost in SARS-CoV-2 with the RBM binding cleft available to bind glycan co-receptors, such as glycosaminoglycans(Clausen et al., 2020; Kearns et al., 2022), blood group antigens(Nguyen et al., 2021; Wu et al., 2023), monosialylated gangliosides(Nguyen et al., 2021), among others. Region 3 is a short, relatively structured loop stretching between aa 411-426, located on the opposite side of the RBD relative to Region 1, see Figure 1d.

Effect of the loss of N343 glycosylation on the structure of the WHu-1, alpha and beta RBDs

Results obtained for the WHu-1 strain and for the alpha (B.1.1.7) and beta (B.1.351) VoCs are discussed together due to their sequence and structure similarity, with alpha counting only one mutation (N501Y) and beta three mutations (K417N, E484K and N501Y) relative to the WHu-1 RBD. Extensive sampling through conventional MD, i.e. 4 μs for the alpha and beta VoC and 8 μs with an additional 4 μs of Gaussian accelerated MD (GaMD) for the WHu-1 RBD, see Table S.1, shows that the loss of N343 glycosylation induces a dramatic conformational change in the RBD, where one or both helical loops flanking the hydrophobic beta sheet core pull towards each other, see Figure 2. This conformational change can occur very rapidly upon removal of the N-glycan or after a longer delay due to the complexity of the conformational energy landscape. The data used for the analysis corresponds to systems that have reached structural stability, i.e. equilibrium; we discarded the timeframes corresponding to conformational transitions.

Panel a) Kernel Density Estimates (KDE) plot of the backbone RMSD values calculated relative to frame (t = 0) of the trajectory for Region 1 (green) aa 337-353, Region 2 (yellow) aa 439-506, and Region 3 (orange) aa 411-426 of the glycosylated (left plot) and non-glycosylated (right plot) WHu-1 RBDs. Duration of the MD sampling is indicated on the top-right corner of each plot with the conformational equilibration time subtracted as the corresponding data were not included in the analysis. Representative structures from the MD trajectories of the WHu-1 RBD glycosylated (cyan) and non-glycosylated (blue) at N343 are shown on the right-hand side of the panel. The N343 glycan (GlyTouCan-ID G00998NI) is rendered with sticks in white, the hydrophobic residues underneath the N343 glycan are highlighted with VDW spheres, while the protein structure is represented with cartoons. Panel b) KDE plot of the backbone RMSD values (see details in panel a) above) calculated for the alpha (B.1.1.7) RBD glycosylated (left) and non-glycosylated (right) at N343. Representative structures from the MD simulation of the alpha RBDs are shown on the right-hand side of the panel, with the N343 glycosylated RBD shown with pink cartoons and the non-glycosylated alpha RBD in purple cartoons. Panel c) KDE plots of the backbone RMSD values calculated for the beta (B.1.351) RBD glycosylated (left) and non-glycosylated (right) at N343. Representative structures from the MD simulation of the beta RBDs are shown on the right-hand side of the panel, with the N343 glycosylated RBD shown with orange cartoons and the non-glycosylated alpha RBD in red cartoons. Panel d) Binding affinities (1/Kd, x103 M-1) for interactions between different RBDs (including intact and endoF3 treated WHu-1 RBD and alpha and beta RBD) and the GM1os (GlyTouCan-ID G46613JI) and GM2os (GlyTouCan-ID G61168WC) oligosaccharides. HEK293a samples(Nguyen et al., 2021) and shown here as reference. HEK293b samples all carry FLAG and His tags and are shown for WHu-1 (glycosylated and treated with endoF3 treated), alpha and beta sequences. Further details in Supplementary Material. Panel e) Predicted complex between the WHu-1 RBD and GM1os, with GM1os represented with sticks in SNFG colours, the protein represented with cartoons (cyan) and the N343 with sticks (white). Residues directly involved in the GM1os binding or proximal are labelled and highlighted with sticks. All N343 glycosylated RBDs carry also a FA2G2 N-glycan (GlyTouCan-ID G00998NI) at N331, which is not shown for clarity. Rendering done with VMD (https://www.ks.uiuc.edu/Research/vmd/), KDE analysis with seaborn (https://seaborn.pydata.org/) and bar plot with MS Excel.

To explore the effects of the loss of N343 glycosylation in the WHu-1 RBD, we started the MD simulations from different conformations. In one set of conventional sampling MD trajectories (MD1) and in the GaMD simulations the starting structure corresponds to an open RBD from an MD equilibrated S ectodomain obtained in earlier work(Harbison et al., 2022). In this system the RBM is unfolded and retains the maximum degree of flexibility. MD2 was started from a conformation corresponding to the ACE2-bound structure(Lan et al., 2020). Results obtained from MD2 and GaMD are entirely consistent with results from MD1, and thus are included as Supplementary Material in Figure S.2. The GaMD simulation shows a lower degree of contact of the N343 glycan with the aa 365-375 stretch of the opposite loop, see Figure 1c, because most of the contacts are with residues further downstream from position 365. Nevertheless, the N343 remains engaged in a bridging conformation throughout the simulation. As shown by the RMSD values distributions, represented through Kernel Density Estimates (KDE) in Figure 2, the structure of Region 1 in the WHu-1 RBD is stable. In the glycosylated RBD the stability of Region 1 is largely due to the contribution of the bridging N343 FA2G2 glycan, forming hydrogen bonds with the residues in loop aa 365-375 throughout the simulations, see Figures 1c and S.2. Conversely the conformation of the RBM (Region 2) is very flexible in both glycosylated and non-glycosylated forms. Loss of N343 glycosylation triggers a conformational change in the Region 1 of the WHu-1 RBD, shown by a broader KDE peak in Figure 2a. This conformational change ultimately triggers the complete detachment of the hydrophilic loop from Region 1, see Figure 1c, through rupture of the non-covalent interactions network between Y351 (Region 1) and S469 or T470 (Region 2) via of hydrogen bonding, and Y351 and L452 (Region 2) via CH-π stacking. Structural changes in Region 3 upon loss of glycosylation at N343 appear to be negligible.

The starting structure used for the simulations of the alpha RBD derives from the ACE2-bound conformation of the WHu-1 RBD (PDB 6M0J) modified with the N501Y mutation. The reconstructed glycan at N343 interacts with the aa 365-375 throughout the entire trajectory, but it adopts a stable conformation only after 830 ns, where we started collecting the data shown in Figure 2b. Again, we see that the loss of glycosylation at N343 causes a swift conformational change that brings the aa 335-345 and aa 365-375 loops closer together, see Figure 2b. This conformational change involves primarily Region 1, and just like the previous case, it ultimately determines the detachment of the hydrophilic patch from the Y351 in Region 1. Also shown by the KDE plot in Figure 2b, a small conformational change in Region 3, which involves a partial disruption and refolding of a helical turn, can be observed during the trajectory of the N343 glycosylated alpha RBD. As in the previous case the structure of Region 3 appears to be unaffected by N343 glycosylation, at least within the sampling accumulated in this work.

In the beta RBD (starting structure from PDB 7LYN) the reconstructed N343 glycan adopts a bridging conformation quite rapidly and retains this conformation throughout the trajectory with only minor deviations. The corresponding RMSD values KDE distributions for Regions 1 to 3, see Figure 2c, reflect this structural stability. The stability of the RBM (Region 2) is supported by interactions between Y351 (Region 1) and the hydrophilic loop, as noted earlier. Loss of glycosylation at N343 causes a rapid tightening of the RBD core helical loops towards each other, which again in this case ultimately causes the detachment of the hydrophilic loop from Y351 in Region 1 towards the end of the MD trajectory, i.e. after 1.9 μs of sampling.

Effect of the loss of N343 glycosylation on the binding affinity of GM1os /2 osfor the WHu-1, alpha and beta RBDs

In earlier work we presented a model of the complex between GM1 and the WHu-1 RBD(Garozzo et al., 2022) to understand the role of monosialylated gangliosides as co-receptors in SARS-CoV-2 infection(Nguyen et al., 2021). The predicted binding site we validated through extensive MD sampling is located at the junction between Region 1 and Region 2 of the WHu-1 RBD and it involves all the residues that stabilise the region, namely Y351, L452, S469 and T470, see Figures 1c and 2e. As part of our investigation of glycan co-receptors binding to the SARS-CoV-2 RBD, we used direct ESI-MS assay to determine the impact of the loss of N343 glycosylation on GM1os and GM2os binding. Here, we used endoF3-treatment to trim down the fucosylated biantennary and triantennary complex N-glycans into core nonfucosylated or fucosylated GlcNAc (Gn or GnF, respectively). LC-MS analysis suggests that N-glycans on N343 but not N331 of WHu-1 RBD were trimmed down (Figure S.6). From the zero-charge mass spectra of endoF3-treated WT RBD (Figure S.5), we performed glycan assignment (Table S.8) and found that 31% of detected glycoforms contained Gn/GnF at N343 while the remaining was the intact form. Affinity data in Figure 2d show that the enzymatic removal of the N343 glycan from the WHu-1 RBD causes a complete loss of GM1os/2os binding, which is consistent with both, the involvement of the junction between Regions 1 and 2 the binding and its allosteric control of the RBM dynamics. Furthermore, while binding of GM1os and GM2os to the alpha RBD appears to be slightly decreased relative to WHu-1, binding to the beta RBD is dramatically reduced. We can reconcile this finding with the mutation E484K in beta, which changes the key interaction between E484 and GM1os, see Figure 2 and with changes in structure and dynamics of the RBM terminal hairpin induced by mutations(Williams et al., 2022), which have also been suggested to affect the S opening kinetics(Y. Wang et al., 2021).

Effects of N343 glycosylation on the structure of the delta RBD

The delta (B.1.617.2) RBD carries two mutations, namely L452R and T478K, relative to the WHu-1 strain. The open RBD in the cryo-EM structure PDB 7V7Q was used as starting conformation for the MD simulations of both the glycosylated, and the non-glycosylated delta RBDs. To understand how the mutations in delta affect the RBD structure and modulate the response to the loss of glycosylation at N343, we ran two uncorrelated conventional MD simulations (2 μs) and one GaMD simulation (2 μs) for both the glycosylated and non-glycosylated systems, for a total (cumulative) sampling of 12 μs. Results are shown in Figure 3. In the glycosylated delta RBD the N343 glycan is observed to be much more dynamic than in the WHu-1, alpha and beta RBDs, engaging in contacts with different regions of the RBD in addition to the loop aa 365-375. In response to these fluctuations the conformation of the RBD remains stable with only minor deviations from the average structure of Regions 1 and 3. All trajectories, and in particular the results of the GaMD simulation in Figure 3c, show that RBM (Region 2) of delta is highly dynamic. This flexibility appears to involve specifically the terminal hairpin of the RBM (aa 476 to 486), which includes the T478K mutation, while the rest of the RBM is tightly anchored due to the L454R mutation. More specifically, the R452 of delta can establish a new hydrogen bond with Y351, in addition to S469 and T470, reinforcing the junction between Regions 1 and 2. Furthermore, in the delta RBM the role of L452 in CH-π stacking to Y351 is taken by the proximal L492, through a twist of the beta sheet, see Figure 3e. These interactions also contribute to reinforcing the R454 orientation, tightening the link with the RBM hydrophilic patch.

Panel a) KDE plot of the backbone RMSD values calculated relative to frame 1 (t = 0) of the MD1 trajectory for Region 1 (green) aa 337-353, Region 2 (yellow) aa 439-506, and Region 3 (orange) aa 411-426 of the N343 glycosylated delta (B.1.617.2) RBD. The MD1 simulation was started from the open RBD conformation from the cryo-EM structure PDB 7V7Q. Based on the conformation of the N-glycan reconstructed at N353, the first 100 ns of the MD1 production trajectory were considered part of the conformational equilibration and not included in the data analysis. Panel b) KDE plot of the backbone RMSD values calculated relative to frame 1 (t = 0) of the MD2 trajectory for Regions 1-3 (see details above) of the N343 glycosylated delta (B.1.617.2) RBD. The MD2 simulation was started from the open RBD conformation from the cryo-EM structure PDB 7V7Q with different velocities relative to MD1. The first 350 ns of the MD2 production trajectory were considered part of the conformational equilibration and not included in the data analysis. Panel c) KDE plot of the backbone RMSD values calculated relative to frame 1 (t = 0) of the GaMD trajectory for Regions 1-3 of the N343 glycosylated delta (B.1.617.2) RBD. The first 400 ns of the GaMD production trajectory were considered part of the conformational equilibration and not included in the data analysis. Panel e) Graphical representation of the delta RBD with the protein structure (lime cartoon) from a representative snapshot from MD1. The N343 FA2G2 glycan (GlyTouCan-ID G00998NI) is represented in different colours, corresponding to the different MD trajectories, as described in the legend, with snapshots taken at intervals of 100 ns. Residues in the hydrophobic core of the delta RBD are represented with VDW spheres partially visible under the N-glycans overlay. Panel f) Insert showing the junction between Regions 1 and 2 from the left-hand side of the RBD in panel e). The residues involved in the network solidifying the junction are highlighted with sticks and labelled. Panel f) Affinities (1/Kd, x103 M-1) for interactions between GM1os (GlyTouCan-ID G46613JI) and GM2os (GlyTouCan-ID G61168WC) oligosaccharides and the intact and endoF3-treated delta RBD and omicron RBD. Rendering done with VMD (https://www.ks.uiuc.edu/Research/vmd/), KDE analysis with seaborn (https://seaborn.pydata.org/) and bar plot with MS Excel.

The effect of the loss of glycosylation at N343 on the delta RBD was assessed by running two uncorrelated MD simulations, one by conventional sampling (MD1 of 3 μs) and the other through enhanced sampling (GaMD of 2 μs). As a consequence of the L452R mutation shown in Figure 3, the tightening of the helical loops aa 335-345 and aa 365-375 over the hydrophobic core of the RBD occurring upon loss of glycosylation at N343 does not affect the structure and dynamic of the junction between Regions 1 and 2, see Figure S.3. Results of the conventional MD simulation show that the tightening of the loops is mainly achieved by a larger displacement of the aa 365-375 loop rather than of Region 1, while the GaMD results show tightening of both loops, see Figure S.3. In all simulations the structure of the junction between Regions 1 and 2 remains undisturbed, with no detachment of the hydrophilic patch within the sampling we collected.

Effect of the loss of N343 glycosylation on the binding affinity of GM1/2 for the delta RBD

To examine the effect of N343 glycosylation on glycan binding of delta RBD, we used the direct ESI-MS assay to quantify the binding affinities between endo F3-treated delta and GM1os and GM2os. From the zero-charge mass spectra of endoF3-treated RBD, see Figure S.5, we performed glycan assignment, see Table S.9, and found that both N331 and N343 glycans were trimmed down to Gn/GnF. Direct ESI-MS data in Figure 3f show no loss of GM1/2 binding in the delta RBD upon loss of N343 glycosylation, which further supports the involvement of the Region 1 to 2 junction in sialylated glycans recognition.

Effects of N343 glycosylation on the RBD structure in the omicron BA.1 SARS-CoV-2

The omicron BA.1 RBD carries 15 mutations relative to the WHu-1 strain, namely S371L, S373P, S375F, K417N, N440K, G446S, S477N, T478K, E484A, Q493R, G496S, Q498R, N501Y, Y505H, and T547K. The S371L, S373P, and S375F mutations, retained in all omicron VoCs including the most recently circulating XBB.1.5, EG.5.1 and BA.2.86, remove all hydroxyl sidechains that we have seen being involved in hydrogen bonding interactions with the N343 glycan in the WHu-1, alpha, beta and delta RBDs, see Figures 1c and S.3. We investigated the effects of the loss of glycosylation at N343 in the structure of the BA.1 RBD through two sets of uncorrelated conventional MD simulations (MD1 and MD2) and one set of GaMD, with total cumulative simulation time of 12 μs. Starting structures correspond to the open RBD in PDB 7QO7 (MD1) and in PDB 7WVN (MD2 and GaMD), where the N343 glycan was reconstructed in different conformations, depending on the spatial orientation of the N343 sidechain. The results of the MD1 and GaMD simulations show that, despite the S371L, S373P, and S375F mutations, the N343 glycan is still forms stable contacts with the aa 365-375 loop, see Figures 1c and S.3, and these interactions contribute to the stability of the RBD structure, see Figure 4. In the starting structure we used for MD2 the N343 glycan was built with the core pentasaccharide pointing away from the RBD hydrophobic core. Consequently, the N-glycan adopts different transient conformations during the MD2 trajectory, which terminate with an interaction with the hydrophobic interior of the RBD and with the N331 glycan, see Figure S.6. In all simulations the loss of glycosylation at N343 causes a tightening of the aa 335-345 and aa 365-375 loops, which in omicron is stabilised by more efficient packing of the aa 365-375 loop within the hydrophobic core, driven by the embedding of the L371 and F375 sidechains. The non-glycosylated RBD adopts a stable conformation where we do not see a detachment of the hydrophilic patch. The stabilising effect of the aa 365-375 loop mutations in omicron could not be tested by means of affinity for GM1os/2os as omicron (BA.1) binds those epitopes only weakly, see Figure 3f. Based on the binding site we predicted by MD simulations in earlier work(Garozzo et al., 2022), see Figures 1d and 2e, and as observed for beta, the loss of E484 due to the E484A mutation in omicron may negate GM1os/2os binding.

Panel a) KDE plot of the backbone RMSD values calculated relative to frame 1 (t = 0) of the GaMD trajectory for Region 1 (green) aa 337-353, Region 2 (yellow) aa 439-506, and Region 3 (orange) aa 411-426 of the glycosylated omicron (BA.1) RBD. Panel b) KDE plot of the backbone RMSD values calculated relative to frame 1 (t = 0) of the GaMD trajectory (see details above) of the non-glycosylated omicron (BA.1) RBD. Panel c) Graphical representation of the glycosylated (protein in yellow cartoons and N343-FA2G2 in white sticks, N331 omitted for clarity) and non-glycosylated (protein in cyan cartoons) of the omicron (BA.1) RBD. Structures correspond to the last frame of the GaMD trajectories, see details in the legend. Panel d) KDE plot of the backbone RMSD values calculated relative to frame 1 (t = 0) of the MD trajectory of the omicron BA.2.86 RBD glycosylated with FA2G2 N-glycans at N343, N354 and N331(not shown). Panel e) Graphical representation of the omicron BA.2.86 RBD (protein in violet cartoons and N-glycans in violet sticks) structurally aligned to the glycosylated omicron (BA.1) RBD (protein in yellow cartoons) for reference. The N343 and N354 glycans are intertwined throughout the trajectory. Panel f) Same graphical representation of the omicron BA.2.86 and BA.1 RBDs with the N-glycans not shown. The purple arrow points to the displacement of the loop in response to the presence of the N354 glycan in BA.2.86. Rendering with VMD (https://www.ks.uiuc.edu/Research/vmd/) and KDE analysis with seaborn (https://seaborn.pydata.org/).

The stability of the RBD structure is further enhanced by the presence of an additional glycosylation site at N354, which appeared in the recently detected omicron BA.2.86 ‘pirola’ variant. As shown in Figure 4d-f, the N-glycans at N343 and N354 are tightly intertwined throughout the trajectory stabilising Region 1, also shielding the area very effectively. The presence of an additional N-glycosylation site at N354 also changes the conformation of the loop that hosts the site relative to the BA.1 starting structure we used as a template to run the MD simulation, see Figure 4f. To note, based on earlier glycoproteomics analysis(Newby et al., 2022; Watanabe et al., 2020a) and on the exposure to the solvent of the reconstructed glycan structure at N354, we chose to occupy all glycosylation sites with FA2G2 N-glycans.

Discussion

Quantifying the role of glycosylation in protein folding and structural stability is a complex task due to the dynamic nature of the glycan structures(Fadda, 2022; Woods, 2018) and to the micro- and macro-heterogeneity in their protein functionalization(Čaval et al., 2021; Riley et al., 2019; Struwe and Robinson, 2019; Thaysen-Andersen and Packer, 2012; Zacchi and Schulz, 2016) that hinder characterization. Yet, the fact that protein folding occurs within a context where glycosylation types and occupancy can change on the fly, suggests that not all glycosylation sites are essential for the protein to achieve and retain a native fold and that those sites may be displaced without consequences to function. In this work we investigated the structural role of the N-glycosylation at N343 in the SARS-CoV-2 S RBD, one of the most highly conserved sites in the viral phylogeny(Harbison et al., 2022). Extensive MD simulations in this and in earlier work by us and others(Casalino et al., 2020; Grant et al., 2020; Harbison et al., 2022; Sikora et al., 2021) show that the RBD core is effectively shielded by this glycan. Furthermore, the N343 glycan has been shown to be mechanistically involved in the opening and closing of the S(Sztain et al., 2021), making this glycosylation site functionally essential towards viral infection.

In this work we performed over 45 μs of cumulative MD sampling with both conventional and enhanced schemes to show that the N343 glycan also plays a fundamental structural role in the WHu-1 SARS-CoV-2 and that this role has changed in the variants circulating thus far. While we cannot gauge how fundamental is N343 glycosylation towards RBD folding, we see that the amphipathic nature of the complex N-glycan(Watanabe et al., 2020a) at N343 enhances the stability of the RBD architecture, bridging between the two partially helical loops that frame a highly hydrophobic beta sheet core. To note, we determined the same bridging structures also for oligomannose types N-glycans at N343 in earlier work(Harbison et al., 2022). In all variants we observe that the removal of the glycan at N343 triggers a tightening of the loops in a response likely aimed at limiting access of water into the hydrophobic core. In WHu-1, alpha and beta RBDs this event allosterically controls the dynamics of the RBM, ultimately causing the detachment of the hydrophilic patch and misfolding from the ACE2-recognized conformation. These results are in agreement with the drastic reduction of viral infectivity observed upon deletion of both N331 and N343 glycosylation in the WHu-1 strain (Li et al., 2020), where loss of structure may add to the loss of function through gating(Sztain et al., 2021) or vice versa.

As a functional assay to support this molecular insight, we determined how the binding affinity of the RBD for the oligosaccharides of the monosialylated gangliosides GM1os and GM2os is modulated by N343 glycosylation. These were shown in earlier work by us and others to function as co-receptors in WHu-1 infection(Nguyen et al., 2021). We predicted through extensive MD sampling that GM1os and GM2os bind the RBD into a site corresponding precisely to the location occupied by the 6-arm of an ancestral N-glycan at N370(Garozzo et al., 2022; Harbison et al., 2022). Note, the N370 site is still occupied in zoonotic sabercoviruses(Allen et al., 2023). The GM1os binding site, see Figure 2e, is located precisely at the junction between Regions 1 and 2, which is disrupted by the loss of N343 glycosylation in WHu-1. Accordingly, we find that enzymatic removal of the N343 glycan abolishes GM1os and GM2os binding in the WHu-1 RBD, see Figure 2d. While we expect a similar loss of binding in alpha, within the context of a lower affinity relative to the WHu-1 RBD, we find that the beta RBD does not bind GM1os and GM2os, regardless of its glycosylation state. Based on the structure of the GM1-RBD complex we identified, see Figure 2e, where E484 represents a key contact to the oligosaccharides, the mutation of E484K in beta may be key to the loss of binding, together with change in the RBM kinetics linked to this mutation and to variations within the same region(Y. Wang et al., 2021; Williams et al., 2022).

In the delta variant we observed that the L452R mutation is responsible for an increased structural stability of the RBD, reinforcing the non-covalent interactions network between Region 1 and the RBM. Indeed, the tightening of the loops occurring upon loss of N343 glycosylation does not trigger a misfolding of the RBM, see Figure 3. Accordingly, we observe that the delta RBD with the trimming down of N331 and N343 glycans shows no significant change in binding affinity for GM1os and GM2os relative to the fully glycosylated form, see Figure 3f.

In all omicron variants, including all the currently circulating VoCs and VUMs, the loop aa 365-375 that the N343 glycan hooks on, carries similar mutations, with the highly conserved S371, S373 and S375 all mutated to hydrophobic residues, see Figure S.1. Our MD results on the BA.1 and BA.2.86 RBDs show that hydrophobic residues at positions 371, 372 and 373 can pack within the RBD core, while leading to a loop structure that can support the N343 glycan branches through interactions with the backbone, see Figure 4. We have shown for all variants that the contacts between the N343 glycan and the aa 365-375 stretch of the opposite loop are fairly equally distributed, between hydrophilic (hydrogen bonding) and hydrophobic (dispersion or van der Waals) type interactions, see Figure 1c. Therefore, it is expected that the loss of anchoring hydrogen bonding residues can be supported through other interactions. Within this context, the removal of the N343 glycan does still cause a tightening of the loops, yet through a different mechanism relative to the other variants that ultimately does not appear to affect the RBM dynamics. As in beta, for omicron there is negligible binding of the N343 glycosylated RBD to GM1os and GM2os, likely due to the E484A mutation, which would deny a key interaction within the predicted binding site, see Figure 2e.

Taken together, our results show that since the WHu-1, alpha and beta strains, the RBD has evolved to make the N-glycosylation site at N343 structurally dispensable. Within this framework, provided that an N-glycosylation site in the immediate vicinity of N343 is necessary for folding and for function, a shift of the site within the sequence can potentially occur. Such a modification may negatively affect recognition and binding by neutralising antibodies(Liu et al., 2021; Piccoli et al., 2020; Pinto et al., 2020) and thus promote evasion. We have also shown for the BA.2.86 that the new glycosylation site at N354 can effectively contribute to the stability of Region 1, while significantly increasing shielding.

Moreover, we show that specific VoCs lost affinity for monosialylated ganglioside oligosaccharides with a trend in agreement with a binding site located at the junction between Region 1 and the RBM, which is part of the N370 glycan binding cleft on the RBD(Harbison et al., 2022). This conclusion is further supported by how binding affinities for GM1os and GM2os change upon the loss of N343 glycosylation, in agreement with the MD results. Further to this, as mutations we identified dampened binding to monosialylated ganglioside oligosaccharides, it is also possible that further mutations may switch the affinity back on or determine a shift of preference of the RBD towards other glycans that can still be recognised within the N370 cleft. Further work is ongoing in this area.

Finally, the results from this work point to the importance of understanding the impact of N- glycosylation in protein structure and stability, with immediate consequences to COVID-19 vaccine design. Indeed, earlier work shows SARS-CoV-2 S-based protein vaccines with increased efficacy due to the removal of N-glycans(Huang et al., 2022), and of RBD-based vaccines in use and under development(Cohen et al., 2022; Más-Bermejo et al., 2022; Valdes-Balbin et al., 2021) that may be designed with and without N-glycans. The design of such constructs may benefit from understanding which N-glycosylation sites are structurally essential and which are dispensable.

Material and Methods

Computational methods

All simulations were performed using additive, all-atom force fields, namely the AMBER 14SB parameter set(Maier et al., 2015) to represent protein atoms and counterions (200 mM of NaCl), GLYCAM06j-1(Kirschner et al., 2008) to represent glycans, and TIP3P for water molecules(Jorgensen et al., 1983). All production trajectories from conventional (deterministic) MD simulations were run for a minimum of 2 μs to ensure convergence. In some cases, we extended the simulations up to 3 μs to assess the stability of specific conformational transitions, where deemed necessary. All Gaussian accelerated MD (GaMD)(Miao et al., 2015; J. Wang et al., 2021) production trajectories were run for 2 μs. All simulations of the N343 glycosylated and non-glycosylated RBDs were started from identical 3D structures. The glycans at N331 and N343 were rebuilt as FA2G2 (GlyTouCan-ID G00998NI) based on glycoproteomics data(Newby et al., 2022; Watanabe et al., 2020a) with 3D structures from our GlycoShape database(Ives et al., 2023) (https://glycoshape.org). Further information on the RBD structures and PDB IDs for all variants, together with details on the MD systems set-up, equilibration protocols and total sampling times allocations are available as Supplementary Material. Sequences for all VoCs and VUM RBDs (aa 327-540) from https://viralzone.expasy.org/9556.

Proteins and glycans

Expression and purification of recombinant WHu-1, Alpha, Beta, Delta and Omicron RBD (EG319RVQP…VN541F, UniProt number P0DTC2) with C-terminal FLAG (SGDYKDDDDKG) and His tags (HHHHHHG) used in the current study were described elsewhere(Akache et al., 2021; Colwill et al., 2022). Mutations of SARS-CoV-2 RBD VOCs are shown in Figure S1. Proteins were purified using standard immobilised metal-ion affinity chromatography (IMAC), followed by size-exclusion chromatography on Superdex-75 to remove dimers as decribed(Forest-Nault et al., 2022). To obtain endo F3-treated WHu-1 and Delta RBD, 100 μg of each RBD was treated with endo F3 (purchased from New England Biolabs) in 1x Glycobuffer (50 mM sodium acetate, pH 4.5) at 37 °C overnight. Each protein was dialyzed and concentrated against 100 mM ammonium acetate (pH 7.4) using an Amicon 0.5-mL microconcentrator (EMD Millipore) with a 10-kDa MW cutoff and stored at –80 °C until used. The concentrations of protein stock solutions were estimated by UV absorption (280 nM). The oligosaccharides of GM1 and GM2, Galβ1-3GalNAcβ1-4(Neu5Acα2-3)Galβ1-4Glc (MW 998.34 Da, GM1os) and GalNAcβ1-4(Neu5Acα2-3)Galβ1-4Glc (MW 836.29 Da, GM2os), respectively, were purchased from Elicityl SA (Crolles, France). 1 mM stock solutions of each glycan were prepared by dissolving a known mass of glycan in ultrafiltered Milli-Q water. All stock solutions were stored at -20 °C until needed.

ESI-MS affinity measurements

Affinities (Kd) of glycan ligands for RBD were measured by the direct ESI-MS binding assay. The ESI-MS affinity measurements were performed in positive ion mode on a Q Exactive Orbitrap mass spectrometer (Thermo Fisher Scientific). The capillary temperature was 150 °C, and the S-lens RF level was 100; an automatic gain control target of 5L×L105 and maximum injection time of 100Lms were used. The resolving power was 17,500. The instrument was equipped with a modified nanoflow ESI (nanoESI) source. NanoESI tips with an outer diameter (o.d.) of ∼5 μm were pulled from borosilicate glass (1.2 mm o.d., 0.69 mm i.d., 10 cm length, Sutter Instruments, CA) with a P-97 micropipette puller (Sutter Instruments). A platinum wire was inserted into the nanoESI tip, making contact with the sample solution, and a voltage of 0.8 kV was applied. Each sample solution contained a given RBD (5 μM) and GM1os or GM2os (at three different concentrations ranging from 10 to 150 μM) in ammonium acetate (100 mM, pH 7.4). Data acquisition and pre-processing was performed using the Xcalibur software (version 4.1); ion abundances were extracted using the in-house software SWARM(Kitov et al., 2019). A brief description of the data analysis procedures used in this work is given as the Supporting Information.

Protease digestion

20 μg of a given purified protein (intact and endoF3-treated WT RBDs) were dissolved in 100 μL of 8 M urea in 100 mM Tris-HCl (pH 8.0) containing 3 mM EDTA and incubated at room temperature for 1 h. The denatured protein was then reduced with 5 μL of 500 mM dithiothreitol (DTT; Sigma-Aldrich) at room temperature for 1 h; followed by alkylation with 12 μL of 500 mM iodoacetamide (Sigma-Aldrich) at room temperature for 20Lmin in the dark. The reaction was quenched by adding 5 μL of 250 mM DTT, and the solution buffer was exchanged using a 10-kDa Amicon Ultra centrifugal filter. The samples were loaded onto the filter and centrifuged at 14 000×g for 15 min. The glycoprotein solution was subsequently digested with trypsin/chymotrypsin (substrate/enzyme (wt/wt)L=L50) in 50 mM ammonium bicarbonate (pH 8.0) for 18Lh at 37 °C. The reaction was quenched by heat inactivation at 100 °C for 10Lmin. The lyophilized sample was stored at -20 °C until LC–MS analysis.

Peptide analysis by Reverse-Phase Liquid Chromatography (RPLC)-MS/MS

The digested samples were separated using a RPLC-MS/MS on a Vanquish UHPLC system (Thermo Fisher Scientific) coupled with ESI-MS detector (Thermo Q Exactive Orbitrap). Peptide separation was achieved using a Waters Acquity UPLC Peptide BEH C18 column (1.7 μm, 2.1 mm × 150 mm; Waters). The eluents were 0.1% formic acid in water (solvent A) and 0.1% formic acid in acetonitrile (solvent B). The separation was performed at 60 °C. The following gradient was used for MS detection: t = 0 min, 95% solvent A (0.2 mL min−1); t = 45 min, 40% solvent A (0.2 mL min−1); t = 55 min, 5% solvent A (0.2 mLmin−1); t = 55.1 min, 95% solvent A (0.2 mL min−1). During LC–MS analysis, the following parameters were used: sheath gas flow rate of 10 arbitrary units (AU), capillary temperature of 250 °C and spray voltage of 1.5 kV. The mass spectra were acquired in positive mode with an m/z range of 200–3,000 at a resolution of 70,000. The automatic gain control target was set at 1 × 106, and a maximum injection time of 100 ms was used. HCD mass spectra were acquired in the data-dependent mode for the five most abundant ions with a resolution of 17,500. Automatic gain control target, maximum injection time and isolation window were set at 2 × 105, 200 ms and 2.0 m/z, respectively. HCD-normalized collision energy was 25%. The data were recorded by Xcalibur (Thermo, version 4.1) and analyzed using Thermo BioPharma Finder software.

The peptide sequences (EG319RVQP…VN541FS with C-terminal FLAG (SGDYKDDDDKG) and His tags (HHHHHHG), UniProt number P0DTC2) were then identified using the theoretical digest feature of the software. Carbamidomethylation and carboxymethylation at cysteine residues were used as a fixed modification. Common mammalian N- and O-glycans were also used as variable modifications. A precursor mass tolerance of 5 ppm was set. For quantification, the abundance of each N-glycan at each N-glycosylation site (N331 and N343) is the sum of MS areas under the peak curve divided by the corresponding charge states. Next, for each N-glycosylation site, the relative abundance of each N-glycan is calculated as its abundance over the total abundance of all N-glycans detected.

Acknowledgements

The Science Foundation of Ireland (SFI) Frontiers for the Future Programme is gratefully acknowledged for financial support of CMI postdoctoral training (20/FFP-P/8809). The opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the Science Foundation Ireland. CMI and EF gratefully acknowledge ORACLE for Research for the generous allocation of computational and data storage resources. CAF acknowledges the Irish Research Council (IRC) for funding through the Government of Ireland Postgraduate Scholarship Programme (GOIPG/201912212). CMI, CAF, AMH and EF acknowledge the Irish Centre for High-End Computing (ICHEC) for generous allocation of computational resources. Large part of the computational work described here was run on the HPC cluster kay at ICHEC, soon to be decommissioned. We would like to take this opportunity to thank kay for her invaluable service to the Irish scientific computing community, together with all the staff at ICHEC that took great care of her during the past 5 years. JSK and LN acknowledge the Natural Sciences and Engineering Research Council of Canada, the Canada Foundation for Innovation and the Alberta Innovation and Advanced Education Research Capacity Program for funding. We are grateful to the members of the NRC-HHT Mammalian Cell Expression Section for their contribution to the cloning, expression and purification of the various recombinant proteins used in this study and to the Pandemic Response Challenge Program of the National Research Council of Canada for its financial support.

Data Availability

All MD trajectories are available for download at https://doi.org/10.5281/zenodo.10441732