Chapter 3

Unknown Knowns, Known Unknowns, and Unforeseen Consequences: Using Free Energy Shifts To Predict Mutant Phenotypes

Published as …

A version of this chapter originally appeared as Chure, G, Razo-Mejia, M., Belliveau, N.M., Kaczmarek, Zofii A., Einav, T., Barnes, Stephanie L., Lewis, M., and Phillips, R. (2019). Predictive shifts in free energy couple mutations to their phenotypic consequences. Proceedings of the National Academies of Sciences 116(37) DOI: https://doi.org/10.1073/pnas.1907869116. G.C., M.R.M, N.M.B., Z.A.K., and S.L.B designed the experiments and collected and analyzed data. G.C. developed the theoretical treatment of free energy shifts. G.C., M.R.M, N.M.B., Z.A.K., T.E., S.L.B., and R.P. designed the research project. G.C. and R.P. wrote the paper. M.L. provided guidance and advice.

Abstract

Mutation is a critical mechanism by which evolution explores the functional landscape of proteins. Despite our ability to experimentally inflict mutations at will, it remains difficult to link sequence-level perturbations to systems-level responses. Here, we present a framework centered on measuring changes in the free energy of the system to link individual mutations in an allosteric transcriptional repressor to the parameters which govern its response. We find that the energetic effects of the mutations can be categorized into several classes which have characteristic curves as a function of the inducer concentration. We experimentally test these diagnostic predictions using the well-characterized LacI repressor of Escherichia coli, probing several mutations in the DNA binding and inducer binding domains. We find that the change in gene expression due to a point mutation can be captured by modifying only the model parameters that describe the respective domain of the wild-type protein. These parameters appear to be insulated, with mutations in the DNA binding domain altering only the DNA affinity and those in the inducer binding domain altering only the allosteric parameters. Changing these subsets of parameters tunes the free energy of the system in a way that is concordant with theoretical expectations. Finally, we show that the induction profiles and resulting free energies associated with pairwise double mutants can be predicted with quantitative accuracy given knowledge of the single mutants, providing an avenue for identifying and quantifying epistatic interactions.

Introduction

Thermodynamic treatments of transcriptional regulation have been fruitful in their ability to generate quantitative predictions of gene expression as a function of a minimal set of physically meaningful parameters (Ackers and Johnson 1982; Buchler, Gerland, and Hwa 2003; Vilar and Leibler 2003; Garcia and Phillips 2011; Daber, Sharp, and Lewis 2009; Brewster et al. 2014; Weinert et al. 2014; Rydenfelt et al. 2014; Razo-Mejia et al. 2014, 2018; Bintu, Buchler, Garcia, Gerland, Hwa, Kondev, and Phillips 2005; Bintu, Buchler, Garcia, Gerland, Hwa, Kondev, Kuhlman, et al. 2005; Kuhlman et al. 2007). These models quantitatively describe numerous properties of input-output functions, such as the leakiness, saturation, dynamic range, steepness of response, and the [EC50] – the concentration of inducer at which the response is half maximal. The mathematical forms of these phenotypic properties are couched in terms of a minimal set of experimentally accessible variables, such as the inducer concentration, transcription factor copy number, and the DNA sequence of the binding site (see Chapter 2 and Razo-Mejia et al. (2018)). While the amino acid sequence of the transcription factor is another controllable variable, it is seldom implemented in quantitative terms considering mutations with subtle changes in chemistry frequently yield unpredictable physiological consequences. In this work, we examine how a series of mutations in either the DNA binding or inducer binding domains of a transcriptional repressor influence the values of the biophysical parameters which govern its regulatory behavior.

     We build upon the results presented in Chapter 2 of this thesis and present a theoretical framework for understanding how mutations in the amino acid sequence of the repressor affect different parameters and alter the free energy of the system. We find that the parameters capturing the allosteric nature of the repressor, the repressor copy number, and the DNA binding specificity contribute independently to the free energy of the system with different degrees of sensitivity. Furthermore, changes restricted to one of these three groups of parameters result in characteristic changes in the free energy relative to the wild-type repressor, providing falsifiable predictions of how different classes of mutations should behave.

     Next, we test these descriptions experimentally using the well-characterized transcriptional repressor of the lac operon LacI in E. coli regulating expression of a fluorescent reporter. We introduce a series of point mutations in either the inducer binding or DNA binding domain. We then measure the full induction profile of each mutant, determine the minimal set of parameters that are affected by the mutation, and predict how each mutation tunes the free energy at different inducer concentrations, repressor copy numbers, and DNA binding strengths. We find in general that mutations in the DNA binding domain only influence DNA binding strength, and that mutations within the inducer binding domain affect only the parameters which dictate the allosteric response. The degree to which these parameters are insulated is notable, as the very nature of allostery suggests that all parameters are intimately connected, thus enabling binding events at one domain to be “sensed” by another.

     With knowledge of how a collection of DNA binding and inducer binding single mutants behave, we predict the induction profiles and the free energy changes of pairwise double mutants with quantitative accuracy. We find that the energetic effects of each individual mutation are additive, indicating that epistatic interactions are absent between the mutations examined here. Our model provides a means for identifying and quantifying the extent of epistatic interactions in a more complex set of mutations, and can shed light on how the protein sequence and general regulatory architecture coevolve.

Theoretical Model

This work considers the inducible simple repression regulatory motif depicted in Fig. 1 (A) from a thermodynamic perspective which has been thoroughly dissected and tested experimentally (Garcia and Phillips 2011; Brewster et al. 2014; Razo-Mejia et al. 2018) and is described in depth in Chapter 2. The result of this extensive theory-experiment dialogue is a succinct input-output function schematized in Fig. 1 (B) that computes the fold-change in gene expression relative to an unregulated promoter. This function is of the form
$$ \text{fold-change} = \left(1 + {R_A \over N_{NS}}e^{-\beta\Delta\varepsilon_{RA}}\right)^{-1}, \qquad(1)$$
where RA is the number of active repressors per cell, NNS is the number of non-specific binding sites for the repressor, ΔεRA is the binding energy of the repressor to its specific binding site relative to the non-specific background, and β is defined as ${1 \over k_B T}$ where kB is the Boltzmann constant and T is the temperature. While this theory requires knowledge of the number of active repressors, we often only know the total number R which is the sum total of active and inactive repressors. We can define a prefactor pact(c) which captures the allosteric nature of the repressor and encodes the probability that a repressor is in the active (repressive) state rather than the inactive state for a given inducer concentration c, namely,
$$ p_\text{act}(c) = {\left(1 + {c \over K_A}\right)^n \over \left(1 + {c \over K_A}\right)^n + e^{-\beta\Delta\varepsilon_{AI}}\left(1 + {c \over K_I}\right)^n}. \qquad(2)$$

Here, KA and KI are the dissociation constants of the inducer to the active and inactive repressor, ΔεAI is the energetic difference between the repressor active and inactive states, and n is the number of allosteric binding sites per repressor molecule (n = 2 for LacI). With this in hand, we can define RA in Eq. 1 as RA = pact(c)R.

Figure 1: A predictive framework for phenotypic and energetic dissection of the simple repression motif. (A) The inducible simple repression architecture. When in the active state, the repressor (red) binds the cognate operator sequence of the DNA (orange box) with high specificity, preventing transcription by occluding binding of the RNA polymerase to the promoter (blue rectangle). Upon addition of an inducer molecule, the inactive state (purple) becomes energetically preferable, and the repressor no longer binds the operator sequence with appreciable specificity. Once unbound from the operator, binding of the RNA polymerase (blue) is no longer blocked, and transcription can occur. (B) The simple repression input–output function for an allosteric repressor with two inducer binding sites. The key parameters are identified in speech bubbles. (C) The fold change in gene expression collapses as a function of the free energy. Panel (C, left) shows measurements of the fold change in gene expression as a function of inducer concentration from Razo-Mejia et al. (2018). Points and errors correspond to the mean and SEM of at least 10 biological replicates. The thin lines represent the line of best fit given the model shown in (B). This model can be rewritten as a Fermi function with an energetic parameter F, which is the energetic difference between the repressor bound and unbound states of the promoter, schematized in C, Middle. The points in (C), Bottom correspond to the data shown in (C, left) collapsed onto a master curve defined by their calculated free energy F. The solid black line is the master curve defined by the Fermi function shown in (C, Middle). The Python code (ch3_fig1.py) used to generate this figure can be found on the thesis GitHub repository.

     A key feature of Eq. 1 and Eq. 2 is that the diverse phenomenology of the gene expression induction profile can be collapsed onto a single master curve by rewriting the input-output function in terms of the free energy F also called the Bohr parameter (Phillips 2015),
$$ \text{fold-change} = \frac{1}{1 + e^{-\beta F}}, \qquad(3)$$
where
$$ F = -k_BT \log p_\text{act}(c) - k_BT\log\left({R \over N_{NS}}\right) + \Delta\varepsilon_{RA}. \qquad(4)$$
Hence, if different combinations of parameters yield the same free energy, they will give rise to the same fold-change in gene expression, enabling us to collapse multiple regulatory scenarios onto a single curve. This can be seen in Fig. 1 (C) where eighteen unique inducer titration profiles of a LacI simple repression architecture collected and analyzed in Razo-Mejia et al. (2018) collapse onto a single master curve. The tight distribution about this curve reveals that the fold-change across a variety of genetically distinct individuals can be adequately described by a small number of parameters. Beyond predicting the induction profiles of different strains, the method of data collapse inspired by Eq. 3 and Eq. 4 can be used as a tool to identify mechanistic changes in the regulatory architecture (Swem et al. 2008). Similar data collapse approaches have been used previously in such a manner and have proved vital for distinguishing between changes in parameter values and changes in the fundamental behavior of the system (Swem et al. 2008; Keymer et al. 2006).

     Assuming that a given mutation does not result in a non-functional protein, it is reasonable to say that any or all of the parameters in Eq. 1 can be affected by the mutation, changing the observed induction profile and therefore the free energy. To examine how the free energy of a mutant F(mut) differs from that of the wild-type F(wt), we define ΔF = F(mut) − F(wt), which has the form
$$ \begin{aligned} \Delta F = -k_BT\log\left({p_\text{act}^\mathrm{(mut)}(c) \over p_\text{act}^\mathrm{(wt)}(c)}\right) &- k_BT \log\left({R^\mathrm{(mut)}\over R^\mathrm{(wt)}}\right)\\ &+ (\Delta\varepsilon_{RA}^\mathrm{(mut)}- \Delta\varepsilon_{RA}^\mathrm{(wt)}). \end{aligned} \qquad(5)$$
ΔF describes how a mutation translates a point across the master curve shown in Fig. 1 (C). As we will show in the coming paragraphs (illustrated in Fig. 2), this formulation coarse grains the myriad parameters shown in Eq. 1 and Eq. 2 into three distinct quantities, each with different sensitivities to parametric changes. By examining how a mutation changes the ΔF as a function of the inducer concentration, one can draw conclusions as to which parameters have been modified based solely on the shape of the curve. To help the reader understand how various perturbations to the parameters tune the free energy, we have hosted an interactive figure on the dedicated website for the publication which makes exploration of parameter space a simpler task.

     The first term in Eq. 5 is the log ratio of the probability of a mutant repressor being active relative to the wild type at a given inducer concentration c. This quantity defines how changes to any of the allosteric parameters – such as inducer binding constants KA and KI or active/inactive state energetic difference ΔεAI – alter the free energy F, which can be interpreted as the free energy difference between the repressor bound and unbound states of the promoter. Fig. 2 (A) illustrates how perturbations to the inducer binding constants KA and KI alone alter the induction profiles and free energy as a function of the inducer concentration. In the limit where c = 0, the values of KA and KI do not factor into the calculation of pact(c) given by Eq. 2 meaning that ΔεAI is the lone parameter setting the residual activity of the repressor. Thus, if only KA and KI are altered by a mutation, then ΔF should be 0 kBT when c = 0, illustrated by the overlapping red, purple, and grey curves in the right-hand plot of Fig. 2 (A). However, if ΔεAI is influenced by the mutation (either alone or in conjunction with KA and KI), the leakiness will change, resulting in a non-zero ΔF when c = 0. This is illustrated in Fig. 2 (B) where ΔεAI is the only parameter affected by the mutation.

     It is important to note that for a mutation which perturbs only the inducer binding constants, the dependence of ΔF on the inducer concentration can be non-monotonic. While the precise values of KA and KI control the sensitivity of the repressor to inducer concentration, it is the ratio KA/KI that defines whether this non-monotonic behavior is observed. This can be seen more clearly when we consider the limit of saturating inducer concentration,
$$ \lim\limits_{c \rightarrow \infty} \log\left({p_\text{act}^\mathrm{(mut)}\over p_\text{act}^\mathrm{(wt)}}\right) \approx \log\left[{1 + e^{-\beta\Delta\varepsilon_{AI}^\mathrm{(wt)}} \left({K_A^\mathrm{(wt)}\over K_I^\mathrm{(wt)}}\right)^n \over 1 + e^{-\beta\Delta\varepsilon_{AI}^\mathrm{(wt)}} \left({K_A^\mathrm{(mut)}\over K_I^\mathrm{(mut)}}\right)^n}\right], \qquad(6)$$
which illustrates that ΔF returns to zero at saturating inducer concentration when the ratio KA/KI is the same for both the mutant and wild-type repressors, so long as ΔεAI is unperturbed. Non-monotonicity can only be achieved by changing KA and KI and therefore serves as a diagnostic for classifying mutational effects reliant solely on measuring the change in free energy. A rigorous proof of this non-monotonic behavior given changing KA and KI can be found in supplemental Chapter 7.

     The second term in Eq. 5 captures how changes in the repressor copy number contributes to changes in free energy. It is important to note that this contribution to the free energy change depends on the total number of repressors in the cell, not just those in the active state. This emphasizes that changes in the expression of the repressor are energetically divorced from changes to the allosteric nature of the repressor. As a consequence, the change in free energy is constant for all inducer concentrations, as is schematized in Fig. 2 (C). Because the magnitude of the free energy shift logarithmically proportional to the relative change in repressor copy number, a mutation which increases expression from 1 to 10 repressors per cell is more impactful from an energetic standpoint (kBTlog (10) ≈ 2.3 kBT) than an increase from 90 to 100 (kBTlog (100/90) ≈ 0.1 kBT). Appreciable changes in the free energy only arise when variations in the repressor copy number are larger than or comparable to an order of magnitude. Changes of this magnitude are certainly possible from a single point mutation, as it has been shown that even synonymous substitutions can drastically change translation efficiency (Frumkin et al. 2018).

     The third and final term in Eq. 5 is the difference in the DNA binding energy between the mutant and wild-type repressors. All else being equal, if the mutated state binds more tightly to the DNA than the wild type (ΔεRA(wt) > ΔεRA(mut)), the net change in the free energy is negative, indicating that the repressor bound states become more energetically favorable due to the mutation. Much like in the case of changing repressor copy number, this quantity is independent of inducer concentration and is therefore also constant (Fig. 2 (D)). However, the magnitude of the change in free energy is linear with DNA binding affinity while it is logarithmic with respect to changes in the repressor copy number. Thus, to change the free energy by 1 kBT, the repressor copy number must change by a factor of  ≈ 2.3 whereas the DNA binding energy must change by 1 kBT.

     The unique behavior of each quantity in Eq. 5 and its sensitivity with respect to the parameters makes ΔF useful as a diagnostic tool to classify mutations. Given a set of fold-change measurements, a simple rearrangement of Eq. 3 permits the direct calculation of the free energy, assuming that the underlying physics of the regulatory architecture has not changed. Thus, it becomes possible to experimentally test the general assertions made in Fig. 2.

Figure 2: Parametric changes due to mutations and the corresponding free-energy changes for (A) perturbations to KA and KI, (B) changes to the allosteric energy difference ΔεAI, (C) changes to repressor copy number, and (D) changes in DNA binding affinity. The first column schematizes the changed parameters and the second column reflects which quantity in Eq. 5 is affected. The third column shows representative induction profiles from mutants which have smaller (purple) and larger (orange) values for the parameters than the wild type (gray). The fourth and fifth columns illustrate how the free energy is changed as a result. Purple and red arrows indicate the direction in which the points are translated about the master curve. Three concentrations (points labeled 1, 2, and 3) are shown to illustrate how each point is moved in free-energy space. An interactive version of this figure can be found on the paper website (https://www.rpgroup.caltech.edu/mwc_mutants).

Results

DNA Binding Domain Mutants

With this arsenal of analytic diagnostics, we can begin to explore the mutational space of the repressor and map these mutations to the biophysical parameters they control. As one of the most thoroughly studied transcription factors, LacI has been subjected to numerous crystallographic and mutational studies (Lewis et al. 1996; Daber, Sharp, and Lewis 2009; Daber, Sochor, and Lewis 2011). One such work generated a set of point mutations in the LacI repressor and examined the diversity of the phenotypic response to different allosteric effectors (Daber, Sochor, and Lewis 2011). However, several experimental variables were unknown, precluding precise calculation of ΔF as presented in the previous section. In Daber, Sochor, and Lewis (2011), the repressor variants and the fluorescence reporter were expressed from separate plasmids. As the copy numbers of these plasmids fluctuate in the population, both the population average repressor copy number and the number of regulated promoters were unknown. Both of these quantities have been shown previously to significantly alter the measured gene expression, and calculation of ΔF is dependent on knowledge of their values. While the approach presented in Daber, Sochor, and Lewis (2011) considers the Lac repressor as an MWC molecule, the copy numbers of the repressor and the reporter gene were swept into an effective parameter R/KDNA, hindering our ability to distinguish between changes in repressor copy number or in DNA binding energy. To test our hypothesis of free energy differences resulting from various parameter perturbations, we used the data set in Daber, Sochor, and Lewis (2011) as a guide and chose a subset of the mutations to quantitatively dissect. To control copy number variation, the mutant repressors and the reporter gene were integrated into the E. coli chromosome where the copy numbers are known and tightly controlled (Razo-Mejia et al. 2018; Garcia and Phillips 2011). Furthermore, the mutations were paired with ribosomal binding sites where the level of translation of the wild-type repressor had been directly measured previously (Garcia and Phillips 2011).

     We made three amino acid substitutions (Y17I, Q18A, and Q18M) that are critical for the DNA-repressor interaction. These mutations were introduced into the lacI sequence used in Garcia and Phillips (2011) with four different ribosomal binding site sequences that were shown (via quantitative Western blotting) to tune the wild-type repressor copy number across three orders of magnitude. These mutant constructs were integrated into the E. coli chromosome harboring a Yellow Fluorescent Protein (YFP) reporter. The YFP promoter included the native O2 LacI operator sequence which the wild-type LacI repressor binds with high specificity (ΔεRA =  − 13.9 kBT). The fold-change in gene expression for each mutant across twelve concentrations of IPTG was measured via flow cytometry. As we mutated only a single amino acid with the minimum number of base pair changes to the codons from the wild-type sequence, we find it unlikely that the repressor copy number was drastically altered from those reported in Garcia and Phillips (2011) for the wild-type sequence paired with the same ribosomal binding site sequence. In characterizing the effects of these DNA binding mutations, we take the repressor copy number to be unchanged. Any error introduced by this assumption should be manifest as a larger than predicted systematic shift in the free energy change when the repressor copy number is varied.

Figure 3: Induction profiles and free-energy differences of DNA binding domain mutations. Each column corresponds to the highlighted mutant at the top of the figure. Each strain was paired with the native O2 operator sequence. Open points correspond to the strain for each mutant from which the DNA binding energy was estimated. (A) Induction profiles of each mutant at four different repressor copy numbers as a function of the inducer concentration. Shaded regions demarcate the 95% credible region of the induction profile generated by the estimated DNA binding energy. (B) Data collapse of all points for each mutant shown in A using only the DNA binding energy estimated from a single repressor copy number. Points correspond to the average fold change in gene expression of 6–10 biological replicates. Error bars are SEM. Where error bars are not visible, the relative error in measurement is smaller than the size of the marker. (C) The change in the free energy resulting from each mutation as a function of the inducer concentration. Points correspond to the median of the marginal posterior distribution for the free energy. Error bars represent the upper and lower bounds of the 95% credible region. Points in A at the detection limits of the flow cytometer (near fold-change values of 0 and 1) were neglected for calculation of the ΔF. The IPTG concentration is shown on a symmetric log scale with linear scaling ranging from 0 to 10 − 2 μM and log scaling elsewhere. The shaded red lines in C correspond to the 95% credible region of our predictions for ΔF based solely on estimation of ΔεRA from the strain with R = 260 repressors per cell. The Python code (ch3_fig3.py) used to generate this figure can be found on the thesis GitHub repository.

     A naïve hypothesis for the effect of a mutation in the DNA binding domain is that only the DNA binding energy is affected. This hypothesis appears to contradict the core principle of allostery in that ligand binding in one domain influences binding in another, suggesting that changing parameter modifies them all. The characteristic curves summarized in Fig. 2 give a means to discriminate between these two hypotheses by examining the change in the free energy. Using a single induction profile (white-faced points in Fig. 3), we estimated the DNA binding energy using Bayesian inferential methods, the details of which are thoroughly discussed in the Materials & Methods as well as in the supplemental Chapter 7. The shaded red region for each mutant in Fig. 3 represents the 95% credible region of this fit whereas all other shaded regions are 95% credible regions of the predictions for other repressor copy numbers. We find that redetermining only the DNA binding energy accurately captures the majority of the induction profiles, indicating that other parameters are unaffected. One exception is for the lowest repressor copy numbers (R = 60 and R = 124 per cell) of mutant Q18A at low concentrations of IPTG. However, we note that this disagreement is comparable to that observed for the wild-type repressor binding to the weakest operator in Razo-Mejia et al. (2018), illustrating that our model is imperfect in characterizing weakly repressing architectures. Including other parameters in the fit (such as ΔεAI) does not significantly improve the accuracy of the predictions. Furthermore, the magnitude of this disagreement also depends on the choice of the fitting strain (see supplemental Chapter 7).

     Mutations Y17I and Q18A both weaken the affinity of the repressor to the DNA relative to the wild type strain with binding energies of  − 9.9 − 0.1 + 0.1kBT and  − 11.0 − 0.1 + 0.1kBT, respectively. Here we report the median of the inferred posterior probability distribution with the superscripts and subscripts corresponding to the upper and lower bounds of the 95% credible region. These binding energies are comparable to that of the wild-type repressor affinity to the native LacI operator sequence O3, with a DNA binding energy of  − 9.7 kBT. The mutation Q18M increases the strength of the DNA-repressor interaction relative to the wild-type repressor with a binding energy of  − 15.43 − 0.06 + 0.07kBT, comparable to the affinity of the wild-type repressor to the native O1 operator sequence ( − 15.3 kBT). It is notable that a single amino acid substitution of the repressor is capable of changing the strength of the DNA binding interaction well beyond that of many single base-pair mutations in the operator sequence (Barnes et al. 2019).

     Using the new DNA binding energies, we can collapse all measurements of fold-change as a function of the free energy as shown in Fig. 3 (B). This allows us to test the diagnostic power of the decomposition of the free energy described in Fig. 2. To compute the ΔF for each mutation, we inferred the observed mean free energy of the mutant strain for each inducer concentration and repressor copy number (see Materials & Methods as well as the supplemental Chapter 7 for a detailed explanation of the inference). We note that in the limit of extremely low or high fold-change, the inference of the free energy is either over- or under-estimated, respectively, introducing a systematic error. Thus, points which are close to these limits are omitted in the calculation of ΔF. We direct the reader to the supplemental Chapter 7 for a detailed discussion of this systematic error. With a measure of F(mut) for each mutant at each repressor copy number, we compute the difference in free energy relative to the wild-type strain with the same repressor copy number and operator sequence, restricting all variability in ΔF solely to changes in ΔεRA.

     The change in free energy for each mutant is shown in Fig. 3 (C). It can be seen that the ΔF for each mutant is constant as a function of the inducer concentration and is concordant with the prediction generated from fitting ΔεRA to a single repressor copy number (orange lines Fig. 3 (C)]) This is in line with the predictions outlined in Fig. 2 (C) and (D), indicating that the allosteric parameters are “insulated,” meaning they are not affected by the DNA binding domain mutations. As the ΔF for all repressor copy numbers collapses onto the prediction, we can say that the expression of the repressor itself is the same or comparable with that of the wild type. If the repressor copy number were perturbed in addition to ΔεRA, one would expect a shift away from the prediction that scales logarithmically with the change in repressor copy number. However, as the ΔF is approximately the same for each repressor copy number, it can be surmised that the mutation does not significantly change the expression or folding efficiency of the repressor itself. These results allow us to state that the DNA binding energy ΔεRA is the only parameter modified by the DNA mutants examined.

Inducer Binding Domain Mutants

Much as in the case of the DNA binding mutants, we cannot safely assume a priori that a given mutation in the inducer binding domain affects only the inducer binding constants KA and KI. While it is easy to associate the inducer binding constants with the inducer binding domain, the critical parameter in our allosteric model ΔεAI is harder to restrict to a single spatial region of the protein. As KA, KI, and ΔεAI are all parameters dictating the allosteric response, we consider two hypotheses in which inducer binding mutations alter either all three parameters or only KA and KI.

     We made four point mutations within the inducer binding domain of LacI (F161T, Q291V, Q291R, and Q291K) that have been shown previously to alter binding to multiple allosteric effectors (Daber, Sharp, and Lewis 2009). In contrast to the DNA binding domain mutants, we paired the inducer binding domain mutations with the three native LacI operator sequences (which have various affinities for the repressor) and a single ribosomal binding site sequence. This ribosomal binding site sequence, as reported in Garcia and Phillips (2011), expresses the wild-type LacI repressor to an average copy number of approximately 260 per cell. As the free energy differences resulting from point mutations in the DNA binding domain can be described solely by changes to ΔεRA, we continue under the assumption that the inducer binding domain mutations do not significantly alter the repressor copy number.

Figure 4: Induction profiles and free-energy differences of inducer binding domain mutants. Open points represent the strain to which the parameters were fit — namely, the O2 operator sequence. Each column corresponds to the mutant highlighted at the top of the figure. All strains have R = 260 per cell. (A) The fold change in gene expression as a function of the inducer concentration for three operator sequences of varying strength. Dashed lines correspond to the curve of best fit resulting from fitting KA and KI alone. Shaded curves correspond to the 95% credible region of the induction profile determined from fitting KA, KI, and ΔεAI. Points correspond to the mean measurement of 6–12 biological replicates. Error bars are the SEM. (B) Points in A collapsed as a function of the free energy calculated from redetermining KA, KI, and ΔεAI. (C) Change in free energy resulting from each mutation as a function of the inducer concentration. Points correspond to the median of the posterior distribution for the free energy. Error bars represent the upper and lower bounds of the 95% credible region. Shaded curves are the predictions. IPTG concentration is shown on a symmetric log scaling axis with the linear region spanning from 0 to 10 − 2 μM and log scaling elsewhere. The Python code (ch3_fig4.py) used to generate this figure can be found on the thesis GitHub repository.

     The induction profiles for these four mutants are shown in Fig. 4 (A). Of the mutations chosen, Q291R and Q291K appear to have the most significant impact, with Q291R abolishing the characteristic sigmoidal titration curve entirely. It is notable that both Q291R and Q291K have elevated expression in the absence of inducer compared to the other two mutants paired with the same operator sequence. Panel (A) in Fig. 2 illustrates that if only KA and KI were being affected by the mutations, the fold-change should be identical for all mutants in the absence of inducer. This discrepancy in the observed leakiness immediately suggests that more than KA and KI are affected for Q291K and Q291R.

     Using a single induction profile for each mutant (shown in Fig. 4 as white-faced circles), we inferred the parameter combinations for both hypotheses and drew predictions for the induction profiles with other operator sequences. We find that the simplest hypothesis (in which only KA and KI are altered) does not permit accurate prediction of most induction profiles. These curves, shown as dotted lines in Fig. 4 (A), fail spectacularly in the case of Q291R and Q291K, and undershoot the observed profiles for F161T and Q291V, especially when paired with the weak operator sequence O3. The change in the leakiness for Q291R and Q291K is particularly evident as the expression at c = 0 should be identical to the wild-type repressor under this hypothesis. Altering only KA and KI is not sufficient to accurately predict the induction profiles for F161T and Q291V, but not to the same degree as Q291K and Q291R. The disagreement is most evident for the weakest operator O3 green lines in 4 (A), though we have discussed previously that the induction profiles for weak operators are difficult to accurately describe and can result in comparable disagreement for the wild-type repressor (Razo-Mejia et al. 2018).

     Including ΔεAI as a perturbed parameter in addition to KA and KI improves the predicted profiles for all four mutants. By fitting these three parameters to a single strain, we are able to accurately predict the induction profiles of other operators as seen by the shaded lines in Fig. 4 (A). With these modified parameters, all experimental measurements collapse as a function of their free energy as prescribed by Eq. 3 (Fig. 4 (B)). All four mutations significantly diminish the binding affinity of both states of the repressor to the inducer, as seen by the estimated parameter values reported in Table 3.1. As evident in the data alone, Q291R abrogates inducibility outright (KA ≈ KI). For Q291K, the active state of the repressor can no longer bind inducer whereas the inactive state binds with weak affinity. The remaining two mutants, Q291V and F161T, both show diminished binding affinity of the inducer to both the active and inactive states of the repressor relative to the wild-type.

Inferred values of KA, KI, and ΔεAI for inducer binding mutants
Mutant KA KI ΔεAI [kBT] Reference
WT 139 − 22 + 29μM 0.53 − 0.04 + 0.04μM 4.5 Razo-Mejia et al. (2018)
F161T 165 − 65 + 90μM 3 − 3 + 6μM 1 − 2 + 5 This study
Q291V 650 − 250 + 450μM 8 − 8 + 8μM 3 − 3 + 6 This study
Q291K  > 1 mM 310 − 60 + 70μM  − 3.11 − 0.07 + 0.07 This study
Q291R 9 − 9 + 20μM 8 − 8 + 20μM  − 2.35 − 0.09 + 0.01 This study

     Given the collection of fold-change measurements, we computed the ΔF relative to the wild-type strain with the same operator and repressor copy number. This leaves differences in pact(c) as the sole contributor to the free energy difference, assuming our hypothesis that KA, KI, and ΔεAI are the only perturbed parameters is correct. The change in free energy can be seen in Fig. 4 (C). For all mutants, the free energy difference inferred from the observed fold-change measurements falls within error of the predictions generated under the hypothesis that KA, KI, and ΔεAI are all affected by the mutation (shaded curves in Fig. 4 (C)). The profile of the free energy change exhibits some of the rich phenomenology illustrated in Fig. 2 (A) and (B). Q291K, F161T, and Q291V exhibit a non-monotonic dependence on the inducer concentration, a feature that can only appear when KA and KI are altered. The non-zero ΔF at c = 0 for Q291R and Q291K coupled with an inducer concentration dependence is a telling sign that ΔεAI must be significantly modified. This shift in ΔF is positive in all cases, indicating that ΔεAI must have decreased, and that the inactive state has become more energetically favorable for these mutants than for the wild-type protein. Indeed the estimates for ΔεAI (Table 3.1) reveal both mutations Q291R and Q291K make the inactive state more favorable than the active state. Thus, for these two mutations, only  ≈ 10% of the repressors are active in the absence of inducer, whereas the basal active fraction is  ≈ 99% for the wild-type repressor (Razo-Mejia et al. 2018).

     We note that the parameter values reported here disagree with those reported in Daber, Sochor, and Lewis (2011). This disagreement stems from different assumptions regarding the residual activity of the repressor in the absence of inducer and the parametric degeneracy of the MWC model without a concrete independent measure of ΔεAI. A detailed discussion of the difference in parameter values between our previous work (Garcia and Phillips 2011; Razo-Mejia et al. 2018), that of Daber, Sochor, and Lewis (2011), and those of other seminal works can be found in the supplemental Chapter 7.

     Taken together, these parametric changes diminish the response of the regulatory architecture as a whole to changing inducer concentrations. They furthermore reveal that the parameters which govern the allosteric response are interdependent and no single parameter is insulated from the others. However, as only the allosteric parameters are changed, one can say that the allosteric parameters as a whole are insulated from the other components which define the regulatory response, such as repressor copy number and DNA binding affinity.

Predicting Effects of Pairwise Double Mutations

Given full knowledge of each mutation individually, we can draw predictions of the behavior of the pairwise double mutants with no free parameters based on the simplest null hypothesis of no epistasis. The formalism of ΔF defined by Eq. 5 explicitly states that the contribution to the free energy of the system from the difference in DNA binding energy and the allosteric parameters are strictly additive. Thus, deviations from the predicted change in free energy would suggest epistatic interactions between the two mutations.

      To test this additive model, we constructed nine double mutant strains, each having a unique inducer binding (F161T, Q291V, Q291K) and DNA binding mutation (Y17I, Q18A, Q18M). To make predictions with an appropriate representation of the uncertainty, we computed a large array of induction profiles given random draws from the posterior distribution for the DNA binding energy (determined from the single DNA binding mutants) as well as from the joint posterior for the allosteric parameters (determined from the single inducer binding mutants). These predictions, shown in Fig. 5 (A) and (B) as shaded blue curves, capture all experimental measurements of the fold-change (Fig. 5 (A)) and the inferred difference in free energy (Fig. 5 (B)). The latter indicates that there are no epistatic interactions between the mutations queried in this work, though if there were, systematic deviations from these predictions would shed light on how the epistasis is manifested.

     The precise agreement between the predictions and measurements for Q291K paired with either Q18A or Q18M is striking as Q291K drastically changed ΔεAI in addition to KA and KI. Our ability to predict the induction profile and free energy change underscores the extent to which the DNA binding energy and the allosteric parameters are insulated from one another. Despite this insulation, the repressor still functions as an allosteric molecule, emphasizing that the mutations we have inserted do not alter the pathway of communication between the two domains of the protein. As the double mutant Y17I-Q291K exhibits fold-change of approximately 1 across all IPTG concentrations (Fig. 5 (A)), these mutations in tandem make repression so weak that it is beyond the limits which are detectable by our experiments. As a consequence, we are unable to estimate ΔF nor experimentally verify the corresponding prediction (grey box in Fig. 5 (B)). However, as the predicted fold-change in gene expression is also approximately 1 for all c, we believe that the prediction shown for ΔF is likely accurate. One would be able to infer the ΔF to confirm these predictions using a more sensitive method for measuring the fold-change, such as single-cell microscopy or colorimetric assays.

Figure 5: Induction and free-energy profiles of DNA binding and inducer binding double mutants. (A) Fold change in gene expression for each double mutant as a function of IPTG. Points and errors correspond to the mean and standard error of 6–10 biological replicates. Where not visible, error bars are smaller than the corresponding marker. Shaded regions correspond to the 95% credible region of the prediction given knowledge of the single mutants. These were generated by drawing 104 samples from the ΔεRA posterior distribution of the single DNA binding domain mutants and the joint probability distribution of KA, KI, and ΔεAI from the single inducer binding domain mutants. (B) The difference in free energy of each double mutant as a function of the reference free energy. Points and errors correspond to the median and bounds of the 95% credible region of the posterior distribution for the inferred ΔF. Shaded regions are the predicted change in free energy, generated in the same manner as the shaded lines in (A). All measurements were taken from a strain with 260 repressors per cell paired with a reporter with the native O2 LacI operator sequence. In all plots, the IPTG concentration is shown on a symmetric log axis with linear scaling between 0 and 10 − 2μM and log scaling elsewhere. The Python code (ch3_fig5.py) used to generate this figure can be found on the thesis GitHub repository.

Discussion

Allosteric regulation is often couched as “biological action at a distance.” Despite extensive knowledge of protein structure and function, it remains difficult to translate the coordinates of the atomic constituents of a protein to the precise parameter values which define the functional response, making each mutant its own intellectual adventure. Bioinformatic approaches to understanding the sequence-structure relationship have permitted us to examine how the residues of allosteric proteins evolve, revealing conserved regions which hint to their function. Co-evolving residues reveal sectors of conserved interactions which traverse the protein that act as the allosteric communication channel between domains (Süel et al. 2003; McLaughlin Jr et al. 2012; Reynolds, McLaughlin, and Ranganathan 2011). Elucidating these sectors has advanced our understanding of how distinct domains “talk” to one another and has permitted direct engineering of allosteric responses into non-allosteric enzymes (Poelwijk et al. 2011; Raman, White, and Ranganathan 2016). Even so, we are left without a quantitative understanding of how these admittedly complex networks set the energetic difference between active and inactive states or how a given mutation influences binding affinity. In this context, a biophysical model in which the various parameters are intimately connected to the molecular details can be of use and can lead to quantitative predictions of the interplay between amino-acid identity and system-level response.

     By considering how each parameter contributes to the observed change in free energy, we are able to tease out different classes of parameter perturbations which result in stereotyped responses to changing inducer concentration. These characteristic changes to the free energy can be used as a diagnostic tool to classify mutational effects. For example, we show in Fig. 2 that modulating the inducer binding constants KA and KI results in non-monotonic free energy changes that are dependent on the inducer concentration, a feature observed in the inducer binding mutants examined in this work. Simply looking at the inferred ΔF as a function of inducer concentration, which requires no fitting of the biophysical parameters, indicates that KA and KI must be modified considering those are the only parameters which can generate such a response.

     Another key observation is that a perturbation to only KA and KI requires that the ΔF = 0 kBT at c = 0. Deviations from this condition imply that more than the inducer binding constants must have changed. If this shift in ΔF off of 0 kBT at c = 0 is not constant across all inducer concentrations, we can surmise that the energy difference between the allosteric states ΔεAI must also be modified. We again see this effect for all of our inducer mutants. By examining the inferred ΔF, we can immediately say that in addition to KA and KI, ΔεAI must decrease relative to the wild-type value as ΔF > 0 at c = 0. When the allosteric parameters are fit to the induction profiles, we indeed see that this is the case, with all four mutations decreasing the energy gap between the active and inactive states. Two of these mutations, Q291R and Q291K, make the inactive state of the repressor more stable than the active state, which is not the case for the wild-type repressor (Razo-Mejia et al. 2018).

      Our formulation of ΔF indicates that shifts away from 0 kBT that are independent of the inducer concentration can only arise from changes to the repressor copy number and/or DNA binding specificity, indicating that the allosteric parameters are untouched. We see that for three mutations in the DNA binding domain, ΔF is the same irrespective of the inducer concentration. Measurements of ΔF for these mutants with repressor copy numbers across three orders of magnitude yield approximately the same value, revealing that ΔεRA is the sole parameter altered via the mutations.

     We note that the conclusions stated above can be qualitatively drawn without resorting to fitting various parameters and measuring the goodness-of-fit. Rather, the distinct behavior of ΔF is sufficient to determine which parameters are changing. Here, these conclusions are quantitatively confirmed by fitting these parameters to the induction profile, which results in accurate predictions of the fold-change and ΔF for nearly every strain across different mutations, repressor copy numbers, and operator sequence, all at different inducer concentrations. With a collection of evidence as to what parameters are changing for single mutations, we put our model to the test and drew predictions of how double mutants would behave both in terms of the titration curve and free energy profile.

     A hypothesis that arises from our formulation of ΔF is that a simple summation of the energetic contribution of each mutation should be sufficient to predict the double mutants (as long as they are in separate domains). We find that such a calculation permits precise and accurate predictions of the double mutant phenotypes, indicating that there are no epistatic interactions between the mutations examined in this work. With an expectation of what the free energy differences should be, epistatic interactions could be understood by looking at how the measurements deviate from the prediction. For example, if epistatic interactions exist which appear as a systematic shift from the predicted ΔF independent of inducer concentration, one could conclude that DNA binding energy is not equal to that of the single mutation in the DNA binding domain alone. Similarly, systematic shifts that are dependent on the inducer concentration (i.e. not constant) indicate that the allosteric parameters must be influenced. If the expected difference in free energy is equal to 0 kBT when c = 0, one could surmise that the modified parameter must not be ΔεAI nor ΔεRA as these would both result in a shift in leakiness, indicating that KA and KI are further modified.

     Ultimately, we present this work as a proof-of-principle for using biophysical models to investigate how mutations influence the response of allosteric systems. We emphasize that such a treatment allows one to boil down the complex phenotypic responses of these systems to a single-parameter description which is easily interpretable as a free energy. The general utility of this approach is illustrated in Fig. 6 where gene expression data from previous work along with all of the measurements presented in this work collapse onto the master curve defined by Eq. 3. While our model coarse grains many of the intricate details of transcriptional regulation into two states (one in which the repressor is bound to the promoter and one where it is not), it is sufficient to describe a swath of regulatory scenarios. As discussed in the supplemental Chapter 7, any architecture in which the transcription-factor bound and transcriptionally active states of the promoter can be separated into two distinct coarse-grained states can be subjected to such an analysis.

     Given enough parametric knowledge of the system, it becomes possible to examine how modifications to the parameters move the physiological response along this reduced one-dimensional parameter space. This approach offers a glimpse at how mutational effects can be described in terms of energy rather than Hill coefficients and arbitrary prefactors. While we have explored a very small region of sequence space in this work, coupling of this approach with high-throughput sequencing-based methods to query a library of mutations within the protein will shed light on the phenotypic landscape centered at the wild-type sequence. Furthermore, pairing libraries of protein and operator sequence mutants will provide insight as to how the protein and regulatory sequence coevolve, a topic rich with opportunity for a dialogue between theory and experiment.

Figure 6: Data collapse of the simple repression regulatory architecture. All data are means of biological replicates. Where present, error bars correspond to the standard error of the mean of five to fifteen biological replicates. Red triangles indicate data from Garcia and Phillips (2011) obtained by colorimetric assays. Blue squares are data from Brewster et al. (2014) acquired from video microscopy. Green circles are data from Razo-Mejia et al. (2018). obtained via flow cytometry. All other symbols correspond to the work presented here. An interactive version of this figure can be found on the website associated with the publication of this chapter where the different data sets can be viewed in more detail. The Python code (ch3_fig6.py) used to generate this figure can be found on the thesis GitHub repository.

Materials & Methods

Bacterial Strains and DNA Constructs

All wild-type strains from which the mutants were derived were generated in previous work from the Phillips group (Garcia and Phillips 2011; Razo-Mejia et al. 2018). Briefly, mutations were first introduced into the lacI gene of our pZS3*1-lacI plasmid (Garcia and Phillips 2011) using a combination of overhang PCR Gibson assembly as well as QuickChange mutagenesis (Agligent Technologies). The oligonucleotide sequences used to generate each mutant as well as the method are provided in the supplemental Chapter 7.

     For mutants generated through overhang PCR and Gibson assembly, oligonucleotide primers were purchased containing an overhang with the desired mutation and used to amplify the entire plasmid. Using the homology of the primer overhang, Gibson assembly was performed to circularize the DNA prior to electroporation into MG1655 E. coli cells. Integration of LacI mutants was performed with λ Red recombineering as described in Sharan et al. (2009) and Garcia and Phillips (2011).

      The mutants studied in this work were chosen from data reported in Daber, Sochor, and Lewis (2011). In selecting mutations, we looked for mutants which suggested moderate to strong deviations from the behavior of the wild-type repressor. We note that the variant of LacI used in this work has an additional three amino acids (Met-Val-Asn) added to the N-terminus than the canonical LacI sequence reported in Farabaugh (1978). To remain consistent with the field, we have identified the mutations with respect to their positions in the canonical sequence and those in Daber, Sochor, and Lewis (2011). However, their positions in the raw data files correspond to that of our LacI variant and is noted in the README files associated with the data.

Flow Cytometry

All fold-change measurements were performed on a MACSQuant flow cytometer as described in Razo-Mejia et al. (2018). Briefly, saturated overnight cultures 500 μL in volume were grown in deep-well 96-well plates covered with a breathable nylon cover (Lab Pak - Nitex Nylon, Sefar America, Cat. No. 241205). After approximately 12 to 15 hr, the cultures reached saturation and were diluted 1000-fold into a second 2 mL 96-deep-well plate where each well contained 500 μL of M9 minimal media supplemented with 0.5% w/v glucose (anhydrous D-Glucose, Macron Chemicals) and the appropriate concentration of IPTG (Isopropyl β-D-1-thiogalactopyranoside, Dioxane Free, Research Products International). These were sealed with a breathable cover and were allowed to grow for approximately 8 hours until the OD600nm ≈ 0.3. Cells were then diluted ten-fold into a round-bottom 96-well plate (Corning Cat. No. 3365) containing 90 μL of M9 minimal media supplemented with 0.5% w/v glucose along with the corresponding IPTG concentrations.

     The flow cytometer was calibrated prior to use with MACSQuant Calibration Beads (Cat. No. 130-093-607). During measurement, the cultures were held at approximately 4 C by placing the 96-well plate on a MACSQuant ice block. All fluorescence measurements were made using a 488 nm excitation wavelength with a 525/50 nm emission filter. The photomultiplier tube voltage settings for the instrument are the same as those used in Razo-Mejia et al. (2018), and are listed in supplemental Chapter 6.

     The data were processed using an automatic unsupervised gating procedure based on the front and side-scattering values, where we fit a two-dimensional Gaussian function to the log10 forward-scattering (FSC) and the log10 side-scattering (SSC) data. Here we assume that the region with highest density of points in these two channels corresponds to single-cell measurements and consider data points that fall within 40% of the highest density region of the two-dimensional Gaussian function. We direct the reader to Razo-Mejia et al. (2018) and supplemental Chapter 6 for further detail and comparison of flow cytometry with single-cell microscopy.

Bayesian Parameter Estimation

We used a Bayesian definition of probability in the statistical analysis of all mutants in this work. In supplemental Chapter 7, we derive in detail the statistical models used for the various parameters as well as multiple diagnostic tests. Here, we give a generic description of our approach. To be succinct in notation, we consider a generic parameter θ which represents ΔεRA, KA, KI, and/or ΔεAI depending on the specific LacI mutant.

     As prescribed by Bayes’ theorem, we are interested in the posterior probability distribution
g(θ | y) ∝ f(y | θ)g(θ),   (7)
where we use g and f to represent probability densities over parameters and data, respectively, and y to represent a set of fold-change measurements. The likelihood of observing our dataset y given a value of θ is captured by f(y | θ). All prior information we have about the possible values of θ are described by g(θ).

     In all inferential models used in this work, we assumed that all experimental measurements at a given inducer concentration were normally distributed about a mean value μ dictated by Eq. 1 with a variance σ2,
$$ f(y\,\vert\, \theta) = {1 \over (2\pi\sigma^2)^{N/2}}\prod\limits_i^N \exp\left[-{(y_i - \mu(\theta))^2 \over 2\sigma^2}\right], \qquad(8)$$
where N is the number of measurements in the data set y.

     This choice of likelihood is justified as each individual measurement at a given inducer concentration is a biological replicate and independent of all other experiments. By using a Gaussian likelihood, we introduce another parameter σ. As σ must be positive and greater than zero, we define as a prior distribution a half-normal distribution with a standard deviation ϕ,
$$ g(\sigma) = { {1 \over \phi}\sqrt{2 \over \pi} }\exp\left[-{x \over 2\phi^2}\right]\,;\, x \geq 0, \qquad(9)$$
where x is a given range of values for σ. A standard deviation of ϕ = 0.1 was chosen given our knowledge of the scale of our measurement error from other experiments. As the absolute measurement of fold-change is restricted between 0 and 1, and given our knowledge of the sensitivity of the experiment, it is reasonable to assume that the error will be closer to 0 than to 1. Further justification of this choice of prior through simulation based methods are given in the supplemental Chapter 7. The prior distribution for θ is dependent on the parameter and its associated physical and physiological restrictions. Detailed discussion of our chosen prior distributions for each model can also be found in the supplemental Chapter 7.

     All statistical modeling and parameter inference was performed using Markov chain Monte Carlo (MCMC). Specifically, Hamiltonian Monte Carlo sampling was used as is implemented in the Stan probabilistic programming language (Carpenter et al. 2017). All statistical models saved as .stan models and can be accessed at the GitHub repository associated with this work (DOI: 10.5281/zenodo.2721798) or can be downloaded directly from the website associated with the publication this chapter is based on.

Inference of Free Energy From Fold-Change Data

While the fold-change in gene expression is restricted to be between 0 and 1, experimental noise can generate fold-change measurements beyond these bounds. To determine the free energy for a given set of fold-change measurements (for one unique strain at a single inducer concentration), we modeled the observed fold-change measurements as being drawn from a normal distribution with a mean μ and standard deviation σ. Using Bayes’ theorem, we can write the posterior distribution as
$$ g(\mu, \sigma\,\vert y) \propto g(\mu)g(\sigma){1\over(2\pi\sigma^2)^{N/2}}\prod\limits_i^N \exp\left[{-(y_i - \mu)^2 \over 2\sigma^2}\right] \qquad(10)$$
where y is a collection of fold-change measurements. The prior distribution for μ was chosen to be uniform between 0 and 1 while the prior on σ was chosen to be half normal, as written in Eq. 9. The posterior distribution was sampled independently for each set of fold-change measurements using MCMC. The .stan model for this inference is available on the paper website.

     For each MCMC sample of μ, the free energy was calculated as
F =  − log (μ − 1−1)   (11)
which is simply the rearrangement of Eq. 3. Using simulated data, we determined that when μ < σ or (1 − μ) < σ, the mean fold-change in gene expression was over or underestimated for the lower and upper limit, respectively. This means that there are maximum and minimum levels of fold-changes that can be detected using flow cytometry which are set by the distribution of fold-change measurements resulting from various sources of day-to-day variation. This results in a systematic error in the calculation of the free energy, making proper inference beyond these limits difficult. This bounds the range in which we can confidently infer this quantity with flow cytometry. We hypothesize that more sensitive methods, such as single cell microscopy, colorimetric assays, or direct counting of mRNA transcripts via Fluorescence In Situ Hybridization (FISH) would improve the measurement of ΔF. We further discuss details of this limitation in the supplemental Chapter 7.

Data and Code Availability

All data was collected, stored, and preserved using the Git version control software. Code for data processing, analysis, and figure generation is available on the [GitHub repository] (https://www.github.com/rpgroup-pboc/mwc_mutants) or can be accessed via the paper website. Raw flow cytometry data is stored on the CaltechDATA data repository and can be accessed via DOI 10.22002/D1.1241.

References

Ackers, Gary K, and Alexander D Johnson. 1982. “Quantitative Model for Gene Regulation by A Phage Repressor.” Proceedings of the National Academy of Sciences 79: 1129–33.

Barnes, Stephanie L., Nathan M. Belliveau, William T. Ireland, Justin B. Kinney, and Rob Phillips. 2019. “Mapping DNA Sequence to Transcription Factor Binding Energy in Vivo.” Edited by Gary D. Stormo. PLOS Computational Biology 15 (2). https://doi.org/10.1371/journal.pcbi.1006226.

Bintu, Lacramioara, Nicolas E Buchler, Hernan G Garcia, Ulrich Gerland, Terence Hwa, Jané Kondev, Thomas Kuhlman, and Rob Phillips. 2005. “Transcriptional Regulation by the Numbers: Applications.” Current Opinion in Genetics & Development 15 (2): 125–35. https://doi.org/10.1016/j.gde.2005.02.006.

Bintu, Lacramioara, Nicolas E Buchler, Hernan G Garcia, Ulrich Gerland, Terence Hwa, Jané Kondev, and Rob Phillips. 2005. “Transcriptional Regulation by the Numbers: Models.” Current Opinion in Genetics & Development 15 (2): 116–24. https://doi.org/10.1016/j.gde.2005.02.007.

Brewster, Robert C., Franz M. Weinert, Hernan G. Garcia, Dan Song, Mattias Rydenfelt, and Rob Phillips. 2014. “The Transcription Factor Titration Effect Dictates the Level of Gene Expression.” Cell 156 (6): 1312–23. https://doi.org/10.1016/j.cell.2014.02.022.

Buchler, N. E., U. Gerland, and T. Hwa. 2003. “On Schemes of Combinatorial Transcription Logic.” Proceedings of the National Academy of Sciences 100 (9): 5136–41. https://doi.org/10.1073/pnas.0930314100.

Carpenter, Bob, Andrew Gelman, Matthew D. Hoffman, Daniel Lee, Ben Goodrich, Michael Betancourt, Marcus Brubaker, Jiqiang Guo, Peter Li, and Allen Riddell. 2017. “Stan: A Probabilistic Programming Language.” Journal of Statistical Software 76 (1): 1–32. https://doi.org/10.18637/jss.v076.i01.

Daber, Robert, Kim Sharp, and Mitchell Lewis. 2009. “One Is Not Enough.” Journal of Molecular Biology 392 (5): 1133–44. https://doi.org/10.1016/j.jmb.2009.07.050.

Daber, Robert, Matthew A. Sochor, and Mitchell Lewis. 2011. “Thermodynamic Analysis of Mutant Lac Repressors.” Journal of Molecular Biology, The Operon Model and its Impact on Modern Molecular Biology, 409 (1): 76–87. https://doi.org/10.1016/j.jmb.2011.03.057.

Farabaugh, Phillip J. 1978. “Sequence of the lacI Gene.” Nature 274 (24): 5.

Frumkin, Idan, Marc J. Lajoie, Christopher J. Gregg, Gil Hornung, George M. Church, and Yitzhak Pilpel. 2018. “Codon Usage of Highly Expressed Genes Affects Proteome-Wide Translation Efficiency.” Proceedings of the National Academy of Sciences 115 (21): E4940–E4949. https://doi.org/10.1073/pnas.1719375115.

Garcia, H. G., and R. Phillips. 2011. “Quantitative Dissection of the Simple Repression Input-Output Function.” Proceedings of the National Academy of Sciences 108 (29): 12173–8. https://doi.org/10.1073/pnas.1015616108.

Keymer, Juan E., Robert G. Endres, Monica Skoge, Yigal Meir, and Ned S. Wingreen. 2006. “Chemosensing in Escherichia Coli: Two Regimes of Two-State Receptors.” Proceedings of the National Academy of Sciences 103 (6): 1786–91. https://doi.org/10.1073/pnas.0507438103.

Kuhlman, T., Z. Zhang, M. H. Saier, and T. Hwa. 2007. “Combinatorial Transcriptional Control of the Lactose Operon of Escherichia Coli.” Proceedings of the National Academy of Sciences 104 (14): 6043–8. https://doi.org/10.1073/pnas.0606717104.

Lewis, Mitchell, Geoffrey Chang, Nancy C. Horton, Michele A. Kercher, Helen C. Pace, Maria A. Schumacher, Richard G. Brennan, and Ponzy Lu. 1996. “Crystal Structure of the Lactose Operon Repressor and Its Complexes with DNA and Inducer.” Science 271 (5253): 1247–54.

McLaughlin Jr, Richard N., Frank J. Poelwijk, Arjun Raman, Walraj S. Gosal, and Rama Ranganathan. 2012. “The Spatial Architecture of Protein Function and Adaptation.” Nature 491 (7422): 138–42. https://doi.org/10.1038/nature11500.

Phillips, Rob. 2015. “Napoleon Is in Equilibrium.” Annual Review of Condensed Matter Physics 6 (1): 85–111. https://doi.org/10.1146/annurev-conmatphys-031214-014558.

Poelwijk, Frank J., Philip D. Heyning, Marjon GJ de Vos, Daniel J. Kiviet, and Sander J. Tans. 2011. “Optimality and Evolution of Transcriptionally Regulated Gene Expression.” BMC Systems Biology 5 (August): 128. https://doi.org/10.1186/1752-0509-5-128.

Raman, Arjun S., K. Ian White, and Rama Ranganathan. 2016. “Origins of Allostery and Evolvability in Proteins: A Case Study.” Cell 166 (2): 468–80. https://doi.org/10.1016/j.cell.2016.05.047.

Razo-Mejia, Manuel, Stephanie L. Barnes, Nathan M. Belliveau, Griffin Chure, Tal Einav, Mitchell Lewis, and Rob Phillips. 2018. “Tuning Transcriptional Regulation Through Signaling: A Predictive Theory of Allosteric Induction.” Cell Systems 6 (4): 456–469.e10. https://doi.org/10.1016/j.cels.2018.02.004.

Razo-Mejia, M., J. Q. Boedicker, D. Jones, A. DeLuna, J. B. Kinney, and R. Phillips. 2014. “Comparison of the Theoretical and Real-World Evolutionary Potential of a Genetic Circuit.” Physical Biology 11 (2): 026005. https://doi.org/10.1088/1478-3975/11/2/026005.

Reynolds, Kimberly A., Richard N. McLaughlin, and Rama Ranganathan. 2011. “Hot Spots for Allosteric Regulation on Protein Surfaces.” Cell 147 (7): 1564–75. https://doi.org/10.1016/j.cell.2011.10.049.

Rydenfelt, Mattias, Robert Sidney Cox, Hernan Garcia, and Rob Phillips. 2014. “Statistical Mechanical Model of Coupled Transcription from Multiple Promoters Due to Transcription Factor Titration.” Physical Review E 89 (1): 012702. https://doi.org/10.1103/PhysRevE.89.012702.

Sharan, S. K., L. C. Thomason, S. G. Kuznetsov, and D. L. Court. 2009. “Recombineering: A Homologous Recombination-Based Method of Genetic Engineering.” Nature Protocols 4: 206–23. https://doi.org/nprot.2008.227\\\%0020[pii]\\\%002010.1038/nprot.2008.227.

Süel, Gürol M., Steve W. Lockless, Mark A. Wall, and Rama Ranganathan. 2003. “Evolutionarily Conserved Networks of Residues Mediate Allosteric Communication in Proteins.” Nature Structural Biology 10 (1): 59–69. https://doi.org/10.1038/nsb881.

Swem, Lee R., Danielle L. Swem, Ned S. Wingreen, and Bonnie L. Bassler. 2008. “Deducing Receptor Signaling Parameters from in Vivo Analysis: LuxN/AI-1 Quorum Sensing in Vibrio Harveyi.” Cell 134 (3): 461–73. https://doi.org/10.1016/j.cell.2008.06.023.

Vilar, José M. G., and Stanislas Leibler. 2003. “DNA Looping and Physical Constraints on Transcription Regulation.” Journal of Molecular Biology 331 (5): 981–89. https://doi.org/10.1016/S0022-2836(03)00764-2.

Weinert, Franz M., Robert C. Brewster, Mattias Rydenfelt, Rob Phillips, and Willem K. Kegel. 2014. “Scaling of Gene Expression with Transcription-Factor Fugacity.” Physical Review Letters 113 (25). https://doi.org/10.1103/PhysRevLett.113.258101.