Chapter VI

Published as …

A version of this chapter originally appeared as Razo-Mejia, M., Barnes, S.L., Belliveau, N.M., Chure, G., Einav, T.*, Lewis, M., and Phillips, R. (2018). Tuning transcriptional regulation through signaling: A predictive theory of allosteric induction. Cell Systems 6, 456-469.e10. DOI:https://doi.org/10.1016/j.cels.2018.02.004. M.R.M, S.L.B, N.M.B, G.C., and T.E. contributed equally to this work from the theoretical underpinnings to the experimental design and execution. M.R.M, S.L.B, N.M.B, G.C, T.E., and R.P. wrote the paper. M.L. provided extensive guidance and advice.

Inferring Allosteric Parameters from Previous Data

The fold-change function derived in Chapter 2 features three unknown parameters K_A, K_I, and Δε_AI. In this section, we explore different conceptual approaches to determining these parameters. We first discuss how the induction titration profile of the simple repression constructs used in this paper are not sufficient to determine all three MWC parameters simultaneously, since multiple degenerate sets of parameters can produce the same fold-change response. We then utilize an additional data set from Brewster et al. (2014) to determine the parameter Δε_AI = 4.5 k_BT, after which the remaining parameters K_A and K_I can be extracted from any induction profile with no further degeneracy.

Degenerate Parameter Values

In this section, we discuss how multiple sets of parameters may yield identical fold-change profiles. More precisely, we show that if we try to fit the data into the fold-change and extract the three unknown parameters (K_A, K_I, and Δε_AI), then multiple degenerate parameter sets would yield equally good fits. In other words, this data set alone is insufficient to uniquely determine the actual physical parameter values of the system. This problem persists even when fitting multiple data sets simultaneously as illustrated later in this chapter.

In Fig. 1 we fit the R = 260 data by fixing Δε_AI to the value shown on the x-axis and determine the parameters K_A and K_I given this constraint. We use the fold-change function, but with βΔε_RA modified to the form βΔε̃_RA to account for the underlying assumptions used when fitting previous data (as is defined in the following section).

The best-fit curves for several different values of Δε_AI are shown in Fig. 1 (B). Note that these fold-change curves are nearly overlapping, demonstrating that different sets of parameters can yield nearly equivalent responses. Without more data, the relationships between the parameter values represent the maximum information about the parameter values that can be extracted from the data. Additional experiments which independently measure any of these unknown parameters could resolve this degeneracy. For example, NMR measurements could be used to directly measure the fraction (1 + e^{− βΔε_AI})^− 1 of active repressors in the absence of IPTG (Gardino et al. 2003; Boulton and Melacini 2016).

Figure 1: **Multiple sets of parameters yield identical fold-change responses.** (A) The data for the O2 strain (Δε_RA = − 13.9 k_BT) with R = 260 Fig. 2.5(C) was fit using Eq. 2.5 with n = 2. The allosteric energy difference Δε_AI is forced to take on the value shown on the x−axis, while K_A and K_I are fit freely. (B) The resulting best-fit functions for several values of Δε_AI yield nearly identical fold-change responses. The Python code (`ch6_figS1.py`) used to generate this figure can be found on the thesis GitHub repository.

Computing Δε_AI

As shown in the previous section, the fold-change response of a single strain is not sufficient to determine the three MWC parameters (K_A, K_I, and Δε_AI), since degenerate sets of parameters yield nearly identical fold-change responses. To circumvent this degeneracy, we now turn to some previous data from the lac system in order to determine the value of Δε_AI for the induction of the Lac repressor. Specifically, we consider two previous sets of work from (i) Garcia et al. (2011 a) and (ii) Brewster et al. (2014), both of which measured fold-change with the same simple repression system in the absence of inducer (c = 0), but at various repressor copy numbers R. The original analysis for both data sets assumed that in the absence of inducer, all of the Lac repressors were in the active state. As a result, the effective binding energies they extracted were a convolution of the DNA binding energy Δε_RA and the allosteric energy difference Δε_AI between the Lac repressor’s active and inactive states. We refer to this convoluted energy value as Δε̃_RA. We first disentangle the relationship between these parameters in Garcia and Phillips and then use this relationship to extract the value of Δε_AI from Brewster et al. (2014).

Garcia and Phillips determined the total repressor copy numbers R of different strains using quantitative Western blots. Then they measured the fold-change at these repressor copy numbers for simple repression constructs carrying the O1, O2, O3, and Oid lac operators integrated into the chromosome. These data were then fit to the following thermodynamic model to determine the repressor-DNA binding energies Δε̃_RA for each operator,
$$ \text{fold-change}(c=0) = \left( 1+\frac{R}{N_{NS}}e^{-\beta \Delta\tilde{\varepsilon}_{RA}} \right)^{-1}. \qquad(1)$$
Note that this functional form does not exactly match our fold-change in the limit c = 0,
$$ \text{fold-change}(c=0) = \left( 1+\frac{1}{1+e^{-\beta \Delta \varepsilon_{AI}}}\frac{R}{N_{NS}}e^{-\beta \Delta\varepsilon_{RA}} \right)^{-1}, \qquad(2)$$
since it is missing the factor $\frac{1}{1+e^{-\beta \Delta\varepsilon_{AI}}}$ which specifies what fraction of repressors are in the active state in the absence of inducer,
$$ \frac{1}{1+e^{-\beta \Delta\varepsilon_{AI}}} = p_A(0). \qquad(3)$$
In other words, Garcia et al. (2011 a) assumed that in the absence of inducer, all repressors were active. In terms of our notation, the convoluted energy values Δε̃_RA extracted by Garcia and Phillips (namely, Δε̃_RA = − 15.3 k_BT for O1 and Δε̃_RA = − 17.0 k_BT for Oid) represent
$$ \beta \Delta\tilde{\varepsilon}_{RA} = \beta \Delta\varepsilon_{RA} - \log \left( \frac{1}{1 + e^{-\beta \Delta \varepsilon_{AI}}} \right). \qquad(4)$$
Note that if e^{− βΔε_AI} ≪ 1, then nearly all of the repressors are active in the absence of inducer so that Δε̃_RA ≈ Δε_RA. In simple repression systems where we definitively know the value of Δε_RA and R, we can use Eq. 4 to determine the value of Δε_AI by comparing with experimentally determined fold-change values. However, the binding energy values that we use from Garcia et al. (2011 a) are effective parameters Δε̃_RA. In this case, we are faced with an undetermined system in which we have more variables than equations, and we are thus unable to determine the value of Δε_AI. In order to obtain this parameter, we must turn to a more complex regulatory scenario which provides additional constraints that allow us to fit for Δε_AI.

A variation on simple repression in which multiple copies of the promoter are available for repressor binding (for instance, when the simple repression construct is on plasmid) can be used to circumvent the problems that arise when using Δε̃_RA. This is because the behavior of the system is distinctly different when the number of active repressors p_A(0)R is less than or greater than the number of available promoters N. Repression data for plasmids with known copy number N allows us to perform a fit for the value of Δε_AI.

To obtain an expression for a system with multiple promoters N, we follow Weinert et al. (2014), writing the fold-change in terms of the the grand canonical ensemble as
$$ \text{fold-change} = \frac{1}{1 + \lambda_r e^{-\beta \Delta \varepsilon_{RA}}}, \qquad(5)$$
where λ_r = e^βμ is the fugacity and μ is the chemical potential of the repressor. The fugacity will enable us to easily enumerate the possible states available to the repressor.

To determine the value of λ_r, we first consider that the total number of repressors in the system, R_tot, is fixed and given by
R_tot = R_S + R_NS, (6)

where R_S represents the number of repressors specifically bound to the promoter and R_NS represents the number of repressors nonspecifically bound throughout the genome. The value of R_S is given by
$$ R_S = N \frac{\lambda_r e^{-\beta \Delta \varepsilon_{RA}}}{1 + \lambda_r e^{-\beta \Delta \varepsilon_{RA}}}, \qquad(7)$$

where N is the number of available promoters in the cell. Note that in counting N, we do not distinguish between promoters that are on plasmid or chromosomaly integrated provided that they both have the same repressor-operator binding energy (Weinert et al. 2014). The value of R_NS is similarly given by
$$ R_{NS} = N_{NS} \frac{\lambda_r}{1 + \lambda_r}, \qquad(8)$$
where N_NS is the number of non-specific sites in the cell (recall that we use N_NS = 4.6 × 10⁶ for E. coli). Substitution yields the form
$$ p_A(0) R_{\text{tot}} = \frac{1}{1 + e^{-\beta \Delta \varepsilon_{AI}}} \left( N \frac{\lambda_r e^{-\beta \Delta \varepsilon_{RA}}}{1 + \lambda_r e^{-\beta \Delta \varepsilon_{RA}}} + N_{NS} \frac{\lambda_r}{1 + \lambda_r} \right), \qquad(9)$$
where we recall from Eq. 4 that $\beta \Delta \varepsilon_{RA} = \beta \Delta \tilde\varepsilon_{RA} + \log{\left(\frac{1}{1 + e^{-\beta \Delta \varepsilon_{AI}}}\right)}.$ Numerically solving for λ_r yields a fold-change function in which the only unknown parameter is Δε_AI.

With these calculations in hand, we can now determine the value of the Δε_AI parameter. Fig. 2 shows how different values of Δε_AI lead to significantly different fold-change response curves. Thus, analyzing the specific fold-change response of any strain with a known plasmid copy number N will fix Δε_AI. Interestingly, the inflection point occurs near p_A(0)R_tot = N (as shown by the triangles in Fig. 2), so that merely knowing where the fold-change response transitions from concave down to concave up is sufficient to obtain a rough value for Δε_AI. We note, however, that for Δε_AI ≥ 5 k_BT, increasing Δε_AI further does not affect the fold-change because essentially every repressor will be in the active state in this regime. Thus, if the Δε_AI is in this regime, we can only bound it from below.

We now analyze experimental induction data for different strains with known plasmid copy numbers to determine Δε_AI. Fig. 2 (B) shows experimental measurements of fold-change for two O1 promoters with N = 64 and N = 52 copy numbers and one Oid promoter with N = 10 from Brewster et al. (2014). By fitting these data to Eq. 5, we extracted the parameter value Δε_AI = 4.5 k_BT. Substituting this value into Eq. 3 shows that 99% of the repressors are in the active state in the absence of inducer and Δε̃_RA ≈ Δε_RA, so that all of the previous energies and calculations made by Garcia et al. (2011 a) and Brewster et al. (2014) were accurate.

Figure 2: **Fold-change in gene expression for multiple identical promoters.** (A) In the presence of N = 10 identical promoters, the fold-change depends strongly on the allosteric energy difference Δε_AI between the Lac repressor’s active and inactive states. The vertical dotted lines represent the number of repressors at which R_A = N for each value of Δε_AI. (B) Using fold–change measurements from Brewster et al. (2014) for the operators and gene copy numbers shown, we can determine the most likely value Δε_AI = 4.5 k_BT for LacI. The Python code (`ch6_figS2.py`) used to generate this figure can be found on the thesis GitHub repository.

Induction of Simple Repression with Multiple Promoters or Competitor Sites

We made the choice to perform all of our experiments using strains in which a single copy of our simple repression construct had been integrated into the chromosome. This stands in contrast with the methods used by a number of other studies (Oehler et al. 1994; Setty et al. 2003; Daber, Sharp, and Lewis 2009; Daber, Sochor, and Lewis 2011; Vilar and Saiz 2013; Shis et al. 2014; Sochor 2014), in which reporter constructs are placed on plasmid, meaning that the number of constructs in the cell is not precisely known. It is also common to express repressor on plasmid to boost its copy number, which results in an uncertain value for repressor copy number. Here, we show that our treatment of the MWC model has broad predictive power beyond the single-promoter scenario we explore experimentally, and indeed can account for systems in which multiple promoters compete for the repressor of interest. Additionally, we demonstrate the importance of having precise control over these parameters, as they can have a significant effect on the induction profile.

Chemical Potential Formulation to Calculate Fold-Change

In this section, we discuss a simple repression construct which we generalize in two ways from the scenario discussed in Chapter 2. First, we will allow the repressor to bind to N_S identical specific promoters whose fold-change we are interested in measuring, with each promoter containing a single repressor binding site (N_S = 1 in Chapter 2). Second, we consider N_C identical competitor sites which do not regulate the promoter of interest, but whose binding energies are substantially stronger than non-specific binding (N_C = 0 in Chapter 2). As in Chapter 2, we assume that the rest of the genome contains N_NS non-specific binding sites for the repressor. Using the formalism described in the previous section, we can write the fold-change in the grand canonical ensemble as
$$ \text{fold-change} = \frac{1}{1 + \lambda_r e^{-\beta \Delta \varepsilon_{RA}}}, \qquad(10)$$
where λ_r is the fugacity of the repressor and Δε_RA represents the energy difference between the repressor’s binding affinity to the specific operator of interest relative to the repressor’s non-specific binding affinity to the rest of the genome.

We now expand our definition of the total number of repressors in the system, R_tot, so that it is given by
R_tot = R_S + R_NS + R_C, (11)
where R_S, R_NS, and R_C represent the number of repressors bound to the specific promoter, a non-specific binding site, or to a competitor binding site, respectively. The value of R_S is given by
$$ R_S = N_S \frac{\lambda_r e^{-\beta \Delta \varepsilon_{RA}}}{1 + \lambda_r e^{-\beta \Delta \varepsilon_{RA}}}, \qquad(12)$$
where N_S is the number of specific binding sites in the cell. The value of R_NS is similarly given by
$$ R_{NS} = N_{NS} \frac{\lambda_r}{1 + \lambda_r}, \qquad(13)$$
where N_NS is the number of non-specific sites in the cell (recall that we use N_NS = 4.6 × 10⁶ for E. coli), and R_C is given by
$$ R_C = N_C \frac{\lambda_r e^{-\beta \Delta \varepsilon_C}}{1 + \lambda_r e^{-\beta \Delta \varepsilon_C}}, \qquad(14)$$
where N_C is the number of competitor sites in the cell and Δε_C is the binding energy of the repressor to the competitor site relative to its non-specific binding energy to the rest of the genome.

To account for the induction of the repressor, we replace the total number of repressors R_tot in Eq. 11 by the number of active repressors in the cell, p_act(c)R_tot. Here, p_act denotes the probability that the repressor is in the active state (Eq. 2.4),
$$ p_\text{act}(c)=\frac{\left(1+\frac{c}{K_A}\right)^n}{\left(1+\frac{c}{K_A}\right)^n+e^{-\beta \Delta \varepsilon_{AI} }\left(1+\frac{c}{K_I}\right)^n}. \qquad(15)$$
Substituting Eq. 12 into the modified Eq. 11 yields
$$ p_\text{active}(c) R_{\text{tot}} = N_S \frac{\lambda_r e^{-\beta \Delta \varepsilon_{RA}}}{1 + \lambda_r e^{-\beta \Delta \varepsilon_{RA}}} + N_{NS} \frac{\lambda_r}{1 + \lambda_r} + N_C \frac{\lambda_r e^{-\beta \Delta \varepsilon_C}}{1 + \lambda_r e^{-\beta \Delta \varepsilon_C}}. \qquad(16)$$
For systems where the number of binding sites N_S, N_NS, and N_C are known, together with the binding affinities Δε_RA and Δε_C, we can solve numerically for λ_r and substitute to obtain a fold-change at any concentration of inducer c. In the following sections, we will theoretically explore the induction curves given by Eq. 16 for a number of different combinations of simple repression binding sites, thereby predicting how the system would behave if additional specific or competitor binding sites were introduced.

Variable Repressor Copy Number (R) with Multiple Specific Binding Sites (N_S > 1)

In Chapter 2, we consider the induction profiles of strains with varying R but a single, specific binding site N_S = 1. Here, we predict the induction profiles for similar strains in which R is varied, but N_S > 1, as shown in Fig. 3. The top row shows induction profiles in which N_S = 10 and the bottom row shows profiles in which N_S = 100, assuming three different choices for the specific operator binding sites given by the O1, O2, and O3 operators. These values of N_S were chosen to mimic the common scenario in which a promoter construct is placed on either a low or high copy number plasmid. A few features stand out in these profiles. First, as the magnitude of N_S surpasses the number of repressors R, the leakiness begins to increase significantly, since there are no longer enough repressors to regulate all copies of the promoter of interest. Second, in the cases where Δε_RA = − 15.3 k_BT for the O1 operator or Δε_RA = − 13.9 k_BT for the O2 operator, the profiles where N_S = 100 are notably sharper than the profiles where N_S = 10, and it is possible to achieve dynamic ranges approaching 1. Finally, it is interesting to note that the profiles for the O3 operator where Δε_RA = − 9.7 k_BT are nearly indifferent to the value of N_S.

Figure 3: **Induction with variable R and multiple specific binding sites.** Induction profiles are shown for strains with variable R and Δε_RA = − 15.3, − 13.9, or − 9.7 k_BT. The number of specific sites, N_S, is held constant at 10 as R and Δε_RA are varied. N_S is held constant at 100 as R and Δε_RA are varied. These situations mimic the common scenario in which a promoter construct is placed on either a low or high copy number plasmid. The Python code (`ch6_figS3.py`) used to generate this figure can be found on the thesis GitHub repository.

Variable Number of Specific Binding Sites N_S with Fixed Repressor Copy Number (R)

The second set of scenarios we consider is the case in which the repressor copy number R = 260 is held constant while the number of specific promoters N_S is varied (see Fig. 4). Again, we see that leakiness is increased significantly when N_S > R, though all profiles for Δε_RA = − 9.7 k_BT exhibit high leakiness, making the effect less dramatic for this operator. Additionally, we find again that adjusting the number of specific sites can produce induction profiles with maximal dynamic ranges. In particular, the O1 and O2 profiles with Δε_RA = − 15.3 and − 13.9 k_BT, respectively, have dynamic ranges approaching 1 for N_S = 50 and 100.

Figure 4: **Induction with variable specific sites and fixed R.** Induction profiles are shown for strains with R = 260 and Δε_RA = − 15.3 k_BT, Δε_RA = − 13.9 k_BT, or Δε_RA = − 9.7 k_BT. The number of specific sites N_S is varied from 1 to 500. The Python code (`ch6_figS4.py`) used to generate this figure can be found on the thesis GitHub repository.

Competitor Binding Sites

An intriguing scenario is presented by the possibility of competitor sites elsewhere in the genome. This serves as a model for situations in which a promoter of interest is regulated by a transcription factor that has multiple targets. This is highly relevant, as the majority of transcription factors in E. coli have at least two known binding sites, with approximately 50 transcription factors having more than ten known binding sites (Rydenfelt et al. 2014; Schmidt et al. 2016). If the number of competitor sites and their average binding energy is known, however, they can be accounted for in the model. Here, we predict the induction profiles for strains in which R = 260 and N_S = 1, but there is a variable number of competitor sites N_C with a strong binding energy Δε_C = − 17.0 k_BT. In the presence of such a strong competitor when N_C > R, the leakiness is greatly increased, as many repressors are siphoned into the pool of competitor sites. This is most dramatic for the case where Δε_RA = − 9.7 k_BT, in which it appears that no repression occurs at all when N_C = 500. Interestingly, when N_C < R, the effects of the competitor are not especially notable.

Figure 5: **Induction with variable competitor sites, a single specific site, and fixed R.** Induction profiles are shown for strains with R = 260, N_s = 1, and Δε_RA = − 15.3 k_BT for the O1 operator, Δε_RA = − 13.9 k_BT for the O2 operator, or Δε_RA = − 9.7 k_BT for the O3 operator. The number of specific sites, N_C, is varied from 1 to 500. This mimics the common scenario in which a transcription factor has multiple binding sites in the genome. The Python code (`ch6_figS5.py`) used to generate this figure can be found on the thesis GitHub repository.

Properties of the Induction Response

As discussed in the main body of the paper, our treatment of the MWC model allows us to predict key properties of induction responses. Here, we consider the leakiness, saturation, and dynamic range (diagrammed in Fig. 2.1) by numerically solving Eq. 16 in the absence of inducer, c = 0, and in the presence of saturating inducer c → ∞. Using Eq. 15, the former case is given by
$$ R_\text{tot} \frac{1}{1 + e^{-\beta \Delta \varepsilon_{AI}}} = N_S \frac{\lambda_r e^{-\beta \Delta \varepsilon_{RA}}}{1 + \lambda_r e^{-\beta \Delta \varepsilon_{RA}}} + N_{NS} \frac{\lambda_r}{1 + \lambda_r} + N_C \frac{\lambda_r e^{-\beta \Delta \varepsilon_C}}{1 + \lambda_r e^{-\beta \Delta \varepsilon_C}}. \qquad(17)$$
Similarly, the limit of saturating inducer is found by determining λ_r from the form
$$ R_{\text{tot}} \frac{1}{1 + e^{-\beta \Delta \varepsilon_{AI}} \left(\frac{K_A}{K_I} \right)^2} = N_S \frac{\lambda_r e^{-\beta \Delta \varepsilon_{RA}}}{1 + \lambda_r e^{-\beta \Delta \varepsilon_{RA}}} + N_{NS} \frac{\lambda_r}{1 + \lambda_r} + N_C \frac{\lambda_r e^{-\beta \Delta \varepsilon_C}}{1 + \lambda_r e^{-\beta \Delta \varepsilon_C}}. \qquad(18)$$

In Fig. 6, we show how the leakiness, saturation, and dynamic range vary with R and Δε_RA in systems with N_S = 10 or N_S = 100. An inflection point occurs where N_S = R, with leakiness and dynamic range behaving differently when R < N_S than when R > N_S. This transition is more dramatic for N_S = 100 than for N_S = 10. Interestingly, the saturation values consistently approach 1, indicating that full induction is easier to achieve when multiple specific sites are present. Moreover, dynamic range values for O1 and O2 strains with Δε_RA = − 15.3 and − 13.9 k_BT approach 1 when R > N_S, although when N_S = 10, there is a slight downward dip owing to saturation values of less than 1 at high repressor copy numbers.

Figure 6: **Phenotypic properties of induction with multiple specific binding sites.** The leakiness, saturation, and dynamic range are shown for systems with number of specific binding sites N_S = 10 or N_S = 100. The vertical white line indicates the point at which N_S = R. The Python code (`ch6_figS6.py`) used to generate this figure can be found on the thesis GitHub repository.

In Fig. 7, we similarly show how the leakiness, saturation, and dynamic range vary with R and Δε_RA in systems with N_S = 1 and multiple competitor sites N_C = 10 or N_C = 100. Each of the competitor sites has a binding energy of Δε_C = − 17.0 k_BT. The phenotypic profiles are very similar to those for multiple specific sites with sharper transitions at R = N_C due to the greater binding strength of the competitor site. This indicates that introducing competitors has much the same effect on the induction phenotypes as introducing additional specific sites, as in either case the influence of the repressors is dampened when there are insufficient repressors to interact with all of the specific binding sites.

Figure 7: **Phenotypic properties of induction with a single specific site and multiple competitor sites.** The leakiness, saturation, and dynamic range are shown for systems with a single specific binding site N_S = 1 and a number of competitor sites N_C = 10 or N_C = 100. All competitor sites have a binding energy of Δε_C = − 17.0 k_BT. The vertical white line indicates the point at which N_C = R. The Python code (`ch6_figS7.py`) used to generate this figure can be found on the thesis GitHub repository.

This section gives a quantitative analysis of the nuances imposed on induction response in the case of systems involving multiple gene copies as are found in the vast majority of studies on induction. In these cases, the intrinsic parameters of the MWC model get entangled with the parameters describing the gene copy number.

Flow Cytometry

In this section, we provide information regarding the equipment used to make experimental measurements of the fold-change in gene expression in the interests of transparency and reproducibility. We also provide a summary of our unsupervised method of gating the flow cytometry measurements for consistency between experimental runs.

Equipment

Due to past experience using the Miltenyi Biotec MACSQuant flow cytometer during the Physiology summer course at the Marine Biological Laboratory, we used the same flow cytometer for the formal measurements in this work graciously provided by the Pamela Björkman lab at Caltech. All measurements were made using an excitation wavelength of 488 nm with an emission filter set of 525/50 nm. This excitation wavelength provides approximately 40% of the maximum YFP absorbance, and this was found to be sufficient for the purposes of these experiments. A useful feature of modern flow cytometry is the high-sensitivity signal detection through the use of photomultiplier tubes (PMT) whose response can be tuned by adjusting the voltage. Thus, the voltage for the forward-scatter (FSC), side-scatter (SSC), and gene expression measurements were tuned manually to maximize the dynamic range between autofluorescence signal and maximal expression without losing the details of the population distribution. Once these voltages were determined, they were used for all subsequent measurements. Extremely low signal producing particles were discarded before data storage by setting a basal voltage threshold, thus removing the majority of spurious events. The various instrument settings for data collection are given in Table 6.1.

Instrument settings for data collection using the Miltenyi Biotec MACSQuant flow cytometer.
Laser	Channel	Sensor Voltage
488 nm	Forward-Scatter (FSC)	423 V
488 nm	Side-Scatter (SSC)	537 V
488 nm	Intensity (B1 Filter, 525/50 nm)	790 V
488 nm	Trigger (debris threshold)	24.5V

Unsupervised Gating

Flow cytometry data will frequently include a number of spurious events or other undesirable data points such as cell doublets and debris. The process of restricting the collected data set to those data determined to be “real” is commonly referred to as gating. These gates are typically drawn manually and restrict the data set to those points which display a high degree of linear correlation between their forward-scatter (FSC) and side-scatter (SSC). The development of unbiased and unsupervised methods of drawing these gates is an active area of research (Aghaeepour et al. 2013).

For this study, we used an automatic unsupervised gating procedure to filter the flow cytometry data based on the front and side-scattering values returned by the MACSQuant flow cytometer. We assume that the region with highest density of points in these two channels corresponds to single-cell measurements. Everything extending outside of this region was discarded in order to exclude sources of error such as cell clustering, particulates, or other spurious events.

In order to define the gated region, we fit a two-dimensional Gaussian function to the log₁₀ forward-scattering (FSC) and the log₁₀ side-scattering (SSC) data. We then kept a fraction α ∈ [0, 1] of the data by defining an elliptical region given by
(x−μ)^TΣ^− 1(x−μ) ≤ χ_α²(p), (19)

where x is the 2 × 1 vector containing the log (FSC) and log (SSC), μ is the 2 × 1 vector representing the mean values of log (FSC) and log (SSC) as obtained from fitting a two-dimensional Gaussian to the data, and Σ is the 2 × 2 covariance matrix also obtained from the Gaussian fit. χ_α²(p) is the quantile function for probability p of the chi-squared distribution with two degrees of freedom. Fig. 8 shows an example of different gating contours that would arise from different values of α in Eq. 19. In this work, we chose α = 0.4 which we deemed was a sufficient constraint to minimize the noise in the data. The specific code where this gating is implemented can be found in GitHub repository.

Figure 8: **Representative unsupervised gating contours of flow-cytometry data.** Points indicate individual flow cytometry measurements of forward scatter and side scatter. Colored contours indicate arbitrary gating contours ranging from 100% (α = 1) to 5% (α = 0.05). All measurements shown in Chapters 2 and 3 in this work were made by computing the mean fluorescence from the 40^th percentile (α = 0.4). The Python code (`ch6_figS8.py`) used to generate this figure can be found on the thesis GitHub repository.

Comparison of Flow Cytometry with Other Methods

Previous work from the Phillips’ lab experimentally determined fold-change for similar simple repression constructs using a variety of different measurement methods (Garcia et al. 2011 a). Garcia and Phillips used the same background strains as the ones used in this work, but gene expression was measured with Miller assays based on colorimetric enzymatic reactions with the LacZ protein (Garcia et al. 2011 a). The experiments in Brewster et al. (2014) (as well as in Chapter 4 of this dissertation) used a LacI dimer with the tetramerization region replaced with an mCherry tag, where the fold-change was measured as the ratio of the gene expression rate rather than a single snapshot of the gene output.

Fig. 9 shows the comparison of these methods along with the flow cytometry method used in this work. The consistency of these three readouts validates the quantitative use of flow cytometry and unsupervised gating to determine the fold-change in gene expression. However, one important caveat revealed by this figure is that the sensitivity of flow cytometer measurements is not sufficient to accurately determine the fold-change for the high repressor copy number strains in O1 without induction. Instead, a method with a large dynamic range such as the Miller assay is needed to accurately resolve the fold-change at such low levels of expression.

Figure 9: **Comparison of experimental methods to determine the fold-change.** The fold-change in gene expression for equivalent simple-repression constructs has been determined using three independent methods: flow cytometry (Chapter 2), colorimetric Miller assays (Garcia et al. (2011 a)), and video microscopy (Brewster et al. (2014)). All three methods give consistent results, although flow cytometry meeasurements lose accuracy for fold-change less than 0.01. Note that the repressor-DNA binding energies Δε_RA used for the theoretical predictions were determined in Garcia et al. (2011 a). The Python code (`ch6_figS9.py`) used to generate this figure can be found on the thesis GitHub repository.

Single-Cell Microscopy

In this section, we detail the procedures and results from single-cell microscopy verification of our flow cytometry measurements. Our previous measurements of fold-change in gene expression have been measured using bulk-scale Miller assays (Garcia et al. 2011 a) or through single-cell microscopy (Brewster et al. 2014). In this work, flow cytometry was an attractive method due to the ability to screen through many different strains at different concentrations of inducer in a short amount of time. To verify our results from flow cytometry, we examined two bacterial strains with different repressor-DNA binding energies (Δε_RA) of − 13.9 k_BT and − 15.3 k_BT with R = 260 repressors per cell using fluorescence microscopy, and estimated the values of the parameters K_A and K_I for direct comparison between the two methods. For a detailed explanation of the Python code implementation of the processing steps described below, please see this paper’s GitHub repository.

Strains and Growth Conditions

Cells were grown in an identical manner to those used for measurement via flow cytometry (see Materials & Methods of Chapter 2). Briefly, cells were grown to saturation in rich media broth (LB) with 100 μg ⋅ mL^− 1 spectinomycin in a deep-well 96-well plate at 37^∘C. These cultures were then diluted 1000-fold into 500 μL of M9 minimal medium supplemented with 0.5% glucose and the appropriate concentration of the inducer IPTG. Strains were allowed to grow at 37^∘C with vigorous aeration for approximately 8 hours. Prior to mounting for microscopy, the cultures were diluted 10-fold into M9 glucose minimal medium in the absence of IPTG. Each construct was measured using the same range of inducer concentration values as was performed in the flow cytometry measurements (between 100 nM and 5 mM IPTG). Each condition was measured in triplicate in microscopy whereas approximately ten measurements were made using flow cytometry.

Imaging Procedure

During the last hour of cell growth, an agarose mounting substrate was prepared containing the appropriate concentration of the IPTG inducer. This mounting substrate was composed of M9 minimal medium supplemented with 0.5% glucose and 2% agarose (Life Technologies UltraPure Agarose, Cat. No. 16500100). This solution was heated in a microwave until molten followed by addition of the IPTG to the appropriate final concentration. This solution was then thoroughly mixed and a 500 μL aliquot was sandwiched between two glass coverslips and was allowed to solidify.

Once solid, the agarose substrates were cut into approximately 10 mm × 10 mm squares. An aliquot of one to two microliters of the diluted cell suspension was then added to each pad. For each concentration of inducer, a sample of the autofluorescence control, the ΔlacI constitutive expression control, and the experimental strain was prepared yielding a total of thirty-six agarose mounts per experiment. These samples were then mounted onto two glass-bottom dishes (Ted Pella Wilco Dish, Cat. No. 14027-20) and sealed with parafilm.

All imaging was performed on a Nikon Ti-Eclipse inverted fluorescent microscope outfitted with a custom-built laser illumination system and operated by the open-source MicroManager control software (Edelstein et al. 2014). The YFP fluorescence was imaged using a CrystaLaser 514 nm excitation laser coupled with a laser-optimized (Semrock Cat. No. LF514-C-000) emission filter.

For each sample, between fifteen and twenty positions were imaged allowing for measurement of several hundred cells. At each position, a phase contrast image, an mCherry image, and a YFP image were collected in that order with exposures on a time scale of ten to twenty milliseconds. For each channel, the same exposure time was used across all samples in a given experiment. All images were collected and stored in ome.tiff format. All microscopy images are available on the CaltechDATA online repository under DOI: 10.22002/D1.229.

Image Processing

Correcting Uneven Illumination

The excitation laser has a two-dimensional Gaussian profile. To minimize non-uniform illumination of a single field of view, the excitation beam was expanded to illuminate an area larger than that of the camera sensor. While this allowed for an entire field of view to be illuminated, there was still approximately a 10% difference in illumination across both dimensions. This nonuniformity was corrected for in post-processing by capturing twenty images of a homogeneously fluorescent plastic slide (Autofluorescent Plastic Slides, Chroma Cat. No. 920001) and averaging to generate a map of illumination intensity at any pixel I_YFP. To correct for shot noise in the camera (Andor iXon+ 897 EMCCD), twenty images were captured in the absence of illumination using the exposure time used for the experimental data. Averaging over these images produced a map of background noise at any pixel I_dark. To perform the correction, each fluorescent image in the experimental acquisition was renormalized with respect to these average maps as
$$ I_\text{flat} = \frac{I - I_\text{dark}}{I_\text{YFP} - I_\text{dark}}\langle I_\text{YFP} - I_\text{dark} \rangle, \qquad(20)$$
where I_flat is the renormalized image and I is the original fluorescence image.

Cell Segmentation

Each bacterial strain constitutively expressed an mCherry fluorophore from a low copy-number plasmid. This served as a volume marker of cell mass allowing us to segment individual cells through edge detection in fluorescence. We used the Marr-Hildreth edge detector which identifies edges by taking the second derivative of a lightly Gaussian blurred image. Edges are identified as those regions which cross from highly negative to highly positive values or vice-versa within a specified neighborhood. Bacterial cells were defined as regions within an intact and closed identified edge. All segmented objects were then labeled and passed through a series of filtering steps.

To ensure that primarily single cells were segmented, we imposed area and eccentricity bounds. We assumed that single cells projected into two dimensions are roughly 2 μm long and 1 μm wide, so that cells are likely to have an area between 0.5 μm² and 6 μm. To determine the eccentricity bounds, we assumed that a single cell can be approximated by an ellipse with semi-major (a) and semi-minor (b) axis lengths of 0.5 μm and 0.25 μm, respectively. The eccentricity of this hypothetical cell can be computed as
$$ \text{eccentricity} = \sqrt{1 - \left(\frac{b}{a}\right)^2}, \qquad(21)$$
yielding a value of approximately 0.8. Any objects with an eccentricity below this value were not considered to be single cells. After imposing both an area and eccentricity filter, the remaining objects were considered cells of interest and the mean fluorescence intensity of each cell was extracted.

Calculation of Fold-Change and Empirical Comparison

Cells exhibited background fluorescence even in the absence of an expressed fluorophore. We corrected for this autofluorescence contribution to the fold-change calculation by subtracting the mean YFP fluorescence of cells expressing only the mCherry volume marker from each experimental measurement. The fold-change in gene expression was therefore calculated as
$$ \text{fold-change} = \frac{\langle I_{R > 0} \rangle - \langle I_\text{auto} \rangle}{\langle I_{R = 0} \rangle - \langle I_\text{auto} \rangle}, \qquad(22)$$
where ⟨I_R > 0⟩ is the mean fluorescence intensity of cells expressing LacI repressors, ⟨I_auto⟩ is the mean intensity of cells expressing only the mCherry volume marker, and ⟨I_R = 0⟩ is the mean fluorescence intensity of cells in the absence of LacI.

The agreement in the fold-change in gene expression between single-cell microscopy and flow cytometry can be seen in Fig. 10 (A) where the two methods have been plotted against each other. At this level, we see near perfect agreement between the methods when examining the mean level of gene expression. However, there is a distinct difference in higher moments of the gene expression distributions. Empirical cumulative distributions for a maximally-induced (5000 μM IPTG, R = 160, Δε_RA = − 13.9 k_BT) sample are shown as purple and orange lines in Fig. 10 (B), respectively. To make the different methods directly comparable, the expressions distributions were normalized to range between 0 and 1, and then centered about the mean of the distribution. While the means agree between the methods, it is immediately obvious that the width of the distributions are different with microscopy yielding distributions with a higher variance. To compare the distributions more quantitatively, we computed the central moment values for the variance, skewness, and kurtosis of the distributions (Fig. 10 (C)). This quantitative comparison reveals that the value of the moments can differ by close to an order of magnitude between the methods with flow cytometry systematically lower than the same distribution measured via microscopy. These results show that in terms of measuring the mean level of gene expression, the two methods can be used interchangeably. However, if one is interested in the higher moments of the distribution, the choice of method does matter.

Figure 10: **Empirical comparison of flow cytometry and single-cell microscopy.** (A) The observed fold-change in gene expression for the IPTG titration of a strain with R = 260 and Δε_RA = − 13.9 k_BT using both microscopy (x-axis) and flow cytometry (y-axis). Points and errors represent the mean and standard error of 3 (microscopy) or 10 (flow cytometry) biological replicates. Black line indicates perfect agreement. (B) Empirical cumulative distributions of expression intensity for the strain used in (A) maximally induced with 5000 μM IPTG. Purple and orange lines correspond to measurements with microscopy and flow cytometry, respectively. Fluorescence was normalized between 0 and 1 and centered about the observed mean. (C) Central moments of the distributions shown in (B) for microscopy and flow cytometry. Each point represents a single biological replicate. The Python code (`ch6_figS10.py`) used to generate this figure can be found on the thesis GitHub repository.

Fold-Change Sensitivity Analysis

In Chapter 2,we found that the width of the credible regions varied widely depending on the repressor copy number R and repressor operator binding energy Δε_RA. More precisely, the credible regions were much narrower for low repressor copy numbers R and weak binding energy Δε_RA. In this section, we explain how this behavior comes about. We focus our attention on the maximum fold-change in the presence of saturating inducer. While it is straightforward to consider the width of the credible regions at any other inducer concentration, the predicted fold-change curves presented in Chapter 2 show that the credible regions are widest at saturation.

The width of the credible regions corresponds to how sensitive the fold-change is to the fit values of the dissociation constants K_A and K_I. To be quantitative, we define
Δfold-change_{K_A} ≡ fold-change(K_A, K_I^fit) − fold-change(K_A^fit, K_I^fit), (23)
the difference between the fold-change at a particular K_A value relative to the best-fit dissociation constant K_A^fit = 139 μM. For simplicity, we keep the inactive state dissociation constant fixed at its best-fit value K_I^fit = 0.53 μM. A larger difference Δfold-change_{K_A} implies a wider credible region. Similarly, we define the analogous quantity
Δfold-change_{K_I} = fold-change(K_A^fit, K_I) − fold-change(K_A^fit, K_I^fit) (24)
to measure the sensitivity of the fold-change to K_I at a fixed K_A^fit. Fig. 11 shows both of these quantities in the limit c → ∞ for different repressor-DNA binding energies Δε_RA and repressor copy numbers R.

To understand how the width of the credible region scales with Δε_RA and R, we can Taylor-expand the difference in fold-change to first order, Δfold-change_{K_A} ≈ (∂fold-change/∂K_A)(K_A−K_A^fit), where the partial derivative has the form
$$ \begin{aligned} \frac{\partial \text{fold-change}}{\partial K_A} &= \frac{e^{-\beta \Delta \varepsilon_{AI}} \frac{n}{K_I}\left(\frac{K_A}{K_I}\right)^{n-1}}{\left( 1 + e^{-\beta \Delta \varepsilon_{AI}} \left(\frac{K_A}{K_I}\right)^n \right)^2}\frac{R}{N_{NS}}e^{-\beta \Delta\varepsilon_{RA}} \times \\ & \left( 1+\frac{1}{1+e^{-\beta \Delta \varepsilon_{AI} } \left(\frac{K_A}{K_I}\right)^n }\frac{R}{N_{NS}}e^{-\beta \Delta\varepsilon_{RA}} \right)^{-2}. \end{aligned} \qquad(25)$$
Similarly, the Taylor expansion Δfold-change_{K_I} ≈ (∂fold-change/∂K_I)(K_I−K_I^fit) features the partial derivative
$$ \begin{aligned} \frac{\partial \text{fold-change}}{\partial K_I} &= -\frac{e^{-\beta \Delta \varepsilon_{AI}} \frac{n}{K_I}\left(\frac{K_A}{K_I}\right)^{n}}{\left( 1 + e^{-\beta \Delta \varepsilon_{AI}} \left(\frac{K_A}{K_I}\right)^n \right)^2}\frac{R}{N_{NS}}e^{-\beta \Delta\varepsilon_{RA}} \times \\ & \left( 1+\frac{1}{1+e^{-\beta \Delta \varepsilon_{AI} } \left(\frac{K_A}{K_I}\right)^n }\frac{R}{N_{NS}}e^{-\beta \Delta\varepsilon_{RA}} \right)^{-2}. \end{aligned} \qquad(26)$$
From Eq. 25 and Eq. 26, we find that both Δfold-change_{K_A} and Δfold-change_{K_I} increase in magnitude with R and decrease in magnitude with Δε_RA. Accordingly, we expect that the O3 strains (with the least negative Δε_RA) and the strains with the smallest repressor copy number will lead to partial derivatives with smaller magnitude and hence to tighter credible regions. Indeed, this prediction is carried out in Fig. 11.

Lastly, we note that Eq. 25 and Eq. 26 enable us to quantify the scaling relationship between the width of the credible region and the two quantities R and Δε_RA. For example, for the O3 strains, where the fold-change at saturating inducer concentration is ≈ 1, the right-most term in both equations which equals the fold-change squared is roughly 1. Therefore, we find that both $\frac{\partial \text{fold-change}}{\partial K_A}$ and ∂fold-change/∂K_I scale linearly with R and e^{− βΔε_RA}. Thus the width of the R = 22 strain will be roughly 1/1000 as large as that of the R = 1740 strain; similarly, the width of the O3 curves will be roughly 1/1000 the width of the O1 curves.

Figure 11: **Determining how sensitive the fold-change values are to the fit values of the dissociation constants.** The difference Δfold-change_{K_A} in fold change when the dissociation constant K_A is slightly offset from its best-fit value K_A = 139_− 22^+ 29μM. Fold-change is computed in the limit of saturating inducer concentration (c → ∞, see ) where the credible regions are widest. The O3 strain (Δε_RA = − 9.7 k_BT) is about 1/1000 as sensitive as the O1 operator to perturbations in the parameter values, and hence its credible region is roughly 1/1000 as wide. All curves were made using R = 260. (B) The same analysis as shown in (A), but plotting the sensitivity of fold-change to the K_I parameter relative to the best-fit value K_I = 0.53_− 0.04^+ 0.04μM. Note that only the magnitude, and not the sign, of this difference describes the sensitivity of each parameter. Hence, the O3 strain is again less sensitive than the O1 and O2 strains. (C) The same analysis as shown in (A), but showing how the fold-change sensitivity for different repressor copy numbers. The strains with lower repressor copy number are less sensitive to changes in the dissociation constants, and hence their corresponding curves have tighter credible regions. All curves were made using Δε_RA = − 13.9 k_BT. (D) The same analysis as shown in (C), the sensitivity of fold-change with respect to K_I is again smallest (in magnitude) for the low repressor copy number strains. The Python code (`ch6_figS11.py`) used to generate this figure can be found on the thesis GitHub repository.

Global Fit of All Parameters

In Chapter 2, we used the repressor copy numbers R and repressor-DNA binding energies Δε_RA as reported by Garcia et al. (2011 a). However, any error in these previous measurements of R and Δε_RA will necessarily propagate into our own fold-change predictions. In this section, we take an alternative approach to fitting the physical parameters of the system to that used in the main text. First, rather than fitting only a single strain, we fit the entire data set in along with microscopy data for the synthetic operator Oid. In addition, we also simultaneously fit the parameters R and Δε_RA using the prior information given by the previous measurements. By using the entire data set and fitting all of the parameters, we obtain the best possible characterization of the statistical mechanical parameters of the system given our current state of knowledge.

To fit all of the parameters simultaneously, we follow a similar approach to the one detailed in the Materials & Methods of Chapter 2. Briefly, we perform a Bayesian parameter estimation of the dissociation constants K_A and K_I, the six different repressor copy numbers R corresponding to the six lacI ribosomal binding sites used in our work, and the four different binding energies Δε_RA characterizing the four distinct operators used to make the experimental strains. As in the main text, we fit the logarithms $\tilde{k}_A = -\log \frac{K_A}{1\,\text{M}}$ and $\tilde{k}_I = -\log \frac{K_I}{1\,\text{M}}$ of the dissociation constants which grants better numerical stability.

We assume that deviations of the experimental fold-change from the theoretical predictions are normally distributed with mean zero and standard deviation σ. We begin by writing Bayes’ theorem,
$$ g(\tilde{k}_A, \tilde{k}_I, \mathbf{\textbf{\textit{R}}}, \mathbf{\Delta\boldsymbol{\varepsilon}_{\textbf{\textit{RA}}}}, \sigma \mid D) = \frac{f(D \mid \tilde{k}_A, \tilde{k}_I, \mathbf{\textbf{\textit{R}}}, \mathbf{\Delta\boldsymbol{\varepsilon}_{\textbf{\textit{RA}}}}, \sigma) g(\tilde{k}_A, \tilde{k}_I, \mathbf{\textbf{\textit{R}}}, \mathbf{\Delta\boldsymbol{\varepsilon}_{\textbf{\textit{RA}}}}, \sigma)}{f(D)}, \qquad(27)$$
where $\textbf{\textit{R}}$ is an array containing the six different repressor copy numbers to be fit, $\mathbf{\Delta\boldsymbol{\varepsilon}_{\textbf{\textit{RA}}}}$ is an array containing the four binding energies to be fit, and D is the experimental fold-change data. The term $P(\tilde{k}_A, \tilde{k}_I, \mathbf{\textbf{\textit{R}}}, \mathbf{\Delta\boldsymbol{\varepsilon}_{\textbf{\textit{RA}}}}, \sigma \mid D)$ gives the probability distributions of all of the parameters given the data. The prefixes g and f denote probability densities of parameters and data, respectively. The term $f(D \mid \tilde{k}_A, \tilde{k}_I, \mathbf{\textbf{\textit{R}}}, \mathbf{\Delta\boldsymbol{\varepsilon}_{\textbf{\textit{RA}}}}, \sigma)$ represents the likelihood of having observed our experimental data given some value for each parameter. $g(\tilde{k}_A, \tilde{k}_I, \mathbf{\textbf{\textit{R}}}, \mathbf{\Delta\boldsymbol{\varepsilon}_{\textbf{\textit{RA}}}}, \sigma)$ contains all the prior information on the values of these parameters. Lastly, f(D) serves as a normalization constant and is neglected.

Given n independent measurements of the fold-change, the first term in Eq. 27 can be written as
$$ \begin{aligned} f(D \mid \tilde{k}_A, \tilde{k}_I, \mathbf{\textbf{\textit{R}}}, \mathbf{\Delta\boldsymbol{\varepsilon}_{\textbf{\textit{RA}}}}, \sigma) = \frac{1}{(2\pi\sigma^2)^{\frac{n}{2}}}\prod\limits_{i=1}^n \exp \left[-\frac{(\text{fc}^{(i)}_{\exp} - \text{fc}(\tilde{k}_A, \tilde{k}_I, R^{(i)}, \Delta\varepsilon_{RA}^{(i)}, c^{(i)}))^2}{2\sigma^2}\right], \end{aligned} \qquad(28)$$
where fc_exp⁽ⁱ⁾ is the i^th experimental fold-change and fc( ⋅ ⋅ ⋅ ) is the theoretical prediction. Note that the standard deviation σ of this distribution is not known and hence needs to be included as a parameter to be fit.

The second term in Eq. 27 represents the prior information of the parameter values. We assume that all parameters are independent of each other, such that $g(\tilde{k}_A, \tilde{k}_I, \mathbf{\textbf{\textit{R}}}, \mathbf{\Delta\boldsymbol{\varepsilon}_{\textbf{\textit{RA}}}}, \sigma)$ = g(k̃_A) ⋅ P(k̃_I) ⋅ ∏_iP(R⁽ⁱ⁾) ⋅ ∏_jg(Δε_RA^(j)) ⋅ g(σ), where the superscript (i) indicates the repressor copy number of index i and the superscript (j) denotes the binding energy of index j. As above, we note that a prior must also be included for the unknown parameter σ.

Because we know nothing about the values of k̃_A, k̃_I, and σ before performing the experiment, we assign maximally uninformative priors to each of these parameters. More specifically, we assign uniform priors to k̃_A and k̃_I and a Jeffreys prior to σ, indicating that K_A, K_I, and σ are scale parameters (Sivia and Skilling 2006). We do, however, have prior information for the repressor copy numbers and the repressor-DNA binding energies from Garcia et al. (2011 a). This prior knowledge is included within our model using an informative prior for these two parameters, which we assume to be Gaussian. Hence each of the R⁽ⁱ⁾ repressor copy numbers to be fit satisfies
$$ g(R^{(i)}) = \frac{1}{\sqrt{2\pi\sigma_{R_i}^2}} \exp \left(-\frac{(R^{(i)} - \bar{R}^{(i)})^2}{2 \sigma_{R_i}^2} \right), \qquad(29)$$
where R̄⁽ⁱ⁾ is the mean repressor copy number and σ_{R_i} is the variability associated with this parameter as reported in Garcia et al. (2011 a). Note that we use the given value of σ_{R_i} from previous measurements rather than leaving this as a free parameter.

Similarly, the binding energies Δε_RA^(j) are also assumed to have a Gaussian informative prior of the same form. We write it as
$$ g(\Delta\varepsilon_{RA}^{(j)}) = \frac{1}{\sqrt{2\pi\sigma_{\varepsilon_j}^2}} \exp \left(- \frac{(\Delta\varepsilon_{RA}^{(j)} - \Delta\bar{\varepsilon}_{RA}^{(j)})^2}{2 \sigma_{\varepsilon_j}^2} \right), \qquad(30)$$
where Δε̄_RA^(j) is the binding energy and σ_{ε_j} is the variability associated with that parameter around the mean value as reported in Garcia et al. (2011 a).

The σ_{R_i} and σ_{ε_j} parameters will constrain the range of values for R⁽ⁱ⁾ and Δε_RA^(j) found from the fitting. For example, if for some i the standard deviation σ_{R_i} is very small, it implies a strong confidence in the previously reported value. Mathematically, the exponential in Eq. 29 will ensure that the best-fit R⁽ⁱ⁾ lies within a few standard deviations of R̄⁽ⁱ⁾. Since we are interested in exploring which values could give the best fit, the errors are taken to be wide enough to allow the parameter estimation to freely explore parameter space in the vicinity of the best estimates. Putting all these terms together, we use Markov chain Monte Carlo to sample the posterior distribution $P(\tilde{k}_A, \tilde{k}_I, \mathbf{\textbf{\textit{R}}}, \mathbf{\Delta \boldsymbol{\varepsilon}_{\textbf{\textit{RA}}}}, \sigma \mid D)$, enabling us to determine both the most likely value for each physical parameter as well as its associated credible region.

Fig. 12 shows the result of this global fit. When compared with the results of Chapter 2, we can see that fitting for the binding energies and the repressor copy numbers improves the agreement between the theory and the data. Table 6.2 summarizes the values of the parameters as obtained with this MCMC parameter inference. We note that even though we allowed the repressor copy numbers and repressor-DNA binding energies to vary, the resulting fit values were very close to the previously reported values. The fit values of the repressor copy numbers were all within one standard deviation of the previous reported values provided in Garcia et al. (2011 a). And although some of the repressor-DNA binding energies differed by a few standard deviations from the reported values, the differences were always less than 1 k_BT, which represents a small change in the biological scales we are considering. The biggest discrepancy between our fit values and the previous measurements arose for the synthetic Oid operator, which we discuss in more detail in the following section of this chapter.

Global parameter estimates and comparison to previously reported values.
Parameter	Reported Values (Garcia et al. 2011 a)	Global Fit
K_A	-	205_− 12^+ 11 μM
K_I	-	0.73_− 0.04^0.04 μM
R₂₂	22 ± 4 per cell	20_− 1^+ 1 per cell
R₆₀	60 ± 20 per cell	74_− 3^+ 4 per cell
R₁₂₄	124 ± 30 per cell	130_− 6^+ 6 per cell
R₂₆₀	260 ± 40 per cell	257_− 11^+ 9 per cell
R₁₂₂₀	1220 ± 160 per cell	1191_− 55^+ 32 per cell
R₁₇₄₀	1740 ± 340 per cell	1599_− 87^+ 75 per cell
O1 Δε_RA	− 15.3 ± 0.2 k_BT	− 15.22_− 0.1^+ 0.1 k_BT
O2 Δε_RA	− 13.9 ± 0.2 k_BT	− 13.06_− 0.1^+ 0.1 k_BT
O3 Δε_RA	− 9.7 ± 0.1 k_BT	− 9.4_− 0.1^+ 0.1 k_BT
Oid Δε_RA	− 17.0 ± 0.2 k_BT	− 17.7_− 0.1^+ 0.2 k_BT

Figure 12: **Global fit of dissociation constants, repressor copy numbers, and binding energies.** Theoretical prediction resulting from simultaneous estimation of the dissociation constants K_A and K_I, the six repressor copy numbers R, and the four repressor-DNA binding energies Δε_RA using the entire dataset. Points and errors represent the mean and standard error of ~10 biological replicates for O1, O2, O3, and 3 biological replicates for Oid. The Python code (`ch6_fig12.py`) used to generate this figure can be found on the thesis GitHub repository.

Fig. 13 shows the same key properties as Fig. 2.7 , but uses the parameters obtained from this global fitting approach. We note that even by increasing the number of degrees of freedom in our fit, the result does not change substantially. or the O3 operator data, again, agreement between the predicted [EC₅₀] and the effective Hill coefficient remains poor due to the theory being unable to capture the steepness of the response curves.

Figure 13: **Key properties of induction profiles as predicted with a global fit using all data.** Data for (A) leakiness, (B) saturation, and (C) dynamic range are computed directly from measured fold-change. Points and errors correspond to the mean and standard error of 10 - 11 biological replicates. Points in (D) and (E) for the [EC₅₀] and the effective Hill coefficient, respectively, represent the estimated value using parameter estimates of K_A and K_I for that particular strain. Errors represent the width of the 95% credible region. In all plots, curves represent the theoretical predictions given the parameter estimates conditioned on all data sets. The Python code (`ch6_figS13.py`) used to generate this figure can be found on the thesis GitHub repository.

Applicability of Theory to the Oid Operator Sequence

In addition to the native operator sequences (O1, O2, and O3) considered in the main text, we were also interested in testing our model predictions against the synthetic Oid operator. In contrast to the other operators, Oid is one base pair shorter in length (20 bp), is fully symmetric, and is known to provide stronger repression than the native operator sequences considered so far. While the theory should be similarly applicable, measuring the lower fold-changes associated with this YFP construct was expected to be near the sensitivity limit for our flow cytometer, due to the especially strong binding energy of Oid (Δε_RA = − 17.0 k_BT) (Garcia et al. 2011 a). Accordingly, fluorescence data for Oid were obtained using microscopy, which is more sensitive than flow cytometry.

We follow the approach of the main text and make fold-change predictions based on the parameter estimates from our strain with R = 260 and an O2 operator. These predictions are shown in Fig. 14, where we also plot data taken in triplicate for strains containing R = 22, 60, and 124, obtained by single-cell microscopy. We find that the data are systematically below the theoretical predictions. We also considered our global fitting approach (see previous section) to see whether we might find better agreement with the observed data. Interestingly, we find that the majority of the parameters remain largely unchanged, but our estimate for the Oid binding energy Δε_RA is shifted to − 17.7 k_BT instead of the value − 17.0 k_BT found in Garcia et al. (2011 a). In Fig. 14, we again plot the Oid fold-change data but with theoretical predictions, using the new estimate for the Oid binding energy from our global fit, and find substantially better agreement.

Figure 14: **Predictions of fold-change for strains with an Oid binding sequence versus experimental measurements with different repressor copy numbers.** Experimental data is plotted against the parameter-free predictions that are based on our fit to the O2 strain with R = 260. Here we use the previously measured binding energy Δε_RA = − 17.0 k_BT (Garcia et al. 2011 a). The same experimental data is plotted against the best-fit parameters using the complete O1, O2, O3, and Oid data sets to infer K_A, K_I, repressor copy numbers, and the binding energies of all operators. Here the major difference in the inferred parameters is a shift in the binding energy for Oid from Δε_RA = − 17.0 k_BT to Δε_RA = − 17.7 k_BT, which now shows agreement between the theoretical predictions and experimental data. Shaded regions from the theoretical curves denote the 95% credible region. The Python code (`ch6_figS14.py`) used to generate this figure can be found on the thesis GitHub repository.

Comparison of Parameter Estimation and Fold-Change Predictions across Strains

The inferred parameter values for K_A and K_I in Chapter 2 were determined by fitting to induction fold-change measurements from a single strain (R = 260, Δε_RA = − 13.9 k_BT, n = 2, and Δε_AI = 4.5 k_BT). After determining these parameters, we were able to predict the fold-change of the remaining strains without any additional fitting. However, the theory should be independent of the specific strain used to estimate K_A and K_I; using any alternative strain to fit K_A and K_I should yield similar predictions. For the sake of completeness, here we discuss the values for K_A and K_I that are obtained by fitting to each of the induction data sets individually. These fit parameters are shown in Fig. 2.6 (D) of Chapter 2, where we find close agreement between strains, but with some deviation and poorer inferences observed with the O3 operator strains. Overall, we find that regardless of which strain is chosen to determine the unknown parameters, the predictions laid out by the theory closely match the experimental measurements. Here, we present a comparison of the strain specific predictions and measured fold-change data for each of the three operators considered.

We follow the approach taken in Chapter 2 and infer values for K_A and K_I by fitting to each combination of binding energy Δε_RA and repressor copy number R. We then use these fitted parameters to predict the induction curves of all other strains. In Fig. 15, we plot these fold-change predictions along with experimental data for each of our strains that contains an O1 operator. To make sense of this plot, we consider the first row as an example. In the first row, K_A and K_I were estimated using data from the strain containing R = 22 and an O1 operator (top leftmost plot). The remaining plots in this row show the predicted fold-change using these values for K_A and K_I. In each row, we then infer K_A and K_I using data from a strain containing a different repressor copy number (R = 60 in the second row, R = 124 in the third row, and so on). In Fig. 16 and Fig. 17, we similarly apply this inference to our strains with O2 and O3 operators, respectively. We note that the overwhelming majority of predictions closely match the experimental data.The notable exception is that using the R = 22 strain provides poor predictions for the strains with large copy numbers (especially R = 1220 and R = 1740), though it should be noted that predictions made from the R = 22 strain have considerably broader credible regions. This loss in predictive power is due to the poorer estimates of K_A and K_I for the R = 22 strain as shown in Fig. 2.6 (D).

Figure 15: **O1 strain fold-change predictions based on strain-specific parameter estimation of K_A and K_I.** Fold-change in expression is plotted as a function of IPTG concentration for all strains containing an O1 operator. The Python code (`ch6_figS15-S17.py`) used to generate this figure can be found on the thesis GitHub repository.

Figure 16: **O2 strain fold-change predictions based on strain-specific parameter estimation of K_A and K_I.** Fold-change in expression is plotted as a function of IPTG concentration for all strains containing an O2 operator. The Python code (`ch6_figS15-S17.py`) used to generate this figure can be found on the thesis GitHub repository.

Figure 17: **O3 strain fold-change predictions based on strain-specific parameter estimation of K_A and K_I.** Fold-change in expression is plotted as a function of IPTG concentration for all strains containing an O3 operator. The Python code (`ch6_figS15-S17.py`) used to generate this figure can be found on the thesis GitHub repository.

Properties of Induction Titration Curves

In this section, we expand on the phenotypic properties of the induction response that were explored in Chapter 2. We begin by expanding on our discussion of dynamic range and then show the analytic form of the [EC₅₀] for simple repression.

As stated in the main text, the dynamic range is defined as the difference between the maximum and minimum system response, or equivalently, as the difference between the saturation and leakiness of the system. The dynamic range is therefore given by
$$ \begin{aligned} \text{dynamic range} = &\left( 1+\frac{1}{1+e^{-\beta \Delta \varepsilon_{AI} } \left(\frac{K_A}{K_I}\right)^n }\frac{R}{N_{NS}}e^{-\beta \Delta\varepsilon_{RA}} \right)^{-1} -\\ & \left( 1+\frac{1}{1+e^{-\beta \Delta \varepsilon_{AI} }}\frac{R}{N_{NS}}e^{-\beta \Delta\varepsilon_{RA}} \right)^{-1}. \end{aligned} \qquad(31)$$

The dynamic range, along with saturation and leakiness were plotted with our experimental data in Fig. 2.7(A-C) as a function of repressor copy number. Fig. 18 shows how these properties are expected to vary as a function of the repressor-operator binding energy. Note that the resulting curves for all three properties have the same shape as in Fig. 2.7 (A-C), since the dependence of the fold-change upon the repressor copy number and repressor-operator binding energy are both contained in a single multiplicative term, Re^{− βΔε_RA}. Hence, increasing R on a logarithmic scale is equivalent to decreasing Δε_RA on a linear scale.

An interesting aspect of the dynamic range is that it exhibits a peak as a function of either the repressor copy number (or equivalently of the repressor-operator binding energy). Differentiating the dynamic range (Eq. 31) and setting it equal to zero, we find that this peak occurs at
$$ \frac{R^*}{N_{NS}} = e^{-\beta (\Delta\varepsilon_{AI} - \Delta\varepsilon_{RA})} \sqrt{e^{\Delta\varepsilon_{AI}} + 1} \sqrt{e^{\Delta\varepsilon_{AI}} + \left( \frac{K_A}{K_I} \right)^n}. \qquad(32)$$
The magnitude of the peak is given by
$$ \text{max dynamic range} = \frac{\left( \sqrt{e^{\Delta\varepsilon_{AI}} + 1} - \sqrt{e^{\Delta\varepsilon_{AI}} + \left( \frac{K_A}{K_I} \right)^n} \right)^2}{\left( \frac{K_A}{K_I} \right)^n - 1}, \qquad(33)$$
which is independent of the repressor-operator binding energy Δε_RA or R, and will only cause a shift in the location of the peak, but not its magnitude.

Figure 18: **Dependence of leakiness, saturation, and dynamic range on the operator binding energy and repressor copy number.** Increasing repressor copy number or decreasing the repressor-operator binding energy suppresses gene expression and decreases both the (A) leakiness and (B) saturation. (C) The dynamic range retains its shape, but shifts right as the repressor copy number increases. The peak in the dynamic range can be understood by considering the two extremes for Δε_RA: for small repressor-operator binding energies, the leakiness is small, but the saturation increases with Δε_RA, thereby decreasing the dynamic range. Repressor copy number does not affect the maximum dynamic range. The Python code (`ch6_figS18-S19.py`) used to generate this figure can be found on the thesis GitHub repository.

We now consider the two remaining properties, the [EC₅₀] and effective Hill coefficient, which determine the horizontal properties of a system - that is, they determine the range of inducer concentration in which the system’s response goes from its minimum to maximum values. The [EC₅₀] denotes the inducer concentration required to generate fold-change halfway between its minimum and maximum value and was defined implicitly in Eq. 2.9. For the simple repression system, the [EC₅₀] is given by
$$ \frac{[EC_{50}]}{K_A} = \frac{\frac{K_A}{K_I} - 1}{\frac{K_A}{K_I} - \left( \frac{\left( 1 + \frac{R}{N_{NS}} e^{-\beta \Delta\varepsilon_{RA}} \right) + \left( \frac{K_A}{K_I} \right)^n \left( 2 e^{-\beta \Delta \varepsilon_{AI}} + \left( 1 + \frac{R}{N_{NS}} e^{-\beta \Delta\varepsilon_{RA}} \right) \right) }{ 2 \left( 1 + \frac{R}{N_{NS}} e^{-\beta \Delta\varepsilon_{RA}} \right) + e^{-\beta \Delta \varepsilon_{AI}} + \left( \frac{K_A}{K_I} \right)^n e^{-\beta \Delta \varepsilon_{AI}} } \right)^{\frac{1}{n}}} - 1. \qquad(34)$$
Using this expression, we can then find the effective Hill coefficient h, which equals twice the log-log slope of the normalized fold-change evaluated at c = [EC₅₀]. In Fig. 2.7 (D-E), we show how these two properties vary with repressor copy number, and in Fig. 19, we demonstrate how they depend on the repressor-operator binding energy. Both the [EC₅₀] and h vary significantly with repressor copy number for sufficiently strong operator binding energies. Interestingly, for weak operator binding energies on the order of the O3 operator, it is predicted that the effective Hill coefficient should not vary with repressor copy number. In addition, the maximum possible Hill coefficient is roughly 1.75, which stresses the point that the effective Hill coefficient should not be interpreted as the number of inducer binding sites, which is exactly 2.

Figure 19: **[EC₅₀] and effective Hill coefficient depend strongly on repressor copy number and operator binding energy.** (A) [EC₅₀] values from very small and tightly clustered to relatively large and expanded for stronger operator binding energies. (B) The effective Hill coefficient generally decreases with increasing repressor copy number, indicating a flatter normalized response. The maximum possible Hill coefficient is roughly 1.75 for all repressor-operator binding energies. The Python code (`ch6_figS18-S19.py`) used to generate this figure can be found on the thesis GitHub repository.

Applications to Other Regulatory Architectures

In this section, we discuss how the theoretical framework presented in this work is sufficiently general to include a variety of regulatory architectures outside of simple repression by LacI. We begin by noting that the exact same formula for fold-change given in Chapter 2 can also describe corepression. We then demonstrate how our model can be generalized to include other architectures, such as a coactivator binding to an activator to promote gene expression. In each case, we briefly describe the system and describe its corresponding theoretical description. For further details, we invite the interested reader to read Bintu et al. (2005) and Marzen, Garcia, and Phillips (2013).

Corepression

Consider a regulatory architecture where binding of a transcriptional repressor occludes the binding of RNAP to the DNA. A corepressor molecule binds to the repressor and shifts its allosteric equilibrium towards the active state in which it binds more tightly to the DNA, thereby decreasing gene expression (in contrast, an inducer shifts the allosteric equilibrium towards the inactive state where the repressor binds more weakly to the DNA). As in the main text, we can enumerate the states and statistical weights of the promoter and the allosteric states of the repressor. We note that these states and weights exactly match those in Fig. 2.2 and yield the same fold-change equation,
$$ \text{fold-change} \approx \left(1 + {\left(1 + {c \over K_A}\right)^n \over \left(1 + {c \over K_A}\right)^n + e^{\beta\Delta\varepsilon_{AI} }\left(1 + {c \over K_I}\right)^n} {R \over N_{NS} }e^{-\beta\Delta\varepsilon_{RA} }\right)^{-1}, \qquad(35)$$
where c now represents the concentration of the corepressor molecule. Mathematically, the difference between these two architectures can be seen in the relative sizes of the dissociation constants K_A and K_I between the inducer and repressor in the active and inactive states, respectively. The corepressor is defined by K_A < K_I, since the corepressor favors binding to the repressor’s active state; an inducer must satisfy K_I < K_A, as was found in Chapter 2. Much as was performed in Chapter 2, we can make some predictions regarding the response of a corepressor. In Fig. 20 (A), we show how varying the repressor copy number R and the repressor-DNA binding energy Δε_RA influence the response. We draw the reader’s attention to the decrease in fold-change as the concentration of effector is increased.

Activation

We now turn to the case of activation. While this architecture was not studied in this work, we wish to demonstrate how the framework presented here can be extended to include transcription factors other than repressors. To that end, we consider a transcriptional activator which binds to DNA and aids in the binding of RNAP through energetic interaction term ε_AP. Note that in this architecture, binding of the activator does not occlude binding of the polymerase. Binding of a coactivator molecule shifts its allosteric equilibrium towards the active state (K_A < K_I), where the activator is more likely to be bound to the DNA and promote expression. Enumerating all of the states and statistical weights of this architecture and making the approximation that the promoter is weak generates a fold-change equation of the form
$$ \text{fold-change} = {1 + \frac{\left(1 + \frac{c}{K_A}\right)^n}{\left( 1 + \frac{c}{K_A}\right)^n + e^{\beta\Delta\varepsilon_{AI} }\left(1 + \frac{c}{K_I}\right)^n}\frac{A}{ N_{NS} }e^{-\beta\Delta\varepsilon_{AA} }e^{-\beta\varepsilon_{AP} } \over 1 + {\left(1 + {c \over K_A}\right)^n \over \left(1 + {c \over K_A}\right)^n + e^{\beta\Delta\varepsilon_{AI} }\left(1 + {c \over K_I}\right)^n}{A \over N_{NS} }e^{-\beta\Delta\varepsilon_{AA} } }, \qquad(36)$$
where A is the total number of activators per cell, c is the concentration of a coactivator molecule, Δε_AA is the binding energy of the activator to the DNA in the active allosteric state, and ε_AP is the interaction energy between the activator and the RNAP. Unlike in the cases of induction and corepression, the fold-change formula for activation includes terms from when the RNAP is bound by itself on the DNA as well as when both RNAP and the activator are simultaneously bound to the DNA. Fig. 20 (B) explores predictions of the fold-change in gene expression by manipulating the activator copy number, DNA binding energy, and the polymerase-activator interaction energy. Note that with this activation scheme, the fold-change must necessarily be greater than one. An interesting feature of these predictions is the observation that even small changes in the interaction energy ( < 0.5 k_BT) can result in dramatic increase in fold-change.

As in the case of induction, the approach towards inducible activation is straightforward to generalize. For example, the relative values of K_I and K_A can be switched such that K_I < K_A, in which the secondary molecule drives the activator to assume the inactive state, represents induction of an activator. While these cases might be viewed as separate biological phenomena, mathematically they can all be described by the same underlying formalism.

Figure 20: **Representative fold-change predictions for allosteric corepression and activation.** (A) Contrary to the case of induction described in the main text, addition of a corepressor decreases fold-change in gene expression. The left and right panels demonstrate how varying the values of the repressor copy number R and repressor-DNA binding energy Δε_RA, respectively, change the predicted response profiles. (B) In the case of inducible activation, binding of an effector molecule to an activator transcription factor increases the fold-change in gene expression. Note that for activation, the fold-change is greater than 1. The let and center panels show how changing the activator copy number A and activator-DNA binding energy Δε_AA alter response, respectively. The right panel shows how varying the polymerase-activator interaction energy ε_AP alters the fold-change. Relatively small perturbations to this energetic parameter drastically changes the level of activation and plays a major role in dictating the dynamic range of the system. The Python code (`ch6_figS20.py`) used to generate this figure can be found on the thesis GitHub repository.

E. coli Primer and Strain List

Here, we provide additional details about the genotypes of the strains used, as well as the primer sequences used to generate them. E. coli strains were derived from K12 MG1655. For those containing R = 22, we used strain HG104 which additionally has the lacYZA operon deleted (positions 360,483 to 365,579), but still contains the native lacI locus. All other strains used strain HG105, where both the lacYZA and lacI operons have both been deleted (positions 360,483 to 366,637).

All 25x+11-yfp expression constructs were integrated at the galK locus (between positions 1,504,078 and 1,505,112) while the 3*1x-lacI constructs were integrated at the ybcN locus (between positions 1,287,628 and 1,288,047). Integration was performed with λ Red recombineering (Sharan et al. 2009) as described in Garcia et al. (2011 a) using the primers listed in Table 6.3. We follow the notation of Lutz and Bujard (Lutz and Bujard 1997) for the nomenclature of the different constructs used. Specifically, the first number refers to the antibiotic resistance cassette that is present for selection (2 = kanamycin, 3 = chloramphenicol, and 4 = spectinomycin) and the second number refers to the promoter used to drive expression of either YFP or LacI (1 = P_{LtetO − 1}, and 5 = lacUV5). Note that in 25x+11-yfp, x refers to the LacI operator used, which is centered at +11 (or alternatively, begins at the transcription start site). For the different LacI constructs, 3*1x-lacI, x refers to the different ribosomal binding site modifications that provide different repressor copy numbers and follows from Garcia et al. (2011 a). The asterisk refers to the presence of FLP recombinase sites flanking the chloramphenicol resistance gene that can be used to lose this resistance. However, we maintained the resistance gene in our constructs. A summary of the final genotypes of each strain is listed in Table 6.4. In addition, each strain also contained the plasmid pZS4*1-mCherry and provided constitutive expression of the mCherry fluorescent protein. This pZS plasmid is a low copy (SC101 origin of replication) where, like with 3*1x-lacI, mCherry is driven by a P_{LtetO − 1} promoter.

Primers used in this work.
Primer	Sequence (5’ → 3’)	Notes
pZSForwSeq2	TTCCCAACC TTACCAGAGG GC	Forward sequencing primer for 3*1x-lacI
251 F	CCTTTCGTCT TCACCTCGA	Forward sequencing primer for 25x+11-YFP
YFP1	ACTAGCAACAC CAGAACAGCCC	Reverse sequencing primer for 3*1x-lacI and 25x+11-YFP
HG 6.1 (galK)	gtttgcgcgc agtcagcgat atccattttc gcgaatccg gagtgtaag aaACTAGCAAC ACCAGAACA GCC	Reverse primer for 25x+11-YFP integration in to the galK locus (lowercase).
HG 6.3 (galK)	ttcatattgt tcagcgacag cttgctgtac ggcaggcac cagctcttc cgGGCTAATGC ACCCAGTAA GG	Forward integration primer for 25x+11-YFP with homology to the galK locus (lowercase.
HG11.1 (ybcN)	acctctgcgg aggggaagcg tgaacctctc acaagacgg catcaaatt acACTAGCAAC ACCAGAACA GCC	Reverse integration primer for 31x-lacI with homology to the ybcN* locus (lowercase).
HG11.3 (ybcN)	ctgtagatgtg tccgttcatg acacgaataa gcggtgtag ccattacgc cGGCTAATGCA CCCAGTAAG G	Forward integration primer for 31x-lacI with homology to the ybcN* locus (lowercase).
ybcN-control-upstream-1	AGCGTTTGA CCTCTGCGGA	Sequencing primer to verify integration
ybcN-control-downstream-1	GCTCAGGTT TACGCTTAC GACG	Sequencing primer to verify integration

*E. coli* strains used in this work.
Strain	Genotype
O1, R = 0	HG105::galK<>25O1+11-YFP
O1, R = 22	HG104::galK<>25O1+11-YFP
O1, R = 60	HG105::galK<>25O1+11-YFP, ybcN<>3*1RBS1147-lacI
O1, R = 124	HG105::galK<>25O1+11-YFP, ybcN<>3*1RBS446-lacI
O1, R = 260	HG105::galK<>25O1+11-YFP, ybcN<>3*1RBS1027-lacI
O1, R = 1220	HG105::galK<>25O1+11-YFP, ybcN<>3*1RBS1-lacI
O1, R = 1740	HG105::galK<>25O1+11-YFP, ybcN<>3*1RBS1L-lacI
O2, R = 0	HG105::galK<>25O2+11-YFP
O2, R = 22	HG104::galK<>25O2+11-YFP
O2, R = 60	HG105::galK<>25O2+11-YFP, ybcN<>3*1RBS1147-lacI
O2, R = 124	HG105::galK<>25O2+11-YFP, ybcN<>3*1RBS446-lacI
O2, R = 260	HG105::galK<>25O2+11-YFP, ybcN<>3*1RBS1027-lacI
O2, R = 1220	HG105::galK<>25O2+11-YFP, ybcN<>3*1RBS1-lacI
O2, R = 1740	HG105::galK<>25O2+11-YFP, ybcN<>3*1RBS1L-lacI
O3, R = 0	HG105::galK<>25O3+11-YFP
O3, R = 22	HG104::galK<>25O3+11-YFP
O3, R = 60	HG105::galK<>25O3+11-YFP, ybcN<>3*1RBS1147-lacI
O3, R = 124	HG105::galK<>25O3+11-YFP, ybcN<>3*1RBS446-lacI
O3, R = 260	HG105::galK<>25O3+11-YFP, ybcN<>3*1RBS1027-lacI
O3, R = 1220	HG105::galK<>25O3+11-YFP, ybcN<>3*1RBS1-lacI
O3, R = 1740	HG105::galK<>25O3+11-YFP, ybcN<>3*1RBS1L-lacI
Oid, R = 22	HG104::galK<>25Oid+11-YFP
Oid, R = 60	HG105::galK<>25Oid+11-YFP, ybcN<>3*1RBS1147-lacI
Oid, R = 124	HG105::galK<>25Oid+11-YFP, ybcN<>3*1RBS446-lacI

References

Aghaeepour, Nima, Greg Finak, Holger Hoos, Tim R. Mosmann, Ryan Brinkman, Raphael Gottardo, and Richard H. Scheuermann. 2013. “Critical Assessment of Automated Flow Cytometry Data Analysis Techniques.” Nature Methods 10 (3): 228–38. https://doi.org/10.1038/nmeth.2365.

Bintu, Lacramioara, Nicolas E Buchler, Hernan G Garcia, Ulrich Gerland, Terence Hwa, Jané Kondev, and Rob Phillips. 2005. “Transcriptional Regulation by the Numbers: Models.” Current Opinion in Genetics & Development 15 (2): 116–24. https://doi.org/10.1016/j.gde.2005.02.007.

Boulton, Stephen, and Giuseppe Melacini. 2016. “Advances in NMR Methods to Map Allosteric Sites: From Models to Translation.” Chemical Reviews 116 (11): 6267–6304. https://doi.org/10.1021/acs.chemrev.5b00718.

Brewster, Robert C., Franz M. Weinert, Hernan G. Garcia, Dan Song, Mattias Rydenfelt, and Rob Phillips. 2014. “The Transcription Factor Titration Effect Dictates the Level of Gene Expression.” Cell 156 (6): 1312–23. https://doi.org/10.1016/j.cell.2014.02.022.

Daber, Robert, Kim Sharp, and Mitchell Lewis. 2009. “One Is Not Enough.” Journal of Molecular Biology 392 (5): 1133–44. https://doi.org/10.1016/j.jmb.2009.07.050.

Daber, Robert, Matthew A. Sochor, and Mitchell Lewis. 2011. “Thermodynamic Analysis of Mutant Lac Repressors.” Journal of Molecular Biology, The Operon Model and its Impact on Modern Molecular Biology, 409 (1): 76–87. https://doi.org/10.1016/j.jmb.2011.03.057.

Edelstein, Arthur D, Mark A Tsuchida, Nenad Amodaj, Henry Pinkard, Ronald D Vale, and Nico Stuurman. 2014. “Advanced Methods of Microscope Control Using μManager Software.” Journal of Biological Methods 1 (2): 10. https://doi.org/10.14440/jbm.2014.36.

Garcia, Hernan G., Heun Jin Lee, James Q. Boedicker, and Rob Phillips. 2011a. “Comparison and Calibration of Different Reporters for Quantitative Analysis of Gene Expression.” Biophysical Journal 101 (3): 535–44. https://doi.org/10.1016/j.bpj.2011.06.026.

Garcia, H. G., and R. Phillips. 2011b. “Quantitative Dissection of the Simple Repression Input-Output Function.” Proceedings of the National Academy of Sciences 108 (29): 12173–8. https://doi.org/10.1073/pnas.1015616108.

Gardino, Alexandra K., Brian F. Volkman, Ho S. Cho, Seok-Yong Lee, David E. Wemmer, and Dorothee Kern. 2003. “The NMR Solution Structure of BeF₃–Activated Spo0F Reveals the Conformational Switch in a Phsophorelay System.” Journal of Molecular Biology 331 (1): 245–54. https://doi.org/10.1016/S0022-2836(03)00733-2.

Lutz, Rolf, and Hermann Bujard. 1997. “Independent and Tight Regulation of Transcriptional Units in Escherichia Coli via the LacR/O, the TetR/O and AraC/I1-I2 Regulatory Elements.” Nucleic Acids Research 25 (6): 1203–10. https://doi.org/10.1093/nar/25.6.1203.

Marzen, Sarah, Hernan G. Garcia, and Rob Phillips. 2013. “Statistical Mechanics of MonodWymanChangeux (MWC) Models.” Journal of Molecular Biology, Allosteric Interactions and Biological Regulation (Part I), 425 (9): 1433–60. https://doi.org/10.1016/j.jmb.2013.03.013.

Oehler, S., M. Amouyal, P. Kolkhof, B. von Wilcken-Bergmann, and B. Müller-Hill. 1994. “Quality and Position of the Three Lac Operators of E. Coli Define Efficiency of Repression.” The EMBO Journal 13 (14): 3348–55.

Rydenfelt, Mattias, Hernan G. Garcia, Robert Sidney Cox, and Rob Phillips. 2014. “The Influence of Promoter Architectures and Regulatory Motifs on Gene Expression in Escherichia Coli.” Edited by Jordi Garcia-Ojalvo. PLOS ONE 9 (12): e114347. https://doi.org/10.1371/journal.pone.0114347.

Schmidt, Alexander, Karl Kochanowski, Silke Vedelaar, Erik Ahrné, Benjamin Volkmer, Luciano Callipo, Kèvin Knoops, Manuel Bauer, Ruedi Aebersold, and Matthias Heinemann. 2016. “The Quantitative and Condition-Dependent Escherichia Coli Proteome.” Nature Biotechnology 34 (1): 104–10. https://doi.org/10.1038/nbt.3418.

Setty, Y., A. E. Mayo, M. G. Surette, and U. Alon. 2003. “Detailed Map of a Cis-Regulatory Input Function.” Proceedings of the National Academy of Sciences 100 (13): 7702–7. https://doi.org/10.1073/pnas.1230759100.

Sharan, S. K., L. C. Thomason, S. G. Kuznetsov, and D. L. Court. 2009. “Recombineering: A Homologous Recombination-Based Method of Genetic Engineering.” Nature Protocols 4: 206–23. https://doi.org/nprot.2008.227\\\%0020[pii]\\\%002010.1038/nprot.2008.227.

Shis, David L., Faiza Hussain, Sarah Meinhardt, Liskin Swint-Kruse, and Matthew R. Bennett. 2014. “Modular, Multi-Input Transcriptional Logic Gating with Orthogonal LacI/GalR Family Chimeras.” ACS Synthetic Biology 3 (9): 645–51. https://doi.org/10.1021/sb500262f.

Sivia, Devinderjit, and John Skilling. 2006. Data Analysis: A Bayesian Tutorial. OUP Oxford.

Sochor, Matthew Almond. 2014. “In Vitro Transcription Accurately Predicts Lac Repressor Phenotype in Vivo in Escherichia Coli.” PeerJ 2 (July): e498. https://doi.org/10.7717/peerj.498.

Vilar, Jose M. G., and Leonor Saiz. 2013. “Reliable Prediction of Complex Phenotypes from a Modular Design in Free Energy Space: An Extensive Exploration of the Lac Operon.” ACS Synthetic Biology 2 (10): 576–86. https://doi.org/10.1021/sb400013w.

Weinert, Franz M., Robert C. Brewster, Mattias Rydenfelt, Rob Phillips, and Willem K. Kegel. 2014. “Scaling of Gene Expression with Transcription-Factor Fugacity.” Physical Review Letters 113 (25). https://doi.org/10.1103/PhysRevLett.113.258101.