Theory and implementation - Localization of molecular orbitals and its benefits

3.5 Localization of molecular orbitals and its benefits

4.1.1 Theory and implementation

Since the Mrcc suite of quantum chemical programs [240] had not included an effective CC2 module previously, the first step was to develop an efficient code from scratch. Our implementation of the CC2 equations utilizing the DF approximation is based on the algorithm of H¨attig and co-workers [31]. Since our intention was to decrease the cost of the CC2 method partly by reducing the MO space, the important difference is that our implementation uses MO integrals as opposed to the atomic orbital-based algorithm of Ref. 31. This required the slight modification of the working equations and the algorithm, which are presented in Table 4.1 together with the scaling of the various steps. The rate-determining step of the procedure is still the construction of the (aiˆ|bj) integral list (step 6) and the contraction of the doubles amplitudes (step 7), the operation count of which scales as n²_occn²_virtn_aux.

Table 4.1: Working equations and their scaling for an iteration in the ground-state CC2 algorithm

Step Scaling

1 Construct intermediates ˆX_ai^Q and ˆX_ij^Q Xˆ_ai^Q=P

bJ_ab^QT_i^b n_occn²_virtn_aux

Xˆ_ij^Q=P

aJ_ai^QT_j^a n²_occn_virtn_aux

2 Compute the ˆJ_ji^Q and ˆJ_ai^Q lists and the contribution of one term in ˆFai to Ωai

Jˆ_ij^Q = ˆX_ij^Q+J_ij^Q n²_occn_aux Ω_ai ← −P

QjXˆ_aj^QJˆ_ji^Q n²_occn_virtn_aux

Jˆ_ai^Q = ˆX_ai^Q+J_ai^Q−P

jJˆ_ij^QT_j^a 2noccnvirtnaux+n²_occnvirtnaux

3 Construct intermediateX^Q X^Q= 2P

aiJ_ai^QT_i^a n_occn_virtn_aux

4 Compute further contributions of ˆFai to Ωai

Ω_ai ←P

Qj(P

kXˆ_jk^QT_k^a) ˆJ_ji^Q 2n²_occn_virtn_aux Ω_ai ←P

QJˆ_ai^QX^Q+T_i^a(ε_a−ε_i) n_occn_virtn_aux+n_occn_virt 5 Calculate ˆFjb

Fˆ_jb =P

QJˆ_bj^QX^Q−P

QiXˆ_ji^QJ_bi^Q n_occn_virtn_aux+n²_occn_virtn_aux 6 Construct the (aiˆ|bj) list

(aiˆ|bj) = P

QJˆ_ai^QJˆ_bj^Q n²_occn²_virtnaux

7 Compute and contract ˆT_ij^ab, calculate the contributions of ˆF_jb to Ω_ai Tˆ_ij^ab = [2(aiˆ|bj)−(ajˆ|bi)]/D^ab_ij n²_occn²_virt Yˆ_ai^Q =P

bjTˆ_ij^abJ_bj^Q n²_occn²_virtnaux

Ω_ai ←P

bjTˆ_ij^abFˆ_jb n²_occn²_virt

8 Calculate further contributions of ˆT_ij^ab to Ω_ai Ωai ←P

Qb(J_ba^Q −P

jJ_bj^QT_j^a) ˆY_bi^Q =P

QbJˆ_ba^QYˆ_bi^Q 2noccn²_virtnaux+n²_virtnaux

Ω_ai ← −P

QjJˆ_ji^QYˆ_aj^Q n²_occn_virtn_aux

9 Calculate CC2 correlation energy

∆ECC2 =P

Qai( ˆJ_ai^Q+X^QT_i^a−P

jXˆ_ij^QT_j^a) ˆY_ai^Q 4noccnvirtnaux+n²_occnvirtnaux

Since the structure of the excited state equations is very similar to that of the cor-responding ground-state terms, only the number of intermediates is higher, the algorithm implemented for the ground state calculations is also applicable mutatis mutandisto the evaluation of theσvector. For the reduced-cost state-specific excited state calculations we implemented a modified version of the algorithm proposed by H¨attig and co-workers [31].

In the first step a multi-state DF-CIS calculation is run, and the corresponding CIS ex-citation energies and vectors will be used as initial guess in the subsequent calculations.

Then, for each excited state, the eigenvector is preoptimized by a modified Davidson algo-rithm using the root-following technique [241] to avoid convergence to a wrong solution.

Finally, the non-linear problem is solved by a direct inversion in the iterative subspace (DIIS) algorithm, which is also applied to solve the ground-state CC2 equations. Our experience showed that in particular cases the above algorithm finds the wrong root, that is, we arrive at the same CC2 solution starting from different CIS eigenvectors. We could overcome this problem by projecting out the single excitation part of the CC2 eigenvectors converged previously from the initial guess CIS vectors.

In order to speed up the calculations, we conservatively reduced the size of the bases using the combination of the NO and NAF techniques. For the construction of the NOs for ground-state CC2 calculations the MP2 density matrix is a plausible choice.

For excited states the selection of an appropriate wave function Φ in Eqs. (3.68) and (3.69) is less trivial. Here the CIS(D) approach seems to be a good choice since it is an excited-state generalization of the MP2 method and accounts for about the same amount of correlation for the excited state as the MP2 method does for the ground state.

Moreover, the calculation of CIS(D) density matrices is relatively cheap, the computation time required is similar to that of a CC2 iteration. Of course, we should keep in mind that the ground-state CC2 amplitudes are also necessary to solve the equations for the excited states, and it is highly desirable to solve both equations in the same MO basis. For this reason we construct a state-specific MO basis for each excited state in which both the ground- and excited-state equations are solved. Consequently, a single, “state-averaged”

(hole) density matrix is required for each excited state for the construction of the NOs.

For this purpose we simply take the average of the MP2 (D^MP2) and CIS(D) (D^CIS(D)) densities, i.e., we diagonalize the D = (D^MP2 + D^CIS(D))/2 density matrices, and the similar expression holds for the hole densities.

The required blocks of the one-particle MP2 (hole) density matrices can be calcu-lated as

D^MP2_ij =X

kab

(2t^ab_ikt^ab_jk−t^ab_ikt^ba_jk) (4.1) D^MP2_ab =X

ijc

(2t^ca_ijt^cb_ij −t^ca_ijt^bc_ij). (4.2) The algorithm for the construction of MP2 densities is presented in Table 4.2. The bot-tleneck is the assembly of the (ai|bj) integral list, which scales asn²_occn²_virtnaux. There are two other fifth-power-scaling-operations in the algorithm. The evaluation of the virtual-virtual block of the density matrix scaling as n²_occn³_virt is also relatively expensive, while the computation time needed for the calculation of the occupied-occupied block of the

Table 4.2: Working equations and their scaling for the evaluation of MP2 density matrices

Step Scaling

1 Assemble (ai|bj) list (ai|bj) =P

QJ_ai^QJ_bj^Q n²_occn²_virtn_aux 2 Compute t^ab_ij

t^ab_ij = (ai|bj)/D^ab_ij n²_occn²_virt 3 Compute intermediate X_ij^ab

X_ij^ab = 2t^ab_ij −t^ba_ij n²_occn²_virt 4 Compute density matrices

D^MP2_ij =P

kabt^ab_ikX_jk^ab n³_occn²_virt D^MP2_ab =P

ijct^ca_ijX_ij^cb n²_occn³_virt

hole density matrix is in general considerably shorter. We note that the MP2 densities must be constructed only once, independently of the number of excited states.

The CIS(D) density matrix can be split up as the sum of the CIS density matrix and the contribution of the perturbative (D) correction, D^CIS(D) =D^CIS+D^(D). The CIS density matrices read as

D^CIS_ij =X

c^a_ic^a_j, (4.3)

D_ab^CIS=X

c^a_ic^b_i, (4.4)

wherec^a_i is a CIS coefficient, while we defineD^(D)analogously to its MP2 counterpart as D^(D)_ij =X

kab

(2c^ab_ikc^ab_jk −c^ab_ikc^ba_jk), (4.5) D_ab^(D)=X

ijc

(2c^ca_ijc^cb_ij −c^ca_ijc^bc_ij) (4.6) with c^ab_ij as a CIS(D) doubles coefficient. The latter were given by Head-Gordon and co-workers [157] in a spin-orbital basis. The corresponding expression for spatial orbitals can easily be derived by spin-integration and exploiting the relationships of doubles am-plitudes [242], and reads as

c^ab_ij = P

c[(ac|bj)c^c_i + (ai|bc)c^c_j]−P

k[(kj|ai)c^b_k+ (ki|bj)c^a_k]

D_ij^ab+ω_CIS , (4.7)

where ω_CIS stands for the CIS excitation energy. The calculation of CIS(D) densities follows the route shown in Table 4.3. As it can be seen, similar to the evaluation of the MP2 densities, the rate-determining operation is the construction of a four-index

Table 4.3: Working equations and their scaling for the evaluation of CIS(D) density matrices

Step Scaling

1 Construct CIS contribution to density matrices D^CIS(D)_ij ←P

ac^a_ic^a_j n²_occn_virt D^CIS(D)_ab ←P

ic^a_ic^b_i n_occn²_virt 2 Compute intermediate Y_ai^Q

Y_ai^Q ←P

cJ_ac^Qc^c_i n_occn²_virtn_aux Y_ai^Q ← −P

jJ_ij^Qc^a_j n²_occn_virtn_aux 3 Compute intermediate V_ij^ab

V_ij^ab =P

QY_ai^QJ_bj^Q n²_occn²_virtn_aux 4 Calculate c^ab_ij coefficients

c^ab_ij = (V_ij^ab+V_ji^ba)/(D^ab_ij +ωCIS) n²_occn²_virt 5 Compute intermediate X_ij^ab

X_ij^ab = 2c^ab_ij −c^ba_ij n²_occn²_virt 6 Compute density matrices

D^CIS(D)_ij ←P

kabc^ab_ikX_jk^ab n³_occn²_virt D^CIS(D)_ab ←P

ijcc^ca_ijX_ij^cb n²_occn³_virt

intermediate as a product of three-index quantities, which scales asn²_occn²_virtn_aux. However, in addition to the inexpensive CIS contribution there are two further forth-power-scaling terms (step 2) in contrast to the MP2 density matrices, where the occupied-occupied and virtual-virtual blocks of the three-center integral list are contracted with CIS coefficients.

Since the CIS(D) coefficients, and thus the density matrices are state-dependent, these quantities have to be evaluated for each excited state.

Our numerical experience showed that further approximations can be introduced at the construction of the D^MP2 and D^CIS(D) matrices. As it will be demonstrated in the following subsection, the calculated excitation energies hardly change if we neglect the exchange-like contributions to the density matrices, that is, instead of Eqs. (4.1), (4.2), (4.5), and (4.6) we define the density matrices as

D^MP2_ij =X

kab

2t^ab_ikt^ab_jk (4.8)

D_ab^MP2=X

ijc

2t^ca_ijt^cb_ij (4.9)

D^(D)_ij =X

kab

2c^ab_ikc^ab_jk (4.10)

D^(D)_ab =X

ijc

2c^ca_ijc^cb_ij. (4.11)

The advantage of these equations is that the evaluation of intermediates X_ij^ab (see Tables 4.2 and 4.3) can be avoided, and the calculation of the density matrices can be reduced to the multiplication of a matrix with its transpose, which operation can be efficiently performed. Furthermore, a simple and fast algorithm can be designed for the evaluation of the above equations if the t^ab_ij or c^ab_ij list does not fit into the memory. In this case, at the construction of the MP2 densities (see Table 4.2) one of the virtual indices of the doubles amplitudes is split up into blocks, hereafter denoted byA, so that the amplitudes with the remaining three indices can be kept in the memory. The outmost loop in the algorithm runs over A, and in the inner loops the doubles amplitudes with one virtual index in the given index range are calculated, and their contribution to the density matrices is computed. The evaluation of the CIS(D) density matrices deserves somewhat more attention (see Table 4.3) since the size of the J_ab^Q list can also easily exceed the size of the available memory. In this case the list is blocked according to its auxiliary index so that the J_ab^Q integrals with Q in a block can be held in the memory, and the first contribution to intermediate Y_ai^Q (step 2) is evaluated within a loop performed over the blocks. Analogously to the MP2 algorithm one of the virtual indices of the four-index intermediates is also restricted, and the subsequent steps (steps 3 to 6) are carried out in a loop running over the corresponding A blocks. A further difficulty is that the symmetrization of intermediateV_ij^abis necessary to obtain thec^ab_ij coefficients (step 4), and it requires the recalculation of particularV_ij^ab matrix elements if the out-of-core algorithm is executed.

After the diagonalization of the (hole) density matrices and the selection of the NOs of sufficiently high occupation number, the truncated NO basis is canonicalized so that we will be able to employ the equations derived for canonical MOs. The elements of the latter basis will be denoted by ˜a, ˜i, . . . , and ˜J will stand for the J list the MO indices of which are transformed to the canonicalized NO basis.

The number of the variables could be further reduced utilizing the NAF approach.

However, an important question is at what stage the NAF basis should be constructed and what Jshould be used for that purpose. In principle, we can change for a NAF basis right at the beginning of the calculation, prior to the evaluation of the MP2 and CIS(D) density matrices. The advantage of this strategy would be that the expenses of the latter steps would also be reduced. However, as it is discussed in Ref. 233, the NAF basis is an auxiliary basis that is optimal for the given MO basis, and hence, it is expedient to construct the NAFs from the J matrix whose MO indices are already transformed to

the final MO basis, in our case from matrix ˜J. The second question is what block of J˜ is to be used for the determination of the NAFs. Of course, the NAFs constructed from the entire ˜J will be the exact right SVs of the matrix. However, we can also build approximate NAF bases from particular blocks of ˜J, for instance, we can compute matrix W only from the ˜J_ai^Q or ˜J_pi^Q integrals. In this way we can reduce the time spent on the construction of the NAF basis, but probably more NAFs have to be retained to obtain results of certain accuracy as the NAFs computed, e.g., from the ˜J_ai^Q list will be lower-quality fitting functions for the (ab|ij)-type integrals. These aspects will be analyzed in the following subsection.

To conclude this subsection we overview our general algorithm for the present reduced-cost CC2 approach, which is as follows.

1. Solve HF equations

2. Solve CIS equations for all the excited states using a multi-state algorithm 3. Loop over excited states

(a) Compute state-averaged one-particle density matrices (Tables4.2 and 4.3) (b) Transform the MO indices ofJ to the pseudo-canonical NO basis

(c) Evaluate matrix W from ˜J [Eq. (3.66)], select NAFs to be retained (d) Transform the auxiliary function index of ˜J to the NAF basis [Eq. (3.67)]

(e) Save integral list J to disk End loop

4. Loop over excited states

(a) Retrieve integral list J from disk

(b) Solve the ground- and excited-state CC2 equations (Table4.1) End loop

Note that the CC2 equations are solved in a different loop than the one in which the densities, the NAF basis, and the transformed integrals are constructed. The rationale behind this is that, in the first loop, it enables the design of an algorithm where the originalJ list is read from disk or recalculated only once for the integral transformation.

The transformed J integral list is saved to the disk and retrieved in the second loop.

However, the size of the J lists are significantly, by more than an order of magnitude smaller than that of matrix J, thus the reading and processing of the J list does not cause any complication at the CC2 calculations.

In document 2019 D´avidMester EﬃcientExcited-StateQuantumChemicalApproachesforLargeMolecules (Pldal 30-37)