Test–Retest Reproducibility of In Vivo Magnetization Transfer Ratio and Saturation Index in Mice at 9.4 Tesla

Magnetization transfer saturation (MTsat) imaging was developed to reduce T1 dependence and improve specificity to myelin, compared to the widely used MT ratio (MTR) approach, while maintaining a feasible scan time. As MTsat imaging is an emerging technique, the reproducibility of MTsat compared to MTR must be evaluated.

applying an off-resonance radiofrequency pre-pulse (MT pulse) to selectively saturate the spectrally broad macromolecular proton pool. This saturation then transfers to the free water proton pool via MT, resulting in a decrease in the observed free water signal. The magnitude of the MT effect can be characterized by the MTR where PDw is the signal without an MT pulse applied, which is proton density weighted (PDw), and MTw is the signal with the MT pulse applied, which is MT weighted (MTw). Although MTR has been shown to correlate well with histological myelin content, 1,4 it is also sensitive to the choice of sequence parameters, flip angle inhomogeneities, and longitudinal relaxation time (T1). 5 T1 also correlates strongly with myelin content, but is also sensitive to axon size 6 and iron content, 7 mitigating the power of MTR as a measure of myelin. Quantitative MT has been used in many recent works to quantify myelin, [8][9][10] as it reduces the confounding effects of scan parameters and quantifies specific tissue characteristics, such as the macromolecular pool size. 3 However, qMT relies on complex modeling of the MR signal dependence on myelin, and requires more measurements and thus longer acquisition times compared to contrastweighted MT protocols. 3 Magnetization transfer saturation (MTsat) imaging was developed to improve MTR, by decoupling MTR from T1 effects, while maintaining a feasible scan time. 5 The shorter scan time compared to qMT enables longitudinal in vivo imaging and allows the addition of other imaging techniques required to characterize microstructure. A scalar map of MTsat can be acquired using two reference scans of proton density and T1 weighting (PDw and T1w, respectively), and one MTw scan. MTsat, being more independent of system parameters and T1 weighting, and less susceptible to inhomogeneities of the receiver coil and the transmitted RF field, provides greater specificity and contrast compared to MTR. 5,11 MTsat shows higher white matter contrast in the brain than MTR, 5 and has been shown to correlate more with disability metrics than MTR in patients with multiple sclerosis. 12 Hagiwara et al reported that MTsat may be more suited to measure myelin in the white matter, compared to the ratio of T1-weighted to T2-weighted images, which has also been proposed as a measure of myelin. 13 There is strong interest in applying MT to preclinical rodent neuroimaging studies at ultra-high field strengths (≥7 T), demonstrated by MTR [14][15][16][17] and qMT studies. 10,[18][19][20] The feasibility of MTsat imaging in mice at 9.4 T has been shown previously 21 and MTsat has been explored in a feline model of demyelination at 3 T. 22 While most MTsat studies have been performed at 3 T, recently, Olsson et al reported an optimized whole-brain MTsat protocol at 7 T, 23 which highlights the increasing interest in this method. Although previous in vivo MTsat studies have shown high reproducibility in humans at 3 T, 24,25 the comparability of MTR and MTsat reproducibility has not been fully evaluated. This also leaves open the question of MTsat reproducibility in a preclinical setting, at an ultra-high field strength. As MTsat provides a time-efficient alternative to fully quantitative techniques but with increased specificity and contrast compared to MTR, investigation of MTsat in preclinical rodent imaging will likely be of interest to other research groups. The aim of this work was to assess test-retest reproducibility of in vivo MTR and MTsat in mice at 9.4 T and provide estimates of required sample sizes, which is essential in planning preclinical neuroimaging studies involving models of disease/injury.

Subjects
All animal procedures were approved by the University of Western Ontario Animal Use Subcommittee and were consistent with guidelines established by the Canadian Council on Animal Care. Twelve adult C57Bl/6 mice (six males and six females) were scanned twice 5 days apart. The sample size was chosen to reflect similar sample sizes used in other pre-clinical imaging studies. [26][27][28] Before scanning, anesthesia was induced by placing the animals in an induction chamber with 4% isoflurane and an oxygen flow rate of 1.5 L/min. Following induction, isoflurane was maintained during the imaging session at 1.8% with an oxygen flow rate of 1.5 L/min through a custom-built nose cone. The mouse head was fixed in place using ear bars and a bite bar to prevent head motion. These mice were also part of a different longitudinal study with three additional imaging sessions following the test and retest scans, at the end of which they were euthanized for histology. The mice were anesthetized with ketamine/xylazine 1,2 and then underwent trans-cardiac perfusion with ice-cold saline, followed by 4% paraformaldehyde in phosphate-buffered saline.
In Vivo MRI In vivo MRI experiments were conducted on a hybrid system: Agilent 9.4 Tesla, 31-cm bore magnet (Agilent, Palo Alto, CA), equipped with a 60-mm gradient coil set of 1 T/m strength (slew rate = 4100 T/m/s) (Agilent) and Bruker Avance MRI III console with software package of Paravision-7 (Bruker BioSpin Corp, Billerica, MA). A single channel transceiver surface coil (20 Â 25 mm), built inhouse, was fixed in place directly above the mouse head to maximize signal-to-noise ratio (SNR). A boost in SNR in the cortex when using this surface coil, compared to a commercially available 40-mm millipede (MP40) volume coil (Agilent), has been reported previously. 29 The MT protocol required 50 minutes total scan time and comprised three FLASH-3D (fast low angle shot) scans, and one RF transmit field (B1) map scan to correct for local variations in flip angle. An MT-weighted scan, and reference T1-weighted and PD-weighted scans (MTw, T1w, and PDw, respectively) were acquired by appropriate choice of the repetition time (TR) and the flip angle (α): TR/α = 8.5 ms/20 for the T1w scan and 25 ms/9 for the PDw and the MTw scans. MT-weighting was achieved by applying an offresonance Gaussian-shaped RF pulse (12-ms duration, 385 nominal flip angle, 3.5 kHz frequency offset from water resonance, 5 μT RF peak amplitude) prior to the excitation. Other acquisition parameters were: TE = 2.75 ms; resolution = 150 Â 150 Â 400 μm 3 ; field of view (FOV) = 19.2 Â 14.4 Â 12 mm 3 ; read-out bandwidth = 125 kHz; 12 averages. The B1 map was acquired at a lower resolution of 600 Â 600 Â 400 μm 3 and the following scan parameters: TE = 4 ms; α = 60 ; short TR = 20 ms; long TR = 100 ms; 2 averages. Anatomical images were also acquired for each subject within each session using a 2D T2-weighted TurboRARE pulse sequence (150-μm in-plane resolution; 500-μm slice thickness; TE/TR = 40/5000 ms; 16 averages; total acquisition time = 22 minutes).
Image Processing MTR and MTsat maps were generated using in-house MATLAB (ver. 2020b, Mathworks, Natick, MA) code. Gaussian filtering (full-width-half-maximum = 3 voxels) was first applied to the original images (MTw, PDw, and T1w images, and B1 maps) to reduce noise, while retaining image contrast. The standard MTR maps were calculated using Eq. 1. MTw, PDw, and T1w images were used to calculate MTsat maps, following the original method proposed by Helms et al, 5 and outlined by Hagiwara et al. 13 The following parameter estimates are influenced by local flip angle errors and are hence labeled by the subscript "app." The apparent longitudinal relaxation rate, R 1app , was calculated as follows: where S T1 and S PD denote signal intensities of T1w and PDw images, respectively; TR T1 and TR PD denote TR of T1w and PDw images, respectively; and α T1 and α PD denote excitation flip angles of T1w and PDw images, respectively. The apparent signal amplitude, A app , was calculated as follows: Using R 1app and A app , the apparent MT saturation, MTsat app , was calculated as follows: where S MT , TR MT , and α MT denote signal intensity, TR, and excitation flip angle of the MTw image, respectively. MTsat app is inherently robust against differences in relaxation rates and inhomogeneities of RF transmit and receive field compared with conventional MTR imaging. 5,11 Furthermore, B1 maps were used to correct for small residual higher order dependencies of the MT saturation on the local RF transmit field to further improve spatial uniformity, as suggested by Weiskopf et al 24 where RF local is the relative flip angle α compared to the nominal flip angle. Brain masks were produced using the skull stripping tool from BrainSuite (ver. 19b, http://brainsuite.org/ quickstart/cse/). 30 Image registration was performed using affine and symmetric diffeomorphic transforms with ANTs software (https://github.com/ANTsX/ANTs). 31 Region-ofinterest (ROI) masks were acquired from the labeled Allen Mouse Brain Atlas. 32 One T2-weighted scan was performed for each subject at each timepoint. As registration to an atlas is time-consuming, a T2-weighted scan from only one subject at the test timepoint was chosen (the "chosen T2") to be registered to the atlas. All other T2-weighted images from other subjects, at both timepoints, were registered to the "chosen T2." MTR parameter maps were registered to the corresponding anatomical images (from the same subject at the same timepoint). For ROI-based analysis, the inverse transformations resulting from the preceding registration steps (MTR ➝ corresponding T2 ➝ chosen T2 ➝ atlas) were then used to bring the labeled atlas to the corresponding MT space for each subject at each timepoint. The inverse transformations, computed by ANTs for each registration step, are used to perform the opposite operation (such as deforming an image in the atlas space and producing an output in the chosen T2 space), and include inverse deformation fields and inverse affine transforms. Binary masks for each ROI were generated by thresholding the labeled atlas. Each mask was eroded by one voxel, except for the corpus callosum masks, to minimize partial volume errors within a given ROI. The binary masks were visually inspected to ensure good registration quality.
Furthermore, to perform whole brain voxel-wise analysis of all subjects across both timepoints, the data were registered to a common template. MTR maps were first registered to one MTR map (the "chosen MTR"). All MTsat maps were then registered to the chosen MT space using a single transform: MTR ➝ chosen MTR.

Data Availability
The test-retest dataset and in-house code to compute MTR and MTsat is available online: https://osf.io/5nwae/.

Data Analysis
ROI-BASED AND VOXEL-WISE ANALYSIS. ROI analysis was performed using two approaches: analysis of unregistered data and analysis of data registered to a common template. For the second approach, all MTR and MTsat maps were registered to a "chosen" MTR space, as described above.
The ROI analysis focused on five different tissue regions: corpus callosum (CC), internal capsule (IC), hippocampus (HC), cortex (CX), and thalamus (TH). For both ROI-based approaches, Bland-Altman (BA) and CV analyses were performed using the mean MTR and MTsat values from each ROI. Voxel-wise CV analysis was also performed with the registered data. For the test scans of the registered data, the normalized contrast was averaged across all subjects, and an unpaired two-tailed t test was performed between MTR and MTsat contrast. For both registered and unregistered data, paired two-tailed t tests were performed to test for significant differences between ROI-based test and retest mean measurements. As there were multiple ROIs, the Bonferroni-Dunn method was used to correct for multiple comparisons.
Measurement reproducibility was explored for both ROI-based analysis and whole brain voxel-wise analysis. To mitigate partial volume errors from cerebrospinal fluid (CSF) in ROI-based analysis, voxels with MTR < 0.1 were omitted in both test and retest images. In voxel-wise analysis, voxels with MTR < 0.1, as measured on the test images, were omitted. BA analysis was performed for the ROI-based analyses to identify any biases between test and retest measurements. For both analysis techniques, the scan-rescan reproducibility was characterized using the coefficient of variation (CV). The CV Figure 3: Bland-Altman plots depicting biases between test and retest scans for mean MTR and MTsat values (from the ROI-based analysis). Unregistered data (left column) and data registered to a common template (right column) are shown. The solid black lines represent the mean bias, and the dotted black lines represent the AE1.96 standard deviation lines. The average of the test and retest mean values is plotted along the x-axis and the difference between the test and retest mean values is plotted along the y-axis. ROIs in the legend are abbreviated as follows: CC, corpus callosum; CX, cortex; IC, internal capsule; HC, hippocampus; TH, thalamus. reflects both the reproducibility and variability of these metrics, as well as provides insight into necessary sample sizes and minimum detectable effect size. CVs were calculated between subjects (bsCV) and within subjects (wsCV) to quantify the between-subject and within-subject reproducibility, respectively. The between-subject CV was calculated separately for the test and retest timepoints as the standard deviation divided by the mean value across subjects 1-12. These two CV values were then averaged for the mean between-subject CV. The within-subject CV was calculated separately for each subject as the standard deviation divided by the mean of the test and retest scans. The 12 withinsubject CVs were then averaged to determine the mean within-subject CV. For both registered and unregistered data, one-way analysis of variance (ANOVA) was performed to test for significant differences between ROI-based CVs, and unpaired two-tailed t tests were performed to test for significant differences between ROI-based MTR and MTsat CVs (using the Bonferroni-Dunn method to correct for multiple comparisons).
Sample size calculations were performed based on CVs from the ROI analysis of registered data. Minimum sample sizes required to detect defined biological effects (statistically significant changes of 6%, 8%, 10%, 12%, and 14%), using both between and within-subject approaches, were determined at a 95% significance level (α = 0.05) and power of 80% (1 À β = 0.80). The defined statistically significant changes were centered around 10%, as most MT studies report changes in MTR between 15% and 30%, 16,33 while some studies report more subtle changes between 5% and 10%. 15,34 This is explained in greater detail in Section 3. Thus, changes smaller than 10% were considered "small" changes and changes larger than 10% were considered "large" changes.
Following the procedure presented in van Belle, 35 the between-subject CVs were used to determine the sample size Figure 4: Mean between-subject and within-subject coefficients of variation (CV) for MTR and MTsat in each ROI. Reproducibility metrics for unregistered data (left column) and data registered to a common template (right column) are shown. Values for the between-subject CV condition represent the mean AE standard deviation over subjects (averaged over the test and retest timepoints). Values for the within-subject CV condition represent the mean AE standard deviation between test and retest (averaged over the 12 subjects). ROIs are abbreviated as follows: CC, corpus callosum; CX, cortex; IC, internal capsule; HC, hippocampus; TH, thalamus. required per group to detect a defined biological effect between subjects in each ROI. Assuming paired t tests, the standard deviations of the differences between test-retest mean values across subjects, were used to determine the sample size required to detect a defined biological effect within subjects in each ROI, using an online sample size calculator (UCSF Clinical & Translational Science Institute, San Francisco, CA, https://sample-size.net/sample-size-study-paired-ttest/). BA plots, CV calculations, and sample sizes required (using a between-subjects approach) were generated using MATLAB (ver. 2020b, Mathworks, Natick, MA). All tests of statistical significance were performed using GraphPad Prism 9 (San Diego, CA). Results were considered statistically significant at P ≤ 0.05.

Parameter Maps
Representative parameter maps are shown in Fig. 1. MTsat revealed slightly greater contrast than MTR between gray matter and white matter, which was noticeable when comparing the corpus callosum and internal capsule (white matter regions) to the surrounding gray matter. The normalized contrast between gray matter and white matter regions, averaged over all subjects, in MTsat (0.376) was significantly higher (P < 0.0001) than in MTR (0.226).

ROI-Based Analysis
Violin plots, as shown in Fig. 2, depict the distribution of the mean values for each metric within each ROI for the 12 subjects for both registered and unregistered datasets. Across all   BA plots, as shown in Fig. 3, revealed negligible biases, with mean biases of 0.009 and 0 for MTR and MTsat, respectively. Although not significant, MTR exhibited lower between-and within-subject CVs (2.9%-8%) compared to MTsat (4.5%-10%), as shown in Fig. 4

Voxel-Wise Analysis
The voxel-wise CV maps showed very high CVs in the cerebrospinal fluid (CSF), due to the low values of MTR (<0.1) and MTsat (<0.004) in the CSF (Fig. 5). In the CSF, mean bsCV/wsCV were 44.5%/36.5% and 53.2%/39.8% for MTR and MTsat, respectively, while whole brain mean bsCV/wsCV (with a mask applied to omit CSF voxels) were significantly lower at 20.5%/11.8% and 26.2%/16.5% for MTR and MTsat, respectively. Throughout the whole brain, between-and within-subject CVs showed good Figure 7: Sample size estimation using a between-subjects (a) and within-subjects (b) approach on data registered to a common template. Note that the sample size range varies between plots and sample sizes exceeding the range are not shown. ROIs are abbreviated as follows: CC, corpus callosum; CX, cortex; IC, internal capsule; HC, hippocampus; TH, thalamus. reproducibility (CV < 20%) with 67% of voxels and 87% of voxels falling within this range for MTR bsCV and wsCV, respectively. For MTsat, 54% and 80% of voxels were within this range for bsCV and wsCV, respectively. The wsCVs were significantly lower than the bsCVs for both MTR (P < 0.0001) and MTsat (P < 0.0001). For both wsCVs and bsCVs, voxelwise MTR and MTsat CVs had significantly different variances (F test). As shown in Fig. 6 ). As observed in the ROI-based CVs, MTR exhibited lower bsCVs (P < 0.0001) and wsCVs (P < 0.0001) (with peaks at 7% and 6%, respectively) compared to MTsat (with peaks at 15% and 12%, respectively), as shown in whole brain histograms (Fig. 6).

Sample Sizes and Minimum Detectable Effect
BETWEEN SUBJECTS. To detect a minimum change of 8% in all ROIs, MTR required a sample size of 15 (Fig. 7a). In comparison, MTsat required a sample size of 25 to detect an 8% change in all ROIs. The CC and CX required smaller sample sizes, with MTR requiring 12 subjects to detect a 6% change, and MTsat requiring 15 subjects to detect an 8% change.
WITHIN SUBJECTS. As shown in Fig. 7b, in the CC and CX, small changes (6%) could be detected in MTR with six subjects per group, while MTsat could detect larger changes (8%-12%) with 12 subjects per group. For MTR, small changes (6%) could be detected in the other ROIs (IC, HC, TH) with a feasible sample size of 15. MTsat could detect larger changes (8% and greater) in all ROIs with 20 subjects per group.

Discussion
This study explored the reproducibility of MTR and MTsat at 9.4 T, and will provide insight into experiment design and sample size estimation for future in vivo MTsat imaging studies. No biases were found between repeat measurements with ROI-based analysis. MTR and MTsat were shown to be reproducible in both the mean ROI analysis and the whole brain voxel-wise analysis, with MTsat CVs being slightly higher than MTR CVs (which was not significant in ROI analysis, but significant in voxel-wise analysis). Overall, within-subject CVs were lower than between-subject CVs for both ROI-based (not significant) and voxel-wise (significant) analysis, indicating less variability within subjects on a testretest basis.
ROI-Based Reproducibility ROI-based reproducibility was investigated using an unregistered dataset and a dataset registered to a common template, as both unregistered and registered analysis techniques have been used in neuroimaging studies, and the difference between using either analysis technique remains sparsely explored. Recently, Klingenberg et al reported that registration significantly increased the accuracy of a convolutional neural network (CNN) to detect Alzheimer's disease, compared to no registration. 36 In our study, violin plots, BA plots, and ROI-based CV analysis revealed the same trends for both registered and unregistered ROI-based analysis approaches, which indicated that either method can be used for MT analysis. However, we recommend using the registered analysis approach, as there is only one set of ROI masks to edit, making the analysis process more time efficient. The unregistered analysis approach will also introduce inter-and intra-rater variability, due to the large number of ROI masks being edited.
The MTR ROI CVs observed in this work are consistent with MTR CVs in human studies done by Welsch et al 37 and Hannila et al. 38 MTsat CVs reported here are comparable to MTsat CVs in human studies at 3 T. 24,25 Overall, MTsat exhibits slightly higher CVs than MTR, which may arise from noise propagation through the equations used to calculate MTsat, as described by Olsson et al. 23 A noticeable increase in MTsat CVs compared to MTR CVs, in the HC and CX, may be due to low MTsat values in these regions.

Voxel-Wise Reproducibility
Voxel-wise CV trends were comparable to ROI-based CV trends. Voxel-wise CV maps revealed a more noticeable increase in CVs in the superior-inferior direction of the brain in MTR, compared to MTsat. This can be related to the inherent compensation of flip angle inhomogeneities in MTsat. 5

Sample Size and Minimum Detectable Effect
The CC consistently exhibited the smallest required sample sizes, which can be related to the lower variability of myelin content in the CC, compared to the gray matter ROIs. 39 Interestingly, the CC and IC (the white matter regions) required similar sample sizes to detect the same changes in MTsat (using both between-and within-subject approaches), but not in MTR, which required larger sample sizes to detect changes in the IC. This may stem from the better contrast seen between the IC and gray matter in MTsat, compared to MTR, which arises from MTsat being less susceptible to inhomogeneities of the transmitted field and more independent of T1-weighting. 5,21 Most MT studies report changes in MTR between 15% and 30%, with some studies reporting more subtle changes between 5% and 10%. In a cuprizone demyelination model in mice, MTR decreased by 15% and 30% at 4 weeks and 6 weeks of cuprizone administration, respectively. 33 In an ischemic injury model in mice, MTR decreased by 30% in the corpus callosum of injured mice compared to controls. 16 In a closed head traumatic brain injury model in mice, MTR in the corpus callosum decreased by 10% from baseline at 1-day post-injury. 15 A post-mortem study revealed a 10% decrease in MTR between normal-appearing white matter and multiple sclerosis lesions. 1 In a recent multiple sclerosis study, MTR was able to differentiate between patients with and without cognitive impairment, showing a 7% decrease in patients with cognitive impairment. 34 MTR can detect changes on the order of 15%-30% (such as the changes found in the cuprizone demyelination model) with small sample sizes (N = 6). With disease and injury models resulting in less drastic changes to myelin content, our findings suggest that MTR and MTsat can detect smaller changes with feasible preclinical sample sizes. Thiessen et al showed that when there is an 80% reduction in myelinated axon density, MTR only decreases by $30% (because it is thought that inflammation has a competing effect on MTR). 33 So, a 2-fold difference in myelination will result in at least a 15% change in MTR. However, as MTsat provides greater specificity to myelin, a 2-fold difference in myelination should translate to a larger percent change in MTsat.

Limitations
Although a volume coil is more appropriate for structural imaging as it provides stable SNR throughout the brain, this study used a transceiver surface coil. The voxel-wise CV maps showed that between-subject and within-subject CVs were slightly higher toward the inferior region of the brain. However, the increase in CV was subtle and as shown in ROIbased analysis, the CVs of ROIs located in inferior regions of the brain (such as the IC) were comparable to the ROIs closer to the surface coil. Moreover, MTsat maps were comparable to MTsat maps acquired by Boretius et al in the mouse brain at 9.4 T. 21 This shows the feasibility of acquiring MTR and MTsat data using a surface coil, which may be useful in studies in which MT imaging is combined with other methods that require a surface coil or in inherently low SNR methods that would benefit from a surface coil, such as diffusion MRI. Recent preclinical investigations have included a combination of MT imaging and diffusion MRI. 15,20 Moreover, the findings in this study will complement a recent test-retest reproducibility study in advanced diffusion MRI techniques in mice at 9.4 T. 29 Although the sample size was chosen to reflect similar sample sizes used in other pre-clinical imaging studies, [26][27][28] the small number of subjects is another limitation in this work. Nevertheless, we believe that these results are valuable and useful for the MT imaging community. In the statistical analyses, it should be noted that for the within-subject calculation of CV, the standard deviation was determined from only two data points (the test and retest conditions). As a result, the standard deviation may not accurately represent the spread of data within the population, leading to an unknown bias in the resulting within-subject CV.

Conclusion
We demonstrated that MTR and MTsat were reproducible in both ROI-based analysis, which includes both registered and unregistered analysis techniques, and voxel-wise analysis. Importantly, MTsat exhibited comparable reproducibility to MTR, and could detect small changes (<10%) with sample sizes of 15-20, while providing better contrast and maintaining a feasible scan time.