Optical Flow for Intermediate Frame Interpolation of Multispectral Geostationary Satellite Data
Applications of satellite data in areas such as weather tracking and modeling, ecosystem monitoring, wildfire detection, and landcover change are heavily dependent on the trade-offs related to the spatial, spectral and temporal resolutions of the observations. For instance, geostationary weather tracking satellites are designed to take hemispherical snapshots many times throughout the day but sensor hardware limits data collection. In this work we tackle this limitation by developing a method for temporal upsampling of multi-spectral satellite imagery using optical flow video interpolation deep convolutional neural networks. The presented model, extends Super SloMo (SSM) from single optical flow estimates to multichannel where flows are computed per wavelength band. We apply this technique on up to 8 multispectral bands of GOES-R/Advanced Baseline Imager mesoscale dataset to temporally enhance full disk hemispheric snapshots from 15 minutes to 1 minute. Through extensive experimentation, we show SSM greatly outperforms the linear interpolation baseline and that multichannel optical flows improves performance on GOES/ABI. Furthermore, we discuss challenges and open questions related to temporal interpolation of multispectral geostationary satellite imagery.
Every second satellites around the earth are generating valuable data to monitor weather, land-cover, oceans, infrastructure, and human activity. These satellites capture reflectance intensities at designated spectral wavelength, spatial, and temporal resolutions. Properties of the sensors, including wavelengths and resolutions, are optimized for particular applications. Most commonly, satellites are built to capture the visible wavelengths, which are essentially RGB images. Other sensors capture a larger range of wavelengths, such as micro, infrared, and thermal waves, providing information many applications such as storm tracking and wildfire detection. However, sensing more wavelengths is technologically more complex and hence applies further constraints of temporal and spatial resolution. Similarly, a higher temporal frequency requires specific orbital dynamics which then affects the spatial resolution due to it’s distance from earth.
NASA and other agencies have developed satellites with varying spectral, spatial, and temporal resolutions, each for specific applications. For instance, the most current satellite as part of NASA’s Landsat Continuity Mission, Landsat-8 , captures the entire earth every 16-days with 11 spectral bands. This Landsat-8 data is used to monitor crop yields , deforestation [13, 32], droughts , and changing land cover . The Moderate Resolution Imaging Spectroradiometer (MODIS)  is another NASA satellite that monitors Earth everyday at an optimal spatial resolution of 250m and in 36 different spectral bands. The higher temporal resolution of MODIS allows for different applications such as wildfire  and vegetation  monitoring. NASA and NOAA’s most recent set of geostationary satellites as part of the GOES-R program, GOES-16/17, improve the frequency further to 15 minute hemispheric coverage (full disk), 5 minutes Continental United States (CONUS) coverage, and up to 30 second flexible mesoscale coverage of 16 spectral bands at 0.5-2km spatial resolution  (depicted in Figure 2). This improved frequency allows for near real-time detection and observation of wildfires, hurricane tracking, transportation safety, flood risks, and others.
Improved resolution of data from each of these satellites can benefit nearly all earth science applications and has been widely studied on a variety of satellite based and atmospheric modeling datasets. Satellite imagery processing techniques such as unmixing, pan-sharpening, and data fusion leverage the multitude of available datasets for resolution enhancement . For example, the Spatial and Temporal Adaptive Reflectance Fusion (STARFM) algorithm joins high spatial resolution 16-day Landsat and the spatially coarser MODIS daily revisits to generate a 30 meter daily surface reflectance dataset [16, 51]. Similarly, pan-sharping uses varying resolutions of spectral bands within a single satellite to produce artificial high-spatial and high-spectral datasets [27, 43]. Furthermore, in climate modeling, downscaling approaches aim to improve the temporal and spatial resolution using statistical and dynamical approaches to aid localized adaptation and mitigation to climate change [47, 48, 22]. Recent advances in computer vision and video processing provides opportunities to drastically improve the current state of the art techniques in various resolution enhancement approaches . While satellite datasets contain more spectral bands than typically exist in images, the vast amount of data and pattern similarity can be leveraged in deep learning frameworks. In particular, as we will explore in this work, video intermediate frame interpolation models can be further developed to improve temporal resolution of certain types of multi-spectral satellite imagery. In the remainder of this work, we will extend Super SloMo (SSM)  to the multivariate case and show that video frame interpolation can produce improved spatial coverage in geostationary satellite datasets.
We present the first approach to intermediate frame interpolation of geostationary satellite imagery to enable expanded spatial and temporal coverage, improving our ability monitor and analyze environmental changes.
A multivariate intermediate frame interpolation technique, inspired from video processing, is further developed and applied to NASA and NOAA’s current generation of geostationary satellites.
With extensive experimentation we show that our proposed approach well outperforms a linear interpolation baseline and further show that information gained by incorporating more spectral bands improves overall performance.
Visual analysis of optical flow vectors show close resemblance to physical processes typically modeled with dynamical systems.
Ii Related Work
In this section we begin by reviewing previous work in the areas of data fusion and resolution enhancement as applied generally to remote sensing satellite imagery as well as some recent successes of deep learning in the area. Secondly, we provide a brief review of video intermediate frame interpolation techniques.
Ii-a Data Fusion and Interpolation in Earth Sciences
Earth science datasets are complex and often require extensive preprocessing and domain knowledge to effectively render itself useful for large-scale applications or monitoring. Such datasets may contain frequent missing values due to sensor limitations, low quality pixel intensities, incomplete global coverage, and contaminated with atmospheric processes related to cloud and aerosols. Techniques to handle these challenges have been developed and are widely applied across the remote sensing community.
Liebmann et al. presented the first linearly interpolated datasets filling in missing and erroneous longwave radiation to improve global coverage . In more recent datasets, interpolation techniques are widely applied to unprocessed level 1 data to construct easier to use levels 2, 3, and 4 products, often named assimilated datasets . These higher level products may not contain raw observations but simplify the analysis process to non-domain experts. For example, Doelling et al. employed temporal interpolation informed by physical and solar variables from geostationary satellites to temporally enhance the Cloud and Earth’s Radient Energy System (CERAS) product . Similarly, interpolation techniques are employed to generate cloud masks with multi-day sampling for improved land-cover change studies .
In recent years, signal processing and data driven approaches to learn from satellite imagery has become an active area of research. Data fusion is one area of particular interest to the remote sensing community where two or more datasets are fused to generate an enhanced product . The Spatial and Temporal Adaptive Reflectance Fusion (STARFM) algorithm is one example which uses Landsat and MODIS to produce a daily 30-meter reflectance product by using a spectral wise weighting model [16, 51]. Scharlemann et al. presented a temporal fourier processing approach to MODIS data for more accurate analysis of ecological and epidemiology studies . Nearest neighbor analog multiscale patch-decomposition data driven models are used as state-of-the-art interpolation techniques for developing global sea surface temperature (SST) datasets [14, 15]. Neural networks are beginning to aid in filling in reconstructing spatio-temporal SST using Kalman filters to represent 2D geophysical dynamics .
Advances in deep learning and convolutional neural networks have not penetrated interpolation processes for data assimilation of satellite based datasets. Of the few examples of deep learning applied to data fusion, recurrent and convolutional neural networks have been shown to produce effect results on assimilating multiple satellite images [7, 45]. In climate modeling, super-resolution convolutional neural networks have greatly improved accuracy and scalability of statistical downscaling by learning from high-resolution observations joined with topographical information . In other areas of remote sensing, deep learning has presented a remarkable ability to detect and classify phenomena. DeepSat showed that normalized deep belief networks tuned where able to outperform traditional techniques for image classifications . Convolutional neural networks have been shown to effectively classify land use in remotely sensed images, from urban areas  to crop types . Joining high-resolution nighttime and daytime satellite imagery, convolutional neural networks are able to estimate household consumption and assets to measure world-wide poverty rates . In this work, we will extend the applicability of deep learning to interpolate and generate higher temporal resolution satellite datasets with inspiration from video processing.
Ii-B Video Intermediate Frame Interpolation
Video interpolation techniques have shown a high skill at generating slow motion footage by generating intermediate frames in spatially and temporally coherent sequences [30, 33, 24]. These approaches are designed to minimize occlusion while still generating sharp and accurate intermediate frames. Typically, video interpolation techniques focus on single frame interpolation, meaning that a single frame is estimated between two consecutive frames [30, 33]. However, when interpolating satellite imagery, time-dependent and multi-frame estimation are required.
For multi-frame interpolation, classical approaches estimate optical flows between consecutive frames and warp the frames dependent on time . Computing optical flows is an expensive task and can be approximated using neural network architectures using both supervised and unsupervised learning frameworks [12, 20, 36]. Minimizing occlusion between frames is another important task of video interpolation which optical flows cannot handle alone . Furthermore, optical flows have proven to be a valuable tool in understanding movement in satellite imagery for tasks such as estimating global displacement  and mapping lava flows . In satellite imagery, these principles allow us to estimate time-dependent intermediate frames via optical flows while capturing occlusions such as clouds covering land-cover.
Jiang et al. presented Super SloMo which combines both optical flow and occlusion models for time-dependent estimation between consecutive frames . Super SloMo consists of two U-Net CNNs . The first U-Net takes as input two consecutive frames and predicts forward and backward optical flows between them. These optical flows are used to warp the original input images depending on a user defined time between 0 and 1. The second U-Net takes in the original and warped frames as well as forward and backward flows to predict visibility maps and flow errors. The updated flows and visibility maps are then used to estimate the interpolated frame. As this approach depends on time, the model can make spatially and temporally coherent predictions of any time between 0 and 1. In their experiments, Jiang et al. shows that 240-fps video clips can be estimated from 30-fps inputs. In this work, we will apply their architecture with an extension to multivariate optical flows for interpolation geostationary satellite data.
Iii GOES-R Satellite Dataset
Geostationary satellites are synchronized in orbit with earth’s spin to hover over a single location. Given this location, the sensor, measuring radiation as often as possible, can frequently capture data over a continuous and large region. This feature makes geostationary satellites ideal for capturing environmental dynamics. The GOES-R series satellites, namely GOES-16/17, operated by NASA and NOAA provides scientists with unprecedented temporal frequency enabling real-time environmental monitoring using the Advanced Baseline Imager (ABI) [42, 41]. GOES-16 covers the eastern side of North America while GOES-17 covers the western side. GOES-16 collects 16-bands of data with band 2 at 500m, bands 1, 2, and 4 at 1km, and the remainder at 2km. Three data products are derived from the GOES-R series, 1. Full-disk covering the western hemisphere every 15-minutes, 2. Continental US every 5-minutes, and 3. Mesoscale user directed 1000km by 1000km subregion every 30 seconds (often 2 subregions every 60 seconds). ABI’s 16 spectral bands includes two visible channels, four near-infrared channels, and ten infrared channels enabling a suite of applications. Compared to the previous generation, the ABI gives three times more spectral information, four times better spatial resolution, and five times faster coverage. Similarly, the Japanese Space Agency operates Himawari-8 with 15 identical bands, spatial, and temporal resolutions .
These geostationary satellites are particularly useful in tracking weather, monitoring high-intensity events, estimating rainfall rates, fire detection, and many others at near real-time. Mesoscale mode gives forecasters the ability to ”point” the satellite at a user specific subregion for near constant monitoring of events such as wildfires and hurricanes. For example, GOES-16 provided emergency response units tools for decision making during the 2018 California wildfires . However, this high frequency data also provides valuable information of environmental dynamics and retrospective analysis. Futhermore, mesoscale data can be used to inform techniques to produce higher temporal resolution CONUS and full-disk coverage. In this work, we develop a model to improve the temporal resolutions of CONUS and full-disk by learning an optical flow model to interpolate between consecutive frames. With this, we are able to generate 1-minute full-disk artificially enhanced data.
As discussed above, the fundamentals of optical flow are closely aligned with typical dynamical models used to model earth science processes. In this section, we will describe an adaptation of an optical flow approach for time-dependent interpolation of intermediate video frames, Super-SloMo (SSM) , to multi-spectral satellite imagery. Our adaptation removes the assumption of high cross-correlation between spectral bands, which as can be seen in Figure 3, is not satisfied in geostationary satellite imagery. Furthermore, we discuss a stochastic approach to train our networks more efficiently on large-scale datasets and how Bayesian optimization can be applied to optimize hyper-parameters.
Iv-a Intermediate Frame Interpolation
SSM estimation of intermediate frames considers the case of interpolating between two images with RGB channels (3 spectral bands). In the case of satellite based datasets, the number of spectral bands can reach the hundreds. With a minor change to the optical flow and interpolation networks, we extend their framework to model the flows per band.
Following the notation from , let where , as image height, as image width, and a number of spectral bands. The goal is then to construct an intermediate frame , where is a particular channel, with a linear combination of warped and as defined by:
where and are the optical flows from to and to , respectively, for channel . is defined as the backward warping function, implemented with bilinear interpolation, and represents a scalar weight coefficient to enforce temporal consistency and allow for occlusion reasoning. In the case of high temporal resolution satellite imagery, the interpolation is virtually estimating the state of atmospheric variables (clouds, water vapor, etc.) over a static land surface. If a given pixel in captures land surface but the same pixel in sees a cloud, the occlusion principle is used to estimate at what time the cloud covers the pixel. As applied in , visibility maps, , can be modeled to capture occluded pixels. Equation 1 can then be redefined as:
where is a normalization factor. Intermediate optical flows, and , are approximated using forward and backward flows between and , and formulated as:
Two convolutional neural networks (CNN) and are learned in an end-to-end manner estimating the optical flows and visible maps. Each of and are defined with identical architectures with varying input and output dimensions, as presented in Figure 4. Our flow network, , approximates the forward and backward optical flows between and and is defined as follows:
Following, for each channel and a time we compute intermediate image estimations, and . As noted in , these intermediate approximations perform poorly in occluded and non-smooth regions, both common in satellite imagery, which can be mitigated by incorporating a interpolation network. This model is defined as follows:
where and are optical flow residuals used to better approximate non-smooth regions. Hence, we then have:
Lastly, plugging in the optical flows and and visibility maps and into Equation 2 constructs and estimation of . Applying to each channel synthesizes an intermediate multispectral prediction .
As all variables in the architecture are differentiable, the model can be learned in an end-to-end manner. Given two inputs frames and with intermediate frames and corresponding predictions a loss function can be defined as a weighted combination of reconstruction, warping, and smoothness losses such that:
We note that  includes a fourth term for perception of image classes which are not available for this satellite dataset. Similarly, we employ loss functions for each loss terms unless noted otherwise.
The reconstruction loss is defined as the euclidean distance between observed and predicted intermediate frames:
A warping loss is used to optimize estimated optical flows between input and intermediate frames for a channel :
such that ) for multi-spectral channels.
A smoothness loss is applied to forward and backward flows from to to satisfy the smoothness assumption of optical flows in the first network such that:
In practice, this training setup requires optimization over multiple hyper-parameters including , and a learning rate. Constrained Bayesian optimization using Monte Carlo simulations  was applied over the hyper-parameter space (keeping constant) by using the open-source Ax library . The optimization was ran for 20 total trials each with 1 epoch through the training dataset. The weights, and , shown in Table I were found through Bayesian optimization.
|3 Band Models||8 Band Models|
|3 Band Mean||0.0283||0.0218||0.0224||0.0282||0.0221||0.0218|
|8 Band Mean||–||–||–||0.0180||0.0138||0.0136|
As discussed in Section III, the GOES-16 geostationary satellite produces unique mesoscale data on a minute by minute basis. In this section, we show how high temporal resolution mesoscale data can be used to train effective interpolation models of multispectral imagery. We begin by describing data processing and training data generation followed by experimental results on held out test data. The NEX platform on NASA’s Pleiades super-computing system was used for all data processing and model training. Each model was trained using 4 Nvidia V100 GPUs.
Mesoscale GOES-16 data takes one snapshot every 30 seconds of selected areas, typically of interesting weather events. Most often, these snapshots are selected as adjacent tiles for greater spatial coverage, producing one minute data. This one minute data is used for training and testing our approach. Training data was selected as every 5 days from March 16 2017 (first day of data availability) to December 31 2018 between the hours of 12pm and 12am (GMT). Each day consists of 14GB of data with a total of 129 days and 1.8TB of training data. Data processing each snapshot involves the following steps:
Each band is mapped to 2km using bilinear interpolation to a size of 500 by 500 (1000km by 1000km).
Pixels are normalized between 0 and 1 using scaling factors found in Table 4 of the ”GOES-R Advanced Baseline Imager (ABI) Algorithm Theoretical Basis Document For Cloud and Moisture Imagery Product” .
Bands are stacked into a dimensional array with corresponding geographic coordinates.
Training examples are then generated by selecting 134 by 134 sub-images and randomly cropped to 128 by 128 during training. Temporally, 12 consecutive time-steps are extracted per example, of which 9 are randomly selected during training. Furthermore, each image is randomly rotated and flipped during training to reduce model artifacts generated by U-Net. This produces, depending on missing values, on average 1,000 images per day for training. In total, this process produces approximately 200,000 training examples for a heterogeneous dataset containing all seasons and a variety of weather events.
Similarly, test data was selected from every 5 days of 2019 between the hours of 12pm (GMT) and 12am (GMT), and hence, held out from the training dataset. This totalled 30 days and 420GB of test data. Furthermore, we select a major tropical cyclone, Hurricane Irma, approaching Florida on September 8, 2017. This is not included in the training set. For testing, data processing enumerated above is applied.
Experiments are executed on 3 and 8 multivariate channels. For each experiment, two models are trained, 1. with a single optical flow and 2. with channel-wise optical flows, to better understand the applicability of learning multiple optical flows. Linear interpolation is selected as a baseline method. Root mean squared error (RMSE) is used for comparison where interpolated pixels are compared to the ground truth 1-minute mesoscale data. Unless otherwise noted, RMSE values are computed by averaging RMSEs of each test example.
To begin, we look at the overall RMSEs and per experiment, method, and spectral band in Table II. We note that for each experiment, the optical flow interpolation with SloMo greatly outperform a simple linear regression. It can be seen that in the 3 band experiment a univariate optical flow outperforms the multivariate model. This corresponds with Figure 3 where bands 1 and 2 are highly correlated and hence the enforcement of a relatively high smoothness loss coefficient in Table I found with Bayesian optimization. In the 8 band experiment, multivariate flows improve performance for each band. Furthermore, the first 3 bands of the 8 band experiment have the same performance as in the 3 band model. Overall, and as expected, this suggests multivariate flows are suitable with a high number of spectral bands.
Temporally, the error relative to the distance between images is of interest when using the interpolated data. In Figure 6 we test this by looking at RMSEs over 14 evenly spaced timesteps, , between 0 and 1. As expected, the error is largest directly between two input images at for each method and experiment. Linear interpolation error increases quicker than the proposed optical flow approaches. In the 8 band experiment, the multivariate model’s error is below univariate throughout the temporal domain, while the opposite is true in the 3 band experiment.
The optical flows learned from these experiments resemble high-level dynamics found in physical models (dynamical). In Figure 7 we show a quiver plot where the brightness is flow intensity and arrow direction average pooled over every 10 pixels. Hurricane Irma is in the south east where the flows are pointing in a circular motion, a characteristic of tropical cyclones. The northern side of the tropical cyclone shows intense water vapor movement. Similarly, Hurricane Jose is seen on the coast of Mexico. Furthermore, the jet stream moving east is clear, particularly the clouds moving to the northeast east. On the west coast the jet stream is moving north into Canada. In the center of the country, where water vapor is limited, we see little movement.
|3 Band Models||8 Band Models|
|3 Band Mean||0.0305||0.0236||0.0232||0.0305||0.0233||0.0235|
|8 Band Mean||–||–||–||0.0184||0.0140||0.0140|
V-B1 Case Study: Hurricane Irma
Here we focus on a particular event to test and visualize flows at a more fine grain resolution. Depicted in Figure 7(a), Hurricane Irma makes landfall in Puerto Rico on September 8, 2017. Figure 7(b) shows the difference after 15-minutes. Data from 12pm (GMT) to 12am (GMT) are selected for analysis as presented in Table III. We find that, similar to above, the optical flow methods outperform linear interpolation by 30%. However, multivariate flows do not improve the 8 band univariate experiment. Visually, in Figure 5 we show an example of temporal interpolation with fast moving clouds on the hurricane’s edge. Plots (h), (i), and (j), depict the RMSE values per pixel over the first 3 bands. We see that the brightness, high error, in linear interpolation is more widespread in cloudy pixels whereas optical flow has less error in the same regions.
In Figure 7(c) we visualize the water vapor optical flow vectors, , where the brightness is flow intensity with arrows corresponding to direction. These flows come from the 8 band multivariate model. The eye of the cyclone is evident where intensity is low and arrows are pointing in circles. Water vapor is moving at high intensity around the eye, particularly to the north.
The application of video interpolation techniques and optical flow to generating higher temporal coverage of geostationary data is promising. In this work we show that the Super SloMo architecture with optical flow is well suited for this task and can be generalized to handle multispectral satellite imagery. We present a large scale analysis and test on evenly spaced days in 2019 to show that optical flow is able to well capture optical flows between two images and apply these to predict intermediate frames. Furthermore, predicting optical flows per spectral band generally improves performance as the number of spectral bands increases.
This capability will allow the NASA Earth Exchange (NEX) to provide one minute temporal coverage of the continental United States at a 2km spatial resolution with 16 spectral bands using the GOES-R series of satellites. Furthermore, this approach can also be scaled to improve the temporal resolution of the 15-minute fulldisk observations. These data products will allow for improved analysis of atmospheric and physical processes such as hurricane dynamics and wildfire modeling.
The visualizations presented in Figures 7 and 7(c) open questions regarding the use of optical flow in satellite imagery. While not validated, they show that general atmospheric patterns seem to be captured sufficiently. At the higher resolution, we can see where the most intense movement is occurring within Hurricane Irma. Further research will focus on understanding these flow vectors in relation to physical processes as well as testing underlying assumptions of optical flow in satellite imagery.
This work was supported by the NASA Earth Exchange (NEX) and the Advanced Information Systems Technology, award number AIST-16-0137.
-  Ax. https://github.com/facebook/Ax. Accessed: 2019-06-20.
-  Goes-16: A game-changer for fighting deadly wildfires. https://www.goes-r.gov/featureStories/goes16Wildfires.html. Accessed: 2019-01-30.
-  Goes-r advanced baseline imager (abi) algorithm theoretical basis document for cloud and moisture imagery product (cmip). https://www.star.nesdis.noaa.gov/goesr/docs/ATBD/Imagery.pdf. Accessed: 2019-06-24.
-  S. Baker, D. Scharstein, J. Lewis, S. Roth, M. J. Black, and R. Szeliski. A database and evaluation methodology for optical flow. International Journal of Computer Vision, 92(1):1–31, 2011.
-  J. L. Barron, D. J. Fleet, and S. S. Beauchemin. Performance of optical flow techniques. International journal of computer vision, 12(1):43–77, 1994.
-  S. Basu, S. Ganguly, S. Mukhopadhyay, R. DiBiano, M. Karki, and R. Nemani. Deepsat: a learning framework for satellite imagery. In Proceedings of the 23rd SIGSPATIAL International Conference on Advances in Geographic Information Systems, page 37. ACM, 2015.
-  P. Benedetti, D. Ienco, R. Gaetano, K. Ose, R. G. Pensa, and S. Dupuy. fusion: A deep learning architecture for multiscale multimodal multitemporal satellite data fusion. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 11(12):4939–4949, 2018.
-  K. Bessho, K. Date, M. Hayashi, A. Ikeda, T. Imai, H. Inoue, Y. Kumagai, T. Miyakawa, H. Murata, T. Ohno, et al. An introduction to himawari-8/9âjapanâs new-generation geostationary meteorological satellites. Journal of the Meteorological Society of Japan. Ser. II, 94(2):151–183, 2016.
-  M. Castelluccio, G. Poggi, C. Sansone, and L. Verdoliva. Land use classification in remote sensing images by convolutional neural networks. arXiv preprint arXiv:1508.00092, 2015.
-  I. Cohen and I. Herlin. Optical flow and phase portrait methods for environmental satellite image sequences. In European Conference on Computer Vision, pages 141–150. Springer, 1996.
-  D. R. Doelling, N. G. Loeb, D. F. Keyes, M. L. Nordeen, D. Morstad, C. Nguyen, B. A. Wielicki, D. F. Young, and M. Sun. Geostationary enhanced temporal interpolation for ceres flux products. Journal of Atmospheric and Oceanic Technology, 30(6):1072–1090, 2013.
-  A. Dosovitskiy, P. Fischer, E. Ilg, P. Hausser, C. Hazirbas, V. Golkov, P. Van Der Smagt, D. Cremers, and T. Brox. Flownet: Learning optical flow with convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision, pages 2758–2766, 2015.
-  G. Duveiller, P. Defourny, B. Desclée, and P. Mayaux. Deforestation in central africa: Estimates at regional, national and landscape levels by advanced processing of systematically-distributed landsat extracts. Remote Sensing of Environment, 112(5):1969–1981, 2008.
-  R. Fablet, P. H. Viet, and R. Lguensat. Data-driven models for the spatio-temporal interpolation of satellite-derived sst fields. IEEE Transactions on Computational Imaging, 3(4):647–657, 2017.
-  R. Fablet, P. H. Viet, R. H. Lguensat, P.-H. Horrein, B. Chapron, et al. Spatio-temporal interpolation of cloudy sst fields using conditional analog data assimilation. Remote Sensing, 10(2):310, 2018.
-  F. Gao, J. Masek, M. Schwaller, and F. Hall. On the blending of the landsat and modis surface reflectance: Predicting daily landsat surface reflectance. IEEE Transactions on Geoscience and Remote sensing, 44(8):2207–2218, 2006.
-  Z. Gao, W. Gao, and N.-B. Chang. Integrating temperature vegetation dryness index (tvdi) and regional water stress index (rwsi) for drought assessment with the aid of landsat tm/etm+ images. International Journal of Applied Earth Observation and Geoinformation, 13(3):495–503, 2011.
-  H. Ghassemian. A review of remote sensing image fusion methods. Information Fusion, 32:75–89, 2016.
-  D. L. Hall and J. Llinas. An introduction to multisensor data fusion. Proceedings of the IEEE, 85(1):6–23, 1997.
-  E. Ilg, N. Mayer, T. Saikia, M. Keuper, A. Dosovitskiy, and T. Brox. Flownet 2.0: Evolution of optical flow estimation with deep networks. In IEEE conference on computer vision and pattern recognition (CVPR), volume 2, page 6, 2017.
-  J. Inglada, M. Arias, B. Tardy, O. Hagolle, S. Valero, D. Morin, G. Dedieu, G. Sepulcre, S. Bontemps, P. Defourny, et al. Assessment of an operational system for crop type map production using high temporal and spatial resolution satellite optical imagery. Remote Sensing, 7(9):12356–12379, 2015.
-  M. Jakob Themeßl, A. Gobiet, and A. Leuprecht. Empirical-statistical downscaling and error correction of daily precipitation from regional climate models. International Journal of Climatology, 31(10):1530–1544, 2011.
-  N. Jean, M. Burke, M. Xie, W. M. Davis, D. B. Lobell, and S. Ermon. Combining satellite imagery and machine learning to predict poverty. Science, 353(6301):790–794, 2016.
-  H. Jiang, D. Sun, V. Jampani, M.-H. Yang, E. Learned-Miller, and J. Kautz. Super slomo: High quality estimation of multiple intermediate frames for video interpolation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 9000–9008, 2018.
-  C. Justice, L. Giglio, S. Korontzi, J. Owens, J. Morisette, D. Roy, J. Descloitres, S. Alleaume, F. Petitcolin, and Y. Kaufman. The modis fire products. Remote Sensing of Environment, 83(1-2):244–262, 2002.
-  N. Kussul, M. Lavreniuk, S. Skakun, and A. Shelestov. Deep learning classification of land cover and crop types using remote sensing data. IEEE Geoscience and Remote Sensing Letters, 14(5):778–782, 2017.
-  C. A. Laben and B. V. Brower. Process for enhancing the spatial resolution of multispectral imagery using pan-sharpening, Jan. 4 2000. US Patent 6,011,875.
-  B. Letham, B. Karrer, G. Ottoni, E. Bakshy, et al. Constrained bayesian optimization with noisy experiments. Bayesian Analysis, 14(2):495–519, 2019.
-  B. Liebmann and C. A. Smith. Description of a complete (interpolated) outgoing longwave radiation dataset. Bulletin of the American Meteorological Society, 77(6):1275–1277, 1996.
-  Z. Liu, R. A. Yeh, X. Tang, Y. Liu, and A. Agarwala. Video frame synthesis using deep voxel flow. In Proceedings of the IEEE International Conference on Computer Vision, pages 4463–4471, 2017.
-  Z. Lu, R. Rykhus, T. Masterlark, and K. G. Dean. Mapping recent lava flows at westdahl volcano, alaska, using radar and optical satellite imagery. Remote Sensing of Environment, 91(3-4):345–353, 2004.
-  B. A. Margono, S. Turubanova, I. Zhuravleva, P. Potapov, A. Tyukavina, A. Baccini, S. Goetz, and M. C. Hansen. Mapping and monitoring deforestation and forest degradation in sumatra (indonesia) using landsat time series data sets from 1990 to 2010. Environmental Research Letters, 7(3):034010, 2012.
-  S. Niklaus, L. Mai, and F. Liu. Video frame interpolation via adaptive convolution. In IEEE Conference on Computer Vision and Pattern Recognition, volume 1, page 3, 2017.
-  S. Ouala, R. Fablet, C. Herzet, B. Chapron, A. Pascual, F. Collard, and L. Gaultier. Neural network based kalman filters for the spatio-temporal interpolation of satellite-derived sea surface temperature. Remote Sensing, 10(12):1864, 2018.
-  T. S. Pagano and R. M. Durham. Moderate resolution imaging spectroradiometer (modis). In Sensor Systems for the Early Earth Observing System Platforms, volume 1939, pages 2–18. International Society for Optics and Photonics, 1993.
-  V. Patraucean, A. Handa, and R. Cipolla. Spatio-temporal video autoencoder with differentiable memory. arXiv preprint arXiv:1511.06309, 2015.
-  M. Rodell, P. Houser, U. Jambor, J. Gottschalck, K. Mitchell, C.-J. Meng, K. Arsenault, B. Cosgrove, J. Radakovich, M. Bosilovich, et al. The global land data assimilation system. Bulletin of the American Meteorological Society, 85(3):381–394, 2004.
-  O. Ronneberger, P. Fischer, and T. Brox. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention, pages 234–241. Springer, 2015.
-  D. P. Roy, M. Wulder, T. R. Loveland, C. Woodcock, R. Allen, M. Anderson, D. Helder, J. Irons, D. Johnson, R. Kennedy, et al. Landsat-8: Science and product vision for terrestrial global change research. Remote sensing of Environment, 145:154–172, 2014.
-  J. P. Scharlemann, D. Benz, S. I. Hay, B. V. Purse, A. J. Tatem, G. W. Wint, and D. J. Rogers. Global data for ecology and epidemiology: a novel algorithm for temporal fourier processing modis data. PloS one, 3(1):e1408, 2008.
-  T. J. Schmit, P. Griffith, M. M. Gunshor, J. M. Daniels, S. J. Goodman, and W. J. Lebair. A closer look at the abi on the goes-r series. Bulletin of the American Meteorological Society, 98(4):681–698, 2017.
-  T. J. Schmit, M. M. Gunshor, W. P. Menzel, J. J. Gurka, J. Li, and A. S. Bachmeier. Introducing the next-generation advanced baseline imager on goes-r. Bulletin of the American Meteorological Society, 86(8):1079–1096, 2005.
-  V. P. Shah, N. H. Younan, and R. L. King. An efficient pan-sharpening method via a combined adaptive pca approach and contourlets. IEEE transactions on geoscience and remote sensing, 46(5):1323–1335, 2008.
-  S. Skakun, N. Kussul, A. Y. Shelestov, M. Lavreniuk, and O. Kussul. Efficiency assessment of multitemporal c-band radarsat-2 intensity and landsat-8 surface reflectance satellite imagery for crop classification in ukraine. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 9(8):3712–3719, 2016.
-  H. Song, Q. Liu, G. Wang, R. Hang, and B. Huang. Spatiotemporal satellite image fusion using deep convolutional neural networks. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 11(3):821–829, 2018.
-  T. Vandal, E. Kodra, S. Ganguly, A. Michaelis, R. Nemani, and A. R. Ganguly. Deepsd: Generating high resolution climate change projections through single image super-resolution. In 23rd ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2017.
-  R. L. Wilby, S. P. Charles, E. Zorita, B. Timbal, P. Whetton, and L. O. Mearns. Guidelines for Use of Climate Scenarios Developed from Statistical Downscaling Methods. Analysis, 27(August):1–27, 2004.
-  D. S. Wilks. Multisite downscaling of daily precipitation with a stochastic weather generator. Climate Research, 11(2):125–136, 1999.
-  F. Yuan, K. E. Sawaya, B. C. Loeffelholz, and M. E. Bauer. Land cover classification and change analysis of the twin cities (minnesota) metropolitan area by multitemporal landsat remote sensing. Remote sensing of Environment, 98(2-3):317–328, 2005.
-  X. Zhang, M. A. Friedl, C. B. Schaaf, A. H. Strahler, J. C. Hodges, F. Gao, B. C. Reed, and A. Huete. Monitoring vegetation phenology using modis. Remote sensing of environment, 84(3):471–475, 2003.
-  X. Zhu, J. Chen, F. Gao, X. Chen, and J. G. Masek. An enhanced spatial and temporal adaptive reflectance fusion model for complex heterogeneous regions. Remote Sensing of Environment, 114(11):2610–2623, 2010.