Monolithic scintillation crystals can be used for PET detectors, offering good spatial resolution and depth-of-interaction decoding capabilities. Gamma time estimation however often suffers from the spread of scintillation light, leading to an increased influence of dark counts and other statistical fluctuations in analog SiPMs. Digitizing the SiPM waveforms enables us to perform an accurate baseline correction, minimizing the effects of dark counts. The usual methodology of averaging the first few timestamps obtained from leading edge discrimination however still discards a lot of potentially useful information contained in the SiPM waveforms. We propose the use of a 3D convolutional neural network to predict the gamma arrival time in the scintillation crystal, using a 3 ns time window of the array of detector waveforms as input. Specifically, we investigate a 50x50x16 mm$^3$ LYSO crystal coupled to an 8x8 readout array of SiPMs. The required data is obtained from Monte Carlo simulations in GATE, where we further simulate the SiPM signals as a sum of bi-exponential functions centered around the optical photon detection times. Our simulation includes the effects of limited photon detection efficiency, dark counts, optical crosstalk, photon transit time jitter and electronic noise. The neural network can achieve a coincidence time resolution of 141 ps FWHM, a 26% improvement over the conventional methodology of averaging the first few timestamps obtained by leading edge discrimination (177 ps FWHM). In addition, the time resolution for the CNN remains uniform over the crystal, whereas the traditional methodology shows a large deterioration for gamma interactions close to the SiPM surface.