In long-haul optical communication systems, compensating nonlinear effects through digital signal processing (DSP) is difficult due to intractable interactions between Kerr nonlinearity, chromatic dispersion (CD) and amplified spontaneous emission (ASE) noise from inline amplifiers. Optimizing the standard digital back propagation (DBP) as a deep neural network (DNN) with interleaving linear and nonlinear operations for fiber nonlinearity compensation was shown to improve transmission performance in idealized simulation environments. Here, we extend such concepts to practical single-channel and polarization division multiplexed wavelength division multiplexed experiments. We show improved performance compared to state-of-the-art DSP algorithms and additionally, the optimized DNN-based DBP parameters exhibit a mathematical structure which guides us to further analyze the noise statistics of fiber nonlinearity compensation. This machine learning-inspired analysis reveals that ASE noise and incomplete CD compensation of the Kerr nonlinear term produce extra distortions that accumulates along the DBP stages. Therefore, the best DSP should balance between suppressing these distortions and inverting the fiber propagation effects, and such trade-off shifts across different DBP stages in a quantifiable manner. Instead of the common ‘black-box’ approach to intractable problems, our work shows how machine learning can be a complementary tool to human analytical thinking and help advance theoretical understandings in disciplines such as optics.