Even if it's making my brain hurt at times!
Let me try to simplify... It's all about "matching dots." You have a scene with a certain amount of dots in it (details), which is then projected as dots (points of light) by the lens, which is collected by dots (pixels) on the sensor... finally to be output as dots (screen pixels/printer DPI).
When "the dots" are the same size at every stage you have maximum detail. If at any point the number of dots is smaller/size of dots larger, then that will be the limiting factor in terms of detail.
* If you take a picture of a white wall w/o texture, there will be no detail because it doesn't exist...
* A lens projects larger dots at smaller apertures**. You need to use a lens that is sharp at a wide aperture and set to that aperture in order to resolve fine details/fit on small pixels...
i.e. at f/11 a (non-existent) perfect lens projects dots of ~15 micron diameter (yellow/green wavelength). ~ 16M can be resolved by a FF sensor, ~7M can be resolved by an APS sensor, and ~4M can be resolved by a M4/3 sensor. But at f/2 the projected airy disks are (theoretically) only ~2.7microns in diameter and ~470M could be resolved by a 470MP FF sensor (1.4 micron pixels).
* the sensor always has it's resolution (i.e. 24MP). The only question is how much detail it is actually resolving as projected by the scene/lens (and also affected by AA filters, Bayer array/demosaicing, etc).
* the display resolution will (potentially) limit the viewable detail. I.e. a 100PPI display, an inkjet print with "dot bleed," and the human eye's limit of ~ 12MP*** (for critical sharpness).
Technical clarifications/unsimplified:
**a less than perfect lens (i.e. all of them) will project larger dots at maximum/larger apertures due to optical errors. The "sweet spot" is the point of stopping down to correct for optical errors without adding diffraction (i.e. the smallest dots the lens is capable of). Typically ~ 2 stops from wide open.
**The yellow/green wavelength is the most important for digital; you get (roughly) 2x more blue and 1/2 as many red (dots).
**The maximum resolution noted is based on 2 pixels per airy disk which is optimal for a Bayer array w/ AA filter.
*** for undetectable dots the requirement is ~39MP for a 45* diagonal 3:2 print; if you consider scanning a 120* FOV in all directions then the MP's required to be *undetectable* by the human eye at any distance is ~580... something no digital system is capable of (at the moment/AFAIK).
**** More/smaller pixels gather less light resulting in more (relative) signal noise which can obscure detail/resolution.
***** The Bayer filtering restricts light/color per pixel. More pixels (i.e. smaller) is better in order to compensate.
To put this in a "workable example":
Most scenes contain more detail than we can record. In order to record 36MP on a D800/e you need a lens that is w/o optical errors at/by f/5.6 and used at/wider than f/5.6 (it's actually closer to f/7.1, but I don't have a specific/technical reference for that). And you need to be at/near minimum ISO with good light. This will provide maximum resolution/detail (2 pixels per airy disk) with minimum signal/photon noise and maximum color information. And you need to use a SS (or tripod) to prevent any blurring of the image across the tiny pixels.
If any of that is not happening, then you are not getting the full 36MP/DR/Color/Tonality the camera is potentially capable of. If all of that is happening and the image is then printed in a 3:2 aspect the pixels/dots will be virtually undetectable, but the print would need to be made at ~ 700ppi (actual) on very high quality paper.