Looking at the code, it looks like they used existing Python packages to read and parse MS Office formats, not what I expected, seeing that the repo is in Microsoft's org on GitHub I expected them to have used Microsoft's "official" libraries for parsing these formats, through Component Object Model (COM).
They used Mammoth for docx (Word) [1][2]
Python-pptx for ppt (PowerPoint) [3][4]
and Pandas for XSLX (Excel) [5]