7zip can EXTRACT my WORD DOCUMENT file !!!

Status
Not open for further replies.

raksrules

Oracle
This happened today when i wanted to zip a word document. I have 7zip installed on my machine and so i just right clicked on the word doc and went to the 7zip menu. It opened many other items.
Now one would expect that the first items would be
"Add to xyz.7z"
"Add to archive" and so on

But i was surprised to find the first as

"Open archive"
"Extract files" and so on :S

So i did an open archive on the word file and it showed 4-5 files and they were like this

[1]CompObj
[5]DocumentSummaryInformation
[5]SummaryInformation
1Table
Data
WordDocument

I also 'extracted' this word doc and got these all files. I know there may be a logical explanation for this but it was a first time for me when i saw this :ashamed:
 
Yeah I had heard about it, and the technique can be used to extract text from corruped files (I think they are stored in one of the xml's that are extracted). Nice find though.
 
hammerhead said:
Yeah I had heard about it, and the technique can be used to extract text from corruped files (I think they are stored in one of the xml's that are extracted). Nice find though.
That is true, I worked for a Microsoft Office Proccess
 
if you are talking about docx (office 2007 format), its basically a zip file with xmls inside.

if its doc you are talking about, thats surprising. I didn't know that earlier versions were also zips
 
frosty is correct. and OP is talking about Word .docx files only I bet, he didnt mention anything about the file being old format...

edit: basically many people in this thread have been living under a rock or something :D
 
omg omg omg omg omg omg this is earth shattering news.... really! i thot word doc files were a binary format.
 
truly, it is a mystery.... special prizes await the first person who posts conclusive info to get to the bottom of this...
 
Its not a mystery...all Microsoft Office documents are compound files and the data is stored in various containers within the files. Data categorize such as WordDocument, Data, 1Table, etc. office 2003 documents even have a MFT within it self as well.

Office 2007 also does the same thing however it is now using XML for the containers within the files and using the pkzip algorithm to compress the file.

This is mainly used to categorize and organize the Data for easy storing.

Hope this helps...
 
^^ Thanks for the details man

Its just that no one (or only few) knew about this funda before and i accidentally stumbled upon this. :)
 
IIRC MS has some sort of auto file type detection dll, so if OP's file is really a docx just renamed to doc then it might be a reason why this worked for him... but it doesnt look like normal docx contents anyway, perhaps an updated doc binary format only available with newer service packs?

what SP levels are your MS Office installs (@ raks and @hammer) and what are your versions of 7-zip installed?
 
^^ 7ZIP version is 4.65

MS Office patch, no idea :( These things get updated / installed automatically when i connect my machine to my company's network

PS: This is my office laptop not personal
 
Guys as someone pointed out earlier..
Those objects are stored in a container.. if you try opening the files inside the container.. it will be binary info only.

You can try with any version of 7-Zip. and a word 2003 doc, doesnt matter if its 1 yr old or not.
 
Status
Not open for further replies.