What you are looking for is not easy. I would suggest you to look into books of pattern matching and pattern recognition. The popular methods are through Fuzzy sustems. Alternatively you may want to search acm or ieee for similar papers.
If you want to compare in a simpler fashion, you need to first establish the parameters based upon which you will decide if two files are similar or not, for example word count, frequency of words, their occurance in the files etc.
Based on these 'descriptors' you will need to further use some thresholds to classify if the files match or not. For example two files may be similar if the differnece in their word count is not more than 10 and the frequency of some selected words are same. You may require good trees to efficiently solve the problem.
Also look into Edit distance
Edit distance - Wikipedia, the free encyclopedia
Levenshtein distance - Wikipedia, the free encyclopedia
two files will be similar if the edit distance to convert one to another is less than a certain threshold. That is the simplest also I could think of
btw out of curiosity, why do you need to make such a program?