Recently a client of mine had a requirement to be able to search for keywords over a number of PDF documents remotely over a network. Google offers several products that can accomplish this goal with some caveats a client of mine had a requirement to be able to search for keywords within a large.
Google Desktop can accomplish this very efficiently but can only be used physically on a local computer. The Google Search Appliance products can also accomplish this but are cost prohibitive in my opinion.
My solution to accomplish the goal of using Google Desktop remotely via a network uses Apache Web Server as a reverse proxy server and the PHP scripting language. I used PHP but theoretically you could use a program like sed or another scripting language like Perl or Python to do the same thing.
Here it is in a few simple steps:
- Install Google Desktop (disable the sidebar since it’s unnecessary)
- Install Apache for Windows (configure as a reverse proxy server)
- Install PHP for Windows (with the command line interface option enabled)
- Restart the computer
- Configure Apache to use the mod_ext_filter module
- Configure the mod_ext_filter to run all content being returned to the browser through a script which changes the search results. (e.g.- Change the links to reference the PDF documents directly on the server.)
That’s basically it. The tricky part is step 6 above. This part is necessary because without it, clicking a link in the search results would open the PDF document on the server computer and not on a networked workstation.
Step 6 can also accomplish things like changing the Google Desktop logo or adding more information to the search results than Google Desktop provides.
If anyone reading is interested in implementing a solution like this, I can provide consulting.
Related Articles
No user responded in this post
Leave A Reply