I'm a big fan of search. The ability to use the internet to cull information on virtually any topic with just a few clicks has made me more efficient and better informed. And "information" can come in the form of pictures, documents, videos, news feeds -- whatever you need.
So you might think that when my company's application team told me they wanted to initiate an enterprise search project, I would have jumped on board. Not quite. For security and legal reasons, enterprise search can lead to real problems if not deployed with excruciating care and strict governance.
If security concerns aren't addressed, this is what you can expect: The IT team does some research, makes a choice, deploys the infrastructure and begins pointing it to data repositories. Before you know it, someone conducts a search with a term like "M&A" and turns up a sensitive document naming a company that's being considered for acquisition, or a search for the word "salary" reveals an employee salary list that was saved in an inappropriate directory. In other words, people will be able to find all manner of documents that they shouldn't have access to. It's a flagrant violation of what is probably my most important security philosophy: the rule of least privilege.
The rule of least privilege, which I have discussed here many times before, holds that information should be accessible only by those who have a need to know it. When you apply this rule to enterprise search, it means that searches should turn up only those document names, associated metadata and, most important, content that the searcher is allowed to see.
When it comes to controlling access and exposure to searchable data, you can rely on the techniques referred to as early binding and late binding, or you can adopt a hybrid model. With early binding, users decide who can access a document when they add it to the search index. With late binding, the decision is made when a query is submitted. Early binding is much more complex to set up and maintain but offers better performance. My recommendation, though, is a hybrid approach, which offers the best of both worlds. Of course, you will have to consider the pros and cons and weigh them against your own organisation's needs.
The fact that your enterprise search results will be provided via a URL can cause another problem. You need to make sure that such URLs can't be manipulated to provide access to other documents or data. For example, a URL such as www.company-intranet.com/go?viewdoc=210 might be open to manipulation by simply changing the "210" to another number.
My next concern is about access to the administrative and back-end infrastructure of the search technology, as well as any third-party or bundled data analytics tools and any back-end disk storage. Access to those resources should be limited based on the rule of least privilege. All of that infrastructure must also comply with our configuration management and baseline security configurations.
I also want to make sure that the use of enterprise search is restricted to authenticated domain members. We don't want vendors or guests doing searches for data that they shouldn't see.
Another potential problem is that some search engines use caching to serve up frequently accessed data. I'll need to be sure that any caching technology conforms to our data retention policies and that there aren't any e-discovery issues.
Finally, the search infrastructure will need constant oversight to ensure that no document libraries are added without having accessibility rules assigned to them and that employees don't save documents in existing libraries that allow wider access than the document deserves.
Enterprise search is like much else in the enterprise: very powerful and extremely useful, but risky and in need of constant attention.
Mathias Thurman is the psedonym of a real IT security manager. He can be contacted at firstname.lastname@example.org.