Over the last decade western societies have begun to generate vast quantities of data. Examples include:
1) Sales statistics
2) Medical records
3) Web traffic data
4) Media usage data (scrobbling, play counts etc.)
5) User generated data (facebook pages, text messages, emails etc.)
6) Online gaming data (second life, online board game archives etc.)
We have also begun to mine this data for useful patterns.
I give two major problems with the way in which we are doing this.
A) The privacy of this data is often undermined by the careless manner in which it is stored. Whilst the data has great economic value (mostly to companies and governments but also to the consumer/citizen) it can also be used for new and worrying kinds of abuse.
B) We depend too much on people to analyze this data. In my opinion the main limiting factor in gaining economic advantage from this data is our ability to process it.
I predict two trends over the next decade with regard to data mining.
Firstly there will be significant abuses of people's personal data resulting in a backlash against the mass storage of such data. This will cause more data to be stored in a distributed or properly encrypted form. Computation on this data may be performed in a distributed manner too with the programs that perform the computations being more likely to be under the control of the users.
Secondly simple artificial intelligence agents (not much more intelligent than insects) will proliferate as the cost of the necessary hardware to run them approaches zero. These agents will make be called upon to dramatically reduce the burden of decision making for users.
Various developments have been hampered in the past by the unwillingness of people to make large numbers of trivial choices. Some examples include:
1) Highly efficient payment structures (for text, music and video) which require large numbers of individual payment choices.
2) Consumer boycotts based on the complexities of the supply chain and company financing.
So I think that we'll deal with the flood of information by finding better ways to control the movement of that information and by finding better ways to utilize the value in that information.