As many are aware, twenty-first century corporations are facing a crisis. Many corporations have been accurately and comprehensively storing data for years. The problem is, they are ill-equipped to do anything with this massive collection of data. Big data is so comprehensive that it has become unwieldy and the average corporation is unable to retrieve or organize this data in any useful way.
The fundamental difference between structured data and unstructured data, as you might expect, is that structured data is organized in a highly mechanized and manageable way. Structured data is ready for seamless integration into a database or well structured file format such as XML. Unstructured data, by contrast, is raw and unorganized. Digging through unstructured data can be cumbersome and costly. Email is a good example of unstructured data. It's indexed by date, time, sender, recipient, and subject, but the body of an email remains unstructured. Other examples of unstructured data include books, documents, medical records, and social media posts.
In an ideal world, a company's goal would be to turn all of the big data they've amassed into structured data. However, the cost and time associated with this is unfeasible. Email is almost unstructurable, for example, because humans almost never stick to a singular subject even in narrowly focused messages. The challenge of unstructured data is one of volume.
Importantly, this is not to say that structured data remains entirely unproblematic. By it's very nature, structured data needs to remain relatively simplistic and uncomplicated. A data point can only be called structured if it is simple, categorized, and entirely finite, which might suggest to readers that unstructured data is definitionally more interesting (and therefore worth archiving) data.
Knowledge and training can help overcome the obstacle of unstructured data, but this is also costly and users are continually demanding more intuitive tools. Thankfully there are several techniques that are providing the ability to identify patterns within unstructured data. Tools, such as Oracle Endeca Information Discovery, are breaking down walls between structured and unstructured data so they can be analyzed in-parallel to maximize the analytical power of an organization.
The crisis of big data can be conquered, and more effective archival solutions are beginning to emerge.