Over on Coding Horror Jeff Atwood wrote a post which claimed that “All Programming is Web Programming“. Jeff made some good points about how the web provides programmers with the ability to reach an audience of a previously unimaginable size. Definitely agree. Also mentioned was that for better or for worse JavaScript is becoming the most important language in the world of software development. I agree with this as well, though I would add that the significance of JavaScript is in user facing applications only at this point.
It was unfortunate that the post was written in such a polarizing manner and that the comment thread quickly eroded into a shouting match because there is another important point to be made here.
Something else I want to put out there to developers is that the evolution of programming should be focused less on desktop vs web vs embedded or choice of language/platform, and more on how lowering the barriers to entry for new programmers is a positive and not a negative.
To someone writing device drivers or kernel patches the idea of writing a JavaScript function to manipulate the DOM may seem “uninteresting”, but the fact is that more and more people are getting started with programming this way. All they need is a text editor and a web browser and they are on their way. This is a very good thing.
Programming is about automation and automation is about improving efficiency. The more people we can somehow involve in this process the better because in the end the web provides not only the largest number of potential users, but also the largest number of potential programmers. The exciting thing is we are just getting started.
Martin Willcox from Teradata wrote a couple of blogposts outlining the reasons why he feels the phrase “unstructured data” is marketing jargon and that “nontraditional data” is more appropriate.
Let me start by saying that the examples Martin uses in the first post are technically accurate if we were all disk manufacturers. Whether bitmap (audio, video) or text (email, html), it’s true all of these file types use a structured format when being processed by a computer. That being said, we are not all disk manufacturers.
As a data architect I’ve always felt the true spirit of the phrase “unstructured data” corresponds to the modeling and analysis of the data. If you have a collection of objects in an email, an image, or web page… then these things are unstructured. They tell you nothing without the context of the structured model.
If this were simply a preference in terminology then I wouldn’t think too much of it, but when a relational database vendor claims that “nontraditional” (unstructured) data is easily converted to “traditional” data by running fact/entity extraction routines and loading a table it makes me stop and question the true intent of the original message. It’s not as simple as pushing a button, and an RDBMS is most often not your best option. This isn’t something which should be glossed over.
The problem is that when using a relational database schema the relationships, attributes, and quantities must be defined before running any extraction routines. That’s ok when running against a fixed set of data looking for a known set of attributes/measures – but when you are mining millions of images or billions of web pages all of the edges don’t start to show up until you actually start to extract and analyze the data. In this situation a relational database actually makes it harder to consume unstructured data due to the high cost associated with schema changes
To me the term unstructured makes sense… it’s simply the inverse of structured. Data without a model if you will. And remember, the larger and more diverse the data set, the less you will know about it’s characteristcs ahead of time.