<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Connecting Things - Ross Bates &#187; Databases</title>
	<atom:link href="http://www.rossbates.com/category/databases/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.rossbates.com</link>
	<description></description>
	<lastBuildDate>Thu, 17 Dec 2009 21:05:32 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.6</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>MySQL Upgrade Issue</title>
		<link>http://www.rossbates.com/2009/10/mysql-upgrade-issue/</link>
		<comments>http://www.rossbates.com/2009/10/mysql-upgrade-issue/#comments</comments>
		<pubDate>Tue, 13 Oct 2009 17:20:59 +0000</pubDate>
		<dc:creator>Ross</dc:creator>
				<category><![CDATA[Databases]]></category>
		<category><![CDATA[Misc]]></category>
		<category><![CDATA[mysql]]></category>

		<guid isPermaLink="false">http://www.rossbates.com/?p=300</guid>
		<description><![CDATA[I just spent more time than I should have troubleshooting why the upgrade of MySQL from 5.0 to 5.1 on a Debian box resulted in a MySQL instance that wouldn&#8217;t start. Not a lot out there on this so hopefully this will save someone a bit of time in the future.
When upgrading from 5.0 to [...]]]></description>
			<content:encoded><![CDATA[<p>I just spent more time than I should have troubleshooting why the upgrade of MySQL from 5.0 to 5.1 on a Debian box resulted in a MySQL instance that wouldn&#8217;t start. Not a lot out there on this so hopefully this will save someone a bit of time in the future.</p>
<p>When upgrading from 5.0 to 5.1 using apt everything will install normally. Then when the MySQL service tries to restart you&#8217;ll see and init.d error and an error that looks something like this:</p>
<p style="padding-left: 30px;"><em><strong>Errors were encountered while processing:mysql-server-5.1mysql-server</strong></em></p>
<p>Not a lot to go on here but as it turns<em> </em>there is a deprecated entry in the my.cnf file called <strong>skip-bdb</strong>.  Comment this line out and you should be good to go.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.rossbates.com/2009/10/mysql-upgrade-issue/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Structuring the Unstructured</title>
		<link>http://www.rossbates.com/2009/08/structuring-the-unstructured/</link>
		<comments>http://www.rossbates.com/2009/08/structuring-the-unstructured/#comments</comments>
		<pubDate>Thu, 06 Aug 2009 15:28:22 +0000</pubDate>
		<dc:creator>Ross</dc:creator>
				<category><![CDATA[Analytics]]></category>
		<category><![CDATA[Databases]]></category>

		<guid isPermaLink="false">http://www.rossbates.com/?p=281</guid>
		<description><![CDATA[Martin Willcox from Teradata wrote a couple of blog posts outlining the reasons why he feels the phrase &#8220;unstructured data&#8221; is marketing jargon and that &#8220;nontraditional data&#8221; is more appropriate.
Let me start by saying that the examples Martin uses in the first post are technically accurate if we were all disk manufacturers. Whether bitmap (audio, [...]]]></description>
			<content:encoded><![CDATA[<p>Martin Willcox from Teradata wrote a couple of <a href="http://www.teradata.com/t/blogs/emea/its_data_Jim_but_not_as_we_know_it/">blog</a> <a href=" http://www.teradata.com/t/blogs/emea/Its-data-jim-but-not-as-we-know-it-Part2/">posts</a> outlining the reasons why he feels the phrase &#8220;unstructured data&#8221; is marketing jargon and that &#8220;nontraditional data&#8221; is more appropriate.</p>
<p>Let me start by saying that the examples Martin uses in the first post are technically accurate if we were all disk manufacturers. Whether bitmap (audio, video) or text (email, html), it&#8217;s true all of these file types use a structured format when being processed by a computer. That being said, we are not all disk manufacturers.</p>
<p>As a data architect I&#8217;ve always felt the true spirit of the phrase &#8220;unstructured data&#8221; corresponds to the modeling and analysis of the data. If you have a collection of objects in an email, an image, or web page&#8230; then these things are unstructured. They tell you nothing without the context of the structured model.</p>
<p>If this were simply a preference in terminology then I wouldn&#8217;t think too much of it, but when a relational database vendor claims that &#8220;nontraditional&#8221; (unstructured) data is easily converted to &#8220;traditional&#8221; data by running fact/entity extraction routines and loading a table it makes me stop and question the true intent of the original message. It&#8217;s not as simple as pushing a button, and an RDBMS is most often not your best option. This isn&#8217;t something which should be glossed over.</p>
<p>The problem is that when using a relational database schema the relationships, attributes, and quantities must be defined before running any extraction routines. That&#8217;s ok when running against a fixed set of data looking for a known set of attributes/measures &#8211; but when you are mining millions of images or billions of web pages all of the edges don&#8217;t start to show up until you actually start to extract and analyze the data. In this situation a relational database actually makes it harder to consume unstructured data due to the high cost associated with schema changes</p>
<p>To me the term unstructured makes sense&#8230; it&#8217;s simply the inverse of structured. Data without a model if you will.  And remember, the larger and more diverse the data set, the less you will know about it&#8217;s characteristcs ahead of time.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.rossbates.com/2009/08/structuring-the-unstructured/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Getting Into Amazon EC2</title>
		<link>http://www.rossbates.com/2009/07/getting-into-amazon-ec2/</link>
		<comments>http://www.rossbates.com/2009/07/getting-into-amazon-ec2/#comments</comments>
		<pubDate>Mon, 20 Jul 2009 03:45:47 +0000</pubDate>
		<dc:creator>Ross</dc:creator>
				<category><![CDATA[Cloud Computing]]></category>
		<category><![CDATA[Databases]]></category>
		<category><![CDATA[aws]]></category>

		<guid isPermaLink="false">http://www.rossbates.com/?p=262</guid>
		<description><![CDATA[I spent some time this weekend diving deeper into Amazon&#8217;s EC2 and all of the associated services. I&#8217;ve read about EC2, discussed it with colleagues, I pretty much thought I knew what it was all about&#8230;.. virtual hosting right? Yeah, I was wrong. It was going through the process off setting up an instance and [...]]]></description>
			<content:encoded><![CDATA[<p>I spent some time this weekend diving deeper into Amazon&#8217;s <a href="http://aws.amazon.com/ec2/">EC2</a> and all of the associated services. I&#8217;ve read about EC2, discussed it with colleagues, I pretty much thought I knew what it was all about&#8230;.. virtual hosting right? Yeah, I was wrong. It was going through the process off setting up an instance and configuring all the network and storage services completely that changed my perspective. EC2 is really, really, cool.</p>
<p>What is really rocking my world is the whole concept of throw-away servers. The idea that a discrete process can spin up a new server that gets built at run time, does some work, then disappears is amazing.  I see this as turning the whole concept of linear scale on it&#8217;s head. You don&#8217;t scale an app, you scale individual threads. Powerful stuff, especially when dealing with data mining and event processing.</p>
<p>Much more coming soon&#8230;..</p>
]]></content:encoded>
			<wfw:commentRss>http://www.rossbates.com/2009/07/getting-into-amazon-ec2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>ORM, RDF, and Jon Postel</title>
		<link>http://www.rossbates.com/2009/07/orm-rdf-and-postel/</link>
		<comments>http://www.rossbates.com/2009/07/orm-rdf-and-postel/#comments</comments>
		<pubDate>Mon, 06 Jul 2009 19:57:45 +0000</pubDate>
		<dc:creator>Ross</dc:creator>
				<category><![CDATA[Collaboration]]></category>
		<category><![CDATA[Databases]]></category>
		<category><![CDATA[Linked Data]]></category>

		<guid isPermaLink="false">http://www.rossbates.com/?p=249</guid>
		<description><![CDATA[Had a great conversation with @cks earlier about the dichotomy between ORM and RDF/OWL when modeling enterprise data. His position was that with a pure ORM model you are more likely to have consistent data throughout your applications because the rules &#38; constraints have laid out before the user ever touches the keyboard. Where he [...]]]></description>
			<content:encoded><![CDATA[<p>Had a great conversation with <a href="http://twitter.com/cks">@cks</a> earlier about the dichotomy between ORM and RDF/OWL when modeling enterprise data. His position was that with a pure ORM model you are more likely to have consistent data throughout your applications because the rules &amp; constraints have laid out before the user ever touches the keyboard. Where he felt RDF/OWL was at odds with the ORM model is that by giving people the ability to create relationships that trigger additional inferences you must trust them to understand the implications of their actions.</p>
<p>A simple example&#8230;.</p>
<pre>    :trees :madeOf :wood</pre>
<pre>    :trees :need :water</pre>
<p>Now someone comes along and creates this:</p>
<pre>    :table :madeOf :wood</pre>
<p>At this point it&#8217;s incumbent upon all dependent applications to understand that &#8220;:table&#8221; does not in fact need water. Again, the key is that every application must be aware of and apply the same rules or data integrity suffers. It&#8217;s not that tradtional ORM and RDF/OWL can&#8217;t coexist, in some companies it may be an integrated process. Where @cks was concerned is that the inferences inherent to RDF/OWL introduces issues with consistency and integration because it&#8217;s so easy for new rules to simply pop up.</p>
<p>I agree with everything up to this point, but where I would argue we need to be headed with enterprise apps is a hybrid model that blends the consistency &amp; predictability of ORM, with the freedom of RDF/OWL.</p>
<p>First let&#8217;s quickly take a moment to talk about freedom. When I hear a programmer use the phrase &#8220;never trust the user&#8221; I scratch my head. Sure you should sanitize application input for the sake of security but let&#8217;s be realistic about it, business users do not intentionally inject crap into the system. They use software to get things done. Humans will make mistakes, but so do the software applications that were written by&#8230; well, humans.</p>
<p><strong><em>The user is the most important component of software development. If that sounds obvious, then why don&#8217;t we trust them more? </em></strong>My hope is that developers begin putting more trust in the user with a focus on creating software that learns from the users instead of limiting them.</p>
<p>So back to the hybrid model. I picture RDF/OWL as the essential meta layer above the ORM. By abstracting it with interfaces that are usable to non-techies it becomes an engine for collecting knowledge about the relationships and attributes of the business across all dimensions. We shouldn&#8217;t be concerned with modeling absolute and irrefutable truths, because tomorrow there will be an exception. That&#8217;s the problem with strict models in the enterprise, there will always be exceptions. Next, the ORM layer follows on as an application specific module where you can extract pieces of the meta layer to digest, analyze, and make use of the data at a domain level.</p>
<p>It&#8217;s about putting a higher priority on the collection of information than on enforcing rules on the information. The principle reminds me of the great quote from <a href="http://en.wikipedia.org/wiki/Jon_Postel">Jon Postel.</a></p>
<p style="padding-left: 30px;"><em>&#8220;be conservative in what you send, be liberal in what you receive&#8221;</em></p>
<p>Postel is of course referring to the <a href="http://tools.ietf.org/html/rfc793">Transmission Control Protocol</a>, a language that computers use to speak to each other over the internet.  To me however  these words have a more universal meaning in the world of software development which I&#8217;d categorize like this:</p>
<ul>
<li><em>Listen more<br />
</em></li>
<li><em>Talk less<br />
</em></li>
<li><em>Prepare for exceptions<br />
</em></li>
<li><em>Trust until you are given a reason not to</em></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.rossbates.com/2009/07/orm-rdf-and-postel/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Data Migration for CouchDB</title>
		<link>http://www.rossbates.com/2009/07/data-migration-for-couchdb/</link>
		<comments>http://www.rossbates.com/2009/07/data-migration-for-couchdb/#comments</comments>
		<pubDate>Thu, 02 Jul 2009 03:30:54 +0000</pubDate>
		<dc:creator>Ross</dc:creator>
				<category><![CDATA[Databases]]></category>
		<category><![CDATA[couchdb]]></category>
		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://www.rossbates.com/blog/?p=170</guid>
		<description><![CDATA[Something that's currently missing from CouchDB is a way to import/export documents. This feature will likely to make it into the core functionality of CouchDB in due time, but say you need a way to get your data out of CouchDB... like today. Well here's how you can do it.]]></description>
			<content:encoded><![CDATA[<p>Something that&#8217;s currently missing from <a href="http://couchdb.apache.org/">CouchDB</a> is a way to import/export documents. This feature may be added to CouchDB one day, but say you need a way to get your data out of CouchDB&#8230; like right now. Well here&#8217;s how you can do it.</p>
<p>Before getting started one quick side note about dealing with CouchDB data files. When you create a new database there is a corresponding {db}.couch file created that is your actual &#8220;database&#8221;. It&#8217;s usually in /var/lib/couchdb, but if not check DbRootDir in your /etc/couchdb/couch.ini for the location (update: for 0.9.0 it&#8217;s now database_dir in /etc/couchdb/default.ini)</p>
<p>Under normal circumstances you have the ability to take hot backups of these files at anytime using rsync, cp, etc&#8230; it&#8217;s simply a file. The thing that got me stuck was when CouchDB went from 0.8.0 to 0.9.0 and the internal file format changed. The result was that the data needed to be moved programmatically across databases using raw JSON.</p>
<p>If you search the CouchDB mailing lists for how to get your data migrated you&#8217;ll likely come across references to the <a href="http://code.google.com/p/couchdb-python/">couchdb-python</a> utilities. Dig more and you&#8217;ll see references to the tools/dump.py and tools/load.py scripts. That&#8217;s about where the trail ended for me, but after some hacking around I&#8217;ve successfully moved data from 0.8.0 to 0.9.0. As an added bonus I was able to get my hands dirty with the couchdb-python library which has been fantastic so far.</p>
<p>One more side note, this time about couchdb-python. If you are new to CouchDB I would still recommend starting with Futon, Views, and the REST API before you move to a client library (Python or other). It will help you conceptualize how CouchDB is way more than a massive hash table or fancy object store.</p>
<p>So to the task at hand&#8230;. Assuming you have Python 2.4 or later you&#8217;ll need to install 3 things.</p>
<p style="padding-left: 30px;"><a href="http://code.google.com/p/httplib2/">httplib2</a> &#8211; This is a Python HTTP lib, I was able to install it via apt-get on Debian.  There are packages <a href="http://code.google.com/p/httplib2/wiki/Install">available</a> for other distros.</p>
<p style="padding-left: 30px;"><a href="http://pypi.python.org/pypi/simplejson">simplejson</a> -  Python egg for JSON manipulation.</p>
<p style="padding-left: 30px;"><a href="http://pypi.python.org/pypi/CouchDB">couchdb-python</a> &#8211; Python egg for CouchDB.</p>
<p>I was able to install the egg files using the Python easy_installer.</p>
<p>The next step is to grab tools/dump.py and tools/load.py from CouchDB egg file. To do this you need to unzip the CouchDB .egg that&#8217;s in site-packages and extract the files to a directory of your choice. This seems like a strange method, but it works. Someone let me know if I&#8217;m missing an easier way.</p>
<p>To begin the database dump run dump.py and pass the full URL to the database you are exporting. Make sure to redirect output in order to capture the JSON.</p>
<p style="padding-left: 30px;">./dump.py http://source-couchdb:5984/msg_db &gt; msg_db.json</p>
<p>Once your export completes copy the .json file and the load.py to the same directory and run the following command to import the file to your target database.</p>
<p style="padding-left: 30px;">./load.py &#8211;input=msg_db.json http://target-couchdb:5984/msg_db</p>
<p>Make sure you create the target database before you run the script or it will fail. You&#8217;ll know everything is working if you see a series of statements that looks like this:</p>
<p style="padding-left: 30px;">Loading document &#8216;bda90174c1a41bad2289bfc5829008ce&#8217;<br />
Loading document &#8216;e45d7c2850610a01658234eeddde1fde&#8217;<br />
Loading document &#8216;e856071c791cd677eafbce85bb1509de&#8217;</p>
<p>After it completes, you can fire up Futon and you&#8217;ll see all your precious data has been loaded into your new instance of CouchDB. Victory!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.rossbates.com/2009/07/data-migration-for-couchdb/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
	</channel>
</rss>
