Man these guys are smart. http://www.slideshare.net/kevinweil/nosql-at-twitter-nosql-eu-2010
Talks about Twitters many different approaches to storage (Hadoop, Cassandra, FlockDB) and the tools used in querying of that data to answer questions (Pig)
Watched this presentation about how Sourceforge chose MongoDB for their customer facing webapp. You know, the one you go to download Azuerus and all those open source apps from
Sourceforge chose Mongo because it offered them high read performance although write was sucky but not needed for their app. They learnt that even though you can put things in one document, you didnt need to retrieve everything all at once and how they easily used up all network bandwidth between the web and mongo server. Additionally they found they didnt need memcache in between their persistence and the webservers as Mongo was fast enough to serve out the data as it was.
Much like Neo4J I blogged about recently, there is also CouchDB from Apache. CouchDB is a document database which isn’t a graph database like Neo4J, but what that exactly means I’m not sure yet – the purpose seem to overlap a fair bit. Both are aimed at the goal of no need to map objects to a relational schema and both claim to support the direct mapping of OO code to a DB entity. Additionally both have REST api’s.
At this early stage, I will probably say that the Graph DB (Neo4J) provides data about the relationships between nodes, whereas the document DB (CouchDB) is a collection of documents with an ID and named fields – ok that is not much different from Neo4J other than the associativity type, direction and properties aren’t specified and each record must have a unique ID in CouchDB)
Unlike Neo4J, CouchDB is licensed under Apache License V2 so uptake will probably be easier for a non-open source project to consider.
More info for CouchDB is available in their Intro and Overview pages on their site.
There was a recent BerlinBuzzwords conference which got mentions for interesting CouchDB and Neo4J talks. Many twitter peers who attended tweeted to the fact that the presentations were interesting though they arent yet up. The intent is that slides and vids should be up shortly for these. It also inspired the InfoQ article that suggested that Couch could be used as a Personal Database, something that SQLite prevoiusly gets top honours for.
If this is true, then CouchDB sounds like a suitable small database for development uses that can scale up quite easily to other types. The most important thing for me right now is that there is a Grails plugin to allow me to try it out. 🙂
A great article on the NoSQL movement, focusing on Graph Databases and the Neo4J implementation appeared on InfoQ.
Graph databases store a Node (aka vertex), that has properties (aka attributes), that is then linked to another node. The relationship (aka edge) between the nodes also has properties and has a direction and a type.
The article demonstrates that syntactically these databases look easy to manipulate. In Ruby (and presumably Grails) they map directly to classes and with the use of metaclasses and operator overloading, to allow the domain objects to be quickly in a neatly expressive way. For Groovy/Grails/Griffon, there is info on the Neo4j wiki
Because OO models are graphs, the need for ORM layer is removed. They map directly to the classes we are trying to store. Additionally the pain of upgrading the data in a class is reduced since adding a new column is less painful. More benefits can be found on the Neo4J website.
Neo4J is free for open source projects and there is an unsupported free single instance Basic edition for startups, but is paid for everyone else. There are 3 commercial editions, basic, advanced and enterprise priced at $588, $5,988 and $23,998 per annum (billed as monthly subscription). The free Basic is a good move, small companies who may elect a free alternative such as MySQL, Postgres or even the Oracle Express Edition. For the Basic & Advanced editions, you could argue that the development time saved would more than cover the fees charged.
The basic edition lacks monitoring which in my opinion should be included, especially if you are only going to get 2 support incidents per year with that package. However the advanced edition has that feature plus monitoring, a management console and allows hot backup and failover to a 2nd running node. It only differs itself from the high end one due to lack of high availability features and longer support response times. If you were using it for a production enterprise system, you’d probably want the 24/7 support and I think this pricing is comparable to a support subscription for any of the larger RDBMS vendors and with no upfront capex fees to note, though keeping in mind that at the enterprise level you’d be paying for multiple running nodes costs may run up quickly —- there is always negotiation though in this space.
This great article on codeproject talks about the different sorts of joins there are in SQL. Sounds pretty easy right, but then there are ones I dont remember learning in school, Left and Right EXCLUDING join, and Full Excluding Join. Before that, I’d be doing SELECT … MINUS SELECT.
For the uninitiated (like me), Microsoft SQL Server has T-SQL (think PL/SQL for Oracle). Of course being an MS product, it has to have some VB scripting integration, which is cool. Part of this integration lets you get your hands on a component called SQLMail and SQLAgentMail. These things plug into the MAPI mail system on the SQL server and send emails through it.
The downside of this is that in shared hosting environments, you may not have the luxury of setting up a mail profile on a PC and therefore MAPI may be lacking. There are a few alternates though, like specifying your own DLL and using it to send mail or some other means.
Either way, its good to know that it is possible to get your scheduled reports to send mails out when they are finished.
The links below cover a variety of means you can achieve this.