Mongo DB

Watched this presentation about how Sourceforge chose MongoDB for their customer facing webapp. You know, the one you go to download Azuerus and all those open source apps from Winking smile

Sourceforge chose Mongo because it offered them high read performance although write was sucky but not needed for their app. They learnt that even though you can put things in one document, you didnt need to retrieve everything all at once and how they easily used up all network bandwidth between the web and mongo server. Additionally they found they didnt need memcache in between their persistence and the webservers as Mongo was fast enough to serve out the data as it was.

August 22, 2010

Net Neutrality

Recently a colleague sent me a link to an online petition regarding the concept of Net Neutrality being under threat by a recent deal between Verizon and Google. http://www.commoncause.org/site/pp.asp?c=dkLNK1MQIwG&b=1234951

The dialog signals a prioritisation of internet traffic for the rich, fast lanes for large corporations who can pay for them as well as the ability for ISPs to block traffic and prevent legitimate usages of the internet.

I found this ‘cause’ to be more fear mongering rather than based on true facts and whilst some points were correct, the practicalities of what was being proposed were apparently lost in fear, uncertainty and doubt. So switching into soapbox mode, here was my reply:

There are some valid concerns here, but I think there is a bit of FUD going on here. They are trying to say ISPs have these tools to control traffic and shouldn’t be allowed to use them, but I think those same tools are necessary to have a working network for all users.

What got my back up in particular, was the bit where they claim that a corporation can buy a fast lane to their websites and supposedly lock smaller sites into slower speeds. I wonder how technically speaking that they can do this feasibly and economically? I mean its always been possible for an ISP to route traffic whichever way it chooses and prioritise some data over others & shape the speed of traffic also but I cant see how a router could handle slowing ALL traffic except the ones going to a certain address to create this artificial highway. I don’t think network admins would want the headaches of shaping all their users en-masse.

Given my previous working life was all about building products to shape users who had gone over their download limits as well as products to do with prioritisation of data (Quality of Service), I understand this a fair bit. Unfortunately I see it being taken a bit out of hand. To implement this practically, loads on a router have to be considered, the more shaping and prioritisation you do, the hotter a router runs and then because you are applying features to thousands of users, not just a small handful who are paying for particular features, you increase the chances of bugs and other wierd stuff occuring. Eg more calls to Cisco to have them diagnose your equipment because you are using a feature in a way that they didn’t envisage.

Another thing about the practicalities of this is to what effect Google (primarily a content provider) and Verizon (one bandwidth provider in a market of many) can do together to influence the speeds of internet surfing for those outside their networks… pretty much zip, nada, zilch from a technical perspective. Maybe Google can use its weight to make other ISPs pay a Google tax for content leaving its network, just as the big 4 ISPs of Australia (Telstra Optus AAPT Primus) do to the littler ISPs, but all it does is have an effect on the cost of providing the service, and the fact that although the industry is in a period of consolidation we still have plenty of ISPs in Australia means that such network access costs cant be prohibitive otherwise we’d only have 4 internet providers in Australia. Ingenuity comes in, people change providers, companies build other networks to connect countries together and they build a market whereby if an ISP, or an end user of an ISP doesnt like the service they are getting, they can switch to another. How can these laws distinguish from an ISP putting efforts to shape your traffic versus the existing efficiencies used to deliver you a cheap internet service that would otherwise be more expensive if you had your own 24/7 line direct to the ISP? (It probably could but would have to be worded carefully and require auditing of ISPs unlike that previously seen before in order to ensure that the lack of service you were getting was because of a infrastructure or product limitation rather than a ISP governance one)

In my opinion this thing they have been trying to fight has been happening in the industry from the days of dialup. Ever been with an all you can eat popular dialup ISP who was oversubscribed? You’d get shaped without knowing if you were a big kahuna user, and your phone line would get disconnected and you’d have to dial back in again. Its unfair that an ISP oversells its services such that the paying customers experience is pretty weak and stops decent ISPs from offering more value add services for free because they are too busy trying to cut costs to compete with these guys. In an ideal world Net Neutrality can come into play here, forcing the industry to guarantee bandwidth and making them accountable to external auditors to ensure that they have enough bandwidth for subscribers, but its not a simple equation – your internet has always been oversubscribed from day one. Even your DSL or cable link gets shared with your neighbours using the same exchange – you’ve been contending for bandwidth with your neighbourhood and probably didnt know it. The business model is that not all users are using the network 100% of the time. Days of large internet traffic like when Obama gets inaugurated and its an unusually busy day happen, but usually there is enough bandwidth to go around to support it.

There is some ISP filtering that is good, for example blocking Windows port 445 so that common viruses don’t get in to PCs and spread, preventing even more network traffic and headaches for users and admins alike. And as much as users hate getting shaped when they exceed their monthly bandwidth, its probably a good thing they aren’t slowing it down for everyone else. I have seen flooded links and network engineers sweating when a particular part of Australia goes overboard. The thing I remember about internet protocols are that they are generally not very good when they get saturated. A link is good at about 70-80% capacity but more of that starts making performance take a huge nosedive. Having some control so everyone gets their fair share is a good thing, although always unpopular…. eg who likes water restrictions?

Everything else they say about blocking services, the example about dropping VOIP packets so people use old school telephone instead of cheaper internet calls, is a perfectly valid concern. Its a restriction of trade. Could you imagine the outcry when a business (or group of) where denied access to amenities – water, electricity that were used for producing an income just because the utility company was unregulated or just evil?

So, I agree with the concept of net neutrality, but I dont want to see it taken out of hand and removing network administration safeties to keep users free of issues, nor removing restrictions that see them clogging up bandwidth for users with more legitimate internet concerns.

August 22, 2010

Reading Other Peoples & your own old code

Old code is hard to read. Even your own. Its not uncommon 6 months down the track, to experience a bit of “what was I thinking when I wrote that?”. Its one of the first things drummed into me by senior devs at my old work. The best way to avoid this is to make your code as easy to understand as possible and I believe to do that you need to keep your code modules small (classes & methods) so that they are easy to test.

But as I try to work with old code bases with long methods and classes with lots of dependencies that are sometimes difficult to test, and with shrinking deadlines to do anything about them, the messages about good practices fall by the wayside. Its refreshing to see a blog like this, Beautiful Code: A Sense of Direction, talk about the core tennets of keeping things simple, small cohesive classes and refactorings to reduce LOC in addition to a nice example at the start that I read as even your own sh—, um, crap stinks.

BTW, the article made reference to a site that I had come across previously but hadn’t bookmarked which appears to be a great resource on Design Patterns (and Antipatterns) and UML called SourceMaking.com

August 22, 2010

Embedding Tomcat 7 in your App

So what if you ship with a plain old Apache server just to run some aging PHP. If you have an server app running all the time, according to this post, it looks like you can embed Tomcat, just like you can embed Jetty

August 22, 2010

Strings and intern

Strings, yawn, boring. They are kinda the bread and butter of our development but lets face it, we can take em for granted sometimes.

So some neat tips from this blog post about Strings in Java. The most useful one being that the compiler builds a string table of all strings that are in code. The compiler is also smart enough to put strings together, a feature called compiler folding. Eg “a” + “aa” in code becomes “aaa” in the string table and vars reference the same spot in the string table. All variables reference the premade strings in the string table which leads to faster equals() operations since String#equals() method checks reference equality first before doing a string size or character by character comparison.

The problem is that if you want the faster equals, you have to have these strings at compile time. If you have something like

String s1 = “a”; String s2 = “aa”; String s3 = s1 + s2;

then equals performed on s1 and s2 will be reference equals, really fast, but s3 won’t since it will be evaluated at runtime.

Thankfully a simple call to the String’s intern method will register a string in the string table

s3 = s3.intern();

The rest of the blog talks about common mistakes with String equality and understanding immutability. Scarily there was a potential memory leak problem with the substring() method which holds a reference to the original string’s internal value field when providing the sub string.

More importantly talks about different options to compare strings of different locales which have multiple unicode character combinations refer to the same printed character or different characters being considered equal in a locale but not as byte equals. All things I was never aware of until reading this blog.

August 15, 2010

Facebook as a business platform

I recently attended the Thoughtworks Quarterly Briefing which was about their Tech Radar – a whats hot in upcoming and existing software dev technologies. One thing that came up as surprising was the idea of Facebook as a Business Platform.

My first thought, Facebook for business?, am I doing it wrong? But its actually a pretty neat idea. The concept is simple, if so many people are using FB as a platform for games (Farmville) and other apps then why not business apps on there? It can be used as a powerful marketing tool or a business information platform. It is not much different to the way businesses have caught onto SMS to send info around and so if the infrastructure is there, being maintained by facebook, then why not use it?

Its funny when you hear an idea and then see it replicated elsewhere – moments of synchronicity perhaps?. In my email this morning there was an article about Delta airlines allowing users to purchase tickets on FB.

August 1, 2010

B12

Short of B12? From: http://www.vegsoc.org/info/b12.html

Human faeces can contain significant B12. A study has shown that a group of Iranian vegans obtained adequate B12 from unwashed vegetables which had been fertilised with human manure. Faecal contamination of vegetables and other plant foods can make a significant contribution to dietary needs, particularly in areas where hygiene standards may be low. This may be responsible for the lack of aneamia due to B12 deficiency in vegan communities in developing countries.

Ok but seriously, if you need some B12. Good sources:

Liver (more B12 by a mile but going to taste awful) > 40µg

If you want to stay veggie:

Mushrooms
Fortified B12 foods
Vitamin Supplements

B12 content in food is measured in micrograms µg. Recommended daily intake is between 2 –3 µg per day.

Intrinsic Factor

So in order to absorb B12, your body produces this thing called Intrinsic Factor. It sounds more like a metric than a biological entity but not having enough Intrinsic Factor means that you cant get B12 into your blood stream. If this is the case, you may have to get your B12 via injections.

Analogous B12 and Blood Tests

Tofu, Tempeh, Spirulina all appear to have B12, but not quite. They are known as analogous B12, enzymes that appear as B12 in a blood test but dont do the same job. In fact, there presence can effect the uptake of B12

Dietary Supplement Fact Sheet: Vitamin B12

Vegetarian Network Victoria – Nutrition: Vitamin B12 and Vegan Diets

August 1, 2010

Legacy Java Systems

I’ve had my fair share of working with legacy code. I don’t think legacy code is a bad thing because its existence means it has addressed the majority of the needs of the business successfully enough to stay around. However, as time proceeds, the business needs to change in order not to become a historic entity and so legacy apps need to be updated. This is usually where the frustration comes in. You have to work with older development practices and styles you may not be used to. You also may have to source old libraries or application containers in order to get things to build and deploy.

However, I would like to think you can make a green field out of anywhere. A lot of other people may disagree, but the legacy system does have the advantage of addressing business need that your own ‘from scratch’ green fields app may not (provided the business need is still relevant). There was a recent article on info q: Eight Quick Ways to Improve Java Legacy Systems that talked about the following areas

Tip #1: Use a Profiler
Tip #2: Monitor Database Usage
Tip #3: Automate Your Build and Deployment
Tip #4: Automate Your Operations and Use JMX
Tip #5: Wrap in a Warm Blanket of Unit Tests
Tip #6: Kill Dead Code
Tip #7: Adopt a ‘compliance to building codes’ Approach
Tip #8: Upgrade Your JRE

I’ve used a number of these techniques, but there were some I hadn’t heard of before such as the jdbcWrapper that simply wraps your jdbc driver with a logger so you can see what part of your code executes what SQL. (Tip 2)

Using JMX to trigger cache cleanups as described in Tip 4 is a great idea. Another useful thing I’ve seen is using scripts and creating a Javascript or Groovy console within your app so that live maintenance can be performed on production environments.

Tip 5 touches on breaking dependencies, which is what you see a lot of in the code base I use. I’d like to get to know more about how to do this

Tip 6 advised that the Emma code coverage tool can also be used to find production code that isnt run (in addition to stuff thats untested)

Tip 7 really rang true to me. Its easier to bring in all your toys under the sun, but without a common plan, everyone will bring in their own frameworks to solve the same problem and make the app even more confusing to work with. I like ‘toys’ (frameworks, tools, etc) but I know I can get carried away Winking smile

July 27, 2010

UUID’s in Java

UUID’s are useful for making unique reference numbers of things. In an ideal world these UUIDs are so long, they never collide.

The first place to discuss UUIDs would be RFC 4122. The next part would be the wikipedia entry for a more concise version.

So why generate UUID’s? One reason could be to generate a unique code to identify a particular PC (mac address?). Another is to identify points in time (and can include the machine aka node’s mac address that created the UUID).

Further refs:

Java 6 api UUID entry
Blog that touches on UUID support in Java

July 27, 2010

CamelCaseString to Space Separated String

groovy> print 'CamelCaseString'.replaceAll(/([A-Z][a-z]*)/, '$1 ').trim()
Camel Case String

There is probably a better regex approach that does not a space to the final word, and thus remove the need for a trim, but this is quick and easily embeddable anywhere, for example, in a jasper report groovy expression. Winking smile

Technorati Tags: code snippets,groovy,regex,camel case