Massive Datasets in JavaOne

301320-j1-sf-imspeaking-250x250-2222678
I’m both proud and excited to bring the presentation Querying Massive Data Sets with Google BigQuery & Java 8 to the JavaOne audience. I’ve already spoken in Melbourne about Google BigQuery, and its matured nicely over the last year growing in feature set, and we’ve been able to take advantage in more novel ways.

This talk presents a state of the union for where BQ is as a Big Data tool, comparing it to other tools such Hadoop and traditional RDBMS.  Most interesting is how to take advantage of BigQuery using the provided Java API, then combining it with the Java 8 Streams API to pull in loads of data on scale.

What you may not know about Gradle

Last week I was at the Gradle Summit in Santa Clara. There were a few things I learnt about Gradle that were worth mentioning since they expanded what I knew Gradle to be, and are just handy to know about

  1. Support for Docker, Vagrant & Cargo

    Gradle has a docker plugin allowing builds to define their own Docker VM’s.  This keeps the development environment the same as production which means less risk when deploying to a new environment given things are more ‘same’.  Vagrant and Cargo support do the same for VirtualBox VMs and Arquillian based JEE containers. Developer checks out build and has the production environment (equivalent) ready to go, all orchestrated by the Gradle build.  Ben Muschko’s Provisioning Virtualized Infrastructure with Gradle talk has more.

  2. Not just for JVM projects anymore

    Gradle has had native support for months now, but it’s been adopted quite quickly and used by companies to build native apps. These companies have little to do with building Groovy or Java apps. Prezi use Gradle to build their JavaScript modules and manage their dependencies, and also build their native mobile apps which incorporate some of those JS modules. It’s enlightening to hear the Prezi devs state they have no JVM knowledge but have taken Gradle in new directions with new modules.

  3. Appreciation of the number of Enterprises dealing with Multi Module builds

    There are some large companies using Gradle. LinkedIn, Google, Netflix, Prezi were present at the conference and had talks. You quickly get appreciation for the number of developers using the tools, and the challenges that all have in incorporating dependencies from other teams. Gradle is in a valuable position to provide tooling to help these organisations work amongst themselves, get their builds building more often and more consistently.

  4. Appreciation that working on multiple modules at once that depend on each other gets tricky

    When working on module A, that depends on module B, Gradle will download its dependencies and recompile A. When working on both modules B and A, do you want to bring in B from the artifact repository or your local one you are currently working on? There is also the chance that the existing version of the dependency will be loaded from the repository instead of using the one that you are currently building. You want to make sure that doesn’t happen. The team at Prezi created Pride, a way to manage multi project builds and specify a fenced off area to indicate the dependencies you care about building yourself versus those in the dependent modules. Their blog, Simpler Management of Modular App Development, goes into more detail about the scenarios you want to care about and their approach to managing this.
    Apart from that I gathered a lot of anecdotal evidence about how large enterprises with multiple teams chose to avoid updating a dependency built by another team for fear of it breaking the build or generating bugs. The problem manifests itself in other bugs that are fixed by that dependency but not implemented by all the products the company releases. There doesnt seem to be a consistent way to manage this but in those larger organisations it does fall onto the build masters responsibility for reporting on the quality of the dependencies used in each project and make sure that they are up to date, and work with the project leads to make sure they get updated accordingly. The challenges of integration within the enterprise came up multiple times to make it worth mentioning here.

  5. The number of attendees a Gradle Summit draws

    I was quite impressed with the number of attendees. I dont have an official number, (could guess a few hundred to a thousand?) but the fact that a conference can be built solely off the existence of a build tool, and have enough sessions to run 3 separate tracks over two days is impressive. Sure Silicon Valley has some of the largest Enterprises using Gradle (and thus developers) using the tool, but people also traveled from other parts of the US, and internationally from Europe, and even Australia to talk Gradle. The result is a conference the size of YOW Melbourne, with an equal breadth of attendees. Gradle is positioning itself as *the* polyglot build tool and so the breadth of topics can only grow from here.

Apart from the experience of presenting, Gradle Summit opened my eyes to a bigger picture of what Gradle is about, far beyond the JVM realm I thought it to live.

Weeks turn into months, months turn into years…

It’s been almost two years since my last blog. A lot has happened in that time so I wanted to do a status update – No I’m not dead yet!.

The thing that has taken the most time has been starting the Melbourne Java and JVM users group, affectionately known as MelbJVM.

The group has grown from 14 people in our first meeting in March 2012 to over 500 members today with regular sponsors, venues. We’ve had both international and local speakers, partnered events with other tech communities, and hosted workshops. There is now an organising committee to help drive things as well.  I have to say doing something for the community through MelbJVM has been the most rewarding thing I’ve ever done. I’d encourage anyone reading this with an inkling for wanting to help any community, do it, you won’t look back!

highres_289946172

2013 was a great year personally as well

This happened
Wedding

That happened
473054_10151611562491407_857553271_o

and then happened again the next day
920651_10151614063896407_616479301_o

These trips happened
IMG_5282 IMG_5686 IMG_6577 IMG_6402IMG_6899 IMG_7165

as well as a massive conference trip in September

These gigs happened:

  • Puscifer
  • A Perfect Circle / Kyuss / Slayer at Soundwave
    2013-03-01 16.47.12
  • Tool on my birthday weekend (refer to photo above)
  • Black Sabbath in Melbourne
  • Deftones / Tool / Black Sabbath at Ozzfest Japan
    20130511_16152120130512_180826 20130512_193154
  • Karnivool

I also had the pleasure of being Best Man at this event

999617_10152053698526407_1668963442_n

Other than that, long time friends have had babies or gotten married or engaged and we couldn’t be happier for them.

Here’s to a prosperous 2014 everyone!
(2013 will go down for me as a great year and one that sets the bar high)

3 cool technologies I discovered over summer.

Today I finish up my ‘holidays’ but I did want to write a quick post about the cool stuff I’d been looking at.

Firstly, the LMAX Disruptor pattern. I didn’t think you could fall for a pattern until I came across this. Disruptor is a pattern that lets you run through a queue (well a ring in this case) of requests. Each request has multiple handlers (the small bits of code that do the work). And because each handler is small, you can easily unit test too. The Disruptor facilitates the order in which these handlers execute. Handlers can run ahead and let the slower handlers catch up later. Its certainly worth checking out, it seems that its a style that is newer in Java frameworks, and reminds me of the asynchronous stuff that happens in say Node.js.

The next thing I looked at was Clojure. This language has been around for a while, but I didnt get to appreciate it until someone gave a presentation at a conference in Crete last year. Clojure is like Lisp. Lots of parenthesis, but the beauty is that the code becomes simple to read (once you get used to it). Easier to understand the control flow. A method call would look like this:
(operation argument argument)
A wrapped method call
(operation2 (operation1 arg arg) arg)

The final thing was a bit of fun. Whilst cleaning up a cupboard full of magazine coverdisks, I came across the final issue of CU Amiga magazine. It was quite interesting to see the passion of the Amiga users write in and say sorry to see another bastion fall. Some were realistic about the Amiga’s future, but god love em, some kept hoping things would change and a comeback would take place sometime soon. In a way, the ‘comeback’ has occured (for me at least). No new hardware, but a new operating system, called AROS, based off the old kickstart/workbench of old. The distribution runs in a VM, but can be installed standalone on a PC. There is also discussion on Mac, iOS and Android distros too. If you want to quickly get your Amiga fix on, and reminisce about the old workbench days, then do a search for ‘Icaros’ and download a web capable distro that boots up in seconds. Some of the UI elements are outdated, for example, clicking a window gives it focus, but doesnt bring it to front, but the great thing about Amiga nuts, was/is that there was always a plugin to be installed that gave you the functionality of other OS’. Worth a play if you have some free time and it will leave you wondering what OS X and Windows are doing with all your CPU cycles.

Rick Wagner’s Blog: How to find which .jar a class is in (easily)

Holy heck, I needed this earlier today

Rick Wagner’s Blog: How to find which .jar a class is in (easily).

Makes mention of JBoss’ tattletale utility.

The comments also mention the Java Class Finder plugin for Eclipse (I used the CTRL+SHIFT+T today personally which did the same job)

There is also LibraryFinder plugin for IntelliJ and classjarsearch command line tool to search a directory of Jars for a class.

Glassfish V3.0.1 and Update Tool

When installing plugins for glassfish at work I would receive a Premature EOF just after accepting the license terms for the plugin. This would occur regardless of method used, updatetool cli, gui or web.
Strangely it worked at home, but for me and my colleagues at work, we couldnt get through. I suspect some network issue on our end but didnt need to delve too deep as I found a workaround.

The simple thing to do is to install the latest version of the pkg command. I used pkg-2.3.2 from updatecenter (http://wikis.sun.com/display/IpsBestPractices/Downloads) site.

After extracting the archive, I ran this at a command prompt (in its bin directory)
C:DevEnvpkg-2.3.2bin>pkg install updatetool

The download is strangely very slow and it timed out about three times before I could install update tool. Thankfully running the same command again resumes from where it left off.

C:DevEnvpkg-2.3.2bin>pkg install updatetool
DOWNLOAD PKGS FILES XFER (MB)
wxpython2.8-minimal 1/2 915/929 7.5/7.9

Errors were encountered while attempting to retrieve package or file data for
the requested operation.
Details follow:

1: Framework error: code: 18 reason: transfer closed with 2028 bytes remaining t
o read
URL: ‘http://pkg.sun.com/layered/collection/dev’. (happened 4 times)
2: Framework error: code: 18 reason: transfer closed with 1901 bytes remaining t
o read
URL: ‘http://pkg.sun.com/layered/collection/dev’. (happened 4 times)

C:DevEnvpkg-2.3.2bin>pkg install updatetool
DOWNLOAD PKGS FILES XFER (MB)
Completed 2/2 929/929 7.9/7.9

PHASE ACTIONS
Install Phase 1092/1092
PHASE ITEMS
Reading Existing Index 8/8
Indexing Packages 2/2

You then can simply run updatetool from the command prompt to kick off the GUI app. Downloading plugins via here was slow and timed out also, but it found my local Glassfish install and worked to install plugins to it as I needed.

Deploying grails 1.3.x war to glassfish

This was a recent bug bear which may be a regression from 1.2.x line

When you deploy to Glassfish you get an error message

An error has occurred
There is no installed container capable of handling this application com.sun.enterprise.deploy.shared.FileArchive@1e46947

The fix is to include in BuildConfig.groovy

grails.project.war.osgi.headers=false

Maven users can also set the property when running the grails war artificat.

mvn clean grails:war -Dgrails.project.war.osgi.headers=false

via Grails – grails 1.2.2 and glassfish.

Net Neutrality

Recently a colleague sent me a link to an online petition regarding the concept of Net Neutrality being under threat by a recent deal between Verizon and Google.  http://www.commoncause.org/site/pp.asp?c=dkLNK1MQIwG&b=1234951

The dialog signals a prioritisation of internet traffic for the rich, fast lanes for large corporations who can pay for them as well as the ability for ISPs to block traffic and prevent legitimate usages of the internet.

I found this ‘cause’ to be more fear mongering rather than based on true facts and whilst some points were correct, the practicalities of what was being proposed were apparently lost in fear, uncertainty and doubt. So switching into soapbox mode, here was my reply:

There are some valid concerns here, but I think there is a bit of FUD going on here. They are trying to say ISPs have these tools to control traffic and shouldn’t be allowed to use them, but I think those same tools are necessary to have a working network for all users.  

What got my back up in particular, was the bit where they claim that a corporation can buy a fast lane to their websites and supposedly lock smaller sites into slower speeds.  I wonder how technically speaking that they can do this feasibly and economically?  I mean its always been possible for an ISP to route traffic whichever way it chooses and prioritise some data over others & shape the speed of traffic also but I cant see how a router could handle slowing ALL traffic except the ones going to a certain address to create this artificial highway.  I don’t think network admins would want the headaches of shaping all their users en-masse.

Given my previous working life was all about building products to shape users who had gone over their download limits as well as products to do with prioritisation of data (Quality of Service), I understand this a fair bit.  Unfortunately I see it being taken a bit out of hand.  To implement this practically, loads on a router have to be considered, the more shaping and prioritisation you do, the hotter a router runs and then because you are applying features to thousands of users, not just a small handful who are paying for particular features, you increase the chances of bugs and other wierd stuff occuring.  Eg more calls to Cisco to have them diagnose your equipment because you are using a feature in a way that they didn’t envisage.

Another thing about the practicalities of this is to what effect Google (primarily a content provider) and Verizon (one bandwidth provider in a market of many) can do together to influence the speeds of internet surfing for those outside their networks… pretty much zip, nada, zilch from a technical perspective. Maybe Google can use its weight to make other ISPs pay a Google tax for content leaving its network, just as the big 4 ISPs of Australia (Telstra Optus AAPT Primus) do to the littler ISPs, but all it does is have an effect on the cost of providing the service, and the fact that although the industry is in a period of consolidation we still have plenty of ISPs in Australia means that such network access costs cant be prohibitive otherwise we’d only have 4 internet providers in Australia.  Ingenuity comes in, people change providers, companies build other networks to connect countries together and they build a market whereby if an ISP, or an end user of an ISP doesnt like the service they are getting, they can switch to another.  How can these laws distinguish from an ISP putting efforts to shape your traffic versus the existing efficiencies used to deliver you a cheap internet service that would otherwise be more expensive if you had your own 24/7 line direct to the ISP?  (It probably could but would have to be worded carefully and require auditing of ISPs unlike that previously seen before in order to ensure that the lack of service you were getting was because of a infrastructure or product limitation rather than a ISP governance one) 

In my opinion this thing they have been trying to fight has been happening in the industry from the days of dialup. Ever been with an all you can eat popular dialup ISP who was oversubscribed?  You’d get shaped without knowing if you were a big kahuna user, and your phone line would get disconnected and you’d have to dial back in again.  Its unfair that an ISP oversells its services such that the paying customers experience is pretty weak and stops decent ISPs from offering more value add services for free because they are too busy trying to cut costs to compete with these guys.  In an ideal world Net Neutrality can come into play here, forcing the industry to guarantee bandwidth and making them accountable to external auditors to ensure that they have enough bandwidth for subscribers, but its not a simple equation – your internet has always been oversubscribed from day one.  Even your DSL or cable link gets shared with your neighbours using the same exchange – you’ve been contending for bandwidth with your neighbourhood and probably didnt know it.  The business model is that not all users are using the network 100% of the time.  Days of large internet traffic like when Obama gets inaugurated and its an unusually busy day happen, but usually there is enough bandwidth to go around to support it.

There is some ISP filtering that is good, for example blocking Windows port 445 so that common viruses don’t get in to PCs and spread, preventing even more network traffic and headaches for users and admins alike.  And as much as users hate getting shaped when they exceed their monthly bandwidth, its probably a good thing they aren’t slowing it down for everyone else.  I have seen flooded links and network engineers sweating when a particular part of Australia goes overboard.  The thing I remember about internet protocols are that they are generally not very good when they get saturated.  A link is good at about 70-80% capacity but more of that starts making performance take a huge nosedive.  Having some control so everyone gets their fair share is a good thing, although always unpopular…. eg who likes water restrictions?

Everything else they say about blocking services, the example about dropping VOIP packets so people use old school telephone instead of cheaper internet calls, is a perfectly valid concern.  Its a restriction of trade.  Could you imagine the outcry when a business (or group of) where denied access to amenities – water, electricity that were used for producing an income just because the utility company was unregulated or just evil?

So, I agree with the concept of net neutrality, but I dont want to see it taken out of hand and removing network administration safeties to keep users free of issues, nor removing restrictions that see them clogging up bandwidth for users with more legitimate internet concerns.

Patellar Tendinitis

I have a contact in the health-fitness industry point me in the direction of this condition after I explained I was having an on/off pain under my kneecap.

http://www.athleticadvisor.com/Injuries/LE/Knee/patellar_tendinitis.htm

http://en.wikipedia.org/wiki/Patellar_reflex

Need a map?  http://www.innerbody.com/image/skel12.html

http://orthopedics.about.com/cs/patelladisorders/a/kneecapdisloc.htm (not this bad, thank goodness)

Usual disclaimer about not following self-diagnosis and getting a 2nd opinion from a medical professional applies.