Address Search OS OpenNames with PostGIS, SQLAlchemy and Python – PART 2

Part 1 of this post outlined how to configure a PostGIS database to allow us to run Full Text searches against the OS OpenNames dataset.

In Part 2 we look at writing a simple Python 3 CLI app that will show you how easy it is to integrate this powerful functionality into your apps and APIs.  Other than Python the only dependency we need is the  SQLAlchemy ORM to let our app communicate with Postgres.

address-search

Installing SQLAlchemy

SQLAlchemy can be installed using pip.  It is dependent on psycopg2, which you may struggle to install on Mac without Postgres present, which is frustrating (however solutions can be found on Stack Overflow)

A simple address search CLI

Let me draw your attention to…

Hopefully this script is fairly easy to follow, but there are a couple of lines to draw your attention to

  • Line 4 – Note we have to tell SQLAlchemy we’re using the Postgres dialect so it understands TSVECTOR
  • Lines 8 – 12 is simply SQLAlchemy boiler plate that sets up our connection and session for the app.  You’ll need to swap out the connection details for your own
  • Lines 17-20 I’ve chosen to map only 3 columns, you’ll probably want to map more.
  • Line 25 – is very important, here we append the OR operator to every word the user has supplied, meaning we’re returning addresses.  You could extend this to allow the user to specify on exact match operator and change this to an & search.
  •  Line 26 – Finally note we ask SQLAlchemy to match our search, and importantly we must supply the postgresql_reconfig param to say we’re searching in English.  This is vital or you wont get the matches you expect.

Running our app

We can run our app from the command line simply by entering the following command

python address_search.py 'forth street'

And we see our app print out all matching addresses that contain either Forth or Street 🙂

Ends

Hopefully you can see how easy it would be take the above code and integrate it into your apps and APIs.  I hope you’ve found these tutorials useful.  Happy text searching.

5 lessons from 3 years at a start-up

Some thoughts in no particular order after 3 years at a start-up

Have a plan – sounds obvious but a weakness of agile is that it can give rise to the illusion that there’s a plan.  However, in reality planning is emergent as the iterations and stories float by.  Emergent planning means that the team can drift or can become distracted, or it’s hard to turn down non-core projects because you can’t point to a strategy or project delivery.  Plans can be flexible and tested in the MVP style, and changed when they are proved not to be working – but there’s no excuse not to have one.

Then ensure everyone is signed up to the plan.  Even in a small team it’s easy for factions and agendas to emerge.  Getting everyone pulling in the same direction is non-trivial

Sales and marketing are waaay more important than devs admit/realise
– Make time to support sales and marketing efforts.  Devs love to scoff at sales people with their suits, lines in BS and vague promises.  But the hard fact is there are very few successful products that have gained market share on technical superiority alone, and the chances are your team is not producing one of them.  You need to think long and hard about your sales and marketing approach.

Only today did I read in the Sunday Times that the publishers of Grand Theft Auto hired Max Clifford to create a media shit-storm regarding the moral failings of the game.  Resulting, of course, in millions of additional sales.

Avoid non-core projects at all costs – Pressure for sales may mean you’re tempted to take on side projects, or do free work in exchange for some kind of marketing exposure. DON’T!!  DON’T EVEN THINK ABOUT IT!!

My experience was that this was a huge distraction and money-pit and time waster and just a generally bad idea that should be pushed back against at all costs.  If you’re tempted and think you can manage it – trust me it will still be a distraction.  If you’re still tempted time-box the work hard and ensure all stakeholders understand that there’s a maximum amount of time you can afford.

Don’t white-label and abstract features until at least 2 customers ask for them – This is basically a rewording of YAGNI – it’s tempting to assume all customers will want feature X or Y.  However, until you have hard evidence that multiple customers want the same feature, try to avoid wasting time abstracting them.  This sounds simple but is very difficult to police and make hard/fast decisions about without getting devs backs up – kanban boards etc can help here to demonstrate to the team how these tasks can add time and cost to the project.

Invest in your team – This doesn’t just mean salaries, this mean listening to your employees.  If you notice the team doing a lot of overtime, do something about it.  Encourage R&D, make sure they have some “slack” time, pay for them to attend conferences, encourage them to blog, take them out for dinner.  Encourage experimentation with new technologies.  Let them do flexi-time, homeworking.

Things like this make a job enjoyable, and mean your team aren’t scouring the job ads.

So in conclusion, as usual, we can say the golden rule is that there are no golden rules, no doubt success can be achieved by ignoring all of the above, but these stuck out to me over the last few years.

See also:

The SDK business is dead – It’s a commodity market now.

The Mythical Version 1.0

As a breed us hackers are perfectionists.  Tinkering away at that algorithm, worrying about the size of that switch statement, wondering about abstracting away some detail.  But always, always with the aim of improving our code base.

Many of our number are also a bunch of nit-picking, passive-aggressive, show-boating arseholes.  Although these traits are kind of endearing once you realise that optimus1337, who is currently comparing you to Hitler, is probably 19, his Mum thinks he’s a wonderful lad, and he helps his Gran with her shopping at the weekends.

However, there is an unfortunate consequence of these two character traits.  It can make it very intimidating about putting out your opinion or sharing some code with your peers.  We’ll hoard code, or practice at home, but not want to put something out there because it’s not perfect, or we won’t contribute to a project for the fear that we’ll be shouted down, or what we produce won’t meet some sort of arbitrary ultra-geek standard.

This attitude can be seen in the insanely conservative version numbers we give any code that we are brave enough to put out into the wide-world, ie – MyProject – v0.0.001.  For example, I’m a massive fan of the Nant project and have been using it to build my solutions for the last 4 years.  In that time the project has gone from version 0.86Beta1 to the recently released v.091.  In the entire 4 years I’ve been using it, it’s been as solid as a rock, and I haven’t had one issue with it, ever!

There’s no such thing as done

All developers implicitly understand that no project is ever finished, or any piece of code ever completely bug free, or that couldn’t be refactored.  Which makes a “done” project as rare as the legendary unicorn.

A few years back when projects started versioning themselves after the year/month they were deployed, ie Ubuntu 12.4, Office 2010 etc.  I was very cynical, thinking this is just a marketing ploy, to make us download/purchase the latest version.

However, I’ve lately realised that this versioning scheme has the benefit of indicating that this software is just that year’s version, or that month’s version.  It doesn’t say this software has reached mythical v1 status, it just says this is the stuff we think is good enough to release now.  The marketing aspect is just a fringe benefit 🙂

Conclusion

So don’t worry about joining the melting pot – jump right in.  Release version 12.3.09 of that idea you’ve been working on.  You can still conform to semantic versioning, and tell the likes of optimus1337 “Dude, relax. The code’s not done, it just the stuff I wanted to share, and BTW that’s not how you spell Goebbels ;-)”

Update – Auto Packaging using CSPack and Azure SDK 1.6

This post is related to two of my previous posts:

Azure 1.5 ate my diagnostics

I had diagnotics working quite happily until SDK 1.5 came out.  Then all of a sudden data was no longer being transferred to  Azure storage.  Even more mysteriously diagnostics would happily transfer data to Azure storage when being emulated locally, but not when on the Azure cloud (in other words a nightmare problem)

I didn’t get around to investigating why till this week.  I saw that several people had the same problem, and assumed that the problem was that I wasn’t configuring the diagnostics correctly in the OnStart method.

Finally I saw this forum thread.  The thead described that if you upload your solution from visual studio diagnostic works correctly, but not when deployed from the build process.  I tried for myself, and yep diagnostics would magically work when the solution was deployed from Visual Studio.  This finally clued me into the fact that the problem had nothing to do with the code, but everything to do with packaging.  Which leads us to this update on Auto Packaging your Azure solution.

Configuring Your Azure Continuous Integration process with CSPack and SDK 1.6

My previous post on using CSPack to automatically build your deployment packages is largely still correct.  But as of (I assume SDK 1.5) there’s a new EntryPoint property.

So you need to specify the name of the DLL that is the entrypoint to your solution.  In mycase HuzuSocial.App.dll.  So my AzureProperties.txt file now looks like this:

TargetFrameWorkVersion=v4.0
EntryPoint=HuzuSocial.App.dll

Now configured correctly, Diagnostics works as expected from our Continuous Integration process.

Windows Azure Diagnostics with SDK 1.6 for WebRoles

There appears to be a lot of conflicting and confused advice about configuring Diagnostics on Windows Azure.  The situation is not at all helped by Microsoft’s own site which, to paraphrase Morecambe and Wise, has all the right pieces of information, just not necessarily in the right order.

It doesn’t help that what used to work with earlier versions of the Azure SDK, no longer works with later versions.  So here I outline:

  • The steps to get Diagnostics outputting correctly to Windows Azure Storage with SDK 1.6 for WebRoles (although I’d imagine it’s largely the same for WorkerRoles)
  • Azure 1.5 ate my diagnostics – Another post where I update my Auto Packaging post to be compatible with SDK 1.6

Setting up Windows Azure Diagnostics for your WebRole with SDK 1.6

1. Configure Web.Config – required if you are using Trace statements

I use Log4Net for my general logging/tracing needs so don’t use Trace statements, thus the example shown in step 3, below, does not require you to complete this step.

However, if you are using Trace statements,  ie:

System.Diagnostics.Trace.TraceError("Error has occurred");

You’ll need to configure Web.config as described here

<system.diagnostics>
    <trace>
        <listeners>
            <add type="Microsoft.WindowsAzure.Diagnostics.DiagnosticMonitorTraceListener,
                Microsoft.WindowsAzure.Diagnostics,
                Version=1.0.0.0,
                Culture=neutral,
                PublicKeyToken=31bf3856ad364e35"
                name="AzureDiagnostics">
                <filter type="" />
            </add>
        </listeners>
    </trace>
</system.diagnostics>

2. Initialise Diagnostics

As outlined here, you’ll need to ensure you add the Import element for the Diagnostics module in your ServiceDefinition.csdef file.  Here’s what mine looks like:

<?xml version="1.0" encoding="utf-8"?>
<ServiceDefinition name="HuzuSocial.Azure" xmlns="http://schemas.microsoft.com/ServiceHosting/2008/10/ServiceDefinition">
    <WebRole name="HuzuSocial.App" vmsize="Small" >
        <Sites>
            <Site name="Web">
                <Bindings>
                    <Binding name="Endpoint1" endpointName="Endpoint1" />
                </Bindings>
            </Site>
        </Sites>
        <Endpoints>
            <InputEndpoint name="Endpoint1" protocol="http" port="80" />
        </Endpoints>
        <Imports>
            <Import moduleName="Diagnostics" />
        </Imports>
    </WebRole>
</ServiceDefinition>

Secondly you’ll need to add your Azure Storage Account details into your ServiceConfiguration.cscfg, mine looks like this (obviously replace with your account name and key):

<?xml version="1.0" encoding="utf-8"?>
<ServiceConfiguration serviceName="HuzuSocial.Azure" xmlns="http://schemas.microsoft.com/ServiceHosting/2008/10/ServiceConfiguration" osFamily="1" osVersion="*">
        <Role name="HuzuSocial.App">
        <Instances count="2" />
        <ConfigurationSettings>
            <Setting name="Microsoft.WindowsAzure.Plugins.Diagnostics.ConnectionString" value="DefaultEndpointsProtocol=https;AccountName=[youracountnamehere];AccountKey=[youraccountkeyhere]/>
        </ConfigurationSettings>
        <Certificates>
        </Certificates>
    </Role>
</ServiceConfiguration>

3. Override the OnStart method in WebRole.cs

In the root of your web project you should have a WebRole class.  You’ll need to override the OnStart method to correctly initialise the Diagnostics.  There is loads of different sample code out there, some of it highly dubious.  This is my configuration, and works well for me (I lifted this from a post out there somewhere, unfortunately I forgot to bookmark it and can no longer find it, so thankyou whoever you are)

public override bool OnStart()
{
    DiagnosticMonitorConfiguration diagConfig = DiagnosticMonitor.GetDefaultInitialConfiguration();

    var perfCounters = new List<string>
    {
        @"\Processor(_Total)\% Processor Time",
        @"\Memory\Available Mbytes",
        @"\TCPv4\Connections Established",
        @"\ASP.NET Applications(__Total__)\Requests/Sec",
        @"\Network Interface(*)\Bytes Received/sec",
        @"\Network Interface(*)\Bytes Sent/sec"
    };

    // Add perf counters to configuration
    foreach (var counter in perfCounters)
    {
        var counterConfig = new PerformanceCounterConfiguration
                            {
                                CounterSpecifier = counter,
                                SampleRate = TimeSpan.FromSeconds(5)
                            };

        diagConfig.PerformanceCounters.DataSources.Add(counterConfig);
    }

    diagConfig.PerformanceCounters.ScheduledTransferPeriod = TimeSpan.FromMinutes(1.0);

    //Windows Event Logs
    diagConfig.WindowsEventLog.DataSources.Add("System!*");
    diagConfig.WindowsEventLog.DataSources.Add("Application!*");
    diagConfig.WindowsEventLog.ScheduledTransferPeriod = TimeSpan.FromMinutes(1.0);
    diagConfig.WindowsEventLog.ScheduledTransferLogLevelFilter = LogLevel.Warning;

    //Azure Trace Logs
    diagConfig.Logs.ScheduledTransferPeriod = TimeSpan.FromMinutes(1.0);
    diagConfig.Logs.ScheduledTransferLogLevelFilter = LogLevel.Warning;

    //Crash Dumps
    CrashDumps.EnableCollection(true);

    //IIS Logs
    diagConfig.Directories.ScheduledTransferPeriod = TimeSpan.FromMinutes(1.0);

    DiagnosticMonitor.Start("Microsoft.WindowsAzure.Plugins.Diagnostics.ConnectionString", diagConfig);

    return base.OnStart();
}

4. That’s it

When deployed to Azure your diagnostics should be successfully transferred to Azure Storage.  To analyse them in any meaningful way, I’d recommend Cerebrate Diagnostics manager, which gives you a nice dashboard.  See below