Address Search OS OpenNames with PostGIS, SQLAlchemy and Python – PART 2

Part 1 of this post outlined how to configure a PostGIS database to allow us to run Full Text searches against the OS OpenNames dataset.

In Part 2 we look at writing a simple Python 3 CLI app that will show you how easy it is to integrate this powerful functionality into your apps and APIs.  Other than Python the only dependency we need is the  SQLAlchemy ORM to let our app communicate with Postgres.

address-search

Installing SQLAlchemy

SQLAlchemy can be installed using pip.  It is dependent on psycopg2, which you may struggle to install on Mac without Postgres present, which is frustrating (however solutions can be found on Stack Overflow)

A simple address search CLI

Let me draw your attention to…

Hopefully this script is fairly easy to follow, but there are a couple of lines to draw your attention to

  • Line 4 – Note we have to tell SQLAlchemy we’re using the Postgres dialect so it understands TSVECTOR
  • Lines 8 – 12 is simply SQLAlchemy boiler plate that sets up our connection and session for the app.  You’ll need to swap out the connection details for your own
  • Lines 17-20 I’ve chosen to map only 3 columns, you’ll probably want to map more.
  • Line 25 – is very important, here we append the OR operator to every word the user has supplied, meaning we’re returning addresses.  You could extend this to allow the user to specify on exact match operator and change this to an & search.
  •  Line 26 – Finally note we ask SQLAlchemy to match our search, and importantly we must supply the postgresql_reconfig param to say we’re searching in English.  This is vital or you wont get the matches you expect.

Running our app

We can run our app from the command line simply by entering the following command

python address_search.py 'forth street'

And we see our app print out all matching addresses that contain either Forth or Street 🙂

Ends

Hopefully you can see how easy it would be take the above code and integrate it into your apps and APIs.  I hope you’ve found these tutorials useful.  Happy text searching.

Advertisements

Address Search OS OpenNames with PostGIS, SQLAlchemy and Python – PART 1

In this two part post we’ll look at implementing an address search using the Ordnance Survey Open Names dataset.  We’ll use the power of Postgres with the PostGIS extension leveraging it’s built in Full Text Search, and use Python and the SQLAlchemy ORM to create a simple CLI.

address-search

Part 1 – Data Load and DB Config

Address Data

The UK is very badly served for free address data.  The best we have is the Ordnance Survey OpenNames dataset.  It will work as a Postcode lookup or a street finder (at a push), but the dataset would require a lot of additional processing to be a useful address search.  OS really want you to purchase AddressBase

That said, OpenNames will suffice for this example and it should be easy to extend the example to a fuller dataset if you’re lucky enough to have one.

Loading Data to PostGIS

You can download OpenNames as either CSV, or GML.  I’d recommend GML as it’s simpler to load it into PostGIS using OGR2OGR.

Once you unzip the archive you’ll see that the files are referenced according to the British National Grid, so you can load as much or as little as you want.

We’ll load NS68 which contains addresses in my home town of Stirling, as follows (swap out the values for your db):

ogr2ogr -f PostgreSQL PG:"host=localhost dbname=Real-World port=5432 user=iain password=password" NS68.gml -progress -nln open_names --config PG_USE_COPY YES 

You should now have a new table called open_names containing the addressing info.

Note if you want to load more gml files just use the -append flag:

ogr2ogr -f PostgreSQL PG:"host=localhost dbname=Real-World port=5432 user=iain password=password" NS88.gml -append -progress -nln open_names --config PG_USE_COPY YES 

Setting up Full Text Search

We now have our open_names table, but no text search column.  So we can add a textsearchable column which must be of type TSVECTOR as follows:

ALTER TABLE open_names ADD COLUMN textsearchable TSVECTOR;

We can populate the column by using the built in function TO_TSVECTOR, this tokenises the words based on the supplied config, in our case english.  However, multiple configs are supported.

UPDATE open_names SET textsearchable = TO_TSVECTOR('english', text || ' ' || localid);

If you look at the data in your new column you’ll see that it now contains text tokens representing the address data.

Increase accuracy by concatenating multiple columns

Note that we’re concatenating 2 columns together in this update statement – text and localid.  In our case the reason for doing this is that the postcode in the localid column is stored without a space, meaning our search will return a result if the user enters a postcode without a space.

However, it should be clear if we had better address data, we could concat multiple columns.  Meaning if a user searched for “1 Main St, Stirling, FK3 4GG” we would be able to return an accurate match.

Add an Index for faster searching

Now that we have data set up we can add an index to our new column which will ensure searches are fast:

CREATE INDEX textsearch_idx ON open_names USING GIN (textsearchable);

Let’s do some searches

Now lets query our new column to see if we can find some matches using the TO_TSQUERY function

SELECT COUNT(1) FROM open_names WHERE textsearchable @@ TO_TSQUERY('english', 'avenue')

Here we find we have 41 streets in Stirling area containing the word avenue.  You’ll note that I don’t need to worry about lowercase, uppercase or where the word might appear in the string.  Full text search takes care of that for me 🙂

The @@ operator basically means that the query matches the tsvector column.

Using AND and OR for better matches

A very powerful feature of Postgres’ Full Text Search is the ability to find matches contain all or some of the words in the query using the AND & operator or the OR | operator, as these examples show:

select * from open_names where textsearchable @@ to_tsquery('english', 'forth & view');

Here we only return one result Forth View which contains both Forth and View, if we change this to an OR search:

select * from open_names where textsearchable @@ to_tsquery('english', 'forth | view')

We get 7 results including Forth View, Bruce View, Forth Place.

Again it should be easy to see how powerful text searches could be built for complex text documents.

A final note on Triggers

While our address data is fairly static, if you had a table where users were regularly editing address data, or any other columns you wanted to run a full text search on, you should consider adding a trigger to keep the TSVECTOR column up to date, as outlined here.

So for our example the trigger would look like:

CREATE TRIGGER tsvectorupdate BEFORE INSERT OR UPDATE
ON open_names FOR EACH ROW EXECUTE PROCEDURE
tsvector_update_trigger(textsearchable, 'pg_catalog.english', localid, text);

Up Next

Hopefully Part 1 has demonstrated how it is very easy to set up powerful text searching in Postgres.  In Part 2 we’ll look at how we can use Python and SQLAlchemy to allow you to integrate this functionality into your apps and APIs.

 

Restoring a Postgres database to AWS RDS using Docker

In this post I look at using Docker to restore a Postgres dump file to a Postgres database running in the cloud on AWS RDS.

Keep it clean

One of the big selling points of docker, for me, is that I can have lots of apps and utils running in nice containers on my dev laptop, without having to install them locally.  This ensures my laptop stays nice and responsive and I don’t clutter/break my laptop with lots of weird dependencies and running processes that I’m then too scared to delete.

Postgres is a good example – I don’t want to install it locally, but I do need access to the command line tools like psql and pg_restore, to be able to work with my databases effectively.

One way of accessing these tools would be to ssh onto the AWS cloud instances, but there’s a bunch of reasons most pertinently security (not to mention the faff) why you’d want to avoid that every time you want to run some sql.  So let’s look at how we use Docker to ease the pain instead.

Start Me Up

With Docker installed you can build this simple Dockerfile to create a local Postgres container.  The User and Password env vars aren’t strictly required, however, if you want to actually connect to the containerised DB, it’s pretty handy

You can build, run and connect to the container as follows (assumes you are on Mac)

Note line 4 where I map the data-load dir I created at line 1 to a new directory called data-loader inside my container.  This means that when I copy the Postgres dump file into my local data-load directory, it will be available to the postgres tools available in the container.

Line 6  allows me to connect to the container, swap the imageId  for your locally running containerID.

Restoring your database with pg_restore

I’ll assume you already have a Postgres database set up within the AWS cloud.  So now we have connected to our container, we can use pg_restore to use restore our dumpfile into AWS (note this command will prompt you for the admin password)

A note on schemas

If you’re doing a partial restore, you may want to restore your dumpfile to a separate schema.  Unfortunately there appears to be no way to do this from the command line.  What you have to do is to rename the public schema, create a new public schema and restore into that, then reverse the process.

This StackOverflow answer outlines the process.

Restore Complete

You should now have a complete restore of your dumpfile in the cloud.  Please add comments if anything is unclear.

Avoiding pitfalls running Mongo 3.2 in Docker on OSX

Quick post after smashing my head off the desk for 5 hours trying to understand YTF I couldn’t get Mongo 3.2 running in a Docker container on my Mac.  Hopefully this post spares others the elastoplasts 🙂

logo-gray-1024x512-993867aa92

Pitfall #1 – Mongo cannot write data to the host OSX machine

There are three factors that cause this issue:

  1. A lot of mongo/docker docs tell you to mount a local volume
  2. By default docker-machine uses the virtualbox driver to create a local VM
  3. Mongo uses mmap to turbo-charge access to files on disk

These facts led me (by way of this issue) to the fact that there’s been an ancient known issue on VirtualBox that basically says docker ain’t going to be able to read/write files with mmap.

So it’s pointless trying to tell mongo, via docker, to mount a local data volume, as the above bug means mongo isn’t going to be able to access it.  (careful reading of the Docker Hub mongo docs may allow you to divine this, but it’s not at all obvious)

Solution – Create a data volume in the container rather than the host.

Like always once you understand the problem, the solution is simple.  We can tell docker to create a data volume in the container rather than the host, as follows:

In my (still) early days with docker I’d come across these cli commands which contained absolutely no description of WTF all the switches are doing, so a quick breakdown:

  • –name my-local-mongo – This tells docker to give the container the name “my-local-mongo” rather than generating a name like tuftybeaver
  • -v mongo-data:/data/db – This is the key switch, here we tell docker to create a new data volume called mongo-data and mount it as /data/db which is the default location mongo wants to write data to
  • -p 27017:27017 – Here we tell docker to bind the container port 27017 to port 27017 on the host vm, allowing us to access docker locally.
  • -d – This switch tells docker to run the process detached so we’re not running interactively.

If all has gone well docker should be running your mongo container successfully.  A quick docker ps will show your running container.

docker logs -f my-local-mongo – will allow you to follow the log files

Pitfall #2 – I can’t connect to mongo from localhost

This is the classic docker school boy error, that everyone makes.  The first time you do this you’ll be patting yourself on the back as you type localhost:27017 into your connection string and then wonder why the hell docker isn’t found.

This is because everything is actually running on your local docker virtual machine.  To find the IP of the VM enter docker-machine ls

You should see the URL listed something like this tcp://192.168.99.100:2376 You need to snarf this IP for your connection string.  The easiest way to sanity check that’s it all working is bash it into your browser, eg for this example:

http://192.168.99.100:27017/

The mongo database should helpfully respond with:

It looks like you are trying to access MongoDB over HTTP on the native driver port.

Demonstrating that your mongo db is running and responding to idiots trying to speak to it over HTTP 🙂

Pitfall #3 – No collections are displayed in RoboMongo

This was a real head scratcher, I was pretty sure everything was working, and yet RoboMongo wasn’t showing a damn thing.  This was a classic case of me Sherlock Holmes style spending 2 hours eliminating the impossible, to find whatever remains, however improbable, must be the truth

So in a turn up for the books, there was no bug in my code!!  It turns out by default Mongo 3.2 uses the WiredTiger storage engine, and RoboMongo can’t parse data stored in this format.

So instead if you’re using Mongo 3.2 you should use MongoChef to connect to your db.  MongoChef will show you all your lovely collections, living within your containerised Mongo.

Hope the above helps, thanks 🙂

 

Achieving Consensus – It’s so funny, how we don’t talk anymore

Walk into any office and there can usually be found a surplus of opinion and a dearth of consensus and agreement. While the crowd may be wise, as individuals we all know we’re right and damn any fool who disagrees with us.

A quick perusal of the comments section of any, even moderately controversial post, and you’ll see an echo chamber of hundreds of people carefully explaining why each other’s opinion is completely and hopelessly wrong.

The rarest phrase on the internet, is – “That’s a compelling argument, you’ve completely changed my mind”

wrong

In fact, the comments sections of most blogs are now so poisonous that the creators of the content, and the majority of its readers, don’t even bother to “Read below the line”.  Worse, people with interesting and unusual opinions are intimidated into not broadcasting them in the first place.

I’d like to say we can all agree that this is a terrible state of affairs, but as I’ve explained, that’s merely an invitation to be pointed at 100 other blog posts explaining why I’m wrong.

So all we can say is that opinions are cheap, and getting consensus is the hard part.  For a contemporary example – the recent Paris climate change talks are both simultaneously The World’s Greatest Diplomatic Achievement, and A Squalid Retrenchment.

Picking battles and getting permission

I wanted to write this post, not because I have the answer to generating constructive discussion on the internet (I wish).  But because I think teams and individuals (including myself) struggle on the projects we work on every day – not for want of ability but for want of consensus and clarity on what we want to achieve.

Because we all instinctively understand the difficulty and energy required to get a group of people to agree on just about anything – the path of least resistance is to avoid the discussion in the first place. Indeed by having the discussion you expose yourself to the risk that your argument will not carry the day, and you’ll be forced down a road you’d prefer not to be on. So it’s easy to understand why tough decisions and “honest” conversations are avoided and “difficult” people worked around.

The old cliché “Pick your battles” or my personal favourite “Better to ask forgiveness than get permission” are undoubted truisms, but they come with a cost.  By avoiding the battles and not getting the permission, in other words not getting consensus, you can find yourself isolated when you hit the inevitable bumps in the road.  As you’ve neatly provided those excluded from the decision with a convenient scape-goat.  eg – “Well if he’d asked me, of course I’d have told him that was a terrible idea.”

Or as Sun-Tzu put it more pithily:

“Victorious warriors win first and then go to war, while defeated warriors go to war first and then seek to win” 

Are you thinking what I’m thinking?

I’m certainly not advocating that every decision needs to be ran past the team, but decisions that will impact more than one or two people or that will have a significant impact on what is delivered,  and when it will be delivered, should be discussed as a group.

Keeping the team and the business involved in the decision making process will cost you time and you’ll inevitably have to slaughter some of your personal sacred cows, to reach an approach everyone can agree on.  However, the prize is worth the pain.

That prize is a clearly articulated plan that both the team and business believes in and much more importantly – are invested in.  Reason being – because you’ve arrived at the plan as a group, the group should want it to succeed as they all have some some skin in the game.

The drive and pressure to “Get stuff done” and the small windows many agile methodologies allow for meetings and planning between sprints means it’s easy to skip getting the team bought into the plan and approach. This is a mistake.

Rather than thinking of getting consensus as an exercise in cost/pain/stress think of it as an extremely valuable deliverable and vital part of leading a team (unfortunately the cost/pain/stress are still there 😉

In the end it’s just all about talking, and remember to heed Cliff’s warning…

Casual Games – Designing for Success

Over the last 12 months our Studio has put out 3 games for Mobile Devices.  We’ve learned a number of hard, sometimes bruising, lessons along the way.  Today I want to talk about what makes Mobile/Casual games different, and some points you should keep in mind when designing games for mobile platforms.

wpid-harcore-gamer

Fun, Fun, Fun

If you haven’t dipped your toe in the water of Mobile Gaming be prepared to drown in a sea of acronyms – CTR, CPA, CPI, DAU, MAU, IAP, LTV .  These things are important, and it behooves you to understand the space, but remember the cardinal rule:

If your game sucks – it won’t sell.  

Everything comes back to making fun games that players enjoy, ALL else is a distant second.

Casual Gamer != Hard Core Gamer

Obvious right?  Casual Gamers play in short 2-10 minutes bursts, they like social experiences, many of them are women, yadda yadda.  However, often or more like – usually, the people who *MAKE* casual games *ARE* hardcore gamers, *ARE* mostly male, and *HATE* social experiences themselves 😉

MediaHandler

I’m being a slightly facetious, the point is the Casual gamer often has a very different world view about games than the self-same people making those games.  They don’t have fond memories of the screee of a ZX Spectrum loading Chuckie Egg or pumping 50 pences into a Street Fighter II cabinet in a warm arcade with a sticky carpet.  Many of them weren’t even born when SFII was released!

So you need to get inside your customer’s head.  Casual Gamers don’t have huge ethical objections to the F2P business model, they don’t care Candy Crush is a Bejewelled rip, they expect games to be free (even if you’ve sunk 12 months of your life into your game), they will buy IAPs if well integrated, and they expect regular/very regular updates, which brings me onto…

Updates – Pile ’em High, Sell ’em Cheap

Casual gamers YEARN for updates, they cleave to them, they covet them.  Regular updates keep your players coming back for more, improve your conversion rate and so on.  So the tricky requirement from a game design pov is to balance what makes your game awesome, with making it quick and cheap to update.

If you look at the undisputed champions of Casual Games – Supercell.  Both Hayday and Clash of Clans are great games but they can be very cheaply and easily updated.  When Supercell add a new building to Hayday the fans go batshit and all start grinding so they can have a Boathouse.

So the point is don’t create an epic space battle simulator that needs 6 months of effort to add a new level, because there’s every chance your game will die while you build it.

Important from a production pov – If you think it’s easy to optimise your Studio to release monthly updates of your games, you’ve clearly never built software or managed a typical dev cycle

Feedback Loop – Listen to Me Them

In this world of Social Media and Cloud Services it’s stupidly easy to set up a twitter account, email account, Facebook Page to allow your players to send in their suggestions or more likely complaints (usually about the cost of everything). Do this, listen to your players and practice Complaint Driven Development.  Your players will love you, they will get the features they want and they generally know more about your game than you do.

and finally

Hopefully these tips are useful, but remember the mobile space is a massively exciting field and a massively fast moving one.  To quote William Goldman –  Nobody knows anything.  So if someone tells you a “fact” (even me) about not doing something, but you believe it will work – try it – because for your audience it might just be the difference.  Except for $69.99 IAPs, don’t try those they are just plain a bad bad BAD idea, but for anything else…

F2P – Whales, Addicts and Dirty Secrets

On the 26th of September the OFT published its interim report into Children’s Online Games [pdf].  The consultation paper is well worth reading, and contains a number of commonsense proposals that the industry would be crazy not to adopt.  That this event was largely unremarked by the gaming press was, to my mind, a big wasted opportunity by the industry to engage with the general public’s concerns regarding games for kids, and monetization in games, in general.

1962_connery_drno_3

Whale?

Casino Capitalism

I’ve been working in the mobile games space for around a year now, and like everyone else I’ve been wrestling with the problem of monetizing games and users.  Like the majority of studios we want to make fun games that players enjoy and turn a healthy profit into the bargain.  One avenue we’ve explored, and continue to explore, is the F2P Model with it’s siren call of easy(ish) user acquisition, and potentially vast profits (See King, Supercell et al).

Like all business trends and fads F2P has attracted a vast array of acronyms, “monetization consultants”, evangelists, preachers and cranks.  It is extremely difficult to seperate the wheat from the chaff, but the principle of F2P is simple enough to understand.

Allow all players to play for free, and attempt to convert a small proportion to become paying users by offering them additional content or “value”, typically in the form of coin packs that can be used to pay for non-consumables (ie avatars, vehicles) or consumables (ie powerups, spells etc).

So if I can persuade 1 million people to play the game every month, and further persuade 2% (a typical conversion rate) to spend £1, I can make £20,000 per month.  This simple sum should demonstrate the kind of numbers you need to turn a really good profit.  You want to either have millions of players, or persuade your Paying Users to spend A LOT more than £1, or even better – both.

It is here both ethics and casino capitalism raise their head.  The accusation that is most often flung at F2P games is that they are no more than glorified fruit machines, and it has to be said where there’s smoke there’s fire.

Whale?

Whale?

F2P Dirty Secrets

One thing that does the F2P industry no favours is the bracketing of its paying players using the casino terminology of Minnows, Dolphins and Whales – effectively admitting, or at least implying, that the players are punters and that these are not games of skill and chance but nothing more than elaborately designed roulette wheels.  So my first plea would be to drop this terminology from our vocabulary.

Secondly, as the OFT points out, many of these games are not, in fact – free.  Many contain gates and levels that are effectively impassable without spending money.  So it’s an outright lie to describe a game as “free”.  Sounds dodgy, doesn’t it?

Worst of all, and in my view, the most dangerous and likely to get the industry into big trouble with regulators, is the use of game mechanics to encourage so-called whales to spend vast amounts of money in game.  The cuddly term whale makes us think of the Texan Oil Baron, but the reality is more likely to be a player who may have become addicted to the game and is spending more than they want, made easier by the inclusion of indefensible items like £69.99 coin packs. [see here] [see here] [see here]

I don’t often hear people saying they felt the got “value” from paying money in Candy Crush the common refrain is “yeah it forced me into it”, “can’t believe I wasted a fiver” etc etc.  These are straws in the wind.

Whale?

Whale?

Duty of care

The F2P industry is a product of the App Stores that spawned it.  The Free price point created a race to the bottom, coupled with a lack of alternative ways of monetizing meant that the F2P business model was inevitable, and that the sharks were happy to exploit the whales or more correctly addicts.

Despite what the evangelists say I struggle to believe any player really wants to spend more than £20 per month on a casual game and still feels good about the purchase next morning.  Given a £10 per month subscription to Spotify or Netflix offers a huge range of content a £30 gem pack looks like nothing other than a huge rip off.

For me the future must be in offering players tiered subscription services.  Allowing players the ability to easily mange their monthly spend, and game makers to turn an ethical profit.  In the meantime ethical game companies should make it easy for players to manage their spending and to implement a monthly ceiling that players can’t breach, maybe around £50.  They should remind themselves that they have a duty of care to their players and possibly that “whale” spending £200 per month is deeply unhappy.

If things continue as they are, I fear the many F2P sharks will poison the waters for everyone else, leading to one almighty backlash and much tougher regulation (deservedly so).  F2P games will find themselves regulated up to the hilt and be categorized in the same bracket as bingo and poker games.

Which is why the industry must not turn a blind eye to the public’s concern, and embrace and implement the guidance contained in the the OFT report.  If not, the perception will become that, no matter how carefully curated, the app store is nothing more than a ghetto for games that provide cheap thrills while emptying the wallet.