Super-fast fuzzy search & super slow cooking.

one-fishermanLong time fishing buddy, Paul, put me on to  Rafi’s Spicebox.

OK – that fish in the picture is far too big to be one of Paul’s, but the hat is just about silly enough to be right.

The spices from Rafi’s are all pre-mixed and ready for adding other ingredients of your choosing, and each pack is labelled with instructions. I’ve made a couple already, and the Malaysian Beef Rendang at the weekend, was very good indeed.

They arrive as mix sachets and need a kilo of veg or meat or both; and are best cooked in a slow cooker foclovernoder around eight hours. You need to either dish it up to a family or freeze what’s left over.

On fast fuzzy searching, I refer you to the left hand clover leaf labelled Identity Resolution. This is where we tie up data from an incoming document (a scanned invoice maybe) to a record on a host computer (a supplier from the supplier table, or an order – it’s all agnostic and driven by parameters).

If the information to be matched has come from an unstructured document, by scanning and OCR, probably, we need to be able to check the text in the document against the host tables in order to resolve the identity of the document.

This presents a whole host of problems: the addresses may not match ordinally word for word, some words may be lost in the OCR or mangled and others may just be spelled incorrectly. Depending on the quality of the incoming document, whole batches might never match with equality searching, and need approximate matching as a matter of course.

This is where Fuzzy Matching comes into the equation. It is not a new technology, but the downside is, that because it is doing so much processor intensive work, it can be quite slow, even against small target data sets, and the only way to get them matched is with a server process that might take several minutes to process a batch. So, you just sit and wait.

I have been focused on real time fuzzy matching with one million addresses to match against. I have chosen 1M addresses as the test set because it exceeds the largest of our customers source tables many times over, and I thought that if I could get to a 10 second match (i.e. 1 second per hundred-thousand addresses) then we would be looking at very quick server matching or even real time identity resolution for our customers.cyberbeast

In testing, on a quick PC, I achieved a match against 1M in under 2 seconds. The impact on the CPU was a quick spike and it sipped RAM, even though I ran the code in one thousand threads.

This puts real-time Identity Resolution and Matching firmly on the table for Softology’s Clover Node. Mind you, I had to fall back on skills I learned thirty odd years ago to cut the code in X86 Assembly.

That was when Information Technology Co. MD Paul (my fishing buddy) of  Antar and I raced to produce as much IBM 370 ibm370Assembly code in as short a time as possible: a baptism by fire.

Now, we’re very happy to fish in a highly non-competitive way, although my fish are much bigger than his.

And the music? I only nailed that fast, fuzzy search last Monday, so I’ve been feeling pretty triumphant, consequently I am listening to King King – raunchy guitar driven blues and rock infused ballads. Love it!

Advertisements

Neural nets, nodes & nourishment – my first development diary entry

NeuralNetwork.pngI programmed my first Neural Nets in the mid-nineties. They were intended to solve a difficult OCR problem that existing software could not handle.

After much experimentation, I produced a net that recognised the data (a series of digits that were needed to connect the image to a database) correctly, 97% of the time. The other 3%, the system reported as being unreadable. So, from a control point of view, it was 100% accurate.

Alex, (No. 1 Son) was around 12 years old and fascinated by what I was doing. After explaining how it worked, I suggested that he construct and experiment with an example. I thought that would be the last I would hear, but, a few hours later he was showing me a small, trainable, perceptron based,  neural network on his Peacock 486DX, which he had programmed in QBasic.

Doubtless, that was what piqued his interest in all things to do with the dark art of machine intelligence. I say dark, because so much of the programming is undertaken when the sun has set, although, in Manchester the sun is not out so often.

Programming is much more than a job for me; it is a consuming passion. Likewise food, which is more of a consumable passion, and music, which has been a constant companion. In his car-seat Alex used to bop to Mozart (Mo-tart), Hadyn, Beethoven and Bros and the girls favoured singing along to Van-Halen and Rainbow.

Over the last few years the soundtrack to my days and nights at the keyboard has been solo piano music, piano jazz trios and plenty of guitar rock, blues and jazz. These three pastimes, more than anything else, thread their way through our family life, and, in many ways, bind us together and feed our strength.

This blog was originally intended for customers to track the development of   Softology’s new project, Clover Nodes,clovernode which Alex and I are designing and programming together with my Softology co-director Martin Purdy. Writing solely about tech is pretty dull, and reading about it is more dull, or so I’m told,  therefore,  I decided I would wrap it up with writings about food and music, and keep it more of a periodic diary.

Whilst writing, I have a chorizo and bean soup simmering away, and I am listening to some pretty good funk fuelled jazz – Aziza . Nice!
The soup is made from a sliced spicy chorizo, gently fried in onions and garlic with paprikathqoagkgh9 and chilli to taste. Sometimes, I chuck in cumin and/or fennel seeds.

To that I add a couple of chopped up potatoes (adjusted for volume) in similar proportion to butter beans and cannelini beans (or other beans). That is topped up with chicken stock and a couple of tablespoons of tomato puree, and left to simmer for around 30-45 minutes.

I suspect the soup would be pretty good if made with some chicken pieces fried up with the spices. I’ll have to try that soon.

In the nineties, I concluded that neural nets were very much  a solution looking for a problem. There has been a massive increase in processing power since then;  especially, with high performance parallel processing video cards. My graphics card has 3,584 CUDA cores. This is a game changer and, once again questions such as:

  • will computers become more intelligent than their creators
  • will robots take all our jobs, and
  • will computers ever become conscious in some way

have bubbled up into public forums.

These were the questions being asked during the eighties, during the time of Alvey
They are fascinating questions regarding science, philosophy, psychology, society and technology that deserve serious debate, but, right now, my soup is ready; its scent has permeated my office and I’m starving. So … laters!