Curate Blog

Desk of the Developer: To AI or not to AI

Written by Dale Willis | Sep 20, 2018 11:05:19 PM

Powering Curate is innovative tech created and made better everyday by our Chief Technical Officer, Dale Willis, and the rest of our development team. We’ll periodically stop by the Desk of the Developer to show you what the team is up to, and how their changes are helping our customers become (and stay) the local experts in their market.

Sci-fi flicks aside, artificial intelligence (AI) doesn’t exclusively equal “robot”. It is everywhere around you, from Siri in your iPhone to searches you make on Google. AI is designed to help you automate mundane and routine tasks, so you can focus more of your time and energy doing more “on-the-fly” responsibilities, such as writing an email to a real estate developer or calling an architect in your network — things only you would know how to react to in the moment.

For the first installment of our Desk of the Developer series, CTO Dale Willis took some time to explain how exactly we’re using AI, the foundation of our software, at Curate and how it can makes our job (and, ultimately, yours!) a whole lot easier.

<h1> From the Desk </h1>

I think that AI is one of the single most important topics to discuss today, it has the power to revolutionize many aspects of our daily life.

On a weekly basis, Curate scans all publicly-available meeting minutes from municipalities within a state looking for insights for our customers. All new documents that we discover need to be searched through for relevant content. Doing this by hand would be a daunting task — we see nearly 100,000 new documents each week just in the Midwest.

Due to the amount of new documents every week, we need to have a way to narrow down what we want to search through in some way. In most cases for us, discussions about upcoming construction projects. To show the power of AI, I’ll discuss how this task would be done without and with AI.

<b><u> Without AI </u></b> we would come up with a list of words that could be relevant to our customers, perhaps “new building”, “renovation”, and “building addition”. Now, we could tell our tech to search through all of our new documents and only focus on the documents that contain those keywords. However, there are a few reasons why this “brute force search” breaks down very quickly.

For instance, what if these documents worded these keywords differently, but — in the topic of discussion — meant the same thing? The only solution without AI would be to build and maintain a massive list of keywords.

As you can imagine, this process would be highly inefficient. It would be like telling a police dog to look for a specific clothing item of a missing person rather than their scent. If you told the dog to look only for the missing person’s jacket, he would completely overlook their glove laying on the ground. But if the dog were searching by scent, he surely would have known the glove belonged to that missing person equally as much as the jacket.

<b><u> With AI </u></b> we can use something called a “word-embedding space” to make our searching more efficient. This is an exciting new area of AI that converts words into numbers in a really smart way to look for keywords and their “relatives”, or keywords that have similar meaning in context but aren’t worded the same. Word embeddings are numbers that are close together if they tend to have “relationships” in English, such as the words in the diagram below.



For instance, the sentences “The new building permit was filed” and “The building addition permit was filed” are both very simple sentences, which show that “new building” and “building addition” can be interchangeable in certain cases. Based on this example, the word-embedding numbers for “new building” and “building addition” would be really close together, maybe 1.3 and 1.35. The numbers themselves are not important, they could just as easily be 354.6 and 354.7 — the important thing is that the closer they are the more related they are.

Now, if we wanted to search for words similar to “new building”, what we could do is search for words whose numbers are close to “new building”, say, everything between 1.25 and 1.35 would match our search. As long as our word-embedding space is accurate, all we need to do is convert each word in the document into its corresponding number and then find all the documents that contain a number close to what we are interested in.

This is very similar to how humans would perform this task: If I gave you a document and said, “Please find all sentences discussing new construction,” undoubtedly, even if the words “new construction” didn’t show up in the sentence, you would find sentences that are relevant because they are related to the topic of “new construction”.

But, we don’t have time to sit down and do this for thousands of documents each week ourselves, which is why word-embedding AI uses numbers to pick out the most relevant information for us — a process that is faster, more accurate, and more consistent than humanly possible.

Hopefully, this sheds a little light on exactly how AI is used today, and how we’re using it at Curate.

Cheers,
Dale