Can a neural net be used to find established companies that exhibit innovative behaviour?

\\ Foresight Si can be used to acquire detailed measurements from many thousands of companies at once. We decided to test the feasibility of using these data to train a neural net to detect commercially relevant attributes. In this test case we settled upon the concept of discriminating between companies based upon their innovative potential derived from our data. Academic research undertaken in Germany has already demonstrated the feasibility of training a neural net to process the digital traces of companies and use these data to find “innovative” companies. The earlier research was based upon training data derived from a sample of companies acquired using a national survey of business innovation. Our test of this approach was based upon an alternative source of training data as the UK does not have survey data of the scale and quality available in Germany. Consequently we must be careful about drawing conclusions from this study which was undertaken as a test or proof of concept as part of an internal feasibility study.

In order to provide the system with learning data we had to look for alternative data sources as the granular data concerning company innovation available in Germany does not exist in the UK. We needed a publicly available dataset of companies with a verifiable indicator of a (relatively) high level innovative behaviour which could then be associated with our measures of a companies web presence. For our initial research we settled upon the publicly available record of companies that have been awarded funding by the UK Government’s funding arm for supporting business innovation: Innovate UK. Unfortunately, we could not access a database of those companies that having applied for funding but were not successful in their applications, this would have helped with training the model. However, we were able to take the list of successful companies and then quickly identify more than 7,000 companies with successful applications which were then matched with their URL’s using our system and subsequently the associated web sites retrieved and the digital materials processed.

Using a conventional split training method we were able to create a model capable of achieving a hit rate in excess of 90%. The model we developed would accurately identify a company that had been successful in a bid for innovation funding on the basis of materials retrieved from web sites alone better than 9 times out of 10. We now had a neural net capable of identifying, to some degree, a companies innovative potential simply from the web materials that the company maintains. This result is important because it adds to the many converging lines of evidence that point to the significance and utility of data derived from the Internet presence of businesses. This simple experiment opened up a number of possibilities and our first step was to extend this study by using the model to examine a very large sample of businesses in a specific area of activity to see what it could tell us about innovation in that sector. We looked at a very large sample of Welsh food companies and applying the neural net to the data from the entire set gave us a score or probability for every business, essentially a degree of likelihood that they would be deemed innovative on the basis of the data we had retrieved. The results of this study are still being examined but again they are very interesting. Obviously these data can only serve as a proxy indicator for the level of innovative behaviour in our sample and more work needs to be done. Nevertheless it is interesting to reflect on those businesses in our survey sample of Welsh food businesses that were identified as “innovative”.