Image Recognition in Manufacturing 

Image reco­gni­ti­on sys­tems can lead to signi­fi­cant impro­ve­ments when moni­to­ring manu­fac­tu­ring pro­ces­ses or iden­ti­fy­ing pro­duct qua­li­ty issues.   

Images crea­ted on the pro­duc­tion line are usual­ly high­ly stan­dar­di­zed by cap­tu­ring them under con­stant con­di­ti­ons. It is also very com­mon that a lot of anno­ta­ted images (by human experts) are available as well. 

When enough anno­ta­ted and high­ly stan­dar­di­zed images are available, the image reco­gni­ti­on sys­tem is much easier to set up. Under the­se cir­cum­s­tances a came­ra modu­le can be con­side­red as “just” ano­ther sen­sor, with a very wide (infor­ma­ti­on inten­si­ve) data stream. This data stream is in its­elf high­ly com­plex and has to be con­den­sed down by a ML or AI model, befo­re it can be used to trig­ger any actions. 

Image Recognition with Traditional Machine Learning 

When the images are high­ly stan­dar­di­zed, it can be suf­fi­ci­ent to use a tra­di­tio­nal ML model to con­den­se the data stream down to easier-to-inter­pret signals.  

Stan­dar­diza­ti­on can mean the fol­lo­wing: 

  • The num­ber of obser­ved items is kept con­stant (for exam­p­le at one). 
  • The posi­ti­on and ori­en­ta­ti­on of the items are fixed. 
  • The came­ras, came­ra types and con­fi­gu­ra­ti­on are uni­fied. 
  • The light­ing is kept con­stant by using arti­fi­ci­al lights.  

The rela­ti­onship bet­ween the indi­vi­du­al pixels and the out­co­me (the signal to act upon) is in any sce­na­rio high­ly com­plex. 

The model (ML and AI) uses the pixels, inter­prets them as signals and lear­ns (during model trai­ning) the rela­ti­on of the­se signals with a tar­get out­co­me (that is know for the trai­ning examp­les). A “tra­di­tio­nal” arti­fi­ci­al neu­ral net­work for exam­p­le – slight­ly sim­pli­fied - takes all signals at its inputs, aggre­ga­tes them and trans­forms them in a non-line­ar fashion to gene­ra­te an out­put signal cor­re­spon­ding with the obser­ved tar­gets. 

In simp­le sce­na­ri­os, this can be rather easy to under­stand: For exam­p­le when bak­ing a cake, it is pos­si­ble to esti­ma­te its rea­di­ness based on the sum of all brown color hues. Not a simp­le but also not a high­ly-com­plex con­clu­si­on. 

Complex scenarios 

The tra­di­tio­nal arti­fi­ci­al neu­ral net­works and other ML model types can only get you so far. With more com­plex sce­na­ri­os they will beco­me unre­lia­ble or requi­re enorm­ous amounts of trai­ning data. 

What is meant by “com­plex sce­na­ri­os”? This can of cour­se be many dif­fe­rent varia­ti­ons, but let’s look at an illus­tra­ti­ve exam­p­le: 

Simp­le: an item in a fixed posi­ti­on and with pre­cise ori­en­ta­ti­on 

Com­plex: an item lays, just whe­re it fell (for exam­p­le on a con­vey­or belt) 

The first case could be hand­led still by an ML model, but the second one only under the con­di­ti­on that we expand the trai­ning data and train with dra­sti­cal­ly more examp­les. We basi­cal­ly mul­ti­ply the amount by posi­ti­ons and degrees. That can quick­ly get infe­a­si­ble. (The poten­ti­al solu­ti­on to modi­fy and “straigh­ten out” the images, is pos­si­ble, but in its­elf high­ly com­plex as well). 

In this (the second) case AI models are the way to go. They are much clo­ser to the func­tion­a­li­ty of the human abili­ty to reco­gni­ze and pro­cess images. For humans the inter­pre­ta­ti­on of images with vary­ing ori­en­ta­ti­on is usual­ly tri­vi­al. A cat from the left or a cat from the right are – apart from aspects of supers­ti­ti­on – equal­ly inter­pre­ted as cats by us. 

Modern AI-based approaches 

Such AI models for image reco­gni­ti­on, pro­cess the image data through a num­ber of dif­fe­rent and spe­ci­fi­cal­ly desi­gned lay­ers. Hence the­se net­works are cal­led con­vo­lu­tio­nal neu­ral net­works (CNN). The indi­vi­du­al lay­ers are not most­ly simi­lar in struc­tu­re (like in tra­di­tio­nal ML models), but ins­tead crea­ted for spe­ci­fic tasks (like edge detec­tion, or aggre­ga­ti­on of the detec­ted enti­ties). As the net­work is made from very many lay­ers, it must be trai­ned with spe­cia­li­zed (Deep Lear­ning) trai­ning algo­rith­ms. 

This design makes it pos­si­ble to pro­cess images much clo­ser to an actu­al under­stan­ding, simi­lar to how the human brain pro­ces­ses images. 

Off-the-Shelf models 

The trai­ning of such models (deep com­plex neu­ral net­works) even with modern opti­mi­zed algo­rith­ms is high­ly resour­ce inten­si­ve (time, pro­ces­sing, ener­gy) and requi­res lar­ge amounts of trai­ning examp­les. 

For­t­u­na­te­ly, it is pos­si­ble to take a “short­cut”. In the com­mu­ni­ty you can find pre-trai­ned, gene­ra­li­zed models that can be adapt­ed to your own use cases. This is a very smart and effi­ci­ent approach, and inte­res­t­ingly very clo­se to the prin­ci­ples of lear­ning known for mil­len­nia: lear­ning some­thing by adding it to exis­ting know­ledge is much easier than start­ing from scratch. As a new-born we take years lear­ning to see and under­stand our sur­roun­dings, but as a grown-up a new “impres­si­on” poten­ti­al­ly takes us just a moment to pro­cess and inter­na­li­ze. 

This is how we can use off-the-shelf models: we get a gene­ra­li­zed model built to pro­cess visu­al impres­si­ons (based on pre­vious trai­ning with many examp­les), with all the gene­ral con­cepts of pro­ces­sing images alre­a­dy in place. We just need to add our domain or pro­blem spe­ci­fic images. 

This usual­ly requi­res cut­ting of the outer lay­ers of the net­work, repla­cing them by lay­ers spe­ci­fic to our sce­na­rio and trai­ning only the­se lay­ers after­wards, while lea­ving the inner pre-trai­ned part of the net­work int­act. This approach makes it pos­si­ble to get to good results with only limi­t­ed indi­vi­du­al examp­les and pro­ces­sing power. 

 

Did we get your atten­ti­on? Are the­re image reco­gni­ti­on sce­na­ri­os whe­re your orga­niza­ti­on could bene­fit from? Plea­se get in touch! 

Image Recognition is a very broad field that can be conquered with different tools and approaches. The deployment of models – how to bring them into production – can also differ very much based on the scenario. StatSoft will gladly assist you in any image recognition projects. 
Categories
Latest News
Your contact

If you have any ques­ti­ons about our pro­ducts or need advice, plea­se do not hesi­ta­te to cont­act us direct­ly.

Tel.: +49 40 22 85 900-0
E-mail: info@statsoft.de

Sasha Shiran­gi (Head of Sales)