Learning Image Classification on edge devices (Android)
A few weeks ago, I wrote a tutorial exploring Image Classification, one of the most popular Machine Learning applications, deployed on a tiny device, the ESP32-CAM. It was an example of a TinyML application.
When we talk about TinyML, it immediately comes to our mind squeezed machine learning models running on embedded devices and consuming very low power. The characteristic of such applications is that we are running AI (or Machine Learning) at the Edge. But power is not always a concern, and so, we can find examples of edge machine learning applications running on more complex devices such as the Raspberry Pi (see my tutorial Exploring AI at the Edge) or even Smart Phones. In short, TinyML can be considered a subset of EdgeML applications. The below figure illustrates this statement:
This project will explore an Edge ML application Classifying Images on an Android device.
Developing Android (AI) Apps
Nowadays, developing Android apps using Java or the Kotlin language at Android Studio is not complicated, but you need to take tutorials to gain some domain. However, if you need to develop real professional applications, Laurence Moroney teaches an excellent course, available free at Coursera: Device-based Models with TensorFlow Lite.
But if you are not a developer, do not have the time, or only need a more straightforward app that can be quickly deployed, the MIT App Inventor should be your choice.
MIT App Inventor is an intuitive, visual programming environment that allows everyone – even children – to build fully functional apps for Android phones, iPhones, and Android/iOS tablets.
Only basic AI Applications are available with MIT App Inventor, such as Image and Sound classification, Pose Estimation, etc.
To start, optionally on this tutorial, available at the MIT App Inventor site, you can go step by step to create a general Image Classification App that will run on your Android device. In that project, the Mobilenet model was pre-trained with the ImageNet dataset, which 999 classes can be checked here. I left the project code (.aia) and the executable (.apk) of my version of this App in my GitHub.
But what we will explore here in this tutorial is how we can use our images to train a machine learning model to be deployed on an edge device, in this case, an Android tablet.
Fruits versus Veggies – Image Classification
As in every machine learning project, We should start training a model and proceed with inference on the edge device (Android). For training, we should find some data (in fact, tons of data!).
With TinyML, we should limit the classification to three or four categories due to limitations (mainly memory). With EdgeML usually, we do not need to be concerned because devices generally have MBs of memory available, instead of a few KB, of embedded ones. But to keep a comparative approach with the Last TinyML project, We will differentiate only apples from bananas and potatoes, same as before.
We will use the Kaggle dataset that includes images from those categories:
This dataset contains images of the following food items:
- Fruits – banana, apple, pear, grapes, orange, kiwi, watermelon, pomegranate, pineapple, mango.
- Vegetables – cucumber, carrot, capsicum, onion, potato, lemon, tomato, radish, beetroot, cabbage, lettuce, spinach, soybean, cauliflower, bell pepper, chili pepper, turnip, corn, sweetcorn, sweet potato, paprika, jalepeño, ginger, garlic, peas, eggplant.
Each category is split into the train (100 images), test (10 images), and validation (10 images).
- Download the dataset from the Kaggle website to your computer.
Using PIC – Personal Image Classifier for Training
The Personal Image Classifier (PIC) is an educational machine learning tool built by Danny Tang in the MIT App Inventor lab, allowing users some hyperparameters customization during training.
Go to this link to open the PIC in your Browser (Chrome works well)
The PIC Front End Training tool can be divided into three main tasks:
- Training: Users specify classification labels and upload or record images for each label.
- Testing: After training the custom model, PIC provides an interface for webcam inputs and classifies the test images, showing the user classification confidences
- Export: After testing a custom PIC model, users can export their models for use in the MIT App Inventor PIC Extension. This allows App Inventor applications to provide inputs to a pre-trained PIC model and perform actions based on the model’s output classifications.
We should use the (+) button to specify classification labels and upload or record images for each label. Try to have at least 50 images from each class. I tested with less than that, but more images, better.
Note that, besides the three categories defined previously, I added a fourth one, “Background”, that contains images of my desk (or no fruits or vegetables)
As mentioned before, classifying images is the most common use of Deep Learning, but a lot of data should be used to accomplish this task. We have only a few dozens of images for each category. Is this number enough? Not at all! We will need thousand of images to “teach or model” to differentiate an apple from a banana. But, we can solve this issue by re-training a previously trained model with thousands of images. We called this technic “Transfer Learning” (TL).
With TL, we can fine-tune a pre-trained image classification model on our data, reaching a good performance even with relatively small image datasets (our case). PIC uses TL over a MobileNet model pre-trained with the ImageNet dataset.
After populating each label with image examples, It is possible to customize some hyperparameters before training. For that, use the button [Custom]. If you are OK, use the [Train Model] button.
Optionally, is possible to train the model using an old version of PIC. With this previous version you have more control over the model design and hyperparameters selection,.
After training the custom model, PIC provides an interface for webcam inputs and classifies the test images, showing the user classification confidences.
After testing some images, PIC also informs the test error metrics:
After testing a custom PIC model, you can export the trained model (model.mdl), downloading it to your computer. For that, use the button [Export Model].
The created file (model.mdl) will be used with the MIT App Inventor PIC Extension, allowing App Inventor applications to provide inputs to a pre-trained PIC model and perform actions based on the model’s output classifications.
The PIC Extension
App Inventor apps are built using components. Components let the apps use the built-in features of the mobile device (like Camera or LocationSensor) or services on the Web (like Twitter or FusionTables). There have been many requests to include additional features in App Inventor.
App Inventor Extensions let anyone create Extension Components. Extension componentscan be used in building projects, just like other components. The difference is that extension components can be distributed on the Web and loaded into App Inventor dynamically: they do not have to be built into the App Inventor system, and they can be imported into projects as needed. With extensions, the range of App Inventor apps can be virtually unlimited.
To create extension components is necessary familiarity with the App Inventor source code (located on Github) and programming the extension in Java. Extension components are packaged as.aix files. Once you create an extension component, anyone can use it in their App Inventor projects.
Some extensions can be found at the MIT App Inventor Extensions repo.
For this project, we should use the extension personalImageClassifier.aix file.
The goal of the PIC extension is to allow us to build an App Inventor application that can take in a pre-trained model and make predictions using images from the device camera.
The main tasks of the extension are:
- First, allow users to load in a pre-trained, exported PIC model (model.mdl).
- Provide a function to allow users to toggle between front and back cameras
- Provide an interface for users to classify images from the camera.
- Provide an interface for users to record audio clips.
- Display model predictions and execute user-defined application functionality based on model results.
- Generate error codes
The image below shows the main extension blocks available:
Creating the Image Classification App
In your browser, go to https://appinventor.mit.edu/, use the button [Create Apps] and on your account page, start a new project named “fruits_vs_veggis”, or import the final project (fruits_vs_veggies.aia) from my GitHub.
Import the personalImageClassifier.aixextension to your project. You can do it by pasting the URL as shown below or uploading the file from your computer. Next, you must upload your trained model to the new component.
Start dragging the non-visible components to your App:
- PersonalImageClassifier1: Extension
- TextToSpeech1: Media
The Layout components should be of your choice. You can inspect my project as a reference.
Our App will have components that will be updated after inference:
- StatusLabel: where the inference label result will be displayed (start with ” “)
- StatusProb: where the probability of the Top 1 result will be displayed (start with ” “)
- WebViwer1: where the camera image will be displayed
- ErrorCode: label where to display eventual error code (start with “Waiting…”)
- ClassifyButton: Command to capture the image sending it to Classify
- ToggleButton: Command to toggle camera (back/front)
- SpeakButton: Command to generate a text-to-speech to present the inference result also with voice.
Below is how the App should look:
Note: It is very important that the WebViwer1 component be connected to one of its properties extensions.
Start the Block design, initializing global variables
and getting a possible error and displaying it on the ErrorCode label.
Next, Initialize the classifier, enabling the ClassifyButton and displaying ” Ready” on the ErrorCode label area:
At this point, the 3 command options are available to the user:
1. Toggle the camera
2. Toggle Speech-to-text (The button is toggle ON -> OFF or OFF -> ON)
3. Perform inference
When the ClassifyButton is pressed, the current frame image from the camera is captured and sent to the classifier, to be classified by the model (Inference). As a result, a list of 2 items is generated (global label), being the index 1 corresponding to “Top 1” most probable class and index 2, its probability.
We should get the class (index 1) and the probability (index 2) and display them on their correspondent labels (StatusLabel and StatusProb). If variable global speak is ON, the result is also passed to TextToSpeech1 to be converted into audio.
Here you can see an overall view of all code:
Deploying the App on Android devices
Once your project is finished, it is time to deploy it on an Android device and perform real tests. For that, go to the Build option on the top menu and select Android App (.apk):
You will be prompt with 2 options:
- Download the.apk file
- Use the QR Code to install the App directly to your device.
The easiest way is to use the QR code option. For that, first, (if you do not have done it before) go to Play Store and download to your device the App Inventor Companion:
If you are using your Android Device as AI Companion (for debug/tests during code development), stop it.
With the MIT A12 Companion app running, use the option scan QR code to scan the QR code generated before (deploying the App).
Give the normal permissions to.apk installation, and that’s it! After that, your Android Device will be ready to classify images!
Here are some examples of inference in real world:
That’s it! Your App is running and performing inference into the Edge! The machine learning model is working without any connection with a big server or Web!
Having a Smart Device running inference on the Edge is very important. For example, we can verify if a plant has a disease, as this project developed with a Bean dataset created by AIR Lab Makerene University of Uganda.
In the case of the original project “Detecting Diseases in the Bean plants”, the App was developed by the Google team using the Android Studio. Still, by downloading the dataset to your machine, you could repeat the same steps used to train the fruits vs. veggies App and create a Bean Disease Detector App.
The MIT App Inventor is a quick way to deploy Machine Learning models at the Edge. On my GitHub repository, you will find the last version of the codes: APP_Inventor-ML_Projects.
Saludos from the south of the world!
See you at my next project!