Case Study : SaferNet with AI
Content
1.1 Introduction
1.2 Problem Statement
1.3 Real-world/Business Objectives and Constraints
2.0 Mapping the real-world problem to an ML problem
2.1 Performance Metric
3.0 DataSet
3.1 Preprocessing
4.0 Model Building
4.1 Custom CNN
4.2 Transfer Learning
4.3 MobileNet
4.4 Custom MobileNet
4.5 Model Interpretation
5.0 Deployment
5.1 Extension
6.0 Future Work
Bibliography
1.1 Introduction
Nowadays on social networks where everyone can upload whatever they want without having to count that there might be children watching that or people don’t want to watch that. To avoid this we need to come up with a solution that filters based real-time and with high accuracy using chrome extension that filters real-time NSFW (not safe for working) images and replaces them with SFW (safe for working) images.
The government has definitely taken a step in this direction by blocking some websites, but what about social media sites and other sites? They are openly showing NSFW (not safe for work) content on their sites. We can avoid that content using artificial intelligence.
We can use some advanced computer vision techniques to filter out that content. We cannot access social media servers and block content for everyone, but we can add filters on the user side. To add a filter between a website and a user, we can simply use a browser extension, so we can filter content with very low latency, and it is also easy to deploy by using JavaScript.
So there are multiple browsers in the market, but from all the market share, Chrome single-handedly captures 63.3% of the market share. It is very beneficial and to reach more users, we decided to build a Chrome Browser Extension.
Google has very detailed and concise documentation for building chrome extensions.
1.2 Problem Statement
We have two class classification problems. we need to classify images as NSFW or SFW.
1.3 Real-world/Business Objectives and Constraints
Low latency required : We can’t make users wait for predictions.
Model size as small as possible : As we already know chrome uses more RAM than other browsers so, Lesser the size of model less RAM use by chrome.
Interpretability is partially important : To check how and why a model is working.
2.0 Mapping the real-world problem to an ML problem
We are posing this problem as binary by classifying images as SFW or NSFW.
2.1 Performance Metric
The F1-score combines the precision and recall of a classifier into a single metric by taking their harmonic mean. It is primarily used to compare the performance of two classifiers.
Metric(s) we use:
F1 Score
Confusion Matrix
3.0 DataSet
To collect NSFW (Not safe for working) images we need to scrap adult websites, blogs but in India those sites are blocked. To get the dataset, there is same kind of project which segment NSFW images, Nudenets we use this data to train our model.
SFW data means any image it can be tree or building, it can be human or animal, so we need diverse collection of images and also those images are copyright free. For that we use Unsplash which provides free stock images. His lite dataset contains random images of ~25,000 photos. It can be used for both commercial and non-commercial usage.
SFW Data Sample
NSFW Data Sample
Blur images because of adult content
After creating all data we get 1.2GB dataset. Upload on kaggle to easy access using API. We have ~20k NSFW and ~20k SFW images.
3.1 Preprocessing
For preprocessing we only normalize images. Because of we have lots of data we don’t do any augmentation on images.
4.0 Model Building
In this case study our main objective of our model is to
- Model that has low latency
- Model that has very small in size
- Model with good accuracy
To do that we need to experiment with different CNN architectures and select best of them.
4.1 Custom CNN
For starting and baseline model we use custom CNN with 2 layers of Convolution and 2 layers of Dense layer.
Even we are just using 6 layer model our number of parameters are 48 Million. Size of model is 557.23 MB.
4.2 Transfer Learning
After Custom Models we try to experiment with transfer learning. For that we use ResNet50 who has 36 Million parameter sand EfficientNet 12 Million parameters both give use good accuracy, sizes are 240MB and 87MB respectively.
4.3 MobileNet
We observe that even model giving good accuracy, but still model size is very large after reading few research paper we found research paper of MobileNet. Which architecture is designed to be used in mobile applications.
MobileNet uses depthwise separable convolutions. It significantly reduces the number of parameters when compared to the network with regular convolutions with the same depth in the nets. This results in lightweight deep neural networks.
We can see that the best model in terms of speed, size, and accuracy is MobileNet. We can write MobileNet from scratch, and we can reduce its size further by reducing the number of layers. Because this model is trained on the imagenet dataset, which has 1000 classes, to classify those 1000 classes, we need more features from the image. However, because we only have two classes, our model can easily perform very well with fewer features and fewer layers and because of less computation speed of the model is increased.
4.4 Custom MobileNet
def expansion_block(x, t, filter):
total_filter = t * filter
x = Conv2D(total_filter, 1, padding='same', use_bias=False)(x)
x = ReLU()(x)
x = BatchNormalization()(x)
return xdef depth_block(x, stride):
x = DepthwiseConv2D(kernel_size=3, strides=stride, padding='same', use_bias=False)(x)
x = ReLU()(x)
x = BatchNormalization()(x)
return xdef projection_block(x, out_channels):
x = Conv2D(filters=out_channels, kernel_size=1, padding='same')(x)
x = BatchNormalization()(x)
return x
We are not using ReLU6 because that is not implemented in tensorflow js, and we need to deploy out model in tensorflow js, so we simply use normal ReLU. using above function for creating Bottleneck of layers.
def Bottleneck(x,t,filters, out_channels,stride,block_id):
y = expansion_block(x,t,filters,block_id)
y = depthwise_block(y,stride,block_id)
y = projection_block(y, out_channels,block_id)
if y.shape[-1]==x.shape[-1]:
y = add([x,y])
return y
In original model they have 17 Bottleneck layers. we are only using 4 bottlenecks.
We only take two layers of bottleneck to reduce number of parameters and size of model. after that we got 18k parameters and model size of 1.01MB.
4.5 Model Interpretation
LIME, or Local Interpretable Model-Agnostic Explanations, is an algorithm that can explain the predictions of any classifier or regressor in a faithful way, by approximating it locally with an interpretable model. It modifies a single data sample by tweaking the feature values and observes the resulting impact on the output.
We can see the yellow region is the reason, because of that model classify that image in that class.
Our model performance is very good. accuracy latency reduce from 42ms to 37.5 ms.
5.0 Deployment
Our model is in tensorflow format which can be run by python language, But we need to run this model in browser, and browser support JavaScript. For that we convert our model into TensorFlow.js version.
After converting to .js our model size is now 226KB.
5.1 Extension
I’m not hardcore javascript developer so basically I don’t know multithreading like advance concepts in javascripts. To run my model on browser
- I just simply run a for loop and predict each image.
- If that image is SFW image then do nothing
- Else replace that image with Lorem Picsum image(random image from internet).
5.2 Extension Demo
Before plugin install
When we install SaferNet with AI extension
Effect of extension
Our extension replace all NSFW images with SFW images…
6.0 Future Work
- Change image source using threading
- Add url change detection
- Try to batch predict
Bibliography
Dataset : https://github.com/notAI-tech/NudeNet
Research paper : https://arxiv.org/abs/1801.04381
Deployment : https://www.tensorflow.org/js
Documentation : https://developer.chrome.com/docs/
Guidance : https://www.appliedaicourse.com/
For the full implementation of this case study refer to this Github repo.