What a good artist understands is that nothing comes from nowhere. All creative work builds on what came before. Nothing is completely original. - Austin Kleon, "Steal Like An Artist"
It can be difficult to come up with projects that feel authentic and completely original. Sometimes it feels downright impossible to be wholly unique. And maybe that's because it is. That's secretly a good thing. Let me tell you why.
The fifth project in the General Assembly Data Science Immersive was a group project built around the notion of social good. We needed to come up with something that would be a positive influence for the world. The four of us had explored a few different ideas based around our individual social justice passions, and had originally settled on a project based around dams and the correlation between dams and malaria in India. Mosquitos breed in still water and the idea seemed to have a lot of merit, until we did some research into our dataset and found little correlation between malaria and dams, at least in the data that we had.
Data science is an iterative process, they say, and here we were heading back to the drawing board.
Claire Hester, one of the Data Science Instructional Associates at General Assembly at the time, presented her capstone project during class one day and we were incredibly impressed. It involved training a convolutional neural network model to be able to detect if someone is wearing a mask correctly, incorrectly, or not at all. And she was able to demonstrate this with a live demo of the model working in real time on her web cam, utilizing imutils and OpenCV in order to achieve this. Claire was honest about the shortcomings of her model, and how it didn't work particularly well in classifying the incorrectly masked. Additionally, her dataset was small, at just 853 images.
As students, we were enamored. We knew we wanted to do something like it, and we knew that it would be a great opportunity to gain some practice working with image data and computer vision, so we decided to see if we could do something similar, but make a better model.
We grappled with the notion of not being wholly original in our data science project, but then we stopped and thought about every project idea we had. We Googled each and every one of them and found that a number of data science projects already existed for every single one of our ideas. There was no escaping the truth: we were not going to come up with something that nobody had ever explored.
Austin Kleon, in his book "Steal Like an Artist", addresses this concept. He argues to envision our influences like a family tree of sorts, and that we should focus on building our own branch on the tree. There's no escaping the influence of what you consume, so it's better to discuss your sources of inspiration and let others be inspired too. Johnny Cash sang "Hurt" to much acclaim when he released it in 2002. His aged, gravely voice lent depth and despair to the lyrics. And plenty of people have no idea that it was originally a Nine Inch Nails song on the album "The Downward Spiral". We take from our inspirations and produce from there, hoping to inspire others down the line.
We decided that, in order to build on what Claire had created, we would need to have a lot more images. We found a dataset on Github that had nearly 70,000 unique images that had also included photoshopped images of blue surgical masks onto the faces that were either correctly worn or incorrectly worn. This put our image dataset at around 200,000 in total. Given computational constraints, only 9,000 total images were used (3,000 of each class) in our modeling. We built our model to have two convolutional layers of 16 nodes each, with pooling layers for both of these. Given the amount of our data, we found our models reaching 0.99 precision and recall scores on all three classes (correctly worn, incorrectly worn, or unmasked) in under 10 epochs. We managed to build an incredibly accurate model overall, and we were all excited by these results.
There's a mistake in art that building upon the work from others is somehow disingenuous, and I wanted to clear the air about that. I want you to know where our sources come from because I want you to know that this is an iteration of an idea that is touched upon by others. There's a fallacy in thinking that people are self-made. We all come from one another and build upon each other. We have shared our results on Github and hope that others build upon what we have made too; this way we all build for the common good, for that is the social good.
I invite anyone and everyone to use what you find useful from all of my projects too. I hope to see others improving upon my work and creating even better versions of anything I produce. It's exciting to see what people can do with your material. There's no need to waste time in believing that you need to start from scratch; please re-use our wheel and get rolling.