DeepStack
DeepStack is a self-hosted, free and open source AI server that provides object detection and face recognition, among other functions.
It is highly optimized and runs on a pleathora of devices and platforms.
Below is quoted from DeepStacks documentation:
DeepStack is an AI server that empowers every developer in the world to easily build state-of-the-art AI systems both on premise and in the cloud. The promises of Artificial Intelligence are huge but becoming a machine learning engineer is hard. DeepStack is device and language agnostic. You can run it on Windows, Mac OS, Linux, Raspberry PI and use it with any programming language.
DeepStack's source code is available on GitHub via https://github.com/johnolafenwa/DeepStack
DeepStack is developed and maintained by DeepQuest AI
Configuration
Configuration example
deepstack:
host: deepstack
port: 5000
object_detector:
cameras:
camera_one:
scan_on_motion_only: false
fps: 1
labels:
- label: person
confidence: 0.8
face_recognition:
save_unknown_faces: false
cameras:
camera_one:
labels:
- person
Object detector
An object detector scans an image to identify multiple objects and their position.
Object detectors can be taxing on the system, so it is wise to combine it with a motion detector
Labels
Labels are used to tell Viseron what objects to look for and keep recordings of. The available labels depends on what detection model you are using.
The max/min width/height is used to filter out any unreasonably large/small objects to reduce false positives.
Objects can also be filtered out with the use of an optional mask.
person, bicycle, car, motorcycle, airplane, bus, train, truck, boat, traffic light, fire hydrant, stop_sign, parking meter, bench, bird, cat, dog, horse, sheep, cow, elephant, bear, zebra, giraffe, backpack, umbrella, handbag, tie, suitcase, frisbee, skis, snowboard, sports ball, kite, baseball bat, baseball glove, skateboard, surfboard, tennis racket, bottle, wine glass, cup, fork, knife, spoon, bowl, banana, apple, sandwich, orange, broccoli, carrot, hot dog, pizza, donut, cake, chair, couch, potted plant, bed, dining table, toilet, tv, laptop, mouse, remote, keyboard, cell phone, microwave, oven, toaster, sink, refrigerator, book, clock, vase, scissors, teddy bear, hair dryer, toothbrush.
Zones
Zones are used to define areas in the cameras field of view where you want to
look for certain objects (labels).
Say you have a camera facing the sidewalk and have labels
setup to
record the label person
.
This would cause Viseron to start recording people who are walking past the
camera on the sidewalk. Not ideal.
To remedy this you define a zone which covers only the area that you are
actually interested in, excluding the sidewalk.
Mask
Masks are used to exclude certain areas in the image from object detection. If a detected object has its lower portion inside of the mask it will be discarded.
The coordinates
form a polygon around the masked area.
To easily generate coordinates
you can use a tool like image-map.net.
Just upload an image from your camera and start drawing your zone.
Then click Show me the code! and adapt it to the config format.
Coordinates coords="522,11,729,275,333,603,171,97"
should be turned into this:
coordinates:
- x: 522
y: 11
- x: 729
y: 275
- x: 333
y: 603
- x: 171
y: 97
Face recognition
Face recognition runs as a post processor when a specific object is detected.
Labels
Labels are used to tell Viseron when to run a post processor.
Any label configured under the object_detector
for your camera can be added to the post processors labels
section.
Only objects that are tracked by an object_detector
can be sent to a post_processor
.
The object also has to pass all of its filters (confidence, height, width etc).
Train
On startup images are read from face_recognition_path
and a model is trained to recognize these faces.
The folder structure of the faces folder is very strict. Here is an example of the default one:
/config
|── face_recognition
| └── faces
| ├── person1
| | ├── image_of_person1_1.jpg
| | ├── image_of_person1_2.png
| | └── image_of_person1_3.jpg
| └── person2
| | ├── image_of_person2_1.jpeg
| | └── image_of_person2_2.jpg
You need to follow this folder structure, otherwise training will not be possible.