Skip to main content

DeepStack

DeepStack

DeepStack is a self-hosted, free and open source AI server that provides object detection and face recognition, among other functions.
It is highly optimized and runs on a pleathora of devices and platforms.
Below is quoted from DeepStacks documentation:

DeepStack is an AI server that empowers every developer in the world to easily build state-of-the-art AI systems both on premise and in the cloud. The promises of Artificial Intelligence are huge but becoming a machine learning engineer is hard. DeepStack is device and language agnostic. You can run it on Windows, Mac OS, Linux, Raspberry PI and use it with any programming language.
DeepStack's source code is available on GitHub via https://github.com/johnolafenwa/DeepStack
DeepStack is developed and maintained by DeepQuest AI

Configuration

Configuration example
deepstack:
host: deepstack
port: 5000
object_detector:
cameras:
camera_one:
scan_on_motion_only: false
fps: 1
labels:
- label: person
confidence: 0.8

face_recognition:
save_unknown_faces: false
cameras:
camera_one:
labels:
- person
deepstackmap required
DeepStack configuration.

Object detector

An object detector scans an image to identify multiple objects and their position.

tip

Object detectors can be taxing on the system, so it is wise to combine it with a motion detector

Labels

Labels are used to tell Viseron what objects to look for and keep recordings of. The available labels depends on what detection model you are using.

The max/min width/height is used to filter out any unreasonably large/small objects to reduce false positives.
Objects can also be filtered out with the use of an optional mask.

tip
These are the labels available in the default DeepStack model:
person, bicycle, car, motorcycle, airplane, bus, train, truck, boat, traffic light, fire hydrant, stop_sign, parking meter, bench, bird, cat, dog, horse, sheep, cow, elephant, bear, zebra, giraffe, backpack, umbrella, handbag, tie, suitcase, frisbee, skis, snowboard, sports ball, kite, baseball bat, baseball glove, skateboard, surfboard, tennis racket, bottle, wine glass, cup, fork, knife, spoon, bowl, banana, apple, sandwich, orange, broccoli, carrot, hot dog, pizza, donut, cake, chair, couch, potted plant, bed, dining table, toilet, tv, laptop, mouse, remote, keyboard, cell phone, microwave, oven, toaster, sink, refrigerator, book, clock, vase, scissors, teddy bear, hair dryer, toothbrush.

Zones

Zones are used to define areas in the cameras field of view where you want to look for certain objects (labels).
Say you have a camera facing the sidewalk and have labels setup to record the label person.
This would cause Viseron to start recording people who are walking past the camera on the sidewalk. Not ideal.
To remedy this you define a zone which covers only the area that you are actually interested in, excluding the sidewalk.

Mask

Masks are used to exclude certain areas in the image from object detection. If a detected object has its lower portion inside of the mask it will be discarded.

The coordinates form a polygon around the masked area.
To easily generate coordinates you can use a tool like image-map.net.
Just upload an image from your camera and start drawing your zone.
Then click Show me the code! and adapt it to the config format.
Coordinates coords="522,11,729,275,333,603,171,97" should be turned into this:

coordinates:
- x: 522
y: 11
- x: 729
y: 275
- x: 333
y: 603
- x: 171
y: 97

Face recognition

Face recognition runs as a post processor when a specific object is detected.

Labels

Labels are used to tell Viseron when to run a post processor.

Any label configured under the object_detector for your camera can be added to the post processors labels section.

note

Only objects that are tracked by an object_detector can be sent to a post_processor. The object also has to pass all of its filters (confidence, height, width etc).

Train

On startup images are read from face_recognition_path and a model is trained to recognize these faces.
The folder structure of the faces folder is very strict. Here is an example of the default one:

/config
|── face_recognition
| └── faces
| ├── person1
| | ├── image_of_person1_1.jpg
| | ├── image_of_person1_2.png
| | └── image_of_person1_3.jpg
| └── person2
| | ├── image_of_person2_1.jpeg
| | └── image_of_person2_2.jpg
danger

You need to follow this folder structure, otherwise training will not be possible.