How to do real time people tracking and recognition using DL

Free AI web copilot to create summaries, insights and extended knowledge, download it at here

10157

Abstract

ss="hljs-params">self, frame: np.ndarray,face_frame=False) -> np.ndarray: names = None if not face_frame: face_crops = {index: {"name": "", "tlbr": tlbr} for index, tlbr in enumerate(self.detector(frame, return_tlbr=True))} for key, value in face_crops.items(): t, l, b, r = value["tlbr"] face_encoding = self.encode(frame[t:b, l:r]) distances = self.cosine_distance(face_encoding, list(self.anchors.values())) if np.max(distances) > self.threshold: face_crops[key]["name"] = list(self.anchors.keys())[np.argmax(distances)] names = face_crops[key]["name"] names = names.rsplit('')[0] print(names,np.max(distances)) else: face_encoding = self.encode(frame) distances = self.cosine_distance(face_encoding, list(self.anchors.values())) if np.max(distances) > self.threshold: names = list(self.anchors.keys())[np.argmax(distances)].rsplit('')[0] print(names,np.max(distances))

    <span class="hljs-keyword">return</span> names</pre></div><p id="0f13"><b>Tracking algorithms

People tracking involves monitoring and following individuals as they move through a scene or across frames in a video. DeepSORT (Deep Learning for Single Object Tracking) and SORT (Simple Online and Realtime Tracking) are examples of tracking algorithms that leverage deep learning for improved accuracy and robustness.<h1 id="192f">Methodology</h1>The system works as follows:<ol><li>The application grabs a new frame from the camera.</li><li>The object detection system processes frames and extracts people from the scene. For each person, a sub-region of the frame is cropped for detailed processing.</li><li>Each person’s region is processed by a face detector algorithm to extract a person’s face from the body.</li><li>Each face is scanned by a face recognition system that compares the current face with faces stored in the database. If a face is recognized, the name is returned, “undefined” is returned otherwise.</li></ol><figure id="701c"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*x9KkLsHRvPeX-M5_T5fC4g.png"><figcaption></figcaption></figure>Associating recognition with a unique track ID is a common and effective approach in people recognition and tracking systems, especially in scenarios where individuals may move in and out of view or temporarily obstruct their faces. This method ensures that even if a person’s face is temporarily obscured or no longer visible in a given frame, the system can still recognize them based on their assigned track ID. Here’s how it works:<ol><li>Assignment of a Track ID: The system assigns a unique track ID to the detected person. This ID is associated with their facial features and other relevant information.</li><li>Continued Tracking: As the video stream or frames progress, the tracking algorithm continuously monitors the movements and appearances of individuals. Even if a person’s face becomes temporarily obscured or is no longer visible, the system still tracks their movement based on their unique track ID.</li><li>Re-Recognition: When the person’s face becomes invisible, the system can re-recognize them by matching their current track ID. This allows for the seamless tracking of individuals across different frames, even in challenging scenarios.</li></ol><div id="3888"><pre>#name is the output of face recognition calling if bool(name): to_remove = [] for key, value in id_face_dictionary.items(): if value == name: if id != key: to_remove.append(key) loggers["recognition"].info(f"{name} already in dict. ID: {id}") for k in to_remove: id_face_dictionary.pop(k)

#once deleted, we add new key id_face_dictionary[id] = name loggers["recognition"].info(f"Added {name} to key {id}")</pre></div>By using track IDs, the system maintains a consistent identity for each individual throughout the video or sequence of frames, ensuring that recognition can be maintained even when the face is not visible at all times. This approach is valuable in various applications, including video surveillance, where continuous tracking and recognition are essential for security and analysis purposes.<figure id="4ef2"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*kCbMvCzasr6Iqelumpc9VQ.gif"><figcaption></figcaption></figure><h1 id="fa64">Optimizing Real-Time Object Detection and Tracking</h1>Real-time object detection and tracking are critical components in various applications such as surveillance, autonomous driving, and interactive systems. However, performing detection and tracking within the stringent time constraint of 30 milliseconds (ms) per frame, equivalent to the frame rate of 30 frames per second (fps), presents significant computational challenges. To overcome this, we propose a multi-threaded architecture that divides processing into three independent threads: Core, Detector, and Recognizer. Each is designed to operate concurrently, reducing processing latency and resource contention.Core Thread: The Application ManagerThe Core thread acts as the central coordinator. Its primary functions are to:<ol><li>Acquire video frames directly from the camera input.</li><li>Dispatch these frames to the Detector thread without delay.</li><li>Collect processed data from the Detector and Recognizer threads.</li><li>Display the resulting frames with detected objects and recognized entities.</li></ol>This thread ensures that the most recent frame is always the one being processed. If the Detector thread takes longer than 30ms to process a frame, the Core thread skips ahead, avoiding backlogs and ensuring real-time performance without queuing frames.<div id="f23d"><pre>while vid.isOpened(): ret, frame = vid.read() # out = None if ret: if queuepulls == 1: timer2 = time.time() # Capture frame-by-frame # if the input queue is empty, give the current frame to # classify if inputQueue.empty(): inputQueue.put(frame) else: loggers["general"].debug("Skipping frame from face detection")

  <span class="hljs-comment"># if the output queue *is not* empty, grab the detections</span>
  <span class="hljs-keyword">if</span> <span class="hljs-keyword">not</span> outputQueue.empty():
      out = outputQueue.get()
  <span class="hljs-keyword">if</span> out <span class="hljs-keyword">is</span> <span class="hljs-keyword">not</span> <span class="hljs-literal">None</span>:
      queuepulls += <span class="hljs-number">1</span>
      <span class="hljs-keyword">for</span> output <span class="hljs-keyword">in</span> out:
          bbox_left = <span class="hljs-built_in">int</span>(output[<span class="hljs-number">0</span>])
          bbox_top = <span class="hljs-built_in">int</span>(output[<span class="hljs-number">1</span>])
          bbox_w = <span class="hljs-built_in">int</span>(output[<span class="hljs-number">2</span>]) 
          bbox_h = <span class="hljs-built_in">int</span>(output[<span class="hljs-number">3</span>])
          <span class="hljs-keyword">if</span> output.shape[<span class="hljs-number">0</span>] == <span class="hljs-number">7</span>:
              <span class="hljs-built_in">id</span> = <span class="hljs-built_in">int</span>(output[<span class="hljs-number">4</span>])
              prev_id = <span class="hljs-built_in">id</span>
          <span class="hljs

Options

-keyword">else: id =prev_id if id in id_face_dictionary: name = id_face_dictionary[id] else: name = "undefined" color = (255,0,0) # Use your custom color drawPerson(frame,bbox_left,bbox_top,bbox_w,bbox_h,name,color)
cv2.imshow('frame', frame) if cv2.waitKey(1) & 0xFF == ord('q'): vid.release() cv2.destroyAllWindows() p.kill() pRec.kill() break</pre></div>Detector Thread: The Object Detection EngineRunning in an infinite loop, the Detector thread is tasked with:<ol><li>Executing object detection algorithms on the current frame.</li><li>Sending the detection results back to the Core thread.</li><li>Forwarding information regarding face detections to the Recognizer thread.</li></ol>The Detector is designed for speed and accuracy, utilizing optimized algorithms capable of identifying various objects within the 30ms time frame.<div id="83d2"><pre>def object_detection_(model_path,confidence,inputQueue,outputQueue,recognitionQueue): global id_face_dictionary yolov8_detector = YOLO(model_path) device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') yolov8_detector.to(device) loggers['tracking'].info("Detection initialized") while True: if not inputQueue.empty(): frame = inputQueue.get() result = yolov8_detector.track(frame,verbose=False,conf=confidence,persist=True)[0] #Verbose False to avoid yolov8 messages data = result.cpu().numpy().boxes.data outputQueue.put(data) if recognitionQueue.empty(): recognitionQueue.put((data,frame))</pre></div>Recognizer Thread: The Identification SpecialistParallel to the Detector, the Recognizer thread is responsible for:<ol><li>Performing face recognition tasks on detected facial data.</li><li>Relaying recognition results back to the Core thread.</li></ol>It also operates in an infinite loop, checking for new data from the Detector and processing it immediately to identify individuals or features in the video frame.<div id="9a5a"><pre>def recognize_algorithm(model_path,recognitionQueue,id_face_dictionary,confidence): detector = face_detector.FaceDetection() recog = face_recognition.FaceNet( detector=detector, threshold=confidence, onnx_model_path = model_path) loggers['recognition'].info("Recognition initialized") while True: if not recognitionQueue.empty(): out = recognitionQueue.get() frame = out[1] boxes = out[0] for output in boxes: bbox_left = int(output[0]) bbox_top = int(output[1]) bbox_w = int(output[2]) bbox_h = int(output[3]) id = int(output[4]) if bbox_w > 0 and bbox_h > 0: person_frame = frame[bbox_top:bbox_h,bbox_left:bbox_w,:] start_time = time.time()

                name = recog(frame=person_frame,face_frame=<span class="hljs-literal">True</span>)
                loggers[<span class="hljs-string">'recognition'</span>].debug(<span class="hljs-string">f"RECOGNITION - Inference time: <span class="hljs-subst">{<span class="hljs-built_in">round</span>(time.time()-start_time,<span class="hljs-number">2</span>)}</span>"</span>)

                <span class="hljs-keyword">if</span> <span class="hljs-built_in">bool</span>(name):
                    to_remove = []
                    <span class="hljs-keyword">for</span> key, value <span class="hljs-keyword">in</span> id_face_dictionary.items():
                        <span class="hljs-keyword">if</span> value == name:
                            <span class="hljs-keyword">if</span> <span class="hljs-built_in">id</span> != key:
                                    to_remove.append(key)
                            loggers[<span class="hljs-string">"recognition"</span>].info(<span class="hljs-string">f"<span class="hljs-subst">{name}</span> already in dict. ID: <span class="hljs-subst">{<span class="hljs-built_in">id</span>}</span>"</span>)
                    <span class="hljs-keyword">for</span> k <span class="hljs-keyword">in</span> to_remove:
                        id_face_dictionary.pop(k)

                    <span class="hljs-comment">#once deleted, we add new key</span>
                    id_face_dictionary[<span class="hljs-built_in">id</span>] = name
                    loggers[<span class="hljs-string">"recognition"</span>].info(<span class="hljs-string">f"Added <span class="hljs-subst">{name}</span> to key <span class="hljs-subst">{<span class="hljs-built_in">id</span>}</span>"</span>)</pre></div><p id="b154"><b>Inter-Thread Communication</b></p><p id="e8c9">Inter-thread communication is a cornerstone of this architecture. It allows for the asynchronous processing of frames, where each thread independently checks for new frames and processes them. This design ensures that the system is always working on the latest available frame, thus maintaining real-time performance without lag. Each thread communicates via <a href="https://docs.python.org/es/3/library/queue.html">Python Queues</a>, with synchronization mechanisms in place to prevent race conditions and data corruption.</p><figure id="3ee0"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*scTx75EB3M_hpihIRhrlYg.png"><figcaption></figcaption></figure><div id="770c"><pre>  inputQueue = Queue(maxsize=<span class="hljs-number">1</span>)

outputQueue = Queue(maxsize=1) recognitionQueue = Queue() p = Process(target=object_detection_, args=(model_path,detection_confidence,inputQueue, outputQueue,recognitionQueue,)) p.daemon = True p.start()

pRec = Process(target=recognize_algorithm, args=(recognition_model_path,recognitionQueue,id_face_dictionary,recognition_confidence,)) pRec.daemon = True pRec.start()</pre></div><h1 id="e6ed">System specifications</h1>The application is designed to run seamlessly on Python, making it accessible to a wide range of users across different operating systems. Its cross-platform compatibility ensures that it can be utilized on popular operating systems such as Windows, macOS, and various Linux distributions. While the application is versatile in terms of OS support, it’s important to note that for optimal real-time performance, a GPU (Graphics Processing Unit) is highly recommended, especially when working with resource-intensive deep learning models. A GPU can significantly accelerate the execution of these models, enabling faster processing and enhancing the application’s ability to perform real-time tasks efficiently.<figure id="3a6e"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*1hX7TqaSHSNuD8BFChpwvw.png"><figcaption>LatinX in AI (LXAI) logo</figcaption></figure>Do you identify as Latinx and are working in artificial intelligence or know someone who is Latinx and is working in artificial intelligence?<ul><li>Get listed on our directory and become a member of our member’s forum: <a href="https://forum.latinxinai.org/">https://forum.latinxinai.org/</a></li><li>Become a writer for the LatinX in AI Publication by emailing us at <a href="mailto:[email protected]">[email protected]</a></li><li>Learn more on our website: <a href="http://www.latinxinai.org/">http://www.latinxinai.org/</a></li></ul>Don’t forget to hit the 👏 below to help support our community — it means a lot!</article></body>

How to do real time people tracking and recognition using DL

Deep Learning fundamentals

Methodology

Optimizing Real-Time Object Detection and Tracking

System specifications