YOLOv8 — Detection from Webcam — Step by Step [CPU]

Ekkachai
4 min readNov 9, 2023

This article shows how to use YOLOv8 for object detection with a web camera.

Following these steps…

Part 1 : Installation

  1. install python https://www.python.org/downloads/
  2. install anaconda https://conda.io/projects/conda/en/latest/user-guide/install/index.html
  3. create & activate a virtual environment https://docs.ultralytics.com/guides/conda-quickstart/#prerequisites
conda create --name ultralytics-env python=3.8 -y
conda activate ultralytics-env

4. install Ultralytics https://docs.ultralytics.com/guides/conda-quickstart/#setting-up-a-conda-environment

conda install -c conda-forge ultralytics

Additional ,

a) If you have a problem with torch , run this → https://pytorch.org/get-started/locally/#windows-python

pip3 install torch torchvision torchaudio

b) if you have a problem with ultralytics version , run this → Issue #2573

pip3 install --upgrade ultralytics

Part 2 : Download a model

We will use an exist model → YOLOv8n. You can download from https://docs.ultralytics.com/models/yolov8/#supported-modes then save it on a local drive.

https://docs.ultralytics.com/models/yolov8/#supported-modes

Part 3: Create a project

  1. at Anaconda prompt (with ultralytics-env),

you can find from a start menu.

, then create a folder “yolov8_webcam”

mkdir yolov8_webcam

2. download file yolov8n.pt to this folder

3. open VS code

code .

Workshop 1 : detect everything from image

  1. put image in folder “/yolov8_webcam”
  2. coding
from ultralytics import YOLO

# Load a model
model = YOLO('yolov8n.pt') # pretrained YOLOv8n model

# Run batched inference on a list of images
results = model(['image1.jpg', 'image2.jpg'], stream=True) # return a generator of Results objects

# Process results generator
for result in results:
boxes = result.boxes # Boxes object for bbox outputs
masks = result.masks # Masks object for segmentation masks outputs
keypoints = result.keypoints # Keypoints object for pose outputs
probs = result.probs # Probs object for classification outputs

Workshop 2 : detect everything from YouTube

  1. at Anaconda prompt (with ultralytics-env)
  2. use this command
yolo predict model=yolov8n.pt source=''https://youtu.be/LNwODJXcvt4' imgsz=32

Workshop 3 : test a web camera

import cv2

cap = cv2.VideoCapture(0)
cap.set(3, 640)
cap.set(4, 480)

while True:
ret, img= cap.read()
cv2.imshow('Webcam', img)
if cv2.waitKey(1) == ord('q'):
break

cap.release()
cv2.destroyAllWindows()

Workshop 4 : detect everything from web camera

  1. connect a web camera
  2. run this command
yolo predict model=yolov8n.pt source=0 imgsz=640

Workshop 5 : detect everything from web camera + add notation

# source from https://dipankarmedh1.medium.com/real-time-object-detection-with-yolo-and-webcam-enhancing-your-computer-vision-skills-861b97c78993

from ultralytics import YOLO
import cv2
import math
# start webcam
cap = cv2.VideoCapture(0)
cap.set(3, 640)
cap.set(4, 480)

# model
model = YOLO("yolov8n.pt")

# object classes
classNames = ["person", "bicycle", "car", "motorbike", "aeroplane", "bus", "train", "truck", "boat",
"traffic light", "fire hydrant", "stop sign", "parking meter", "bench", "bird", "cat",
"dog", "horse", "sheep", "cow", "elephant", "bear", "zebra", "giraffe", "backpack", "umbrella",
"handbag", "tie", "suitcase", "frisbee", "skis", "snowboard", "sports ball", "kite", "baseball bat",
"baseball glove", "skateboard", "surfboard", "tennis racket", "bottle", "wine glass", "cup",
"fork", "knife", "spoon", "bowl", "banana", "apple", "sandwich", "orange", "broccoli",
"carrot", "hot dog", "pizza", "donut", "cake", "chair", "sofa", "pottedplant", "bed",
"diningtable", "toilet", "tvmonitor", "laptop", "mouse", "remote", "keyboard", "cell phone",
"microwave", "oven", "toaster", "sink", "refrigerator", "book", "clock", "vase", "scissors",
"teddy bear", "hair drier", "toothbrush"
]


while True:
success, img = cap.read()
results = model(img, stream=True)

# coordinates
for r in results:
boxes = r.boxes

for box in boxes:
# bounding box
x1, y1, x2, y2 = box.xyxy[0]
x1, y1, x2, y2 = int(x1), int(y1), int(x2), int(y2) # convert to int values

# put box in cam
cv2.rectangle(img, (x1, y1), (x2, y2), (255, 0, 255), 3)

# confidence
confidence = math.ceil((box.conf[0]*100))/100
print("Confidence --->",confidence)

# class name
cls = int(box.cls[0])
print("Class name -->", classNames[cls])

# object details
org = [x1, y1]
font = cv2.FONT_HERSHEY_SIMPLEX
fontScale = 1
color = (255, 0, 0)
thickness = 2

cv2.putText(img, classNames[cls], org, font, fontScale, color, thickness)

cv2.imshow('Webcam', img)
if cv2.waitKey(1) == ord('q'):
break

cap.release()
cv2.destroyAllWindows()

--

--