Friday, September 6, 2024

Learn monocular SLAM with OpenCV

Learn Monocular SLAM with OpenCV

Introduction

To understand how AR works with a monocular camera, I did some research in my free time and explored SLAM methodologies using popular repositories such as ORB_SLAM2/3, VINS-Mono, Twitchslam, and others. The purpose of SLAM is to estimate the camera’s position and orientation in the world frame while simultaneously building a point cloud map. Then I decided to implement it myself using available libraries from OpenCV to gain a deeper understanding.

For the full implementation, please visit the following repository: https://github.com/longpth/RoLabsSlam.

Part 1: Initialization

Step 1: Camera Pose Estimation by Feature Matching Between 2 Consecutive Images

In order to estimate the camera pose in the world frame, we need to estimate the camera transformation between 2 consecutive camera frames. The relation between 2 camera frames is their matching key points of their 2 consecutive images.

C++ Code:

void Frame::detectAndCompute2(const cv::Mat& image) {
    // Parameters for goodFeaturesToTrack
    int maxCorners = 2000;
    double qualityLevel = 0.01;
    double minDistance = 10.0;
    int blockSize = 3;
    bool useHarrisDetector = false;
    double k = 0.04;

    // Detect good features to track in the cell
    std::vector goodFeatures;
    cv::goodFeaturesToTrack(image, goodFeatures, maxCorners, qualityLevel, minDistance, 
        cv::Mat(), blockSize, useHarrisDetector, k);

    // Convert good features to keypoints
    std::vector goodKeypoints;
    cv::KeyPoint::convert(goodFeatures, goodKeypoints);

    // Compute descriptors for the good keypoints in the current cell
    cv::Mat goodFeaturesDescriptors;
    _orb->compute(image, goodKeypoints, goodFeaturesDescriptors);

    // Set the class member variables to the merged results
    _keypoints = goodKeypoints;
    _descriptors = goodFeaturesDescriptors;
}

Images Taken from KITTI Dataset

The images used in this implementation are taken from the KITTI Vision Benchmark Suite (cvlibs.net).

Essential Matrix Calculation

After we have matched points between 2 images, we can compute the Essential matrix which relates corresponding points in two images of the same scene captured by two cameras. It is defined as:

$$ \mathbf{p_2}^T \cdot E \cdot \mathbf{p_1} = 0 $$

Where:

p1 is the point in the first image (in normalized coordinates).
p2 is the point in the second image (in normalized coordinates).
E is the essential matrix (a 3x3 matrix).

We can use OpenCV to calculate the Essential matrix as follows:

C++ Code:

cv::Mat EstimateEssentialMatrix(const Frame& frame1, const Frame& frame2, const std::vector& good_matches, const cv::Mat& cameraMatrix) {
    // Extract the matched keypoints
    std::vector points1, points2;
    for (const auto& match : good_matches) {
        points1.push_back(frame1.KeyPoints()[match.queryIdx].pt);
        points2.push_back(frame2.KeyPoints()[match.trainIdx].pt);
    }

    // Compute the Essential Matrix using RANSAC to handle outliers
    cv::Mat essential_matrix = cv::findEssentialMat(points1, points2, cameraMatrix, cv::RANSAC, 0.999, 1.0);
    return essential_matrix;
}

After having the Essential Matrix, we can extract the relative rotation (R) and translation (t) between the two camera views with cv2.recoverPose().

Note that the translation t is an up to scale translation. In other words, the magnitude of the translation is arbitrary. For real robotics applications, you will need to determine the actual scale of the scene to get the correct translation value. Without the correct scale, the translation vector will only provide the direction of movement, but not the real-world distance.

For monocular setups, one way to determine the scale is to combine the visual data with additional sensors like an IMU (Inertial Measurement Unit) or wheel odometry. The IMU can provide acceleration and orientation information, while wheel odometry can measure the actual distance traveled by the robot. Using these sources of information together, you can estimate the true scale and calculate the correct translation.

Step 2: Reconstruct 3D Point Cloud After Estimating Camera Pose

After having the transformation by combining rotation and translation between 2 camera views from step 1, we do the triangulation on the 2 sets of 2D matched points from 2 images into 3D space. Triangulation involves solving a system of linear equations to find the 3D point that projects to the corresponding 2D points in both images.

The goal is to find a point XXX in 3D space that, when projected into both camera views using the projection matrices P1 and P2, gives the corresponding 2D points p1 and p2.

The projection matrix P can be written as:

P = K [ R | t ]

Where:

K is the intrinsic camera matrix, which includes the camera's internal parameters like focal lengths and principal point.
R is the rotation matrix, which describes the orientation of the camera.
t is the translation vector, which describes the position of the camera relative to the world coordinate system.
[ R | t ] is the extrinsic matrix, which describes how the camera is positioned in the world.

We can use the cv::sfm::triangulatePoints (InputArrayOfArrays points2d, InputArrayOfArrays projection_matrices, OutputArray points3d) API from OpenCV, or we can re-implement it as the following:

C++ Code:

  std::vector MyTriangulatePoints(
      const cv::Mat& P1, const cv::Mat& P2,
      const std::vector& points1,
      const std::vector& points2,
      std::vector& mask,
      int& goodParralaxCnt)
  {
      assert(points1.size() == points2.size()); // Size of both point vectors must be equal

      std::vector points3D;

      for (size_t i = 0; i < points1.size(); i++)
      {
          // Construct matrix A
          cv::Mat A(4, 4, CV_64F);
          A.row(0) = points1[i].x * P1.row(2) - P1.row(0);
          A.row(1) = points1[i].y * P1.row(2) - P1.row(1);
          A.row(2) = points2[i].x * P2.row(2) - P2.row(0);
          A.row(3) = points2[i].y * P2.row(2) - P2.row(1);

          // Perform SVD decomposition
          cv::Mat u, w, vt;
          cv::SVD::compute(A, w, u, vt);

          // The 3D point in homogeneous coordinates is the last column of V^T (or row of V in OpenCV)
          cv::Mat X = vt.row(3).t();

          // Convert homogeneous coordinates to 3D
          X /= X.at<double>(3);  // Normalize by W to get (X, Y, Z, 1)

          cv::Point3d pt3d = cv::Point3d(X.at<double>(0), X.at<double>(1), X.at<double>(2));

          points3D.emplace_back(X.at<double>(0), X.at<double>(1), X.at<double>(2));

          if (CalculateCosParallax(P1, P2, pt3d) < 0.9998)
          {
              mask.push_back(true);
              goodParralaxCnt++;
          }
          else
          {
              mask.push_back(false);
          }
      }

      return points3D;
  }

In the above function MyTriangulatePoints, I referred to OpenCV and ORB-SLAM2 to remove points with small parallax, which are not good for depth estimation.

Part 2: Tracking

After the initialization process, we already have the initial point cloud and the camera transformation from the first camera view to the second one.

In the tracking process:

Step 1: Apply Motion Model

We apply a motion model: currentPose = velocity * previousPose to have the initial pose for the current frame.

Step 2: Track 3D Points

We try to track 3D points in the current frame by projecting these points to the image frame.

Step 3: Pose Optimization

With the set of 2D projected points and respective 3D points, we perform Pose Optimization for the current pose estimation using the g2o library. The current camera pose is updated with the optimized pose.

Step 4: Create New 3D Map Points

Check if we need to create new 3D map points. If yes, create new 3D points from the current frame keypoints that were not matched in the projection search. We use Optical Flow to find the 2D points in the previous frame, then triangulate them. If no match is found using Optical Flow, match the points using BFMatcher.

Step 5: Request Bundle Adjustment

Perform Bundle Adjustment using the g2o library.

Step 6: Calculate Camera Velocity

Calculate the camera velocity using the formula: velocity = currentPose * previousPose_invert, for the next camera pose estimation.

Wednesday, August 7, 2024

Privacy Policy - Gemini Test Your Knowledge

Privacy Policy

Effective Date: 08/08/2024

Welcome to Gemini Test Your Knowledge! This privacy policy explains our practices regarding any information that we may or may not collect from users of our mobile application (the "App").

1. Information We Collect

We do not collect any personal information from users of our App. Your privacy is of utmost importance to us, and we ensure that no data is gathered, stored, or transmitted by using this App.

2. Use of Information

Since we do not collect any information, we do not use any personal data for any purposes.

3. Sharing of Information

We do not share any information with third parties because we do not collect any personal information from our users.

4. Data Security

We prioritize your privacy and security. Since we do not collect any data, there are no security concerns related to data breaches or unauthorized access to personal information.

5. Changes to This Privacy Policy

We may update our Privacy Policy from time to time. We will notify you of any changes by posting the new Privacy Policy on this page. You are advised to review this Privacy Policy periodically for any changes.

This Privacy Policy is effective as of the date stated at the top of this policy.

6. Contact Us

If you have any questions or suggestions about our Privacy Policy, do not hesitate to contact us at: am.projects.forfun@gmail.com or longpth.vn@gmail.com

Privacy Policy - Gemini Quiz

Privacy Policy

Effective Date: 08/08/2024

Welcome to Gemini Quiz! This privacy policy explains our practices regarding any information that we may or may not collect from users of our mobile application (the "App").

1. Information We Collect

We do not collect any personal information from users of our App. Your privacy is of utmost importance to us, and we ensure that no data is gathered, stored, or transmitted by using this App.

2. Use of Information

Since we do not collect any information, we do not use any personal data for any purposes.

3. Sharing of Information

We do not share any information with third parties because we do not collect any personal information from our users.

4. Data Security

We prioritize your privacy and security. Since we do not collect any data, there are no security concerns related to data breaches or unauthorized access to personal information.

5. Changes to This Privacy Policy

This Privacy Policy is effective as of the date stated at the top of this policy.

6. Contact Us

If you have any questions or suggestions about our Privacy Policy, do not hesitate to contact us at: am.projects.forfun@gmail.com or longpth.vn@gmail.com

Saturday, July 27, 2024

Simple Object Tracking with Android Phone

Simple Object Tracking Camera with Android Phone

In this project, I would like to show you how to build a simple object tracking Camera with an Android Phone.

Things used in this project

Hardware components

Arduino Uno
Bluetooth 4.0 UART CC2541 HM-10
RC Servo x 2
Battery
Android Phone

Software apps and online services

Arduino IDE
Android Studio

Story

The software component is an Android application written in Java and developed using Android Studio. This app utilizes the powerful OpenCV library to implement an object tracking algorithm, allowing the camera to follow a designated object in real-time. The trackers are used in the project are TrackerMedianFlow, TrackerCSRT, TrackerKCF, TrackerMOSSE, TrackerTLD, TrackerMIL.

Source code

https://github.com/longpth/Android-Object-Tracking-OpenCV/tree/master

Demonstration

Monday, July 15, 2024

About Robotics Learner

Hello and welcome to my page!

I'm a seasoned software engineer with over 10 years of experience spanning various exciting and dynamic fields, including automotive, robotics, and computer vision. My journey in the tech industry has been a rewarding adventure, filled with numerous projects that have challenged and honed my skills.

Professional Background

Automotive: My work in the automotive sector has involved developing cutting-edge software solutions that power the next generation of vehicles. From enhancing vehicle safety systems to creating advanced driver-assistance technologies, I've been at the forefront of innovation in this rapidly evolving industry.

Robotics: Robotics has always fascinated me, and I've been fortunate enough to work on projects that bring robots to life. Whether it's developing algorithms for autonomous navigation or designing systems that enable robots to interact with their environment, I've contributed to making robots more intelligent and capable.

Computer Vision: In the realm of computer vision, I've worked on projects that enable machines to see and understand the world around them. This includes everything from object recognition and tracking to advanced image processing techniques that are essential for applications like augmented reality and medical imaging.

Passion Projects

Beyond my professional work, I have a deep passion for education and engineering. In my free time, I love to embark on various educational projects that aim to inspire and educate others. Whether it's creating tutorials, building educational tools, or working on DIY engineering projects, I find joy in sharing knowledge and fostering a love for technology.

Why This Page?

I started this page as a platform to share my experiences, insights, and projects with a broader audience. Here, you'll find articles on the latest trends in software engineering, tutorials on various tech topics, and updates on my personal projects. My goal is to provide valuable content that can help both budding and experienced engineers navigate the ever-changing landscape of technology.

Thank you for visiting my page! I hope you find the content here both informative and inspiring. Feel free to reach out if you have any questions, suggestions, or just want to connect.

Email me at: longpth.vn@gmail.com or am.projects.forfun@gmail.com

Happy reading!

Thursday, July 4, 2024

Privacy Policy GeminiChessAnalysis

Privacy Policy

Effective Date: 28/07/2024

Welcome to GeminiChessAnalysis! This privacy policy explains our practices regarding any information that we may or may not collect from users of our mobile application (the "App").

1. Information We Collect

We do not collect any personal information from users of our App. Your privacy is of utmost importance to us, and we ensure that no data is gathered, stored, or transmitted by using this App.

2. Use of Information

Since we do not collect any information, we do not use any personal data for any purposes.

3. Sharing of Information

We do not share any information with third parties because we do not collect any personal information from our users.

4. Data Security

We prioritize your privacy and security. Since we do not collect any data, there are no security concerns related to data breaches or unauthorized access to personal information.

5. Changes to This Privacy Policy

This Privacy Policy is effective as of the date stated at the top of this policy.

6. Contact Us

If you have any questions or suggestions about our Privacy Policy, do not hesitate to contact us at: am.projects.forfun@gmail.com or longpth.vn@gmail.com