ARKit & LiDAR: Building Point Clouds in Swift (part 2)

In the second part of our exploration into creating a point cloud, we will build upon the foundation established in the first part. Having captured and processed individual colored points, our next objective is to merge these points into a unified point cloud. Afterward, we will visualize the point cloud in the AR view and finally export it to a .PLY file. Storing vertices We now have both the color and 3D position of each point from the depth map. The next step is to store these points in the point cloud, but simply appending every point we capture would be inefficient and could lead to an unnecessarily large dataset. To handle this effectively, we need to filter the points coming from different depth maps. Instead of processing every single point, we’ll use a grid-based algorithm. This approach involves dividing the 3D space into uniform grid cells or "boxes" of a predefined size. For each grid cell, we store only a single representative point, effectively downsampling the data. By adjusting the size of these boxes, we can control the density of the point cloud. This method not only reduces the amount of data stored but also gives us flexibility in fine-tuning the point cloud density depending on the requirements of the application—whether we need higher precision for detailed captures or a lighter dataset for faster processing. Let’s define a Vertex, GridKey and a corresponding dictionary as part of our PointCloud actor. actor PointCloud { //-- struct GridKey: Hashable { static let density: Float = 100 private let id: Int init(_ position: SCNVector3) { var hasher = Hasher() for component in [position.x, position.y, position.z] { hasher.combine(Int(round(component * Self.density))) } id = hasher.finalize() } } struct Vertex { let position: SCNVector3 let color: simd_float4 } private(set) var vertices: [GridKey: Vertex] = [:] //-- } GridKey is designed to round point coordinates based on a specified density, allowing us to assign to the same key the points that are positioned near each other. We can complete the process function now. func process(frame: ARFrame) async { guard let depth = (frame.smoothedSceneDepth ?? frame.sceneDepth), let depthBuffer = PixelBuffer (pixelBuffer: depth.depthMap), let confidenceMap = depth.confidenceMap, let confidenceBuffer = PixelBuffer (pixelBuffer: confidenceMap), let imageBuffer = YCBCRBuffer(pixelBuffer: frame.capturedImage) else { return } let rotateToARCamera = makeRotateToARCameraMatrix(orientation: .portrait) let cameraTransform = frame.camera.viewMatrix(for: .portrait).inverse * rotateToARCamera // iterate through pixels in depth buffer for row in 0..<depthBuffer.size.height { for col in 0..<depthBuffer.size.width { // get confidence value let confidenceRawValue = Int(confidenceBuffer.value(x: col, y: row)) guard let confidence = ARConfidenceLevel(rawValue: confidenceRawValue) else { continue } // filter by confidence if confidence != .high { continue } // get distance value from let depth = depthBuffer.value(x: col, y: row) // filter points by distance if depth > 2 { return } let normalizedCoord = simd_float2(Float(col) / Float(depthBuffer.size.width), Float(row) / Float(depthBuffer.size.height)) let imageSize = imageBuffer.size.asFloat let screenPoint = simd_float3(normalizedCoord * imageSize, 1) // Transform the 2D screen point into local 3D camera space let localPoint = simd_inverse(frame.camera.intrinsics) * screenPoint * depth // Converts the local camera space 3D point into world space. let worldPoint = cameraTransform * simd_float4(localPoint, 1) // Normalizes the result. let resulPosition = (worldPoint / worldPoint.w) let pointPosition = SCNVector3(x: resulPosition.x, y: resulPosition.y, z: resulPosition.z) let key = PointCloud.GridKey(pointPosition) if vertices[key] == nil { let pixelRow = Int(round(normalizedCoord.y * imageSize.y)) let pixelColumn = Int(round(normalizedCoord.x * imageSize.x)) let color = imageBuffer.color(x: pixelColumn, y: pixelRow) vertices[key] = PointCloud.Vertex(position: pointPosition, color: color) } } } } Visualization of the point cloud To visualize the point cloud in the AR view, we will create an SCNGeometry each time a frame is processed, which will be displayed using an SCNNode. This approach allows us to dynamically render the point cloud in real-time. However, rendering a high-density point cloud can significantly strain the device's resources, potentially leading to performance issues. To address this and maintain smooth rendering, we will implement a tiny optimization which is drawing only every 10th point from the cloud. This selective rendering will help balance visual fidelity and performance, ensuring that the app remains responsive while the point cloud still conveys essential details of the scanned environment. Let’s create a geometryNode in our ARManager and attach it to the rootNode in the initializer. Next, we will add an async updateGeometry function that will convert vertices from point cloud to a SCNGeometry and replace the geometry in geometryNode. Finally, we will integrate this updateGeometry function into our processing pipeline. actor ARManager: NSObject, ARSessionDelegate, ObservableObject { //-- @MainActor let geometryNode = SCNNode() //-- @MainActor override init() { //-- sceneView.scene.rootNode.addChildNode(geometryNode) } @MainActor private func process(frame: ARFrame) async { guard !isProcessing && isCapturing else { return } isProcessing = true await pointCloud.process(frame: frame) await updateGeometry() // <- add here the geometry update isProcessing = false } func updateGeometry() async { // make an array of every 10th point let vertices = await pointCloud.vertices.values.enumerated().filter { index, _ in index % 10 == 9 }.map { $0.element } // create a vertex source for geometry let vertexSource = SCNGeometrySource(vertices: vertices.map { $0.position } ) // create a color source let colorData = Data(bytes: vertices.map { $0.color }, count: MemoryLayout .size * vertices.count) let colorSource = SCNGeometrySource(data: colorData, semantic: .color, vectorCount: vertices.count, usesFloatComponents: true, componentsPerVector: 4, bytesPerComponent: MemoryLayout .size, dataOffset: 0, dataStride: MemoryLayout >.size) // as we don't use proper geometry, we can pass just an array of // indices to our geometry element let pointIndices: [UInt32] = Array(0..<UInt32(vertices.count)) let element = SCNGeometryElement(indices: pointIndices, primitiveType: .point) // here we can customize the size of the point, rendered in ARView element.maximumPointScreenSpaceRadius = 15 let geometry = SCNGeometry(sources: [vertexSource, colorSource], elements: [element]) geometry.firstMaterial?.isDoubleSided = true geometry.firstMaterial?.lightingModel = .constant Task { @MainActor in geometryNode.geometry = geometry } } Exporting point cloud to a .PLY file Finally, we’ll export the captured point cloud to a .PLY file, utilizing the Transferable protocol for seamless data handling in SwiftUI ShareLink. The .PLY file format is relatively straightforward, consisting of a text file that includes a header specifying the content and structure of the data, followed by a list of vertices along with their corresponding color components. struct PLYFile: Transferable { let pointCloud: PointCloud enum Error: LocalizedError { case cannotExport } func export() async throws -> Data { let vertices = await pointCloud.vertices var plyContent = """ ply format ascii 1.0 element vertex \(vertices.count) property float x property float y property float z property uchar red property uchar green property uchar blue property uchar alpha end_header """ for vertex in vertices.values { // Convert position and color let x = vertex.position.x let y = vertex.position.y let z = vertex.position.z let r = UInt8(vertex.color.x * 255) let g = UInt8(vertex.color.y * 255) let b = UInt8(vertex.color.z * 255) let a = UInt8(vertex.color.w * 255) // Append the vertex data plyContent += "\n\(x) \(y) \(z) \(r) \(g) \(b) \(a)" } guard let data = plyContent.data(using: .ascii) else { throw Error.cannotExport } return data } static var transferRepresentation: some TransferRepresentation { DataRepresentation(exportedContentType: .data) { try await $0.export() }.suggestedFileName("exported.ply") } } Completing the UI Now that we’ve built the core functionality of our app, we can complete our UI. The UI will include a button to start and stop the capture of the point cloud, as well as an option to export and share the generated point cloud file. In our main view, we’ll create a ZStack that overlays the AR view with buttons for controlling the capture process and sharing the result. @main struct PointCloudExampleApp: App { @StateObject var arManager = ARManager() var body: some Scene { WindowGroup { ZStack(alignment: .bottom) { UIViewWrapper(view: arManager.sceneView).ignoresSafeArea() HStack(spacing: 30) { Button { arManager.isCapturing.toggle() } label: { Image(systemName: arManager.isCapturing ? "stop.circle.fill" : "play.circle.fill") } ShareLink(item: PLYFile(pointCloud: arManager.pointCloud), preview: SharePreview("exported.ply")) { Image(systemName: "square.and.arrow.up.circle.fill") } }.foregroundStyle(.black, .white) .font(.system(size: 50)) .padding(25) } } } } Final Thoughts In this two-part article, we’ve built a basic AR application capable of generating and presenting 3D point clouds using ARKit and LiDAR in Swift. We discovered how to extract LiDAR data, convert it to points in 3D space, and unite it into a single point cloud, along with the ability to export and share it as a .PLY file. This app is just the beginning. You could enhance it further by adding features like more advanced filtering, allowing the user to adjust point cloud density, or improving the cloud quality by replacing the points in the grid dictionary depending on distance or other factors. The compact and cost-effective nature of the iPhone's LiDAR makes advanced depth sensing accessible to developers and opens up a world of possibilities for innovative applications. Happy coding! In the second part of our exploration into creating a point cloud, we will build upon the foundation established in the first part. second first first Having captured and processed individual colored points, our next objective is to merge these points into a unified point cloud. Afterward, we will visualize the point cloud in the AR view and finally export it to a . PLY file. PLY Storing vertices Storing vertices We now have both the color and 3D position of each point from the depth map . The next step is to store these points in the point cloud, but simply appending every point we capture would be inefficient and could lead to an unnecessarily large dataset. To handle this effectively , we need to filter the points coming from different depth maps. color 3D position depth map effectively Instead of processing every single point, we’ll use a grid-based algorithm . This approach involves dividing the 3D space into uniform grid cells or " boxes " of a predefined size. For each grid cell, we store only a single representative point, effectively downsampling the data. By adjusting the size of these boxes, we can control the density of the point cloud. grid-based algorithm boxes single downsampling This method not only reduces the amount of data stored but also gives us flexibility in fine-tuning the point cloud density depending on the requirements of the application—whether we need higher precision for detailed captures or a lighter dataset for faster processing. Let’s define a Vertex , GridKey and a corresponding dictionary as part of our PointCloud actor. Vertex GridKey PointCloud actor PointCloud { //-- struct GridKey: Hashable { static let density: Float = 100 private let id: Int init(_ position: SCNVector3) { var hasher = Hasher() for component in [position.x, position.y, position.z] { hasher.combine(Int(round(component * Self.density))) } id = hasher.finalize() } } struct Vertex { let position: SCNVector3 let color: simd_float4 } private(set) var vertices: [GridKey: Vertex] = [:] //-- } actor PointCloud { //-- struct GridKey: Hashable { static let density: Float = 100 private let id: Int init(_ position: SCNVector3) { var hasher = Hasher() for component in [position.x, position.y, position.z] { hasher.combine(Int(round(component * Self.density))) } id = hasher.finalize() } } struct Vertex { let position: SCNVector3 let color: simd_float4 } private(set) var vertices: [GridKey: Vertex] = [:] //-- } GridKey is designed to round point coordinates based on a specified density , allowing us to assign to the same key the points that are positioned near each other. GridKey density same We can complete the process function now. process func process(frame: ARFrame) async { guard let depth = (frame.smoothedSceneDepth ?? frame.sceneDepth), let depthBuffer = PixelBuffer (pixelBuffer: depth.depthMap), let confidenceMap = depth.confidenceMap, let confidenceBuffer = PixelBuffer (pixelBuffer: confidenceMap), let imageBuffer = YCBCRBuffer(pixelBuffer: frame.capturedImage) else { return } let rotateToARCamera = makeRotateToARCameraMatrix(orientation: .portrait) let cameraTransform = frame.camera.viewMatrix(for: .portrait).inverse * rotateToARCamera // iterate through pixels in depth buffer for row in 0..<depthBuffer.size.height { for col in 0..<depthBuffer.size.width { // get confidence value let confidenceRawValue = Int(confidenceBuffer.value(x: col, y: row)) guard let confidence = ARConfidenceLevel(rawValue: confidenceRawValue) else { continue } // filter by confidence if confidence != .high { continue } // get distance value from let depth = depthBuffer.value(x: col, y: row) // filter points by distance if depth > 2 { return } let normalizedCoord = simd_float2(Float(col) / Float(depthBuffer.size.width), Float(row) / Float(depthBuffer.size.height)) let imageSize = imageBuffer.size.asFloat let screenPoint = simd_float3(normalizedCoord * imageSize, 1) // Transform the 2D screen point into local 3D camera space let localPoint = simd_inverse(frame.camera.intrinsics) * screenPoint * depth // Converts the local camera space 3D point into world space. let worldPoint = cameraTransform * simd_float4(localPoint, 1) // Normalizes the result. let resulPosition = (worldPoint / worldPoint.w) let pointPosition = SCNVector3(x: resulPosition.x, y: resulPosition.y, z: resulPosition.z) let key = PointCloud.GridKey(pointPosition) if vertices[key] == nil { let pixelRow = Int(round(normalizedCoord.y * imageSize.y)) let pixelColumn = Int(round(normalizedCoord.x * imageSize.x)) let color = imageBuffer.color(x: pixelColumn, y: pixelRow) vertices[key] = PointCloud.Vertex(position: pointPosition, color: color) } } } } func process(frame: ARFrame) async { guard let depth = (frame.smoothedSceneDepth ?? frame.sceneDepth), let depthBuffer = PixelBuffer (pixelBuffer: depth.depthMap), let confidenceMap = depth.confidenceMap, let confidenceBuffer = PixelBuffer (pixelBuffer: confidenceMap), let imageBuffer = YCBCRBuffer(pixelBuffer: frame.capturedImage) else { return } let rotateToARCamera = makeRotateToARCameraMatrix(orientation: .portrait) let cameraTransform = frame.camera.viewMatrix(for: .portrait).inverse * rotateToARCamera // iterate through pixels in depth buffer for row in 0..<depthBuffer.size.height { for col in 0..<depthBuffer.size.width { // get confidence value let confidenceRawValue = Int(confidenceBuffer.value(x: col, y: row)) guard let confidence = ARConfidenceLevel(rawValue: confidenceRawValue) else { continue } // filter by confidence if confidence != .high { continue } // get distance value from let depth = depthBuffer.value(x: col, y: row) // filter points by distance if depth > 2 { return } let normalizedCoord = simd_float2(Float(col) / Float(depthBuffer.size.width), Float(row) / Float(depthBuffer.size.height)) let imageSize = imageBuffer.size.asFloat let screenPoint = simd_float3(normalizedCoord * imageSize, 1) // Transform the 2D screen point into local 3D camera space let localPoint = simd_inverse(frame.camera.intrinsics) * screenPoint * depth // Converts the local camera space 3D point into world space. let worldPoint = cameraTransform * simd_float4(localPoint, 1) // Normalizes the result. let resulPosition = (worldPoint / worldPoint.w) let pointPosition = SCNVector3(x: resulPosition.x, y: resulPosition.y, z: resulPosition.z) let key = PointCloud.GridKey(pointPosition) if vertices[key] == nil { let pixelRow = Int(round(normalizedCoord.y * imageSize.y)) let pixelColumn = Int(round(normalizedCoord.x * imageSize.x)) let color = imageBuffer.color(x: pixelColumn, y: pixelRow) vertices[key] = PointCloud.Vertex(position: pointPosition, color: color) } } } } Visualization of the point cloud Visualization of the point cloud To visualize the point cloud in the AR view, we will create an SCNGeometry each time a frame is processed, which will be displayed using an SCNNode . This approach allows us to dynamically render the point cloud in real-time . visualize SCNGeometry SCNNode real-time However, rendering a high-density point cloud can significantly strain the device's resources, potentially leading to performance issues. To address this and maintain smooth rendering, we will implement a tiny optimization which is drawing only every 10th point from the cloud. This selective rendering will help balance visual fidelity and performance, ensuring that the app remains responsive while the point cloud still conveys essential details of the scanned environment. balance Let’s create a geometryNode in our ARManager and attach it to the rootNode in the initializer. geometryNode ARManager rootNode Next, we will add an async updateGeometry function that will convert vertices from point cloud to a SCNGeometry and replace the geometry in geometryNode . updateGeometry SCNGeometry geometryNode Finally, we will integrate this updateGeometry function into our processing pipeline. updateGeometry processing actor ARManager: NSObject, ARSessionDelegate, ObservableObject { //-- @MainActor let geometryNode = SCNNode() //-- @MainActor override init() { //-- sceneView.scene.rootNode.addChildNode(geometryNode) } @MainActor private func process(frame: ARFrame) async { guard !isProcessing && isCapturing else { return } isProcessing = true await pointCloud.process(frame: frame) await updateGeometry() // <- add here the geometry update isProcessing = false } func updateGeometry() async { // make an array of every 10th point let vertices = await pointCloud.vertices.values.enumerated().filter { index, _ in index % 10 == 9 }.map { $0.element } // create a vertex source for geometry let vertexSource = SCNGeometrySource(vertices: vertices.map { $0.position } ) // create a color source let colorData = Data(bytes: vertices.map { $0.color }, count: MemoryLayout .size * vertices.count) let colorSource = SCNGeometrySource(data: colorData, semantic: .color, vectorCount: vertices.count, usesFloatComponents: true, componentsPerVector: 4, bytesPerComponent: MemoryLayout .size, dataOffset: 0, dataStride: MemoryLayout >.size) // as we don't use proper geometry, we can pass just an array of // indices to our geometry element let pointIndices: [UInt32] = Array(0..<UInt32(vertices.count)) let element = SCNGeometryElement(indices: pointIndices, primitiveType: .point) // here we can customize the size of the point, rendered in ARView element.maximumPointScreenSpaceRadius = 15 let geometry = SCNGeometry(sources: [vertexSource, colorSource], elements: [element]) geometry.firstMaterial?.isDoubleSided = true geometry.firstMaterial?.lightingModel = .constant Task { @MainActor in geometryNode.geometry = geometry } } actor ARManager: NSObject, ARSessionDelegate, ObservableObject { //-- @MainActor let geometryNode = SCNNode() //-- @MainActor override init() { //-- sceneView.scene.rootNode.addChildNode(geometryNode) } @MainActor private func process(frame: ARFrame) async { guard !isProcessing && isCapturing else { return } isProcessing = true await pointCloud.process(frame: frame) await updateGeometry() // <- add here the geometry update isProcessing = false } func updateGeometry() async { // make an array of every 10th point let vertices = await pointCloud.vertices.values.enumerated().filter { index, _ in index % 10 == 9 }.map { $0.element } // create a vertex source for geometry let vertexSource = SCNGeometrySource(vertices: vertices.map { $0.position } ) // create a color source let colorData = Data(bytes: vertices.map { $0.color }, count: MemoryLayout .size * vertices.count) let colorSource = SCNGeometrySource(data: colorData, semantic: .color, vectorCount: vertices.count, usesFloatComponents: true, componentsPerVector: 4, bytesPerComponent: MemoryLayout .size, dataOffset: 0, dataStride: MemoryLayout >.size) // as we don't use proper geometry, we can pass just an array of // indices to our geometry element let pointIndices: [UInt32] = Array(0..<UInt32(vertices.count)) let element = SCNGeometryElement(indices: pointIndices, primitiveType: .point) // here we can customize the size of the point, rendered in ARView element.maximumPointScreenSpaceRadius = 15 let geometry = SCNGeometry(sources: [vertexSource, colorSource], elements: [element]) geometry.firstMaterial?.isDoubleSided = true geometry.firstMaterial?.lightingModel = .constant Task { @MainActor in geometryNode.geometry = geometry } } Exporting point cloud to a .PLY file Exporting point cloud to a .PLY file Finally, we’ll export the captured point cloud to a . PLY file, utilizing the Transferable protocol for seamless data handling in SwiftUI ShareLink . PLY Transferable SwiftUI ShareLink The . PLY file format is relatively straightforward, consisting of a text file that includes a header specifying the content and structure of the data, followed by a list of vertices along with their corresponding color components. PLY header list struct PLYFile: Transferable { let pointCloud: PointCloud enum Error: LocalizedError { case cannotExport } func export() async throws -> Data { let vertices = await pointCloud.vertices var plyContent = """ ply format ascii 1.0 element vertex \(vertices.count) property float x property float y property float z property uchar red property uchar green property uchar blue property uchar alpha end_header """ for vertex in vertices.values { // Convert position and color let x = vertex.position.x let y = vertex.position.y let z = vertex.position.z let r = UInt8(vertex.color.x * 255) let g = UInt8(vertex.color.y * 255) let b = UInt8(vertex.color.z * 255) let a = UInt8(vertex.color.w * 255) // Append the vertex data plyContent += "\n\(x) \(y) \(z) \(r) \(g) \(b) \(a)" } guard let data = plyContent.data(using: .ascii) else { throw Error.cannotExport } return data } static var transferRepresentation: some TransferRepresentation { DataRepresentation(exportedContentType: .data) { try await $0.export() }.suggestedFileName("exported.ply") } } struct PLYFile: Transferable { let pointCloud: PointCloud enum Error: LocalizedError { case cannotExport } func export() async throws -> Data { let vertices = await pointCloud.vertices var plyContent = """ ply format ascii 1.0 element vertex \(vertices.count) property float x property float y property float z property uchar red property uchar green property uchar blue property uchar alpha end_header """ for vertex in vertices.values { // Convert position and color let x = vertex.position.x let y = vertex.position.y let z = vertex.position.z let r = UInt8(vertex.color.x * 255) let g = UInt8(vertex.color.y * 255) let b = UInt8(vertex.color.z * 255) let a = UInt8(vertex.color.w * 255) // Append the vertex data plyContent += "\n\(x) \(y) \(z) \(r) \(g) \(b) \(a)" } guard let data = plyContent.data(using: .ascii) else { throw Error.cannotExport } return data } static var transferRepresentation: some TransferRepresentation { DataRepresentation(exportedContentType: .data) { try await $0.export() }.suggestedFileName("exported.ply") } } Completing the UI Completing the UI Now that we’ve built the core functionality of our app, we can complete our UI. The UI will include a button to start and stop the capture of the point cloud, as well as an option to export and share the generated point cloud file. start stop export In our main view, we’ll create a ZStack that overlays the AR view with buttons for controlling the capture process and sharing the result. ZStack @main struct PointCloudExampleApp: App { @StateObject var arManager = ARManager() var body: some Scene { WindowGroup { ZStack(alignment: .bottom) { UIViewWrapper(view: arManager.sceneView).ignoresSafeArea() HStack(spacing: 30) { Button { arManager.isCapturing.toggle() } label: { Image(systemName: arManager.isCapturing ? "stop.circle.fill" : "play.circle.fill") } ShareLink(item: PLYFile(pointCloud: arManager.pointCloud), preview: SharePreview("exported.ply")) { Image(systemName: "square.and.arrow.up.circle.fill") } }.foregroundStyle(.black, .white) .font(.system(size: 50)) .padding(25) } } } } @main struct PointCloudExampleApp: App { @StateObject var arManager = ARManager() var body: some Scene { WindowGroup { ZStack(alignment: .bottom) { UIViewWrapper(view: arManager.sceneView).ignoresSafeArea() HStack(spacing: 30) { Button { arManager.isCapturing.toggle() } label: { Image(systemName: arManager.isCapturing ? "stop.circle.fill" : "play.circle.fill") } ShareLink(item: PLYFile(pointCloud: arManager.pointCloud), preview: SharePreview("exported.ply")) { Image(systemName: "square.and.arrow.up.circle.fill") } }.foregroundStyle(.black, .white) .font(.system(size: 50)) .padding(25) } } } } Final Thoughts Final Thoughts In this two-part article, we’ve built a basic AR application capable of generating and presenting 3D point clouds using ARKit and LiDAR in Swift . We discovered how to extract LiDAR data, convert it to points in 3D space, and unite it into a single point cloud, along with the ability to export and share it as a . PLY file. point clouds ARKit LiDAR Swift PLY This app is just the beginning. You could enhance it further by adding features like more advanced filtering, allowing the user to adjust point cloud density, or improving the cloud quality by replacing the points in the grid dictionary depending on distance or other factors. The compact and cost-effective nature of the iPhone's LiDAR makes advanced depth sensing accessible to developers and opens up a world of possibilities for innovative applications. Happy coding!