ARKit 101: How to Build Augmented Reality (AR) based resume using Face Recognition by@gsanjeev7

ARKit 101: How to Build Augmented Reality (AR) based resume using Face Recognition

June 23rd 2022 5,626 reads
Read on Terminal Reader
react to story with heart
react to story with light
react to story with boat
react to story with money
Sanjeev Ghimire HackerNoon profile picture

Sanjeev Ghimire

Apple announced that ARKit will be available on iOS 11 at their company event, WWDC 2017 on June and with the release of iOS 11 on September 19, 2017, ARKit was part of it. Users could download the Xcode version 9.0.1 which includes the iOS 11 and can start creating an Augmented Reality based project.

What is ARKit?

ARKit simplifies the task of making AR experience by combining device motion tracking, scene processing, camera scene capture and display conveniences. Augmented reality (AR) add 2D or 3D objects to the camera view or live view so that those objects seems like its part of the real world. You can use the ARKit features to produce AR experiences in your app or game. AR game has been really popular amongst the crowd, such as Pokemon go, Zombies run, Ingress etc.
ARKit uses the world and camera coordinates that follow a right-handed convention which means x-axis towards the right, y-axis upwards and z-axis points towards the viewer.In order to track the world coordinate, the ARKit uses a technique called visual-inertial odometry which is the combination of the information merged from iOS device’s motion-sensing hardware with vision analysis of the scene visible from phone’s camera. The world tracking also analyzes and understands the content of the scene. Using the hit testing method it can identify planes horizontal or vertical in the camera image and tracks its position and size.
World tracking will not always give you exact metrics because it relies on the device’s physical environment which is not always consistent or difficult to measure. There will always be a certain degree of error when mapping the real world in the camera view for AR experiences. To build high-quality AR experiences we need to take the following into consideration:
  • Design AR experiences for predictable lighting conditions: World tracking requires clear image analysis and to increase the quality of AR experience we need to design the app for better lighting where details can be analyzed.
  • Use tracking quality information to provide user feedback: ARKit can provide better feedback when device motion is combined with clear images. Using this the user can be instructed on how to resolve low-quality tracking situations.
  • Allow time for plane detection to produce clear results, and disable plane detection when you have the results you need. ARKit refines its position and extent based on the plane detection over time. The first time the plane has detected the position and its extent might not be accurate but ARKit learns over time when the plane remains in the scene over time.

ARKit Terminologies

  • SceneKit View: It is a component in the object library of interface builder mostly for 3D graphics rendering similar to GLKit view and allows you to set up many defining attributes of a scene quickly
  • ARSCNView: The ARSCNView is a class is a SceneKit view that includes an ARSession object that manages the motion tracking and image processing required to create an augmented reality (AR) experience.
  • ARSession: ARSession object manages the motion tracking and image processing
  • ARWorldTrackingConfiguration: The ARWorldTrackingConfiguration class provides high-precision motion tracking and enables features to help you place virtual content in relation to real-world surfaces.
  • SCNNode: A structural element of a scene graph, representing a position and transform in a 3D coordinate space, to which you can attach geometry, lights, cameras, or other displayable content.
  • SCNPlane: A rectangular, one-sided plane geometry of specified width and height.


In this blog, I will be explaining on how to quickly start on creating Augmented Reality (AR) app and build an AR experience using facial recognition. The AR app recognizes your face and displays your 3D mock version of you and your professional information in the camera view. The components used in the app are:
  • iOS Vision module
  • IBM Watson Visual Recognition API
  • iOS 11 ARKit
  • IBM Cloudant database to store information.
  • Xcode version greater than 9.0.1


How To

Step 1: Creating a project in Xcode

Create an Augmented Reality App from Xcode as shown in the diagram below.

Step 2: Configure and Run AR Session

Once the project is setup, we need to configure and run the AR Session. There is a ARSCNView already setup which includes ARSession object. The ARSession object manages motion tracking and image processing. And to run this session we need to add ARWorldTrackingConfiguration to the session. The following code sets up the session with the configuration and run:
@IBOutlet var sceneView: ARSCNView!
override func viewWillAppear(_ animated: Bool) {
// Create a session configuration
let configuration = ARWorldTrackingConfiguration()
configuration.planeDetection = .horizontal
// Run the view’s session
The above code adds plane detection configuration to horizontal and runs the session.
Important: If your app requires ARKit for its core functionality, use the arkit key in the UIRequiredDeviceCapabilities section of your app’s Info.plist file to make your app available only on devices that support ARKit. If AR is a secondary feature of your app, use the isSupported property to determine whether to offer AR-based features.

Step 3: Add 3D content to the detected plane

Once the ARSession is setup you can use SceneKit to place virtual content in the view. The project has a sample file called ship.scn which you can place in the view in the assets directory. The following code will add 3D object into the SCNView:

_// Create a new scene_let scene =SCNScene(named:”art.scnassets/ship.scn”)!
Setthe scene to theview
sceneView.scene= scene
The output of the following code will give you a 3D ship object into the real world view.

Step 4: Use Vision API for face detection

Once you have tested that the 3D is working in the camera view, lets set up for face detection using the vision API. The vision API will detect the face, crop the face and send the file to IBM Visual Recognition API to classify the face.
// MARK: — Face detections
private func faceObservation() -> Observable<[(observation: VNFaceObservation, image: CIImage, frame: ARFrame)]> {
return Observable<[(observation: VNFaceObservation, image: CIImage, frame: ARFrame)]>.create{ observer in
guard let frame = self.sceneView.session.currentFrame else {
print(“No frame available”)
return Disposables.create()
// Create and rotate image
let image = CIImage.init(cvPixelBuffer: frame.capturedImage).rotate
let facesRequest = VNDetectFaceRectanglesRequest { request, error in
guard error == nil else {
print(“Face request error: \(error!.localizedDescription)”)
guard let observations = request.results as? [VNFaceObservation] else {
print(“No face observations”)
// Map response
let response ={ (face) -> (observation: VNFaceObservation, image: CIImage, frame: ARFrame) in
return (observation: face, image: image, frame: frame)
try? VNImageRequestHandler(ciImage: image).perform([facesRequest])
return Disposables.create()

Step 5: Use IBM Visual Recognition API to classify the face

Using IBM Visual Recognition API you can upload the cropped face from above and the API will classify and send you a JSON response. To use IBM Watson Visual Recognition API you can register to IBM Bluemix console and create a visual recognition service. Then you should be able to create credentials, which you can use while calling the API. You can use the Watson SDK in your app to use the VisualRecognitionV3 API. To do that follow instruction on here.
private func faceClassification(face: VNFaceObservation, image: CIImage, frame: ARFrame) -> Observable<(classes: [ClassifiedImage], position: SCNVector3, frame: ARFrame)> {
return Observable<(classes: [ClassifiedImage], position: SCNVector3, frame: ARFrame)>.create{ observer in
// Determine position of the face
let boundingBox = self.transformBoundingBox(face.boundingBox)
guard let worldCoord = self.normalizeWorldCoord(boundingBox) else {
print(“No feature point found”)
return Disposables.create()
// Create Classification request
let fileName = self.randomString(length: 20) + “.png”
let pixel = image.cropImage(toFace: face)
//convert the cropped image to UI image
let imagePath = FileManager.default.temporaryDirectory.appendingPathComponent(fileName)
let uiImage: UIImage = self.convert(cmage: pixel)
if let data = UIImagePNGRepresentation(uiImage) {
try? data.write(to: imagePath)
let visualRecognition = VisualRecognition.init(apiKey: Credentials.VR_API_KEY, version: Credentials.VERSION)
let failure = { (error: Error) in print(error) }
let owners = [“me”]
visualRecognition.classify(imageFile: imagePath, owners: owners, threshold: 0, failure: failure){ classifiedImages in
observer.onNext((classes: classifiedImages.images, position: worldCoord, frame: frame))
return Disposables.create()

Step 6: Update the node to place 3D content and Text

Once the face is classified by the visual recognition API, the response of the API is a JSON. The response of the visual recognition has a classification id which is then used to get more information about the classification from the IBM cloudant database. The data is retrieved using the classification id and the JSON response looks like below:
“_id”: “c2554847ec99e05ffa8122994f1f1cb4”,
“_rev”: “3-d69a8b26c103a048b5e366c4a6dbeed7”,
“classificationId”: “SanjeevGhimire_334732802”,
“fullname”: “Sanjeev Ghimire”,
“linkedin”: “",
“twitter”: “",
“facebook”: “",
“phone”: “1–859–684–7931”,
“location”: “Austin, TX”
Then we can update the SCNNode with these details as a child node. SCNNode is A structural element of a scene graph, representing a position and transform in a 3D coordinate space, to which you can attach geometry, lights, cameras, or other displayable content . For each child node, we need to define its font, alignment, and its material. Material includes properties for 3D contents like diffuse content color, specular contents color, double-sided etc. For example, to display the full name from the above JSON that is available in an array can be added to the SCNNode as:
let fullName = profile[“fullname”].stringValue
let fullNameBubble = SCNText(string: fullName, extrusionDepth: CGFloat(bubbleDepth))
fullNameBubble.font = UIFont(name: “Times New Roman”, size: 0.10)?.withTraits(traits: .traitBold)
fullNameBubble.alignmentMode = kCAAlignmentCenter
fullNameBubble.firstMaterial?.diffuse.contents =
fullNameBubble.firstMaterial?.specular.contents = UIColor.white
fullNameBubble.firstMaterial?.isDoubleSided = true
fullNameBubble.chamferRadius = CGFloat(bubbleDepth)
// fullname BUBBLE NODE
let (minBound, maxBound) = fullNameBubble.boundingBox
let fullNameNode = SCNNode(geometry: fullNameBubble)
// Centre Node — to Centre-Bottom point
fullNameNode.pivot = SCNMatrix4MakeTranslation( (maxBound.x — minBound.x)/2, minBound.y, bubbleDepth/2)
// Reduce default text size
fullNameNode.scale = SCNVector3Make(0.1, 0.1, 0.1)
fullNameNode.simdPosition = simd_float3.init(x: 0.1, y: 0.06, z: 0)
And to update the SCNNode:
private func updateNode(classes: [ClassifiedImage], position: SCNVector3, frame: ARFrame) {
guard let person = classes.first else {
print(“No classification found”)
let classifier = person.classifiers.first
let name = classifier?.name
let classifierId = classifier?.classifierID
// Filter for existent face
let results = self.faces.filter{ $ == name && $0.timestamp != frame.timestamp }
.sorted{ $0.node.position.distance(toVector: position) < $1.node.position.distance(toVector: position) }
// Create new face
guard let existentFace = results.first else {
CloudantRESTCall().getResumeInfo(classificationId: classifierId!) { (resultJSON) in
let node = SCNNode.init(withJSON: resultJSON[“docs”][0], position: position)
DispatchQueue.main.async {
let face = Face.init(name: name!, node: node, timestamp: frame.timestamp)
// Update existent face
DispatchQueue.main.async {
// Filter for face that’s already displayed
if let displayFace = results.filter({ !$0.hidden }).first {
let distance = displayFace.node.position.distance(toVector: position)
if(distance >= 0.03 ) {
displayFace.timestamp = frame.timestamp
} else {
existentFace.node.position = position
existentFace.timestamp = frame.timestamp


You can find the GitHub link here.


The output of this displays a mock 3D of the face and the professional details about the person.


With the release of ARKit on iOS 11, there is endless opportunity to build solutions that map the virtual data to the real world scenario. Personally, I think Augmented Reality is an emerging technology in the market and developers from various industry are experimenting it on different applications such as games, construction, aviation etc. Augmented Reality will get matured over time and I see that this will be another thing in the tech-industry in foreseeable future.



Reference Links

react to story with heart
react to story with light
react to story with boat
react to story with money
. . . comments & more!