Rajeev Dixit

@dixiterk

Using object detection in home pictures for improving price estimation

Estimating the price of a home is both science and art. Home characteristics, such as square footage, location or the number of bathrooms, are given different weights according to their influence on home sale prices in each specific geography over a specific period of time, resulting in a set of valuation rules, or models that are applied to generate each home’s Zestimate. Specifically, some of the structured data that is used in the estimation include Physical attributes: Location, lot size, square footage, number of bedrooms and bathrooms and many other details. Tax assessments: Property tax information, actual property taxes paid, exceptions to tax assessments and other information provided in the tax assessors’ records. Prior and current transactions: Actual sale prices over time of the home itself and comparable recent sales of nearby homes.

Besides the structured data, there are a lot of other factors that constitute a so-called curb-appeal or the buyer’s impression of the house. It is important to consider those factors in the estimation so as to match the human emotions and social norms. When agents assess a property, the first thing they typically do is study the home from an overhead, satellite view on Google. They note whether it backs up to a busy street, the proximity to commercial property or freeways, the size of other homes nearby, the vegetation and landscaping, its orientation to the sun and, if available, will view any photos of the exterior plus a street scene. A home with granite countertops and stainless steel appliances compared to a similar home with a Formica kitchen definitely commands a higher value. Similarly, a home that has been renovated should be estimated more than its next-door similar neighbor which is a decade or two out of date. The companies like Zillow have been using such data extracted from home photos and satellite images of the neighborhood for some time. Armed with a large amount of data collected over a period of time, Zillow claims that its Zestimate is 15 percent more accurate when it incorporates machine learning to simulate human judgment. Till recently, companies like Zillow had an advantage of a pool of data collected over many years from homes all over the US, based on which they have fine-tuned their machine learning models. With the availability of computer vision APIs for object detection from Microsoft, Google, and Amazon, and Foxy it is now possible for any company providing real-estate services, while starting from scratch, to make use of unstructured data in their estimation. We tried Amazon, Google and Microsoft’s Object Detection APIs on a suite of home photos. We wanted to find out the accuracy of these APIs with built-in generic object detection models and whether they can be used to augment home estimations. Please see the end of the article on details of objects detected using each of these APIs with the built-in model on a variety of home photos.

Observations

  • Amazon provides the most accurate results followed by Google and Microsoft.
  • Microsoft detects only a single object and has poor accuracy as compared to the other two.
  • Amazon returns multiple labels related to specific objects as well as its ancestors.
  • For some photos, Google returned specific labels such as “hacienda” for “Spanish Estate” showing it might have a wider variety of photos from different countries.
  • Foxy AI takes an approach to classify objects in known categories related to real-estate like Closet, Laundry etc. It is not clear if their model has any higher accuracy than the generic object models of the other three companies.
  • None of the results show adjectives like “Wooden floor” or “Granite countertop” etc. with default models which may limit their use in estimation.

Improving detection

  • Images may not always be clear and sharp. It will be interesting to see how accurately objects in cluttered, out-of-focus or low-light images are detected.
  • Amazon APIs can return the bounding box for common object labels such as people, cars, furniture, apparel or pets. Bounding boxes can be used to find the exact locations of objects in an image, count instances of detected objects, or to measure an object’s size using bounding box dimensions. It would be interesting to see if we can detect additional characteristics of an object by feeding in zoomed in part of the image of the individual object. For real-estate photos, the useful adjectives to detect are — bright and spacious, marble, granite countertop, teak furniture, new refrigerator, wooden floor, Flaming birch cabinets, antique furniture, leather sofa, high ceiling, long, steep driveway, manicured garden, walk-in closet, long hallway, clean carpet.
  • Amazon API allows the use of parent labels to build groups of related labels and then querying of similar labels in one or more images. This can be used to compare bedroom photos of multiple homes and query objects within that room to compare.
  • Microsoft provides the idea of scoped analysis where you can specify the model’s name to get information relevant to that model. Currently, only Celebrities and Landmarks models are supported. But all the companies may eventually add real-estate as one of the categories or domains which may significantly improve the detection accuracy.
  • Home styles, interiors change based on location and culture. Considering such aspects while detecting objects can improve detection accuracy. A targeted machine learning model trained only on the photos of interior and exteriors of houses vs. generic model for any object detection can work well. There are companies like Foxy AI who take this approach. Foxy provides API endpoints specifically to detect objects in home pictures and also provide price predictions based on the input parameters.
  • Google’s AutoML Vision makes it possible for developers with limited machine learning expertise to train high-quality custom models. After uploading and labeling images, AutoML Vision trains a model that can scale as needed to adapt to demands. Such custom, production-ready models are better suited for our use case.
  • Making a generic AI product that is genuinely useful to the custom scenario is very hard. A custom model development based on data collected from real users is always preferable to the exercise of fitting a square peg in a round hole.
  • Teaching a computer to appreciate curb appeal and the buyer’s feel of the home is truly artificial intelligence. Using generic object detection APIs with built-in models can be a good start for companies who do not have any data. As they gather data over time, a custom model built using local home photos can work well to extract additional home features. These along with structured variables like square footage can help hone in on more accurate estimations. In any case, we are far from eliminating expert real-estate agent’s visit for a true valuation of real-estate.

Object Detection Results on home photos

Reference: https://i.imgur.com/7Jk4R8h.jpg

Microsoft

"categories":[
{"name":"outdoor_pool","score":0.95703125}
],
"requestId":"7462e05d-5cc3–4273-a6d6-e78edc916d34",
"metadata":{"width":2048,"height":1365,"format":"Jpeg"}

Google

estate : 0.9384693503379822 
property : 0.9383976459503174
mansion : 0.9083089828491211
villa : 0.8863233923912048
real estate : 0.8205443620681763
home : 0.8012939095497131
hacienda : 0.7106115221977234
building : 0.7076689004898071
sky : 0.6754202842712402
swimming pool : 0.6674705147743225

Amazon

Building : 97.64356231689453 
Mansion : 93.88848876953125
House : 93.88848876953125
Housing : 93.88848876953125
Path : 93.61183166503906
Walkway : 93.61183166503906
Resort : 93.21076202392578
Hotel : 93.21076202392578
Flagstone : 91.89410400390625
Villa : 86.66985321044922
Water : 69.26892852783203
Pool : 69.26892852783203
Architecture : 61.82177734375
Sidewalk : 57.70322799682617
Pavement : 57.70322799682617

Analysis

  • Microsoft correctly identified the pool with a qualifier of “outdoor”. But it did not return any other objects.
  • Google identified specific architecture of building by identifying it as “Hacienda” meaning “Spanish Estate”. It also explicitly identified “sky” which was not detected by the other two.
  • Amazon detected details like “Flagstone”. Amazon seems to have a pretty high confidence score also for each object starting with 97%.
Reference: https://i.imgur.com/Eic3Xam.jpg

Microsoft

"categories":[
{"name":"abstract_","score":0.01171875},{"name":"others_","score":0.01171875}
],
"requestId":"a7add036–54d5–4421-ab4c-889535161445",
"metadata":{"width":1905,"height":2000,"format":"Jpeg"}

Google

interior design : 0.7810790538787842 
home : 0.6999193429946899
window : 0.6992332339286804
outdoor structure : 0.5407284498214722
furniture : 0.5291412472724915

Amazon

Flooring : 99.99893951416016 
Floor : 99.8294448852539
Indoors : 94.28851318359375
Interior Design : 94.28851318359375
Plant : 89.50586700439453
Living Room : 85.67030334472656
Room : 85.67030334472656
Furniture : 83.99723052978516
Couch : 83.99723052978516
Hardwood : 82.8748550415039
Wood : 82.8748550415039
Jar : 78.75053405761719
Pottery : 78.75053405761719
Vase : 78.75053405761719
Blossom : 77.35748291015625
Flower Arrangement : 77.35748291015625
Flower : 77.35748291015625
Potted Plant : 76.62191009521484
Flower Bouquet : 72.43888854980469
Flagstone : 58.75558090209961
Door : 58.61599349975586
Corridor : 55.91652297973633

Analysis

  • Microsoft could not detect any objects with default settings. It also has generic terms like “abstract” (maybe it detected the picture as a painting) with a very low confidence score.
  • Google detected some generic terms like “Window” and “Furniture”.
  • Amazon detected a lot of details included “Potted plant”, “Flower Bouquet” etc.
Reference: https://i.imgur.com/6aGAOOp.jpg

Microsoft

"categories":[
{"name":"abstract_","score":0.01953125},{"name":"others_","score":0.00390625},{"name":"outdoor_","score":0.00390625}
],
"requestId":"0f4b81a5-b38f-47b0–93fa-61d93186a5bb",
"metadata":{"width":2048,"height":912,"format":"Jpeg"}

Google

property : 0.8942150473594666 
countertop : 0.8445141911506653
kitchen : 0.8333971500396729
estate : 0.7475240230560303
interior design : 0.7038251161575317
real estate : 0.7007927894592285
cuisine classique : 0.6698155999183655
ceiling : 0.5234758257865906

Amazon

Indoors : 99.1664047241211 
Room : 99.1664047241211
Kitchen : 94.69178009033203
Interior Design : 93.49835205078125
Flooring : 89.98771667480469
Wood : 85.61946868896484
Lamp : 83.11481475830078
Chandelier : 83.11481475830078
Kitchen Island : 81.20708465576172
Hardwood : 77.21861267089844
Furniture : 59.81884002685547
Floor : 58.4579963684082

Analysis

  • Microsoft could not detect any objects with default settings.
  • Google detected that it is a kitchen but did not identify much of details within the kitchen.
  • Amazon detected a lot of details included “Chandelier”, “Hardwood” etc.
Reference: https://i.imgur.com/2CQ2EyM.jpg

Microsoft

"categories":[
{"name":"abstract_","score":0.02734375},{"name":"building_pillar","score":0.3515625}
], "requestId":"5e1faa3e-ac73–4a65-aaaf-84ec3d8547cd",
"metadata":{"width":2048,"height":944,"format":"Jpeg"}

Google

property : 0.8988840579986572 
room : 0.8776835799217224
interior design : 0.7127363085746765
real estate : 0.6541785597801208
living room : 0.6493942141532898
estate : 0.6164873838424683
floor : 0.5910876989364624
ceiling : 0.5784943699836731
flooring : 0.5770671367645264
house : 0.5629818439483643

Amazon

Flooring : 99.9969482421875 
Floor : 99.98954010009766
Wood : 98.96363830566406
Interior Design : 97.80683135986328
Indoors : 97.80683135986328
Hardwood : 92.8141098022461
Room : 82.762939453125
Living Room : 80.249267578125
Plywood : 72.39012145996094
Furniture : 69.56649780273438
Bedroom : 68.93217468261719
Bed : 61.190338134765625
Rug : 60.942543029785156

Analysis

  • Microsoft could not detect any objects with default settings.
  • Google detected the room incorrectly as “Living room”.
  • Amazon detected the room as both “Living room” and “Bedroom”. It also correctly detected “Bed” and “Plywood”.
Reference: https://i.imgur.com/LGKVfzx.jpg

Microsoft

"categories":[
{"name":"indoor_room","score":0.93359375}
],
"requestId":"59e8b484–652a-4d78–90f6-d9fb4274c33f",
"metadata":{"width":2000,"height":2000,"format":"Jpeg"}

Google

room : 0.916010856628418 
bathroom : 0.8526955842971802
interior design : 0.8033964037895203
floor : 0.8008365035057068
wall : 0.7690424919128418
flooring : 0.7259812951087952
home : 0.725638210773468
ceiling : 0.701020359992981
estate : 0.6925469040870667
real estate : 0.657813549041748

Amazon

Flooring : 99.98371887207031 
Floor : 99.88095092773438
Furniture : 98.47281646728516
Couch : 88.17687225341797
Indoors : 87.85083770751953
Interior Design : 87.85083770751953
Living Room : 78.22207641601562
Room : 78.22207641601562
Wood : 71.65221405029297
Tub : 68.77200317382812
Hardwood : 62.696842193603516
Flagstone : 58.15713119506836

Analysis

  • Microsoft could not detect any objects with default settings.
  • Google detected sit correctly as a “Bathroom”.
  • Amazon detected “Tub” and “Couch” but not as a “Bathroom”.
Reference: https://i.imgur.com/6PnuY0Q.jpg

Microsoft

"categories":[
{"name":"outdoor_","score":0.01171875},{"name":"outdoor_street","score":0.55078125}
],
"requestId":"88a759da-2b1c-43f9–9fec-c8cb07cb9fde",
"metadata":{"width":1769,"height":2000,"format":"Jpeg"}

Google

property : 0.9011626243591309 
walkway : 0.7674194574356079
home : 0.7527650594711304
real estate : 0.7386894822120667
arecales : 0.7260463237762451
palm tree : 0.6820899248123169
estate : 0.6767517924308777
outdoor structure : 0.6750420928001404
villa : 0.6429499387741089
house : 0.6065300107002258

Amazon

Flagstone : 99.65138244628906 
Tree : 95.78475952148438
Plant : 95.78475952148438
Patio : 91.34025573730469
Walkway : 90.66641235351562
Path : 90.66641235351562
Arecaceae : 88.72791290283203
Palm Tree : 88.72791290283203
Outdoors : 79.1570816040039 Sidewalk : 78.15443420410156
Pavement : 78.15443420410156
Human : 77.41170501708984
Person : 77.41170501708984
Building : 76.29102325439453
Banister : 74.00752258300781
Handrail : 74.00752258300781
Summer : 70.18952178955078
Hotel : 63.83850860595703
Arbour : 59.08328628540039
Garden : 59.08328628540039
Home Decor : 57.40121078491211
Pot : 56.944217681884766

Analysis

  • Microsoft could detect it only as “Outdoor”.
  • Google detected some generic terms like “Palm Tree” and “Villa”
  • Amazon detected a lot of details included “Palm Tree”, “Patio” etc. It also detected “Human” and “Person” in the picture which is not there.

Thanks to Pablo Thiel at Haverford College for help with extracting data with all the APIs.

Originally published at 47billion.com on February 4, 2019.

Topics of interest

More Related Stories