If you're comfortable with working with traditional development platforms such as Heroku, but now you need to integrate into Salesforce data, there are many unexpected differences that you might discover.
Salesforce is not a traditional software platform as many developers might be used to working with; it's a managed cloud platform. This means:
You don't have to worry about managing your compute resources, such as the number of CPU cores, or deciding whether to pay for more RAM or hard disk storage.The majority of the routine coding tasks are done for you. (It's a low-code platform.)There are strict rules and constraints you must follow when performing database transactions, running server-side compute instructions, and managing asynchronous work.
In this article, I'll show how to get started with building a Python-based integration with the Salesforce platform, and how to navigate its differences and intricacies. We'll create a simple Python app, deploy it to Heroku, explain how to integrate that app with some records in Salesforce using SOQL, and how to insert or update new records.
Here are the necessary prerequisites you will need:
Python programming language basic syntax and normal CPython runtime. The runtime and language version used at the time of this writing is Python 3.8.6.Working with version control using git.Deploying to Heroku using git and the Heroku CLI.A Salesforce developer org. If you do not have a Salesforce org to work with yet, you can sign up for free here.As a bonus: some experience having written Structured Query Language (SQL), although we will move through a simple example that does not strictly require prior experience.
In addition, all screenshots and examples will show instructions using Visual Studio Code. The caveat is that we are still using mostly command-line terminal commands. You may use any command-line tool of your choice (as long as Python commands work with your local environment pointing to the correct PATH, or other operating system equivalent).
As one might guess, it is generally best not to worry about writing your own integrations with the Salesforce APIs; rather, we will turn to packages available via pip, the Python Package Index. Among the most popular packages for Salesforce at the time of this article are simple_salesforce and aiosfstream, which are for the REST/Bulk APIs and Streaming API for Salesforce respectively.
The Salesforce REST API is probably the first choice for most integrations because of its simplicity and its versatility. Typically, record updates, inserts, or transactions in general occur in small sizes of a single record at a time. In these cases, the REST API is ideal: Simply send a data payload an endpoint (or a web uniform resource identifier [URI]) and receive a response, all over HTTP.
If you haven't written a REST API integration before, then this section is for you. In addition, I recommend reviewing other references on REST for a deeper understanding here, which is a very surface level introduction. (If you're comfortable with REST, feel free to skip this section.)
REST is short for "representational state transfer". It is a large, complex phrase that simply means sending and receiving messages over HTTP. When you visit a web page in your browser, your browser is performing a GET HTTP request, and it receives back a response.
There are different HTTP status codes, with 200 meaning a success, or 404 meaning not found. In addition to GET requests, there are:
When it comes to these operations, HTTP interacts with "resources". For this reason, a uniform resource locator (URL), is also referred to as uniform resource identifier (URI). A URL is a specific kind of URI, or a subset of URIs.
The overall message here is: Do not let all of this lingo about resources or "data payloads" distract you from the main idea—all that is really done in REST integrations is sending messages and receiving responses. Beyond that, we have technical jargon everywhere but that is the underlying concept.
Having said that, the examples in this piece will primarily focus on the versatility of REST integrations.
Let me briefly point out some key concepts for when it may be undesirable to use REST when integrating with Salesforce: bulk operations and near real-time data streams.
For high-volume data processing, usually between thousands to millions of records, the Bulk API makes more sense. The truth is, REST, Bulk, and Streaming APIs all use HTTP principles but they are optimized for different purposes on Salesforce. In the case of the Bulk API, it follows a different set of governor limits, runs batch sizes of up to 2,000 records for database transactions, and has a significantly different response pattern to that of the REST API. With REST, you can receive responses back based on individual records. With Bulk, you can receive responses after the completion of bulk loading jobs.
Finally, the Streaming API is important to a wave in increased popularity for event-driven systems, by which actions taken in integrations are not done from requests and responses but instead come from change event broadcasts to listening clients. Events are broadcast on individual channels, usually considered to be on an "event bus", and then received (as opposed to requested) by listeners.
Streaming events are important because they overcome many issues of latency or delay if implemented well. A good example is the notion of change data capture: If a field updates on a given record in a database, it is ideal to know immediately when that happens, having the database tell clients instead of clients requesting that information from the database.
Such a strategy avoids checking the last modified date stamps, collecting old and new values somewhere, or other similar issues that then become tied to periodic requests and responses. Therefore, if you have resources for processing a data stream, then it is ideal over a bulk job because it delivers information in near real-time rather than delivering information on some scheduled daily process run (in the case of bulk).
All of this said, simple_salesforce will be the way to start using REST or bulk. It provides wrappers in Python code for forming a secure, authenticated session with Salesforce, performing database operations, and choosing whether to use REST endpoints for individual records or utilizing bulk load jobs.
If it is difficult to understand the difference between APIs, feel free to return to this section. This simply provides a background that should prove useful for designing solutions over the long term. Knowledge of Salesforce APIs will prove relevant in the future because not every solution uses REST or bulk.
Although there are many articles that suggest starting with some kind of a Python web framework like Django or Flask, the suggestion here is to start simple. Begin with a "hello world" style program, which is capable of executing on the Heroku container, and then build upward. The purpose of this approach, especially in the case of approaching this problem, is to reduce bloat or fluff at the start, eliminating unnecessary dependencies and keeping the scope of the example manageable.
To start, create a new directory just called
_sample_app
. This folder will contain all the relevant program files and can be a source one refers back to when they build for another use case. Using a terminal, navigate to the newly-created directory. If following these instructions completely through the command line, it should look like this:mkdir _sample_app
cd _sample_app
Inside the folder, initialize a new git repository using git init.
Inside Heroku, create a new app with a name of your choice. Then, with the Heroku CLI, make sure to log in using heroku login.
On the terminal window that is sitting ready on the
_sample_app
directory, set the Heroku remote repository using the Heroku CLI command: heroku git:remote -a [app name].These actions accomplish the main setup for deploying to the Heroku app using git, of course, and then the following files should be created:
test_start.py
print("This is working!")
requirements.txt - will fill in shortly, just create an empty text file for later.
runtime.txt
python-3.8.6
Procfile
web: python test_start.py
.gitignore - In this case, mine was auto-generated from using Bitbucket but copying these extensions is usually good for any Python application repository.
# These are some examples of commonly ignored file patterns.
# You should customize this list as applicable to your project.
# Learn more about .gitignore:
# https://www.atlassian.com/git/tutorials/saving-changes/gitignore
# Node artifact files
node_modules/
dist/
# Compiled Java class files
*.class
# Compiled Python bytecode
*.py[cod]
# Log files
*.log
# Package files
*.jar
# Maven
target/
dist/
# JetBrains IDE
.idea/
# Unit test reports
TEST*.xml
# Generated by MacOS
.DS_Store
# Generated by Windows
Thumbs.db
If you want to keep this in your own version control system outside Heroku for later, then you can deploy to a separate repository.
Otherwise, deploy!
git push heroku main
Then, scale the web dyno specified in the Procfile, which simply commands python to run test_start.py.
heroku run web=1
After a successful deployment and scaling to a single dyno, use
heroku logs
. There should be a display of the message in the print statement from the test_start.py file: "This is working!"At this point, this is now a bare-bones Heroku application that does little other than print a statement one time upon deployment.
Now, we're ready for wiring up to a Salesforce org. A Salesforce org is an instance of the Salesforce platform that encompasses all the data and logic of any company, group, or organization. The Salesforce platform itself is its own product and serves as the main hub by which most other Salesforce products are added on, or customized. Most of the business work done in Salesforce is done within a production org, while most development work is done in a sandbox or scratch org. For this example, you'll use a Developer Edition org, which is treated as a production org. If you're curious about the intricacies of Salesforce development environments versus their 'production' counterparts, see their documentation.
Make sure a Salesforce org is ready. Go through the signup process at https://developer.salesforce.com/signup. Fill out the form with the relevant information. After some time, you will receive an email from Salesforce with login details, notifying that the org is created as requested. Then, using the username and password credentials for the org, sign in at https://login.salesforce.com.
The username and password for logging in manually is the same information used to connect to the org in Python. Every connection to Salesforce happens under a Salesforce user context. As a connection is established with the API, the user context defines a user's access to specific objects, fields, and other features. This controls which data is available for reading or editing under the API connection.
In addition to the username and password, the security token is necessary. Find the security token by going to the user settings within Salesforce. This is located from the type right icon, which produces a dropdown. Then, go to 'Settings', and use 'Reset My Security Token'.
Salesforce will send an email with the security token. Keep this in a secure place as you will need it to authenticate using Python later.
Update requirements.txt to install simple_salesforce as a part of this app's dependencies. Note that this example uses version 1.10.1 of simple_salesforce. Find more information about simple_salesforce and the source code here: https://github.com/simple-salesforce/simple-salesforce.
requirements.txt
simple_salesforce==1.10.1
If you've worked with Python a lot, you know the drill:
python -m pip install -r requirements.txt
Integrating Python with Salesforce: Considering the Connection Object
Now the fun part—let's integrate Python with our Salesforce data.
Update test_start.py to form a connection object for Salesforce, which is very similar to the example from the simple_salesforce documentation. (You can now remove the print statement as it mainly served as a placeholder.)
test_start.py
from simple_salesforce import Salesforce
# A general connection object variable can be initialized,
# which will then be populated with the Salesforce connection
# object as it is returned from simple_salesforce.
production_connection = None
# The content of the string variables below should be replaced
# with your own connection credentials. The API version, 50.0,
# happens to be the latest Salesforce API version from this writing,
# but there might be changes that happen later from future Salesforce API
# versions.
#
# The domain, specifies the login domain known from:
# https://login.salesforce.com
#
# Using 'test' for the login domain would point to:
# https://test.salesforce.com
# That domain is used for sandbox environments. A free
# developer org in this example, though, is considered a
# 'production' Salesforce org and therefore would go to
# the 'login' domain, not 'test' or something else.
#
# While this example has direct statements of these string
# variables, it is ideal in a production grade solution to
# reference these variables via either an environment
# configuration or an encrypted vault. The direct statement
# of the username and password here is done for simplicity.
if production_connection is None:
production_connection = Salesforce(username="<username>",
password="<password>",
security_token="<security_token>",
version=50.0, # the API version of Salesforce, for its metadata
domain='login') # login means: a production org
To retrieve Salesforce records, you will need to use the Salesforce Object Query Language (SOQL). SOQL looks deceptively a lot like SQL. Beware: SOQL is not SQL! There is a distinctive lack of a number of features, like joins, although it is possible to run aggregate queries. This example uses similar syntax to what SQL would be for simplicity's sake but SOQL is its own body of knowledge you can learn more about after moving through this article. Trailhead, the official Salesforce learning and tutorial platform, contains modules on SOQL.
In general, a safe bet is to do direct SELECT style queries. Here is an example SOQL query that retrieves a number of standard fields from the Salesforce Account object:
SELECT Id, Name, CreatedDate
FROM Account
Again, this looks very similar to SQL but it is technically running under an engine that is not a traditional SQL engine at all. SOQL, then, is designed to work in the shared, multi-tenant environment of Salesforce, where multiple org instances may exist on a single server. Therefore, it has very tight constraints around the number of rows returned at any time, the amount of time any given query is allowed to run, and other limitations.
Add some code to the test_start.py program that inserts an Account record.
# Records can be defined through Python dictionary objects,
# with field values specified as key-value pairs. The field name
# points to its value. In this case, the Name field on the Account
# can be any string value.
# The CreatedDate and Id are returned from the Salesforce database
# upon record insert, so do not assign those an values.
account = {"Name":"Test Account 1"}
production_connection.Account.create(account)
print("account_insert_result: {}".format(account_insert_result))
When the program runs to insert the Account, the Account can be retrieved with its automatically assigned database ID and created date by running the SOQL query from above.
# With the record insert done, the record can be queried Using
# SOQL, using this query as an example.
account_soql = "SELECT Id, Name, CreatedDate FROM Account"
account_query_results = production_connection.query(account_soql).get('records')
print("account_query_results: {}".format(len(account_query_results)))
The
.get('records')
is necessary because the payload returned from the query contains more than just the records; it is a data structure that includes some meta information regarding the query result. The results return in an OrderedDict data structure (https://docs.python.org/3.8/library/collections.html#collections.OrderedDict).After putting all these code blocks together then executing test_start.py, the print statements included in these blocks will print out the length of the insert and query results. If following along on your own machine, make sure to compare with the full example code.
From here, all the building blocks needed are essentially in the Python syntax. For the different kinds of database operations, refer to the simple_salesforce documentation. The same basic methodology can be used with the REST API for updates, deletes, and upserts (which use a key to update where a record exists or insert when a record key does not exist).
Remember to always consider Salesforce API limits, also available from their governor limits documentation. In addition, a number of database transaction related limits exist.
Fortunately, Salesforce has summarized their limits in documentation here:
https://developer.salesforce.com/docs/atlas.en-us.apexcode.meta/apexcode/apex_gov_limits.htm#!
For the full example code, take a look at the git repository here:
https://github.com/elegacorp/your-first-python-to-salesforce-app
This example does not consider a number of other important facets to a scalable application design. The security aspect of storing the credentials within the source code is not a good idea but it is used here for simplicity's sake. In all likelihood, this program needs to be able to run on a repeated cycle to handle events, criteria, polling for different conditions, or a similar situation. The style of this code is also procedural and not at all object-oriented. Still, the choice remains for how to approach the missing pieces of this solution for real-world situations. There are various ways to restructure the program for real-world use cases with these building blocks.
Thanks to Python's robust community ecosystem of packages and general support, Salesforce and Python prove easy enough to work with since Python is a viable integration solution.