The ability to quickly develop and deploy interactive applications is invaluable. Streamlit is a powerful tool that enables data scientists and developers to create intuitive web apps with minimal code. Coupled with the Python Data Science Notebook Docker Image, which comes pre-loaded with essential data science libraries, setting up a robust environment for building Streamlit apps has never been easier.
Streamlit is an open-source Python library that simplifies the process of creating interactive web applications for data science and machine learning projects. With Streamlit, you can transform your data scripts into shareable web apps in just a few minutes, all using pure Python. There's no need for front-end development skills or knowledge of web frameworks like Flask or Django.
Key features of Streamlit include:
Streamlit offers several advantages that make it an ideal choice for developing data applications:
By using Streamlit, data scientists can focus on data analysis and model building while providing stakeholders with interactive tools to visualize and understand the results.
The Python Data Science Notebook Docker Image is a Docker container designed to streamline your data science workflow. Built from the minimal python:3.9-slim
base image, it includes a comprehensive suite of pre-installed libraries that cater to various aspects of data science, including data manipulation, machine learning, visualization, and database connectivity.
pandas
, numpy
, polars
, dask
, ibis
, pyiceberg
, datafusion
, sqlframe
scikit-learn
, tensorflow
, torch
, xgboost
, lightgbm
matplotlib
, seaborn
, plotly
psycopg2-binary
, mysqlclient
, sqlalchemy
, duckdb
, pyarrow
boto3
, s3fs
, minio
openpyxl
, requests
, beautifulsoup4
, lxml
, pyspark
, dremio-simple-query
pydata
with the home directory set to /home/pydata
. The working directory is /home/pydata/work
.8888
to allow access to the Jupyter Notebook server.By utilizing this Docker image, you can save time on setup and focus on developing your Streamlit applications, knowing that you have all the necessary tools and libraries at your disposal.
To get started with building Streamlit applications using the Python Data Science Notebook Docker Image, you'll need to set up your environment. This involves installing Docker, pulling the Docker image, running the container, and verifying that Streamlit is installed and functioning correctly.
If Docker is not already installed on your machine, follow these steps:
Download Docker Desktop:
Install Docker:
Verify the Installation:
Open a terminal or command prompt and run:
docker --version
You should see the Docker version information displayed, confirming that Docker is installed.
alexmerced/datanotebook
Docker ImageThe alexmerced/datanotebook
Docker image includes a comprehensive Python environment with pre-installed data science libraries.
In your terminal, execute:
docker pull alexmerced/datanotebook
This command downloads the image from Docker Hub to your local machine.
List all Docker images on your system:
docker images
You should see alexmerced/datanotebook listed among the images.
Now, run a Docker container from the image and access the Jupyter Notebook server.
cd /path/to/your/project
Execute the following command:
docker run -p 8888:8888 -p 8501:8501 -v $(pwd):/home/pydata/work alexmerced/datanotebook
-p 8888:8888
maps the container's port 8888 to your local machine, allowing access to Jupyter Notebook. -p 8501:8501
maps the container's port 8501 to your local machine, allowing access to Streamlit apps.v $(pwd):/home/pydata/work
mounts your current directory into the container, enabling file sharing between your host and the container.Open your web browser and navigate to http://localhost:8888
.
You should see the Jupyter Notebook interface without needing a password or token.
Ensure that Streamlit is installed and functioning properly inside the Docker container.
streamlit --version
If Streamlit is installed, the version number will be displayed.
If not installed, install it using:
pip install streamlit
Create a Test Streamlit App:
In the Jupyter interface, click on New and select Text File.
Save the file as app.py in your working directory.
Add the following code to app.py:
import streamlit as st
st.title("Streamlit Test App")
st.write("Congratulations! Streamlit is working inside the Docker container.")
Save the file.
In the Jupyter terminal, execute:
streamlit run app.py --server.enableCORS false --server.enableXsrfProtection false --server.port 8501 --server.address 0.0.0.0
--server.enableCORS false:
Disables Cross-Origin Resource Sharing protection.--server.enableXsrfProtection false:
Disables Cross-Site Request Forgery protection.--server.port 8501:
Runs the app on port 8501.--server.address 0.0.0.0:
Makes the server accessible externally.Open a new tab in your web browser and navigate to http://localhost:8501.
You should see the Streamlit app displaying the title and message.
To keep the Streamlit app running without occupying the terminal, you can run it in the background using nohup:
nohup streamlit run app.py --server.enableCORS false --server.enableXsrfProtection false --server.port 8501 --server.address 0.0.0.0 &
In the Terminal Running the Container:
Press Ctrl + C
to stop the container.
Alternatively, Use Docker Commands:
docker ps
Stop the container using its Container ID:
docker stop <container_id>
You've successfully set up your environment:
With this setup, you're ready to develop and run Streamlit applications in a consistent and isolated environment, leveraging the powerful tools provided by the Docker image.
With your environment set up, it's time to dive into Streamlit and start building interactive applications. This section will guide you through creating your first Streamlit app, understanding the basic structure of a Streamlit script, and running Streamlit apps from within the Jupyter Notebook provided by the Docker container.
Let's begin by creating a simple Streamlit application that displays text and a chart.
Create a New Python Script:
New
and select Text File
.app.py
in your working directory (/home/pydata/work
).Write the Streamlit Code:
Open app.py
and add the following code:
import streamlit as st
import pandas as pd
import numpy as np
st.title("My First Streamlit App")
st.write("Welcome to my first Streamlit application!")
# Create a random dataframe
df = pd.DataFrame(
np.random.randn(20, 3),
columns=['Column A', 'Column B', 'Column C']
)
st.write("Here is a random dataframe:")
st.dataframe(df)
st.write("Line chart of the data:")
st.line_chart(df)
A Streamlit script is a standard Python script with the streamlit library functions to create interactive elements.
import streamlit as st
Set the Title and Headers:
st.title("App Title")
st.header("This is a header")
st.subheader("This is a subheader")
Write Text:
st.text("This is a simple text.")
st.markdown("This is a text with **markdown** formatting.")
Display Data:
st.dataframe(df) # Displays an interactive table
st.table(df) # Displays a static table
st.line_chart(data)
st.bar_chart(data)
st.area_chart(data)
name = st.text_input("Enter your name:")
st.write(f"Hello, {name}!")
age = st.slider("Select your age:", 0, 100)
st.write(f"You are {age} years old.")
with st.sidebar:
st.write("This is the sidebar.")
col1, col2 = st.columns(2)
col1.write("Content in column 1")
col2.write("Content in column 2")
To run your Streamlit app within the Docker container and access it from your host machine:
Open a Terminal in Jupyter Notebook:
In the Jupyter interface, click on New and select Terminal.
Navigate to the Working Directory:
cd /home/pydata/work
Run the Streamlit App:
Execute the following command:
streamlit run app.py --server.enableCORS false --server.enableXsrfProtection false --server.port 8501 --server.address 0.0.0.0
Explanation of Flags:
--server.enableCORS false:
Disables Cross-Origin Resource Sharing protection.--server.enableXsrfProtection false:
Disables Cross-Site Request Forgery protection.--server.port 8501:
Sets the port to 8501.--server.address 0.0.0.0:
Makes the app accessible externally.http://localhost:8501
.You should see your Streamlit app running. Interact with the App:
app.py
to add more features or interactive elements.Expose the Correct Port:
When running the Docker container, ensure you expose the port used by Streamlit. If you use port 8501, run the container with:
docker run -p 8888:8888 -p 8501:8501 -v $(pwd):/home/pydata/work alexmerced/datanotebook
Use different ports for each app and expose them accordingly.
To run the Streamlit app without tying up the terminal, use:
nohup streamlit run app.py --server.enableCORS false --server.enableXsrfProtection false --server.port 8501 --server.address 0.0.0.0 &
This runs the app in the background and outputs logs to nohup.out.
In this section, you:
With these foundational skills, you're ready to explore more advanced features of Streamlit to build sophisticated data applications.
Data visualization is a crucial aspect of data analysis and communication. Streamlit simplifies the process of creating interactive and dynamic visualizations that can help users explore and understand data more effectively. In this section, we'll explore how to use Streamlit's built-in functions and integrate popular visualization libraries to build interactive data visualizations.
Streamlit provides easy-to-use functions for creating basic charts directly from data structures like Pandas DataFrames and NumPy arrays.
import streamlit as st
import pandas as pd
import numpy as np
# Generate random data
data = np.random.randn(100, 3)
columns = ['Feature A', 'Feature B', 'Feature C']
df = pd.DataFrame(data, columns=columns)
# Display line chart
st.line_chart(df)
Explanation: The st.line_chart() function takes a DataFrame or array-like object and renders an interactive line chart.
# Display bar chart
st.bar_chart(df)
Explanation: st.bar_chart() displays a bar chart. It's useful for categorical data or comparing different groups.
# Display area chart
st.area_chart(df)
Explanation: st.area_chart() creates an area chart, which is similar to a line chart but with the area below the line filled.
For more advanced visualizations, Streamlit supports libraries like Altair, which provides a declarative statistical visualization library for Python.
Creating an Altair Chart
import altair as alt
# Create an Altair chart
chart = alt.Chart(df.reset_index()).mark_circle(size=60).encode(
x='index',
y='Feature A',
color='Feature B',
tooltip=['Feature A', 'Feature B', 'Feature C']
).interactive()
st.altair_chart(chart, use_container_width=True)
Explanation: This code creates an interactive scatter plot using Altair, where you can hover over points to see tooltips.
Streamlit allows you to add widgets that enable users to interact with your visualizations.
# Slider to select number of data points
num_points = st.slider('Select number of data points', min_value=10, max_value=100, value=50)
# Generate data based on slider
data = np.random.randn(num_points, 3)
df = pd.DataFrame(data, columns=columns)
# Display updated chart
st.line_chart(df)
Explanation: The slider widget allows users to select the number of data points, and the chart updates accordingly.
# Selectbox to choose a feature
feature = st.selectbox('Select a feature to display', columns)
# Display the selected feature
st.line_chart(df[feature])
Explanation: The selectbox lets users choose which feature to visualize.
Plotly is another powerful library for creating interactive graphs.
import plotly.express as px
# Create a Plotly figure
fig = px.scatter(df, x='Feature A', y='Feature B', size='Feature C', color='Feature C', hover_name='Feature C')
# Display the Plotly figure in Streamlit
st.plotly_chart(fig, use_container_width=True)
Explanation: This code creates an interactive scatter plot with Plotly, which includes zooming, panning, and tooltips.
You can combine multiple widgets and charts to create a rich interactive experience.
# Multiselect to choose features
selected_features = st.multiselect('Select features to visualize', columns, default=columns)
# Checkbox to toggle data normalization
normalize = st.checkbox('Normalize data')
# Process data based on user input
if normalize:
df_normalized = (df - df.mean()) / df.std()
data_to_plot = df_normalized[selected_features]
else:
data_to_plot = df[selected_features]
# Display line chart of selected features
st.line_chart(data_to_plot)
Explanation: Users can select which features to visualize and whether to normalize the data, and the chart updates accordingly.
# Create columns
col1, col2 = st.columns(2)
with col1:
st.header('User Inputs')
# Add widgets here
num_points = st.slider('Number of points', 10, 100, 50)
feature = st.selectbox('Feature', columns)
with col2:
st.header('Visualization')
# Generate and display chart
data = np.random.randn(num_points, len(columns))
df = pd.DataFrame(data, columns=columns)
st.line_chart(df[feature])
Explanation: This layout separates user inputs and visualizations into two columns, making the app more organized.
In this section, you've learned how to:
By leveraging these features, you can create powerful interactive applications that make data exploration and analysis more accessible to your audience.
As you become more familiar with Streamlit, you'll discover a wealth of advanced features that allow you to build more sophisticated and powerful applications. In this section, we'll delve into some of these capabilities, including state management, dynamic content creation, file handling, and performance optimization through caching.
st.session_state
Streamlit runs your script from top to bottom every time a user interacts with a widget. To maintain state across these reruns, you can use st.session_state
, which is a dictionary-like object that persists throughout the user's session.
import streamlit as st
# Initialize counter in session state
if 'counter' not in st.session_state:
st.session_state.counter = 0
# Increment counter on button click
if st.button('Increment'):
st.session_state.counter += 1
st.write(f"Counter value: {st.session_state.counter}")
Explanation: The counter value is stored in st.session_state.counter, ensuring it persists across interactions.
Streamlit provides layout elements to organize content and improve user experience.
import streamlit as st
st.write("This is visible content")
with st.expander("Click to expand"):
st.write("This content is hidden by default")
Explanation: st.expander creates a collapsible section that users can expand or collapse.
import streamlit as st
tab1, tab2 = st.tabs(["Tab 1", "Tab 2"])
with tab1:
st.write("Content in Tab 1")
with tab2:
st.write("Content in Tab 2")
Explanation: st.tabs allows you to organize content into tabs for better navigation.
Allow users to upload files directly into your app for processing.
import streamlit as st
import pandas as pd
uploaded_file = st.file_uploader("Choose a CSV file", type="csv")
if uploaded_file is not None:
df = pd.read_csv(uploaded_file)
st.write("Uploaded Data:")
st.dataframe(df)
Explanation: Users can upload a CSV file, which the app reads and displays as a DataFrame.
Heavy computations or data loading can slow down your app. Use caching to store results and avoid redundant processing.
import streamlit as st
import pandas as pd
@st.cache_data
def load_data(url):
return pd.read_csv(url)
data_url = 'https://path-to-large-dataset.csv'
df = load_data(data_url)
st.write("Data loaded successfully")
st.dataframe(df.head())
Explanation: The @st.cache_data decorator caches the load_data function's output, improving performance on subsequent runs.
Enhance user experience by customizing your app's layout and appearance.
Setting Page Configuration
import streamlit as st
st.set_page_config(
page_title="Advanced Streamlit Features",
page_icon="🚀",
layout="wide",
initial_sidebar_state="expanded",
)
Explanation: st.set_page_config sets global configurations like the page title, icon, layout, and sidebar state.
import streamlit as st
col1, col2 = st.columns(2)
with col1:
st.header("Column 1")
st.write("Content for the first column")
with col2:
st.header("Column 2")
st.write("Content for the second column")
Explanation: Columns help organize content side by side.
Apply custom themes to match your app's branding or preferred aesthetics.
Create a .streamlit/config.toml file in your app directory with the following content:
[theme]
primaryColor="#d33682"
backgroundColor="#002b36"
secondaryBackgroundColor="#586e75"
textColor="#ffffff"
font="sans serif"
Explanation: The theme settings adjust the app's color scheme and font.
Streamlit offers a variety of widgets for complex user interactions.
import streamlit as st
date = st.date_input("Select a date")
time = st.time_input("Select a time")
st.write(f"You selected {date} at {time}")
Explanation: Allows users to input dates and times.
import streamlit as st
color = st.color_picker('Pick A Color', '#00f900')
st.write('The current color is', color)
Explanation: Users can select a color, which can be used in visualizations or styling. Advanced Callbacks and Event Handling Respond to user interactions with callbacks.
import streamlit as st
def on_button_click():
st.write("Button was clicked!")
st.button("Click Me", on_click=on_button_click)
Explanation: The on_click parameter specifies a function to execute when the button is clicked.
Fetch and display data from external sources.
import streamlit as st
import requests
st.write("Fetch data from an API")
response = requests.get('https://api.example.com/data')
if response.status_code == 200:
data = response.json()
st.write(data)
else:
st.error("Failed to fetch data")
Explanation: Uses the requests library to fetch data from an API and display it.
Streamlit supports bi-directional communication for real-time updates.
import streamlit as st
params = st.experimental_get_query_params()
st.write("Query parameters:", params)
Explanation: Access query parameters from the URL to control app behavior dynamically.
Break down your app into reusable components.
# components.py
import streamlit as st
def display_header():
st.title("Advanced Streamlit Features")
st.write("This is a custom component")
# main app
import streamlit as st
from components import display_header
display_header()
st.write("Main app content")
Explanation: Organize code by splitting it into modules for better maintainability.
Make your app accessible to a global audience.
import streamlit as st
st.write("Hello, World!")
# Use gettext or other localization libraries for translations
Explanation: While Streamlit doesn't provide built-in localization, you can use Python's localization libraries.
Ensure your app is usable by people with disabilities.
st.image('image.png', caption='Descriptive text')
In this section, we've explored several advanced features of Streamlit that empower you to build more interactive and efficient applications:
By mastering these advanced features, you can create sophisticated Streamlit applications that provide a seamless and engaging user experience.
Streamlit excels at making machine learning models accessible through interactive web applications. In this section, we'll explore how to integrate machine learning models into your Streamlit apps using the pre-installed libraries in the Python Data Science Notebook Docker Image, such as TensorFlow, PyTorch, and scikit-learn.
The Docker image comes with TensorFlow and PyTorch installed, allowing you to work with complex neural network models.
import streamlit as st
import tensorflow as tf
# Load a pre-trained model, e.g., MobileNetV2
model = tf.keras.applications.MobileNetV2(weights='imagenet')
st.write("Model loaded successfully.")
from tensorflow.keras.applications.mobilenet_v2 import preprocess_input, decode_predictions
from PIL import Image
import numpy as np
uploaded_file = st.file_uploader("Upload an image", type=["jpg", "jpeg", "png"])
if uploaded_file is not None:
# Load and preprocess the image
image = Image.open(uploaded_file)
st.image(image, caption='Uploaded Image', use_column_width=True)
img = image.resize((224, 224))
img_array = np.array(img)
img_array = preprocess_input(img_array)
img_array = np.expand_dims(img_array, axis=0)
# Make prediction
predictions = model.predict(img_array)
results = decode_predictions(predictions, top=3)[0]
# Display predictions
st.write("Top Predictions:")
for i, res in enumerate(results):
st.write(f"{i+1}. {res[1]}: {round(res[2]*100, 2)}%")
Explanation: Users can upload an image, and the app displays the top predictions from the pre-trained MobileNetV2 model.
import streamlit as st
import torch
from torchvision import models, transforms
# Load a pre-trained ResNet model
model = models.resnet18(pretrained=True)
model.eval()
st.write("PyTorch model loaded successfully.")
from PIL import Image
import torchvision.transforms as T
uploaded_file = st.file_uploader("Upload an image", type=["jpg", "jpeg", "png"])
if uploaded_file is not None:
# Load and preprocess the image
image = Image.open(uploaded_file)
st.image(image, caption='Uploaded Image', use_column_width=True)
preprocess = T.Compose([
T.Resize(256),
T.CenterCrop(224),
T.ToTensor(),
T.Normalize(
mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225]
)
])
img_t = preprocess(image)
batch_t = torch.unsqueeze(img_t, 0)
# Make prediction
with torch.no_grad():
out = model(batch_t)
probabilities = torch.nn.functional.softmax(out[0], dim=0)
# Load labels
with open("imagenet_classes.txt") as f:
labels = [line.strip() for line in f.readlines()]
# Show top 3 predictions
top3_prob, top3_catid = torch.topk(probabilities, 3)
st.write("Top Predictions:")
for i in range(top3_prob.size(0)):
st.write(f"{i+1}. {labels[top3_catid[i]]}: {round(top3_prob[i].item()*100, 2)}%")
Note: Ensure that the imagenet_classes.txt file is available in your working directory.
Let's build a simple regression app using scikit-learn.
import streamlit as st
from sklearn.datasets import load_boston
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
# Load dataset
data = load_boston()
X = data.data
y = data.target
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
# Train model
model = RandomForestRegressor()
model.fit(X_train, y_train)
st.write("Model trained successfully.")
import numpy as np
st.header("Boston Housing Price Prediction")
# Feature input sliders
CRIM = st.number_input('Per capita crime rate by town', min_value=0.0, value=0.1)
ZN = st.number_input('Proportion of residential land zoned for lots over 25,000 sq.ft.', min_value=0.0, value=0.0)
# ... add inputs for other features
# For brevity, we'll use default values for the rest of the features
input_features = np.array([[CRIM, ZN] + [0]*(X.shape[1]-2)])
# Predict
prediction = model.predict(input_features)
st.write(f"Predicted median value of owner-occupied homes: ${prediction[0]*1000:.2f}")
Explanation: Users can input values for features, and the app predicts housing prices.
Visualizations help in understanding model performance.
from sklearn.metrics import mean_squared_error, r2_score
# Predict on test set
y_pred = model.predict(X_test)
# Calculate metrics
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
# Display metrics
st.write("Model Performance on Test Set:")
st.write(f"Mean Squared Error: {mse:.2f}")
st.write(f"R² Score: {r2:.2f}")
import pandas as pd
df = pd.DataFrame({'Actual': y_test, 'Predicted': y_pred})
st.write("Actual vs. Predicted Values")
st.line_chart(df)
Explanation: The line chart shows how closely the model's predictions match the actual values.
Model Serialization: For complex models, consider saving and loading models using joblib or pickle to avoid retraining every time.
import joblib
# Save model
joblib.dump(model, 'model.joblib')
# Load model
model = joblib.load('model.joblib')
Use Caching for Models: Cache the model loading or training functions to improve performance.
python
@st.cache_resource
def load_model():
# Load or train model
return model
Handle Large Models: Be mindful of resource limitations. Use efficient data structures and consider offloading heavy computations.
Provide Clear Instructions: Guide users on how to interact with the app, especially when expecting specific input formats.
In this section, you've learned how to:
By integrating machine learning models into your Streamlit applications, you can create powerful tools that make complex models accessible to end-users in an interactive and user-friendly manner.
In many data science projects, interacting with databases is essential for retrieving, processing, and storing data. Streamlit, combined with the powerful libraries included in the Python Data Science Notebook Docker Image, makes it straightforward to connect to various databases and integrate them into your applications. In this section, we'll explore how to connect to databases using sqlalchemy
, psycopg2
, and specifically how to interact with Dremio using the dremio-simple-query
library.
dremio-simple-query
Dremio is a data lakehouse platform that enables you to govern, join, and accelerate queries across various data sources such as Iceberg, Delta Lake, S3, JSON, CSV, RDBMS, and more. The dremio-simple-query
library simplifies querying a Dremio source using Apache Arrow Flight, providing performant data retrieval for analytics.
dremio-simple-query
LibraryFirst, ensure that the dremio-simple-query
library is installed in your environment:
pip install dremio-simple-query
To connect to Dremio, you'll need to obtain your Dremio Arrow Flight endpoint and an authentication token.
grpc+tls://data.dremio.cloud:443
grpc+tls://data.eu.dremio.cloud:443
grpc+tls://<ip-address>:32010
grpc://<ip-address>:32010
You can use the get_token
function from the dremio-simple-query
library to retrieve the token programmatically.
import streamlit as st
from dremio_simple_query.connect import get_token, DremioConnection
from os import getenv
from dotenv import load_dotenv
# Load environment variables from a .env file (optional)
load_dotenv()
# Retrieve Dremio credentials and endpoints
username = st.secrets["dremio_username"]
password = st.secrets["dremio_password"]
arrow_endpoint = st.secrets["dremio_arrow_endpoint"] # e.g., "grpc+tls://data.dremio.cloud:443"
login_endpoint = st.secrets["dremio_login_endpoint"] # e.g., "https://your-dremio-server:9047/apiv2/login"
# Get authentication token
payload = {
"userName": username,
"password": password
}
token = get_token(uri=login_endpoint, payload=payload)
# Establish connection to Dremio
dremio = DremioConnection(token, arrow_endpoint)
# Test the connection
try:
st.success("Successfully connected to Dremio.")
except Exception as e:
st.error(f"Failed to connect to Dremio: {e}")
Note: Ensure that you securely manage your credentials using Streamlit's secrets management or environment variables.
You can now query data from Dremio and retrieve it in various formats.
# Query data and get a FlightStreamReader object
stream = dremio.toArrow("SELECT * FROM your_space.your_table LIMIT 100")
# Convert the stream to an Arrow Table
arrow_table = stream.read_all()
# Optionally, display the data in Streamlit
df = arrow_table.to_pandas()
st.write("Data from Dremio:")
st.dataframe(df)
# Directly get a Pandas DataFrame
df = dremio.toPandas("SELECT * FROM your_space.your_table LIMIT 100")
st.write("Data from Dremio:")
st.dataframe(df)
# Get a Polars DataFrame
df_polars = dremio.toPolars("SELECT * FROM your_space.your_table LIMIT 100")
st.write("Data from Dremio (Polars DataFrame):")
st.write(df_polars)
You can leverage DuckDB for in-memory analytics on the data retrieved from Dremio.
# Retrieve data as a DuckDB relation
duck_rel = dremio.toDuckDB("SELECT * FROM your_space.your_table LIMIT 100")
# Perform queries on the DuckDB relation
result = duck_rel.filter("column_name > 50").df()
# Display the result
st.write("Filtered Data using DuckDB:")
st.dataframe(result)
Alternatively, you can query Arrow Tables using DuckDB:
import duckdb
# Get data from Dremio as an Arrow Table
stream = dremio.toArrow("SELECT * FROM your_space.your_table LIMIT 100")
arrow_table = stream.read_all()
# Create a DuckDB connection
con = duckdb.connect()
# Register the Arrow Table with DuckDB
con.register("dremio_table", arrow_table)
# Perform SQL queries using DuckDB
query = "SELECT * FROM dremio_table WHERE column_name > 50"
result = con.execute(query).fetch_df()
# Display the result
st.write("Filtered Data using DuckDB on Arrow Table:")
st.dataframe(result)
In addition to Dremio, you might need to connect to other databases like PostgreSQL or MySQL. The Docker image comes with sqlalchemy, psycopg2-binary, and other database drivers pre-installed.
from sqlalchemy import create_engine
import pandas as pd
# Database connection parameters
DB_USER = st.secrets["db_user"]
DB_PASSWORD = st.secrets["db_password"]
DB_HOST = st.secrets["db_host"]
DB_PORT = st.secrets["db_port"]
DB_NAME = st.secrets["db_name"]
# Create a database engine
engine = create_engine(f'postgresql+psycopg2://{DB_USER}:{DB_PASSWORD}@{DB_HOST}:{DB_PORT}/{DB_NAME}')
# Test the connection
try:
with engine.connect() as connection:
st.success("Successfully connected to the PostgreSQL database.")
except Exception as e:
st.error(f"Failed to connect to the database: {e}")
# Sample query
query = "SELECT * FROM your_table LIMIT 10"
# Execute the query and load data into a DataFrame
df = pd.read_sql(query, engine)
# Display the data
st.write("Data from PostgreSQL:")
st.dataframe(df)
When dealing with large datasets, performance can become an issue. Dask is a parallel computing library that integrates with Pandas to handle larger-than-memory datasets efficiently.
Copy
import dask.dataframe as dd
# Read data from SQL using Dask
df = dd.read_sql_table(
table='large_table',
uri=f'postgresql+psycopg2://{DB_USER}:{DB_PASSWORD}@{DB_HOST}:{DB_PORT}/{DB_NAME}',
index_col='id'
)
# Perform computations with Dask DataFrame
filtered_df = df[df['value'] > 100]
# Compute the result and convert to Pandas DataFrame
result = filtered_df.compute()
# Display the result
st.write("Filtered Data:")
st.dataframe(result)
In this section, you've learned how to:
By integrating Dremio and other data systems into your Streamlit applications, you can create powerful data-driven apps that interact with live data sources, enabling real-time analysis and insights.
With your Streamlit app developed and tested within the Docker environment, the next step is to deploy it so that others can access and use it. Deploying Streamlit apps can be done in several ways, including running the app locally, containerizing it with Docker, and deploying it to cloud platforms like Streamlit Community Cloud, Heroku, AWS, or other hosting services.
In this section, we'll explore how to:
While developing within Jupyter Notebook is convenient, deploying your app typically involves running it as a standalone script.
Ensure Streamlit is Installed
If you followed the previous sections, Streamlit should already be installed in your Docker container. If not, install it using:
pip install streamlit
cd /home/pydata/work
Execute the following command:
streamlit run app.py
This command starts the Streamlit server and serves your app at http://localhost:8501 by default.
Open your web browser and navigate to http://localhost:8501 to interact with your app.
Containerizing your app ensures consistency across different environments and simplifies deployment.
# Use the official Python image as base
FROM python:3.9-slim
# Set the working directory
WORKDIR /app
# Copy the requirements file
COPY requirements.txt .
# Install dependencies
RUN pip install --no-cache-dir -r requirements.txt
# Copy the rest of the application code
COPY . .
# Expose the port Streamlit uses
EXPOSE 8501
# Run the Streamlit app
CMD ["streamlit", "run", "app.py", "--server.port=8501", "--server.address=0.0.0.0"]
List all your Python dependencies in a file named requirements.txt:
streamlit
pandas
numpy
# Add any other dependencies your app requires
In your terminal, run:
docker build -t my-streamlit-app .
This builds the Docker image and tags it as my-streamlit-app.
docker run -p 8501:8501 my-streamlit-app
Maps port 8501 in the container to port 8501 on your host machine.
Open your web browser and navigate to http://localhost:8501.
If you plan to deploy your app using Docker images, you may need to push it to a Docker registry like Docker Hub or a private registry.
# Tag the image for Docker Hub
docker tag my-streamlit-app your-dockerhub-username/my-streamlit-app
# Log in to Docker Hub
docker login
# Push the image
docker push your-dockerhub-username/my-streamlit-app
There are several cloud platforms that support deploying Streamlit apps. Below, we'll cover deploying to Streamlit Community Cloud, Heroku, and AWS Elastic Beanstalk.
Streamlit offers a free hosting service for public GitHub repositories.
Once deployed, you'll receive a URL where your app is hosted.
Heroku is a cloud platform that supports deploying applications using Docker.
In your app directory, create a file named Procfile with the following content:
web: streamlit run app.py --server.port=$PORT --server.address=0.0.0.0
If you haven't already:
git init
git add .
git commit -m "Initial commit"
Install the Heroku CLI and log in:
heroku login
Create a new app:
heroku create your-app-name
Deploy Your App
Push your code to Heroku:
git push heroku master
Scale the Web Process
heroku ps:scale web=1
Access Your App
heroku open
AWS Elastic Beanstalk supports deploying applications in Docker containers.
eb init -p docker my-streamlit-app
eb create my-streamlit-env
eb deploy
eb open
You can deploy your Streamlit app using other platforms like:
Build and Push the Docker Image to Google Container Registry
# Build the Docker image
docker build -t gcr.io/your-project-id/my-streamlit-app .
# Push the image
docker push gcr.io/your-project-id/my-streamlit-app
gcloud run deploy my-streamlit-app \
--image gcr.io/your-project-id/my-streamlit-app \
--platform managed \
--region us-central1 \
--allow-unauthenticated
Use environment variables to store sensitive information:
import os
API_KEY = os.getenv("API_KEY")
Set the environment variable in your deployment platform's settings or configuration.
Use Python's built-in logging library:
import logging
logging.basicConfig(level=logging.INFO)
logging.info("This is an info message")
Validate and sanitize all user inputs to prevent security vulnerabilities like injection attacks.
In this section, you've learned how to:
By deploying your Streamlit app, you make it accessible to a wider audience, allowing others to benefit from your interactive data applications.
Developing Streamlit applications involves not just coding but also adhering to best practices that ensure your app is efficient, maintainable, and user-friendly. In this section, we'll cover some essential tips and best practices to help you optimize your Streamlit apps.
A well-organized codebase enhances readability and maintainability, especially as your application grows in complexity.
components
Module: Encapsulate reusable UI components in a separate module to avoid code duplication. # components.py
import streamlit as st
def sidebar_filters():
st.sidebar.header("Filters")
# Add filter widgets
# main app.py
import streamlit as st
from components import sidebar_filters
sidebar_filters()
# Rest of your app code
__pycache__/
.DS_Store
venv/
.env
A polished UI enhances the user experience and makes your app more engaging.
.streamlit/config.toml
file to define your theme settings.[theme]
primaryColor="#6eb52f"
backgroundColor="#f0f0f5"
secondaryBackgroundColor="#e0e0ef"
textColor="#262730"
font="sans serif"
Use Columns and Containers: Organize content using st.columns()
, st.container()
, and st.expander()
for a clean layout.
col1, col2 = st.columns(2)
with col1:
st.header("Section 1")
# Content for section 1
with col2:
st.header("Section 2")
# Content for section 2
Feedback: Use st.progress()
, st.spinner()
, and st.toast()
to provide feedback during long computations.
with st.spinner('Loading data...'):
df = load_data()
st.success('Data loaded successfully!')
Tooltips and Help Text: Add tooltips or help text to widgets to guide users.
st.text_input("Username", help="Enter your user ID assigned by the administrator")
Alt Text for Images: Use the caption parameter in st.image()
to provide descriptions.
st.image('chart.png', caption='Sales over time')
Keyboard Navigation: Ensure that all interactive elements can be navigated using the keyboard.
Being able to identify and fix issues quickly is crucial for smooth app development.
if 'counter' not in st.session_state:
st.session_state.counter = 0
increment = st.button('Increment')
if increment:
st.session_state.counter += 1
st.write(f"Counter: {st.session_state.counter}")
Dependency Mismatches: Ensure that all dependencies are listed in requirements.txt and versions are compatible.
Environment Variables: Check that all required environment variables are set in the deployment environment.
API Changes: If you encounter deprecated functions, update your code to match the latest Streamlit API.
Version Pinning: Specify the Streamlit version in your requirements.txt to maintain consistency.
streamlit==1.25.0
import logging
logging.basicConfig(level=logging.INFO)
logging.info("This is an info message")
Use st.error()
to display error messages to the user.
try:
result = perform_calculation()
st.write(result)
except Exception as e:
st.error(f"An error occurred: {e}")
Unit Tests: Write unit tests for your functions using unittest or pytest. Test Scripts: Create test scripts to simulate user interactions and verify app behavior.
Optimizing your app's performance ensures a better user experience.
Lazy Loading: Load data only when necessary, perhaps in response to user input. Data Sampling: For large datasets, consider using a sample for initial display and provide options to load more data.
@st.cache_data
def load_data():
# Load data from source
return data
@st.cache_data
def compute_expensive_operation(params):
# Perform computation
return result
Avoid Redundant Computations: Structure code to prevent unnecessary re-execution of functions.
Clear Session State When Needed: Manage st.session_state to free up memory if variables are no longer needed.
Ensure your app is secure, especially when handling sensitive data.
Input Validation: Always validate and sanitize user inputs.
Secrets Management: Use Streamlit's secrets management to handle API keys and passwords.
import os
API_KEY = st.secrets["api_key"]
HTTPS: Deploy your app using HTTPS to encrypt data in transit.
Provide documentation to help users understand and navigate your app.
Inline Documentation: Use st.markdown() or st.write() to include instructions and explanations within the app.
User Manuals: Provide a downloadable or linked user guide for complex applications.
Tooltips: Utilize the help parameter in widgets to give users quick hints.
Streamlit is actively developed, and staying updated can help you leverage new features.
Changelog: Regularly check the Streamlit changelog for updates.
Community Forums: Participate in the Streamlit community forums to learn from others and share your experiences.
Update Dependencies: Periodically update your dependencies to benefit from performance improvements and security patches.
By following these best practices and tips, you can:
Implementing these strategies will help you develop professional, robust, and efficient Streamlit applications that meet the needs of your users and stakeholders.
In this comprehensive guide, we've embarked on a journey to master Streamlit using the Python Data Science Notebook Docker Image. Throughout the chapters, we've explored how to set up a robust environment, harness the power of Streamlit for building interactive data applications, and leverage advanced features to enhance functionality and user experience.
st.session_state
, dynamic content creation with layout elements, and performance optimization through caching mechanisms.While we've covered a significant amount of ground, there's always more to learn and explore:
Streamlit has revolutionized the way we create and share data applications, making it accessible for data scientists and developers to build interactive web apps with ease. By combining Streamlit with the Python Data Science Notebook Docker Image, we've established a powerful workflow that simplifies environment setup and accelerates application development.
As you continue your journey, remember that the key to mastery is consistent practice and exploration. Don't hesitate to experiment with new ideas, seek feedback, and iterate on your applications. The world of data science is ever-evolving, and tools like Streamlit are at the forefront of making data more accessible and engaging for everyone.