Tabular data science is the most widely used domain of machine learning, with problems ranging from customer segmentation to churn prediction. Throughout various stages of the tabular data science workflow, communicating your work to stakeholders or clients can be cumbersome; which prevents data scientists from focusing on what matters, such as data analysis and model building. Data scientists can end up spending hours building a dashboard that takes in dataframe and returning plots, or returning a prediction or plot of clusters in a dataset. In this guide, we'll go through how to use
gradio to improve your data science workflows. We will also talk about how to use
gradio and skops to build interfaces with only one line of code!
Make sure you have the
gradio Python package already installed.
We will take a look at how we can create a simple UI that predicts failures based on product information.
import gradio as gr import pandas as pd import joblib import datasets inputs = [gr.Dataframe(row_count = (2, "dynamic"), col_count=(4,"dynamic"), label="Input Data", interactive=1)] outputs = [gr.Dataframe(row_count = (2, "dynamic"), col_count=(1, "fixed"), label="Predictions", headers=["Failures"])] model = joblib.load("model.pkl") # we will give our dataframe as example df = datasets.load_dataset("merve/supersoaker-failures") df = df["train"].to_pandas() def infer(input_dataframe): return pd.DataFrame(model.predict(input_dataframe)) gr.Interface(fn = infer, inputs = inputs, outputs = outputs, examples = [[df.head(2)]]).launch()
Let's break down above code.
fn: the inference function that takes input dataframe and returns predictions.
inputs: the component we take our input with. We define our input as dataframe with 2 rows and 4 columns, which initially will look like an empty dataframe with the aforementioned shape. When the
row_countis set to
dynamic, you don't have to rely on the dataset you're inputting to pre-defined component.
outputs: The dataframe component that stores outputs. This UI can take single or multiple samples to infer, and returns 0 or 1 for each sample in one column, so we give
row_countas 2 and
col_countas 1 above.
headersis a list made of header names for dataframe.
examples: You can either pass the input by dragging and dropping a CSV file, or a pandas DataFrame through examples, which headers will be automatically taken by the interface.
We will now create an example for a minimal data visualization dashboard. You can find a more comprehensive version in the related Spaces.
import gradio as gr import pandas as pd import datasets import seaborn as sns import matplotlib.pyplot as plt df = datasets.load_dataset("merve/supersoaker-failures") df = df["train"].to_pandas() df.dropna(axis=0, inplace=True) def plot(df): plt.scatter(df.measurement_13, df.measurement_15, c = df.loading,alpha=0.5) plt.savefig("scatter.png") df['failure'].value_counts().plot(kind='bar') plt.savefig("bar.png") sns.heatmap(df.select_dtypes(include="number").corr()) plt.savefig("corr.png") plots = ["corr.png","scatter.png", "bar.png"] return plots inputs = [gr.Dataframe(label="Supersoaker Production Data")] outputs = [gr.Gallery(label="Profiling Dashboard").style(grid=(1,3))] gr.Interface(plot, inputs=inputs, outputs=outputs, examples=[df.head(100)], title="Supersoaker Failures Analysis Dashboard").launch()
We will use the same dataset we used to train our model, but we will make a dashboard to visualize it this time.
fn: The function that will create plots based on data.
inputs: We use the same
Dataframecomponent we used above.
Gallerycomponent is used to keep our visualizations.
examples: We will have the dataset itself as the example.
skops is a library built on top of
sklearn. With the recent
gradio integration of
skops, you can build tabular data interfaces with one line of code!
import gradio as gr # title and description are optional title = "Supersoaker Defective Product Prediction" description = "This model predicts Supersoaker production line failures. Drag and drop any slice from dataset or edit values as you wish in below dataframe component." gr.Interface.load("huggingface/scikit-learn/tabular-playground", title=title, description=description).launch()
sklearn models pushed to Hugging Face Hub using
skops include a
config.json file that contains an example input with column names, the task being solved (that can either be
tabular-regression). From the task type,
gradio constructs the
Interface and consumes column names and the example input to build it. You can refer to skops documentation on hosting models on Hub to learn how to push your models to Hub using