Por Jose R. Zapata
Ultima actualizacion: 14/Nov/2023
PLOTLY: Libreria de Visualizacion Interactiva
Plotly es una libreria de graficos interactivos de código abierto que admite más de 40 tipos de gráficos únicos que cubren una amplia gama de casos de uso estadísticos, financieros, geográficos, científicos y tridimensionales.
Ademas de ser interactivo y obtener los valores en cada punto de la gráfica, se pueden mezclar datos numéricos y categóricos.
Instalacion Plotly
´pip install plotly´
Importar Plotly express
Plotly express es un modulo para usar de forma rapida y concisa de usar la visualización interactiva de plotly
Nota: Los datos siempre deben estar en un dataframe
import plotly.express as px
Datos integrados en Plotly
Plotly viene con algunos data sets clasicos integrados para hacer pruebas:
- carshare
- election
- gapminder
- iris
- tips
- wind
también se pueden encontar otros datasets clasicos de demostracion en formato .csv en: https://github.com/mwaskom/seaborn-data
tips = px.data.tips() # Importar el dataset tips
type(tips)
pandas.core.frame.DataFrame
print(px.data.tips.__doc__)
    Each row represents a restaurant bill.
    https://vincentarelbundock.github.io/Rdatasets/doc/reshape2/tips.html
    Returns:
        A `pandas.DataFrame` with 244 rows and the following columns: `['total_bill', 'tip', 'sex', 'smoker', 'day', 'time', 'size']`.
tips.head() # ver los primeros 5 registros
| total_bill | tip | sex | smoker | day | time | size | |
|---|---|---|---|---|---|---|---|
| 0 | 16.99 | 1.01 | Female | No | Sun | Dinner | 2 | 
| 1 | 10.34 | 1.66 | Male | No | Sun | Dinner | 3 | 
| 2 | 21.01 | 3.50 | Male | No | Sun | Dinner | 3 | 
| 3 | 23.68 | 3.31 | Male | No | Sun | Dinner | 2 | 
| 4 | 24.59 | 3.61 | Female | No | Sun | Dinner | 4 | 
tips.dtypes #tipos de datos en el dataframe
total_bill    float64
tip           float64
sex            object
smoker         object
day            object
time           object
size            int64
dtype: object
tips.describe() #Resumen estadistico de los datos del data frame por columna
| total_bill | tip | size | |
|---|---|---|---|
| count | 244.000000 | 244.000000 | 244.000000 | 
| mean | 19.785943 | 2.998279 | 2.569672 | 
| std | 8.902412 | 1.383638 | 0.951100 | 
| min | 3.070000 | 1.000000 | 1.000000 | 
| 25% | 13.347500 | 2.000000 | 2.000000 | 
| 50% | 17.795000 | 2.900000 | 2.000000 | 
| 75% | 24.127500 | 3.562500 | 3.000000 | 
| max | 50.810000 | 10.000000 | 6.000000 | 
Tipos de Graficas con Plotly
Lineas
px.line(tips,y='total_bill',title='Valor Total de la Cuenta')
Barras
px.bar(tips, x="sex", y="total_bill")
px.bar(tips, x="sex", y="total_bill", color='sex')
Histograma
px.histogram(tips,'total_bill',title='Histograma Valor Total de la Cuenta')
px.histogram(tips,'sex',title='Histograma de Generos')
px.histogram(tips,'day',
             category_orders= {'day': ["Thur","Fri","Sat", "Sun"]},
             title='Histograma de Dias')
Boxplot
px.box(tips,y='total_bill', title='Boxplot Valor Total de la Cuenta')
px.box(tips,x = 'day',y='total_bill', color='day',
       title='Boxplots por dia del Valor Total de la Cuenta')
px.box(tips,x = 'day',y='total_bill', title= 'Boxplot por dia con dias en orden',
       category_orders= {'day': ["Thur","Fri","Sat", "Sun"]})
px.box(tips,x = 'day',y='total_bill', color='smoker', category_orders= {'day': ["Thur","Fri","Sat", "Sun"]})
px.box(tips,x = 'day',y='total_bill', color='smoker',
       boxmode='overlay',
       title = 'Boxplots de cuenta total por dia, fumador o no , sobrepuestos ',
       category_orders= {'day': ["Thur","Fri","Sat", "Sun"]})
Violin Plot
px.violin(tips,y='total_bill', title='Boxplot Valor Total de la Cuenta')
px.violin(tips,x = 'day',y='total_bill', title='Violin por dia del Valor Total de la Cuenta')
px.violin(tips,x = 'day',y='total_bill', color='day',
          title='Violin por dia del Valor Total de la Cuenta')
px.violin(tips,x = 'day',y='total_bill', color='sex',
          title='Violin por dia del Valor Total de la Cuenta')
px.violin(tips,x = 'day',y='total_bill', color='sex',violinmode='overlay',
          title='Violin por dia del Valor Total de la Cuenta, Hombres y Mujeres')
StripPlot
px.strip(tips, x="day", y="total_bill")
px.strip(tips, x="total_bill", y="time",
         orientation="h", color="smoker")
px.strip(tips, x="day", y="total_bill",
         color="sex", stripmode='overlay')
Scatterplot
gapminder = px.data.gapminder()
gapminder2007 = gapminder.query("year==2007")
px.scatter(gapminder2007, x="gdpPercap", y="lifeExp")
px.scatter(gapminder2007, x="gdpPercap", y="lifeExp", color="continent")
px.scatter(gapminder2007, x="gdpPercap", y="lifeExp", size="pop", color="continent", size_max=60)
px.scatter(gapminder2007, x="gdpPercap", y="lifeExp", size="pop", color="continent",
           hover_name="country", log_x=True, size_max=60)
Regresion Lineal
px.scatter(tips,x='total_bill',y='tip',trendline='ols')
Matrix Plot
px.scatter_matrix(tips)
px.scatter_matrix(tips, dimensions=['total_bill','tip','size'])
px.scatter_matrix(tips, dimensions=['total_bill','tip','size'], color='sex')
HeatMap
tips.head()
| total_bill | tip | sex | smoker | day | time | size | |
|---|---|---|---|---|---|---|---|
| 0 | 16.99 | 1.01 | Female | No | Sun | Dinner | 2 | 
| 1 | 10.34 | 1.66 | Male | No | Sun | Dinner | 3 | 
| 2 | 21.01 | 3.50 | Male | No | Sun | Dinner | 3 | 
| 3 | 23.68 | 3.31 | Male | No | Sun | Dinner | 2 | 
| 4 | 24.59 | 3.61 | Female | No | Sun | Dinner | 4 | 
# Matriz de correlacion de los datos
tips_corr = tips.corr(numeric_only=True)
tips_corr
| total_bill | tip | size | |
|---|---|---|---|
| total_bill | 1.000000 | 0.675734 | 0.598315 | 
| tip | 0.675734 | 1.000000 | 0.489299 | 
| size | 0.598315 | 0.489299 | 1.000000 | 
px.imshow(tips_corr, text_auto=True,
          color_continuous_scale='Viridis')
Animaciones con Plotly
px.scatter(gapminder, x="gdpPercap", y="lifeExp",
           animation_frame="year", animation_group="country",
           size="pop", color="continent", hover_name="country",
           log_x=True, size_max=45, range_x=[100,100000], range_y=[25,90])
Division de Columnas y filas por Categorias (Facet)
px.scatter(gapminder2007, x="gdpPercap", y="lifeExp", size="pop",
           color="continent",
           hover_name="country",
           size_max=60, facet_col='continent',
           log_x=True)
px.scatter(gapminder, x="gdpPercap", y="lifeExp",
           animation_frame="year", animation_group="country",
           size="pop", color="continent", hover_name="country",
           facet_col="continent",
           log_x=True, size_max=45, range_x=[100, 100000], range_y=[25, 90])
px.histogram(tips,'total_bill', facet_col="time", facet_row="smoker")
px.scatter(tips, x="total_bill", y="tip",
           facet_row="smoker", facet_col="time", color="sex")
px.scatter(tips, x="total_bill", y="tip", facet_row="time", facet_col="day", color="smoker",
          category_orders={"day": ["Thur", "Fri", "Sat", "Sun"], "time": ["Lunch", "Dinner"]})
Graficos en Margenes
px.scatter(tips,x='total_bill',y='tip',
          marginal_x='histogram',
          marginal_y='histogram')
px.scatter(tips,x='total_bill',y='tip',
          marginal_x='violin',
          marginal_y ='box')
px.scatter(tips,x='total_bill',y='tip',
          marginal_x='violin',
          marginal_y ='box',
          color='sex')
Referencias
- https://matplotlib.org/stable/gallery/index.html - Una gran galería que muestra varios tipos de graficos matplotlib. ¡Muy recomendable!
- http://www.loria.fr/~rougier/teaching/matplotlib - Un Buen tutorial de matplotlib.
- https://medium.com/plotly/introducing-plotly-express-808df010143d
- https://plotly.com/python/plotly-express/
- http://seaborn.pydata.org/ - Documentacion Seaborn otra libreria de graficas estadisticas
- https://matplotlib.org/stable/api/markers_api.html - documentacion de marcadores
Phd. Jose R. Zapata