Por Jose R. Zapata
Ultima actualizacion: 14/Nov/2023
PLOTLY: Libreria de Visualizacion Interactiva
Plotly es una libreria de graficos interactivos de código abierto que admite más de 40 tipos de gráficos únicos que cubren una amplia gama de casos de uso estadísticos, financieros, geográficos, científicos y tridimensionales.
Ademas de ser interactivo y obtener los valores en cada punto de la gráfica, se pueden mezclar datos numéricos y categóricos.
Instalacion Plotly
´pip install plotly´
Importar Plotly express
Plotly express es un modulo para usar de forma rapida y concisa de usar la visualización interactiva de plotly
Nota: Los datos siempre deben estar en un dataframe
import plotly.express as px
Datos integrados en Plotly
Plotly viene con algunos data sets clasicos integrados para hacer pruebas:
- carshare
- election
- gapminder
- iris
- tips
- wind
también se pueden encontar otros datasets clasicos de demostracion en formato .csv en: https://github.com/mwaskom/seaborn-data
tips = px.data.tips() # Importar el dataset tips
type(tips)
pandas.core.frame.DataFrame
print(px.data.tips.__doc__)
Each row represents a restaurant bill.
https://vincentarelbundock.github.io/Rdatasets/doc/reshape2/tips.html
Returns:
A `pandas.DataFrame` with 244 rows and the following columns: `['total_bill', 'tip', 'sex', 'smoker', 'day', 'time', 'size']`.
tips.head() # ver los primeros 5 registros
total_bill | tip | sex | smoker | day | time | size | |
---|---|---|---|---|---|---|---|
0 | 16.99 | 1.01 | Female | No | Sun | Dinner | 2 |
1 | 10.34 | 1.66 | Male | No | Sun | Dinner | 3 |
2 | 21.01 | 3.50 | Male | No | Sun | Dinner | 3 |
3 | 23.68 | 3.31 | Male | No | Sun | Dinner | 2 |
4 | 24.59 | 3.61 | Female | No | Sun | Dinner | 4 |
tips.dtypes #tipos de datos en el dataframe
total_bill float64
tip float64
sex object
smoker object
day object
time object
size int64
dtype: object
tips.describe() #Resumen estadistico de los datos del data frame por columna
total_bill | tip | size | |
---|---|---|---|
count | 244.000000 | 244.000000 | 244.000000 |
mean | 19.785943 | 2.998279 | 2.569672 |
std | 8.902412 | 1.383638 | 0.951100 |
min | 3.070000 | 1.000000 | 1.000000 |
25% | 13.347500 | 2.000000 | 2.000000 |
50% | 17.795000 | 2.900000 | 2.000000 |
75% | 24.127500 | 3.562500 | 3.000000 |
max | 50.810000 | 10.000000 | 6.000000 |
Tipos de Graficas con Plotly
Lineas
px.line(tips,y='total_bill',title='Valor Total de la Cuenta')
Barras
px.bar(tips, x="sex", y="total_bill")
px.bar(tips, x="sex", y="total_bill", color='sex')
Histograma
px.histogram(tips,'total_bill',title='Histograma Valor Total de la Cuenta')
px.histogram(tips,'sex',title='Histograma de Generos')
px.histogram(tips,'day',
category_orders= {'day': ["Thur","Fri","Sat", "Sun"]},
title='Histograma de Dias')
Boxplot
px.box(tips,y='total_bill', title='Boxplot Valor Total de la Cuenta')
px.box(tips,x = 'day',y='total_bill', color='day',
title='Boxplots por dia del Valor Total de la Cuenta')
px.box(tips,x = 'day',y='total_bill', title= 'Boxplot por dia con dias en orden',
category_orders= {'day': ["Thur","Fri","Sat", "Sun"]})
px.box(tips,x = 'day',y='total_bill', color='smoker', category_orders= {'day': ["Thur","Fri","Sat", "Sun"]})
px.box(tips,x = 'day',y='total_bill', color='smoker',
boxmode='overlay',
title = 'Boxplots de cuenta total por dia, fumador o no , sobrepuestos ',
category_orders= {'day': ["Thur","Fri","Sat", "Sun"]})
Violin Plot
px.violin(tips,y='total_bill', title='Boxplot Valor Total de la Cuenta')
px.violin(tips,x = 'day',y='total_bill', title='Violin por dia del Valor Total de la Cuenta')
px.violin(tips,x = 'day',y='total_bill', color='day',
title='Violin por dia del Valor Total de la Cuenta')
px.violin(tips,x = 'day',y='total_bill', color='sex',
title='Violin por dia del Valor Total de la Cuenta')
px.violin(tips,x = 'day',y='total_bill', color='sex',violinmode='overlay',
title='Violin por dia del Valor Total de la Cuenta, Hombres y Mujeres')
StripPlot
px.strip(tips, x="day", y="total_bill")
px.strip(tips, x="total_bill", y="time",
orientation="h", color="smoker")
px.strip(tips, x="day", y="total_bill",
color="sex", stripmode='overlay')
Scatterplot
gapminder = px.data.gapminder()
gapminder2007 = gapminder.query("year==2007")
px.scatter(gapminder2007, x="gdpPercap", y="lifeExp")
px.scatter(gapminder2007, x="gdpPercap", y="lifeExp", color="continent")
px.scatter(gapminder2007, x="gdpPercap", y="lifeExp", size="pop", color="continent", size_max=60)
px.scatter(gapminder2007, x="gdpPercap", y="lifeExp", size="pop", color="continent",
hover_name="country", log_x=True, size_max=60)
Regresion Lineal
px.scatter(tips,x='total_bill',y='tip',trendline='ols')
Matrix Plot
px.scatter_matrix(tips)
px.scatter_matrix(tips, dimensions=['total_bill','tip','size'])
px.scatter_matrix(tips, dimensions=['total_bill','tip','size'], color='sex')
HeatMap
tips.head()
total_bill | tip | sex | smoker | day | time | size | |
---|---|---|---|---|---|---|---|
0 | 16.99 | 1.01 | Female | No | Sun | Dinner | 2 |
1 | 10.34 | 1.66 | Male | No | Sun | Dinner | 3 |
2 | 21.01 | 3.50 | Male | No | Sun | Dinner | 3 |
3 | 23.68 | 3.31 | Male | No | Sun | Dinner | 2 |
4 | 24.59 | 3.61 | Female | No | Sun | Dinner | 4 |
# Matriz de correlacion de los datos
tips_corr = tips.corr(numeric_only=True)
tips_corr
total_bill | tip | size | |
---|---|---|---|
total_bill | 1.000000 | 0.675734 | 0.598315 |
tip | 0.675734 | 1.000000 | 0.489299 |
size | 0.598315 | 0.489299 | 1.000000 |
px.imshow(tips_corr, text_auto=True,
color_continuous_scale='Viridis')
Animaciones con Plotly
px.scatter(gapminder, x="gdpPercap", y="lifeExp",
animation_frame="year", animation_group="country",
size="pop", color="continent", hover_name="country",
log_x=True, size_max=45, range_x=[100,100000], range_y=[25,90])
Division de Columnas y filas por Categorias (Facet)
px.scatter(gapminder2007, x="gdpPercap", y="lifeExp", size="pop",
color="continent",
hover_name="country",
size_max=60, facet_col='continent',
log_x=True)
px.scatter(gapminder, x="gdpPercap", y="lifeExp",
animation_frame="year", animation_group="country",
size="pop", color="continent", hover_name="country",
facet_col="continent",
log_x=True, size_max=45, range_x=[100, 100000], range_y=[25, 90])
px.histogram(tips,'total_bill', facet_col="time", facet_row="smoker")
px.scatter(tips, x="total_bill", y="tip",
facet_row="smoker", facet_col="time", color="sex")
px.scatter(tips, x="total_bill", y="tip", facet_row="time", facet_col="day", color="smoker",
category_orders={"day": ["Thur", "Fri", "Sat", "Sun"], "time": ["Lunch", "Dinner"]})
Graficos en Margenes
px.scatter(tips,x='total_bill',y='tip',
marginal_x='histogram',
marginal_y='histogram')
px.scatter(tips,x='total_bill',y='tip',
marginal_x='violin',
marginal_y ='box')
px.scatter(tips,x='total_bill',y='tip',
marginal_x='violin',
marginal_y ='box',
color='sex')
Referencias
- https://matplotlib.org/stable/gallery/index.html - Una gran galería que muestra varios tipos de graficos matplotlib. ¡Muy recomendable!
- http://www.loria.fr/~rougier/teaching/matplotlib - Un Buen tutorial de matplotlib.
- https://medium.com/plotly/introducing-plotly-express-808df010143d
- https://plotly.com/python/plotly-express/
- http://seaborn.pydata.org/ - Documentacion Seaborn otra libreria de graficas estadisticas
- https://matplotlib.org/stable/api/markers_api.html - documentacion de marcadores
Phd. Jose R. Zapata