Visualizacion Interactiva con Plotly

Por Jose R. Zapata

Ultima actualizacion: 14/Nov/2023

PLOTLY: Libreria de Visualizacion Interactiva

Plotly es una libreria de graficos interactivos de código abierto que admite más de 40 tipos de gráficos únicos que cubren una amplia gama de casos de uso estadísticos, financieros, geográficos, científicos y tridimensionales.

Ademas de ser interactivo y obtener los valores en cada punto de la gráfica, se pueden mezclar datos numéricos y categóricos.

Instalacion Plotly

´pip install plotly´

Importar Plotly express

Plotly express es un modulo para usar de forma rapida y concisa de usar la visualización interactiva de plotly

Nota: Los datos siempre deben estar en un dataframe

import plotly.express as px

Datos integrados en Plotly

Plotly viene con algunos data sets clasicos integrados para hacer pruebas:

  • carshare
  • election
  • gapminder
  • iris
  • tips
  • wind

también se pueden encontar otros datasets clasicos de demostracion en formato .csv en: https://github.com/mwaskom/seaborn-data

tips = px.data.tips() # Importar el dataset tips
type(tips)
pandas.core.frame.DataFrame
print(px.data.tips.__doc__)
    Each row represents a restaurant bill.

    https://vincentarelbundock.github.io/Rdatasets/doc/reshape2/tips.html

    Returns:
        A `pandas.DataFrame` with 244 rows and the following columns: `['total_bill', 'tip', 'sex', 'smoker', 'day', 'time', 'size']`.
tips.head() # ver los primeros 5 registros
total_billtipsexsmokerdaytimesize
016.991.01FemaleNoSunDinner2
110.341.66MaleNoSunDinner3
221.013.50MaleNoSunDinner3
323.683.31MaleNoSunDinner2
424.593.61FemaleNoSunDinner4
tips.dtypes #tipos de datos en el dataframe
total_bill    float64
tip           float64
sex            object
smoker         object
day            object
time           object
size            int64
dtype: object
tips.describe() #Resumen estadistico de los datos del data frame por columna
total_billtipsize
count244.000000244.000000244.000000
mean19.7859432.9982792.569672
std8.9024121.3836380.951100
min3.0700001.0000001.000000
25%13.3475002.0000002.000000
50%17.7950002.9000002.000000
75%24.1275003.5625003.000000
max50.81000010.0000006.000000

Tipos de Graficas con Plotly

Lineas

px.line(tips,y='total_bill',title='Valor Total de la Cuenta')
0501001502001020304050
Valor Total de la Cuentaindextotal_bill

Barras

px.bar(tips, x="sex", y="total_bill")
px.bar(tips, x="sex", y="total_bill", color='sex')
FemaleMale050010001500200025003000
sexFemaleMalesextotal_bill

Histograma

px.histogram(tips,'total_bill',title='Histograma Valor Total de la Cuenta')
1020304050051015202530
Histograma Valor Total de la Cuentatotal_billcount
px.histogram(tips,'sex',title='Histograma de Generos')
FemaleMale020406080100120140160
Histograma de Generossexcount
px.histogram(tips,'day',
             category_orders= {'day': ["Thur","Fri","Sat", "Sun"]},
             title='Histograma de Dias')

Boxplot

px.box(tips,y='total_bill', title='Boxplot Valor Total de la Cuenta')
px.box(tips,x = 'day',y='total_bill', color='day',
       title='Boxplots por dia del Valor Total de la Cuenta')
SunSatThurFri1020304050
daySunSatThurFriBoxplots por dia del Valor Total de la Cuentadaytotal_bill
px.box(tips,x = 'day',y='total_bill', title= 'Boxplot por dia con dias en orden',
       category_orders= {'day': ["Thur","Fri","Sat", "Sun"]})
ThurFriSatSun1020304050
Boxplot por dia con dias en ordendaytotal_bill
px.box(tips,x = 'day',y='total_bill', color='smoker', category_orders= {'day': ["Thur","Fri","Sat", "Sun"]})
px.box(tips,x = 'day',y='total_bill', color='smoker',
       boxmode='overlay',
       title = 'Boxplots de cuenta total por dia, fumador o no , sobrepuestos ',
       category_orders= {'day': ["Thur","Fri","Sat", "Sun"]})
ThurFriSatSun1020304050
smokerNoYesBoxplots de cuenta total por dia, fumador o no , sobrepuestosdaytotal_bill

Violin Plot

px.violin(tips,y='total_bill', title='Boxplot Valor Total de la Cuenta')
px.violin(tips,x = 'day',y='total_bill', title='Violin por dia del Valor Total de la Cuenta')
SunSatThurFri0102030405060
Violin por dia del Valor Total de la Cuentadaytotal_bill
px.violin(tips,x = 'day',y='total_bill', color='day',
          title='Violin por dia del Valor Total de la Cuenta')
SunSatThurFri0102030405060
daySunSatThurFriViolin por dia del Valor Total de la Cuentadaytotal_bill
px.violin(tips,x = 'day',y='total_bill', color='sex',
          title='Violin por dia del Valor Total de la Cuenta')
SunSatThurFri−100102030405060
sexFemaleMaleViolin por dia del Valor Total de la Cuentadaytotal_bill
px.violin(tips,x = 'day',y='total_bill', color='sex',violinmode='overlay',
          title='Violin por dia del Valor Total de la Cuenta, Hombres y Mujeres')
SunSatThurFri−100102030405060
sexFemaleMaleViolin por dia del Valor Total de la Cuenta, Hombres y Mujeresdaytotal_bill

StripPlot

px.strip(tips, x="day", y="total_bill")
px.strip(tips, x="total_bill", y="time",
         orientation="h", color="smoker")
px.strip(tips, x="day", y="total_bill",
         color="sex", stripmode='overlay')

Scatterplot

gapminder = px.data.gapminder()
gapminder2007 = gapminder.query("year==2007")
px.scatter(gapminder2007, x="gdpPercap", y="lifeExp")
px.scatter(gapminder2007, x="gdpPercap", y="lifeExp", color="continent")
010k20k30k40k50k40455055606570758085
continentAsiaEuropeAfricaAmericasOceaniagdpPercaplifeExp
px.scatter(gapminder2007, x="gdpPercap", y="lifeExp", size="pop", color="continent", size_max=60)
010k20k30k40k50k4050607080
continentAsiaEuropeAfricaAmericasOceaniagdpPercaplifeExp
px.scatter(gapminder2007, x="gdpPercap", y="lifeExp", size="pop", color="continent",
           hover_name="country", log_x=True, size_max=60)
2345678910002345678910k234564050607080
continentAsiaEuropeAfricaAmericasOceaniagdpPercaplifeExp

Regresion Lineal

px.scatter(tips,x='total_bill',y='tip',trendline='ols')

Matrix Plot

px.scatter_matrix(tips)
020400510FemaleMaleNoYesSunThurDinnerLunch020402460510FemaleMaleNoYesSunSatThurFriDinnerLunch246
total_billtipsexsmokerdaytimesizetotal_billtipsexsmokerdaytimesize
px.scatter_matrix(tips, dimensions=['total_bill','tip','size'])
0204051002040246510246
total_billtipsizetotal_billtipsize
px.scatter_matrix(tips, dimensions=['total_bill','tip','size'], color='sex')

HeatMap

tips.head()
total_billtipsexsmokerdaytimesize
016.991.01FemaleNoSunDinner2
110.341.66MaleNoSunDinner3
221.013.50MaleNoSunDinner3
323.683.31MaleNoSunDinner2
424.593.61FemaleNoSunDinner4
# Matriz de correlacion de los datos
tips_corr = tips.corr(numeric_only=True)
tips_corr
total_billtipsize
total_bill1.0000000.6757340.598315
tip0.6757341.0000000.489299
size0.5983150.4892991.000000
px.imshow(tips_corr, text_auto=True,
          color_continuous_scale='Viridis')
total_billtipsizesize tip total_bill
0.59831513090490250.489298775230357751.00.67573410921136421.00.489298775230357751.00.67573410921136420.5983151309049025

Animaciones con Plotly

px.scatter(gapminder, x="gdpPercap", y="lifeExp",
           animation_frame="year", animation_group="country",
           size="pop", color="continent", hover_name="country",
           log_x=True, size_max=45, range_x=[100,100000], range_y=[25,90])

Division de Columnas y filas por Categorias (Facet)

px.scatter(gapminder2007, x="gdpPercap", y="lifeExp", size="pop",
           color="continent",
           hover_name="country",
           size_max=60, facet_col='continent',
           log_x=True)
2510002510k25404550556065707580852510002510k252510002510k252510002510k252510002510k25
continentAsiaEuropeAfricaAmericasOceaniagdpPercapgdpPercapgdpPercapgdpPercapgdpPercaplifeExpcontinent=Asiacontinent=Europecontinent=Africacontinent=Americascontinent=Oceania
px.scatter(gapminder, x="gdpPercap", y="lifeExp",
           animation_frame="year", animation_group="country",
           size="pop", color="continent", hover_name="country",
           facet_col="continent",
           log_x=True, size_max=45, range_x=[100, 100000], range_y=[25, 90])
px.histogram(tips,'total_bill', facet_col="time", facet_row="smoker")
10203040500510151020304050051015
total_billtotal_billcountcounttime=Dinnertime=Lunchsmoker=Yessmoker=No
px.scatter(tips, x="total_bill", y="tip",
           facet_row="smoker", facet_col="time", color="sex")
0102030405024681001020304050246810
sexFemaleMaletotal_billtotal_billtiptiptime=Dinnertime=Lunchsmoker=Yessmoker=No
px.scatter(tips, x="total_bill", y="tip", facet_row="time", facet_col="day", color="smoker",
          category_orders={"day": ["Thur", "Fri", "Sat", "Sun"], "time": ["Lunch", "Dinner"]})
02040246810020400204002040246810
smokerNoYestotal_billtotal_billtotal_billtotal_billtiptipday=Thurday=Friday=Satday=Suntime=Dinnertime=Lunch

Graficos en Margenes

px.scatter(tips,x='total_bill',y='tip',
          marginal_x='histogram',
          marginal_y='histogram')
px.scatter(tips,x='total_bill',y='tip',
          marginal_x='violin',
          marginal_y ='box')
px.scatter(tips,x='total_bill',y='tip',
          marginal_x='violin',
          marginal_y ='box',
          color='sex')

Referencias

Phd. Jose R. Zapata

Siguiente