Upload
jonathan-dexter
View
207
Download
0
Embed Size (px)
Citation preview
F# and Data 101Utilizing FsLab to Quickly Understand your Data
Jonathan Dexter, Technology Manager of .NET, The Nerdery
AgendaWhat is this talk aboutGet the data!Transform!Science!???Prot!
If you came here to hear about"Monads""Functors"Tail-call optimizationImmutable designPattern matching
What we WILL talk about : FsLab
With Paket
1: 2:
paketinitpaketaddnugetfslab
With NuGet
1: nugetinstallfslabOutputDirectorypackages
The process(Not limited to FsLab)
1: 2: 3: 4:
acquiredata|>transform|>science|>visualize
FsLab: Scratching the Surface5 Libraries
Sorting the libraries above into categories:
Library Acquire Transform Science Display
F# Data Yep! - - -
Deedle - Yep - -
.NET Numerics - Supports Supports -
R Type Provider Partial Yep Yep Partial
XPlot - - - Yep
Step one: Acquire
Classic scenario: CSV
Using CSV type provider
1: 2:
typecsv=FSharp.Data.CsvProviderletcomplaints=csv.Load(complaintsCsv)
Using a data frame
1: letdata=Deedle.Frame.ReadCsv(__SOURCE_DIRECTORY__+"/data.csv")
Type Provider Scenario: World bank provider
World bank provider is bundled with F# Data
1: 2: 3: 4: 5: 6:
letdataContext=FSharp.Data.WorldBankData.GetDataContext()lethighTechExports=dataContext.Countries.``UnitedStates``.Indicators.``Hightechnologyexports(currentUS$)``
Type Provider Scenario: JSON provider
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20:
typeJsonContext=FSharp.Data.JsonProvider
Type Provider Scenario: JSON Provider (cont.)
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12:
letmissedVotes=JsonContext.Load(sprintf"%s?apikey=%s"missedVotesUrlapikey)
letcongressmen=missedVotes.Results|>Seq.collect(funr>r.Members)
lettopMissingCongressman=congressmen|>Seq.sortBy(funm>trym.MissedVotesPctwith|ex>0.0m)|>Seq.rev|>Seq.head
Type Provider Scenario: JSON Provider (cont.)
{"id":"M000309","name":"CarolynMcCarthy","party":"D","state":"NY","district":"4","total_votes":"1192","missed_votes":"687","missed_votes_pct":"57.63","rank":"1","notes":"Willretireattheendof113thCongress."}
Batteries not included: SQL Provider
*SQL Data Connection
*SQL Entity Connectoin
SQL Client
SQL Provider
https://msdn.microsoft.com/en-us/library/hh362320.aspxhttps://msdn.microsoft.com/en-us/library/hh362320.aspxhttps://github.com/fsprojects/FSharp.Data.SqlClienthttps://github.com/fsprojects/SQLProvider
SQL Data Connection
Type provder for an entire database, MS SQL focused.
1: 2: 3: 4:
typedbSchema=SqlDataConnectionletdb=dbSchema.GetDataContext()
SQL Entity Connection
Type provder for an entire database, through ADO.NET Entity model.
1: 2: 3: 4:
typedbSchema=SqlEntityConnectionletdb=dbSchema.GetDataContext()
SQL Client
Type provider for commands, sprocs, and queries
1: 2: 3: 4:
usecmd=newSqlCommandProvider()
1: letresults=cmd.Execute(region="USA")
SQL Provider
Type provider for DB as a whole
MS SQL, Postgres, SQLite, MySQL, Oracle, MS Access
1: 2: 3: 4:
typesql=SqlDataProvider
letctx=sql.GetDataContext()
Step two: Transform
Deedle: Convert to data frame
1: 2: 3:
openDeedleopenFSharp.DataopenFSharp.Data.Runtime.BaseTypes
Expander code omitted, but can be found here
1: 2: 3: 4:
letdataFrame=[forlincongressmen>series["It"=>l]]|>Frame.ofRowsOrdinal|>Frame.expandAllCols10
https://github.com/fslaborg/FsLab/issues/14
Deedle: Normal syntax
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12:
lethighTechExportData=WorldBankData.GetDataContext().Countries.``UnitedStates``.Indicators.``Hightechnologyexports(currentUS$)``
lethighTechFrame=highTechExportData|>Frame.ofRecords|>Frame.indexRowsInt"Item1"|>Frame.mapColKeys(fun_>"HighTechExports")
Deedle: Quick manipulations
Simple statistics
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15:
letstats=["Min"=>Stats.minhighTechFrame"Max"=>Stats.maxhighTechFrame"Average"=>Stats.meanhighTechFrame"StandardDeviation"=>Stats.stdDevhighTechFrame]letobservations=highTechFrame?``HighTechExports``|>Series.observations|>Seq.map(fun(k,v)>floatk,floatv)
letregression=observations|>MathNet.Numerics.LinearRegression.SimpleRegression.Fit
Results
Stats
[("Min",series[HighTechExports=>76767867475])("Max",series[HighTechExports=>220884471208])("Average",series[HighTechExports=>152642394565.462])("StandardDeviation",series[HighTechExports=>39746534154.5001])]
Regression fit (intercept, slope)
(5.799301011e+12,2973741397.0)
R Squared
R squared value
1: 2:
letrsquared=GoodnessOfFit.RSquared(regressedValues|>Seq.mapsnd,observations|>Seq.mapsnd)
0.3274641292
Deedle: Combining data and additional feature creation
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14:
letexportFrame=WorldBankData.GetDataContext().Countries.``UnitedStates``.Indicators.``Exportsofgoodsandservices(currentUS$)``|>Frame.ofRecords|>Frame.indexRowsInt"Item1"|>Frame.mapColKeys(fun_>"TotalExports")
exportFrame?``HighTechExports``
Deedle: Straight to R
Arrrr
Deedle: Straight to R
1: 2: 3: 4:
openRProvider.``base``openRProviderletrFrame=R.as_data_frame(exportFrame)letrFrameSummary=(R.summaryrFrame)
["TotalExportsMin.:2.700e+10""TotalExports1stQu.:1.110e+11""TotalExportsMedian:3.639e+11""TotalExportsMean:6.684e+11""TotalExports3rdQu.:1.015e+12""TotalExportsMax.:2.342e+12""HighTechExportsMin.:7.677e+10""HighTechExports1stQu.:1.282e+11""HighTechExportsMedian:1.521e+11""HighTechExportsMean:1.526e+11""HighTechExports3rdQu.:1.763e+11""HighTechExportsMax.:2.209e+11""HighTechExportsNA's:29""PercentageofHighTechExportsMin.:0.06563""PercentageofHighTechExports1stQu.:0.12270""PercentageofHighTechExportsMedian:0.15502""PercentageofHighTechExportsMean:0.13875""PercentageofHighTechExports3rdQu.:0.16385""PercentageofHighTechExportsMax.:0.18290""PercentageofHighTechExportsNA's:29"]
Step three: Science
"Normal" data analysis languages
F# is slowly catching up
Machine Learning Algorithms (suite)
F# is slowly catching up
Natural Language Processing
F# is slowly catching up
Cloud computing
Step four: Visualize
Visualizing our previous information
With Google Charts
1: 2: 3: 4: 5: 6:
letpieChart=congressmen|>Seq.filter(func>tryc.MissedVotes>=0with_>false)|>Seq.groupBy(func>c.Party)|>Seq.map(fung>fstg,(sndg)|>Seq.sumBy(func>c.MissedVotes|>XPlot.GoogleCharts.Chart.Pie
Visualizing our previous information
Visualizing our previous information
1: 2: 3: 4:
letscatter=exportFrame?``TotalExports``|>Series.observations|>XPlot.GoogleCharts.Chart.Scatter
Visualizing our previous information
Summary
ResourcesPresentation code:
Presentation share:https://github.com/mandest/FSharpAndDataTalk
More F# Resources: *F# Guides on fsharp.org Functional Programming SlackF# Weekly
https://github.com/mandest/FSharpAndDataTalkhttp://fsharp.org/http://fpchat.com/https://sergeytihon.wordpress.com/category/f-weekly/