Dato.com
Edit this on Github
GraphLab
Introduction
1.
Getting started
2.
Working with data
2.1.
Tabular data
2.1.1.
Loading and Saving
2.1.2.
Data Manipulation
2.1.3.
Spark RDDs
2.1.4.
SQL Databases
2.2.
Graph data
2.3.
Time series data
2.4.
Visualization
2.5.
Feature Engineering
2.5.1.
Numeric Features
2.5.1.1.
Quadratic Features
2.5.1.2.
Feature Binning
2.5.1.3.
Numeric Imputer
2.5.2.
Categorical Features
2.5.2.1.
One Hot Encoder
2.5.2.2.
Count Thresholder
2.5.2.3.
Categorical Imputer
2.5.3.
Text Features
2.5.3.1.
TF-IDF
2.5.3.2.
Tokenizer
2.5.3.3.
BM25
2.5.4.
Image Features
2.5.4.1.
Deep Feature Extractor
2.5.5.
Other Transformations
2.5.5.1.
Hasher
2.5.5.2.
Transformer Chain
2.5.5.3.
Custom Transformer
3.
Modeling data
3.1.
Graph analytics
3.1.1.
Examples
3.2.
Regression
3.2.1.
Linear Regression
3.2.2.
Random Forest Regression
3.2.3.
Boosted Trees Regression
3.3.
Classification
3.3.1.
Logistic Regression
3.3.2.
Nearest Neighbor Classifier
3.3.3.
SVM
3.3.4.
Random Forest Classifier
3.3.5.
Boosted Trees Classifier
3.3.6.
Neuralnet Classifier
3.4.
Clustering
3.4.1.
KMeans
3.4.2.
DBSCAN
3.5.
Nearest Neighbors
3.6.
Text analysis
3.6.1.
Processing text
3.6.2.
Topic models
3.7.
Evaluating Models
3.7.1.
Regression Metrics
3.7.2.
Classification Metrics
3.8.
Model parameter search
3.8.1.
Models
3.8.2.
Choosing a search space
3.8.3.
Evaluation functions
3.8.4.
Distributed execution
4.
Applications
4.1.
Recommender systems
4.1.1.
Choosing a model
4.1.2.
Making recommendations
4.1.3.
Finding similar items
4.2.
Data matching
4.2.1.
Record Linker
4.2.2.
Deduplication
4.2.3.
Autotagger
4.2.4.
Similarity Search
4.3.
Churn prediction
4.4.
Frequent Pattern Mining
4.5.
Sentiment analysis
4.5.1.
Applying a sentiment classifier
4.5.2.
Product sentiment analysis and review data
5.
Dato Distributed
5.1.
Asynchronous Jobs
5.2.
Installing on Hadoop
5.3.
Clusters
5.4.
End-to-End Example
5.5.
Distributed Job Execution
5.6.
Distributed Machine Learning
5.7.
Monitoring Jobs
5.8.
Session Management
5.9.
Dependencies
6.
Predictive Services
6.1.
Getting Started
6.2.
Launching
6.3.
Querying
6.4.
Predictive Objects
6.5.
Logging and Feedback
6.6.
Dependencies
6.7.
Experimentation
6.8.
Operations
6.8.1.
Monitoring and Metrics
6.8.2.
Administration
6.9.
Best Practices
6.10.
Run On-Premises
7.
Conclusion
8.
Exercises
8.1.
Tabular data
8.2.
Graph data
8.3.
Graph analytics
8.4.
Classification
8.5.
Text analysis
8.6.
Recommender systems
9.
FAQ/Common Problems
10.
Contributing
Powered by
GitBook
GraphLab
Text features
These feature transformations are useful when you have text data.
TF-IDF
Tokenizer
BM25