DISTRIBUTED MACHINE LEARNING EXAMPLES
STANLEY WANG
SOLUTION ARCHITECT, TECH LEAD
@SWANG68
http://www.linkedin.com/in/stanley-wang-a2b143b
Topic Modeling
• Topical categorization of blogs, documents or other objects that can be tagged with
text, improves the experience for end users;
• Discover Sets of
Topics from Large
Unstructured
Collections of
documents;
• Annotate
documents with
topic;
• Utilize Annotation
to Index, Search
and Classify on
documents;
The Intuitions behind LDA
• Latent Dirichlet Allocation (LDA) is an unsupervised, probabilistic, text
clustering algorithm. LDA defines a generative model that can be used
to model how documents are generated given a set of topics and the
words in the topics;
Graphical Model for LDA
• Topic-based text
classification;
• Topic modeling can be seen as
a pre-processing step before
applying supervised learning
methods, such as
Collaborative Filtering;
• Finding patterns in genetic
data, images, and social
networks;
Real Inference with LDA
• A 100-topic LDA model was fitted to 17,000 articles from the Science journal;
• At right are the top 15 most frequent words from the most frequent topics;
• At left are the inferred topic proportions for the example article from previous slide;
Topic Modeling and Analysis
What is Community Intuition?
In social world, community is a collection of users that are more closely
related to each other than the rest of the network. The relation between users
can be amount of interaction, similar interest, geographical factors etc.
Why Detect Social Communities?
• Behavior Analysis
• Location-based Interaction Analysis
• Recommender Systems Development
• Link Prediction
• Customer Interaction and Analysis
• Media & Content Analysis
• Security
• Social Studies
Community And Applications
Structure Metrics
Centrality Metrics
Metrics of Graph Analysis
Graph Modularity
Graph Modularity Computation
Graph Modularity Examples
Diverse of Centrality
Social Tag Clustering
Social Tag Clustering - Examples

Distributed machine learning examples

  • 1.
    DISTRIBUTED MACHINE LEARNINGEXAMPLES STANLEY WANG SOLUTION ARCHITECT, TECH LEAD @SWANG68 http://www.linkedin.com/in/stanley-wang-a2b143b
  • 2.
    Topic Modeling • Topicalcategorization of blogs, documents or other objects that can be tagged with text, improves the experience for end users; • Discover Sets of Topics from Large Unstructured Collections of documents; • Annotate documents with topic; • Utilize Annotation to Index, Search and Classify on documents;
  • 3.
    The Intuitions behindLDA • Latent Dirichlet Allocation (LDA) is an unsupervised, probabilistic, text clustering algorithm. LDA defines a generative model that can be used to model how documents are generated given a set of topics and the words in the topics;
  • 4.
    Graphical Model forLDA • Topic-based text classification; • Topic modeling can be seen as a pre-processing step before applying supervised learning methods, such as Collaborative Filtering; • Finding patterns in genetic data, images, and social networks;
  • 5.
    Real Inference withLDA • A 100-topic LDA model was fitted to 17,000 articles from the Science journal; • At right are the top 15 most frequent words from the most frequent topics; • At left are the inferred topic proportions for the example article from previous slide;
  • 6.
  • 8.
    What is CommunityIntuition? In social world, community is a collection of users that are more closely related to each other than the rest of the network. The relation between users can be amount of interaction, similar interest, geographical factors etc.
  • 9.
    Why Detect SocialCommunities? • Behavior Analysis • Location-based Interaction Analysis • Recommender Systems Development • Link Prediction • Customer Interaction and Analysis • Media & Content Analysis • Security • Social Studies
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.