Amruta Purandare Resume/CV
Get a
printer-friendly version: [resume.pdf] [resume.doc]
Research Interests: Artificial Intelligence, Natural Language
Processing, Media and Entertainment Technology
Related Areas: Information Retrieval, Data Mining,
Human-Computer Interaction, Multimedia Processing
=============
Work Experience:
=============
Research Staff at Singapore Management University (April 2009 -
Present)
Division: School of Information Systems Research Center
(Singapore)
Keywords: Social Network Analysis, Data Mining, Web 2.0
Research Intern at SONY Corporation (July – September 2008)
Division: Intelligent Systems Research Laboratory (Tokyo, Japan)
Project Title: Web Mining for Artist Relation Extraction
Research Assistant at University of Pittsburgh (September 2005 –
April 2008)
Department: Intelligent Systems Program (Pittsburgh,
Pennsylvania)
Keywords: Spoken Dialogs, Humor Analysis, Discourse Coherence,
Conversational Interfaces
Summer Intern at University of Southern California (May – August
2007)
Division: Information Sciences Institute (Los Angeles,
California)
Keywords: Lexical Semantics, Information Extraction, Text Mining
Software Development Engineering Intern at Amazon.com (May -
August 2005)
Division: Item Metadata Analysis Group (Seattle, Washington)
Keywords: Data Mining, Text Clustering, Ontology Building
Research Assistant at University of Minnesota, Duluth (August
2002 – August 2004)
Department: Computer Science (Duluth, Minnesota)
Keywords: Word Sense Disambiguation, Latent Semantic Analysis,
Text Similarity, N-gram Modeling
========
Education:
========
Master’s in Intelligent Systems (Aug 2006)
University of Pittsburgh
Master’s in Computer Science (Aug 2004)
University of Minnesota, Duluth
Bachelor’s in Computer Engineering (July 2002)
University of Pune (India)
Best Outgoing Student Award
===============
Featured Publication:
===============
Humor: Prosody Analysis and
Automatic Recognition for F.R.I.E.N.D.S. (A. Purandare
and D. Litman) – Proceedings of the Conference on
Empirical Methods in Natural Language Processing (EMNLP), July 22-23, 2006, Sydney,
Australia.
-----------------------
Other Publications:
-----------------------
Analyzing Dialog Coherence using
Transition Patterns in Lexical and Semantic Features (A. Purandare
and D. Litman) – Proceedings of the 21st Florida
Artificial Intelligence Research Society (FLAIRS) special track on Applied
Natural Language Processing, May 15-17, 2008, Coconut Grove, Florida.
Content-Learning Correlations in
Spoken Tutoring Dialogs at Word, Turn and Discourse Levels (A. Purandare and D. Litman) -
Proceedings of the 21st Florida Artificial Intelligence Research Society
(FLAIRS) special track on Intelligent Tutoring Systems, May 15-17, 2008,
Coconut Grove, Florida.
Uncertainty Corpus: Resource to
Study User Affect in Complex Spoken Dialogue Systems (K. Forbes-Riley, D. Litman, S. Silliman and A. Purandare)
- Proceedings 6th Language Resources and Evaluation Conference (LREC), May
28-30, 2008, Marrakech, Morocco.
Comparing Linguistic Features for
Modeling Learning in Computer Dialogue Tutoring (K. Forbes-Riley, D. Litman, A. Purandare, M. Rotaru and J. Tetreault) -
Proceedings 13th International Confereence on Artificial Intelligence in
Education (AIED) Los Angeles, CA, July, 2007.
Using System and User Performance
Features to Improve Emotion Detection in Spoken Tutoring Dialogs (H. Ai, D. Litman, K. Forbes-Riley, M. Rotaru,
J. Tetreault, A. Purandare)
- Proceedings of Interspeech ICSLP, September, 2006,
Pittsburgh, PA.
Resolving Ambiguities in
Biomedical Text with Unsupervised Clustering Approaches (G. Savova,
T. Pedersen, A. Purandare and A. Kulkarni)
- University of Minnesota Supercomputing Institute Research Report UMSI 2005/80
and CB Number 2005/21, May.
Name Discrimination by Clustering
Similar Contexts (T. Pedersen, A. Purandare, and A. Kulkarni) - Proceedings of the Sixth International
Conference on Intelligent Text Processing and Computational Linguistics
(CICLING), February 13-19, 2005, Mexico City.
Unsupervised Word Sense
Discrimination by Clustering Similar Contexts (A. Purandare
and T. Pedersen) - University of Minnesota Supercomputing Institute
Research Report UMSI 2004/146, August 2004 (Master’s Thesis).
The Senseval-3 Multilingual
English-Hindi lexical sample task (T. Chklovski, R. Mihalcea, T. Pedersen, and A. Purandare)
- Proceedings of the Third International Workshop on the Evaluation of Systems
for the Semantic Analysis of Text (SENSEVAL-3), July 25-26, 2004, Barcelona,
Spain.
Word Sense Discrimination by
Clustering Contexts in Vector and Similarity Spaces (A. Purandare
and T. Pedersen) - Proceedings of the Conference on Computational Natural
Language Learning (CoNLL), May 6-7, 2004, Boston, MA.
SenseClusters - Finding
Clusters that Represent Word Senses (A. Purandare and
T. Pedersen) – Intelligent Systems Demonstration in Proceedings of the
Nineteenth National Conference on Artificial Intelligence (AAAI-04), July
25-29, 2004, San Jose, CA. [Also presented in NAACL-2004 Demonstration Session]
Improving Word Sense
Discrimination with Gloss Augmented Feature Vectors (A. Purandare
and T. Pedersen) - Proceedings of the Workshop on Lexical Resources for the Web
and Word Sense Disambiguation, November 22, 2004, Puebla Mexico.
Discriminating Among Word Senses
Using McQuitty's Similarity Analysis (A. Purandare) - Proceedings of the Student Research Workshop
at Human Language Technology and North American Chapter of the Association for
Computational Linguistics (HLT-NAACL), May 30-31, 2003, Edmonton, Canada
===========
Achievements:
===========
Internship offer from SONY Research Lab, Japan (Summer 2008)
Internship offer from IBM T. J. Watson Research Center in New
York Hawthorne (Summer 2007)
Internship offer from the Information Sciences Institute at USC
(Summer 2007)
People’s Choice Award (Feb 2006)
Graduate Student Poster Competition
Computer Science Day at University of Pittsburgh
Internship offer from Amazon.com (Summer 2005)
India Foundation Scholarship (2002)
Best Outgoing Student (2002)
Computer Engineering Department
Cummins College of Engineering, Pune (India)
2nd Prize in Artificial Intelligence & Fuzzy Logic (2001)
Concepts-2001, Technical Paper Presentation Competition
Pune Institute of Computer Technology (India)
Merit-Based Scholarships for Academic Excellence (1998-2002)
Cummins College of Engineering, Pune (India)
10 Awards by Garware High-School, Pune (India)
For Academic Performance (89.2%) in the SSC Board Exam (1996)
Includes the 1st Prize in English (86/100)
===============
Conference Activities:
===============
Program Committee Member for the FLAIRS Conference (2009-2010)
Reviewer for the Association of Computational Linguistics (ACL)
Conference (2008)
Reviewer for the Journal of Biomedical Informatics (January
2008)
Student Member on the SEMEVAL (former SENSEVAL) Program
Committee (2005 - 2007)
Program Committee Member for the Workshop on Building and Using
Parallel Texts: Data Driven Machine Translation and Beyond (held at ACL 2005)
Co-organizer for the Senseval-3 English-Hindi Lexical Sample
Task (2004)
National and International Conferences attended: ACL 2009
(Singapore), unConference 2009 (Singapore), Ad-Tech
2009 (Singapore), FLAIRS 2008 (Miami, FL), NAACL 2007 (Rochester, NY), NAACL
2006 (New York City, NY), ACL 2006 (Sydney, Australia), EMNLP 2006 (Sydney,
Australia), InterSpeech /ICSLP 2006 (Pittsburgh, PA),
NAACL 2004 (Boston, MA), CoNLL 2004 (Boston, MA),
AAAI 2004 (San Jose, CA), Conference on Emails and Anti-Spams
(CEAS) 2004 (Palo Alto, CA), NAACL 2003 (Edmonton, Canada)
Conference Volunteer at ACL 2009, InterSpeech/ICSLP
2006, AAAI 2004, NAACL 2003
=====
Skills:
=====
Human Languages: English, Hindi, Marathi (fluent in each)
Computer Languages: Perl, C, C++
Software Tools used in Projects: MySQL (database), CGI, Javascript
(web programming), Apache (web server), Weka (machine
learning), Cluto (clustering), Graphviz,
TouchGraph, GCluto
(visualization), Wavesurfer (audio processing), Avid
(digital video editing), SVDPackC (dimensionality
reduction), VHDL (hardware programming), Prolog (logic programming), Corel
Draw, OpenGL (Graphics), Latex (paper writing)
NLP Tools used in Research: WordNet (lexicon), Charniak
Parser (tree-bank style parsing), Minipar (dependency
style parsing), Brill Tagger (part-of-speech tagging), Mxterminator
(sentence boundary detection), SenseClusters
(founding developer), N-gram Statistics Package (co-developer)
Datasets used in Research: English
Gigaword, Wikipedia, Google N-gram, TREC,
Switchboard, Enron Emails, FRIENDS tv-show, Product Inventory
from Amazon.com
Research Projects:
==============
Summer Project at SONY (July –
September 2008)
Project Title: Web Mining for Artist Relation Extraction
Summary:
In this project, we automatically
collect information about famous artists from the Web using Information
Extraction and Text Mining algorithms from Natural Language Processing. Key
milestones for this project were to - (1) design and implement relation
extraction algorithms (2) build a large-scale relation database for several
thousand artists (3) evaluate methods based on precision and recall (4) study
metrics for computing relational similarity (e.g. “X collaborated with Y”
is similar to “X worked with Y”, “X co-starred with Y”, “X
teamed up with Y” etc). Our database of artist relations stores
entity-relation triples in the form of (X, R, Y) where R represents a relation
between an artist X and another entity Y (e.g. “X co-starred with Y”, “X is
influenced by Y”, “X married Y”, “X was born in Y”, “X signed contract with Y”,
“X was nominated for Y”, “X auditioned for Y” etc). Our relation extraction
algorithms are designed such that there is no restriction on the type of entity
Y or the set of relations R to be discovered. For example, entity Y could be a
person (e.g. another artist), location, organization, award, movie title etc.
Such database can be used for answering a broad-range of queries such as:
[1] Given two artists X and Y,
describe their relation
[2] Find all entities Y related to
a given artist X by relation R
[3] Find all artists X related to
a given entity Y by relation R
[4] Given two artists X and Y,
find all entities Z that have a same relation R with both X and Y
Research at University of
Pittsburgh (September 2005 – April 2008)
Keywords: Spoken Dialogs, Humor Analysis, Dialog Coherence,
Conversational Interfaces
Summary:
Summer Project at USC/ISI (May –
August 2007)
Keywords: Lexical Semantics, Information Extraction, Text
Mining, Semantic Knowledge Acquisition
Summary:
1.
Relation Extraction - Given a pair of words such as library :: book, hospital
:: doctor, camera :: picture, Honda :: car, Shakespeare :: Hamlet, India :: “Taj Mahal”, the program automatically
describes their semantic relation, e.g. “library collects books”, “doctor
works in a hospital”, “Honda manufactures cars”, “Shakespeare is the author of
Hamlet”, “Taj Mahal is
located in India” etc.
4. Set Expansion – This
functionality is similar to Google Sets and retrieves other words that are
similar to a given set of words, e.g. if the given set is (Honda, Toyota,
Mercedes, BMW), the program extracts other car companies like (Nissan,
Chrysler, Volvo, Mazda, Ford, Fiat, Chevrolet, …) etc.
5. Matching Pairs – Given
two sets of words, the program re-orders the second set such that there is a
one-to-one alignment between the two sets. For example, if the given two sets
are: (Honda, IBM, IKEA, Sony, DSW) and (furniture, shoes, computer, car,
camera), the output of the program will look like - (Honda :: car, IBM ::
computer, IKEA :: furniture, Sony :: camera, DSW :: shoes).
Such knowledge about similarities
and relations between lexical entities is useful for applications like
Information Retrieval and Question Answering, so that user queries like
“Japanese car companies”, “beaches in Hawaii”, “books written by Shakespeare”,
“green leafy vegetables”, “nominees of Oscar 2007”, “female actress who played
the role of Rachel Green in FRIENDS” etc can be answered with specific entity
names, rather than giving a list of documents or text snippets that match such
phrases.
Summer Project at Amazon.com (May
– August 2005)
Keywords: Data Mining, Text Clustering, Ontology Learning
Summary:
During this internship, I designed
and implemented a prototype web-service for clustering and analyzing similar
consumer products on Amazon.com. Manually categorizing a large collection of
product items is not only time-consuming but also leads to inconsistent
categories. For example, a merchant may group golf balls in the same category
as tennis balls, cricket balls, baseball balls, whereas another merchant may
group them with other golf equipments: golf drivers, golf gloves, golf shoes
etc. We therefore developed an automatic method for organizing a large
collection of items based on the similarity of products’ features and
descriptions. Apart from organizing the items, the method is also useful for
retrieving other similar products for recommendation and building product
ontology.
Example Ontology:
(Consumer Products (Electronics) (Furniture) (Apparel)
(Cosmetics) (Jewelry) …)
(Electronics (Computers) (Cameras) (Cell Phones) (Home
Appliances) (Music Instruments) ...)
(Home Appliances (Televisions) (Telephones) (Kitchen Appliances)
(Laundry Machines) …)
(Kitchen Appliances (refrigerator) (oven) (microwave) (toaster)
(coffee-maker) …)
(Music Instruments (violin) (guitar) (piano) (drums) (clarinet)
…)
...
Research at University of
Minnesota, Duluth (August 2002 – August 2004)
Keywords: Word Sense Disambiguation, Latent Semantic Analysis,
Text Similarity, N-gram Modeling
Summary:
1. As a part of my Master’s thesis
and research work, I developed the open source software SenseClusters
(http://senseclusters.sourceforge.net)
for clustering contextually similar text units (words, sentences, paragraphs,
documents etc), including the support for data pre-processing, feature selection,
dimensionality reduction and evaluation of the output clusters.
2. My Master’s thesis on “Unsupervised
Word Sense Discrimination” compares the effect of using first versus second
order feature representations, vector versus similarity space clustering, using
local versus global training data and augmenting corpus derived feature vectors
with dictionary glosses on the performance of word sense discrimination.
3. I was a co-organizer for the
Senseval-3 English-Hindi lexical sample task and helped in collecting data on
Hindi word senses.
4. I was also a co-developer on
the N-gram Statistics Package (http://ngram.sourceforge.net)
and added programs for computing statistics on higher order n-grams (N > 2),
large-scale n-gram counting and for extracting k-th
order word co-occurrences.
Undergraduate Project at Cummins
College of Engineering, Pune (2001)
Keywords: Natural Language Understanding, Knowledge
Representation, Logical Inference
Summary:
The project proposed a prototype
model for a language understanding system that stores natural language text as
predicate-forms in first-order logic, and applies inference rules (like
resolution) to test for entailments, and to detect logical errors and contradictions
in the text. The model was tested on sample text paragraphs collected from the
GRE analytical section, using the PROLOG inference engine. This was the first
time when I came across many issues in processing natural language texts using
computers, such as synonymy, ambiguity, reference resolution, dependency
parsing (identifying the subject, object, verb in the sentence) and some
common-sense reasoning such as “x can donate y only if x first owns y”
etc. The project was presented at Concepts-2001, the undergraduate-level paper
presentation competition held by the Pune Institute of Computer Technology
(India), and won the 2nd prize in Artificial Intelligence &
Fuzzy Logic.
Example Text Paragraph:
To obtain a government post, you
must donate campaign gold bullion and make a television speech. You can
purchase gold bullion only if you are not expelled and you have donated
campaign service of 300 hrs. To make a television speech one must be
politically sound and donate campaign 300 hrs of service.
PROLOG Rules:
obtain(x,
“government post”) => donate(x, campaign, “gold bullion”) ^ make(x,
“television speech”)
purchase(x, “gold bullion”) => !expelled(x) ^ donate(x,
campaign, “service of 300 hrs”)
make(x, “television speech”) => is_politically_sound(x)
^ donate(x, campaign, “300 hrs of service”)
Entailment Test:
Given that A obtains a government post, can we infer that A is
politically sound?
Miscellaneous:
===========
1. Music Analysis and
Classification: The project analyzed vocal and instrumental content of Hindi
film songs for automatic music classification.
2. Automatic Poetry Generation:
Using a large corpus of text collected from the English literature, the program
automatically generates poetry by identifying patterns of rhyming n-grams.
3. Code-Breaker: In this project,
I implemented simple algorithms for encoding and decoding cipher texts.
4. Chat with F.R.I.E.N.D.S: A
chat-agent that simulates characters from the FRIENDS TV-show.
5. Astrological Predictions: I
have been collecting Horoscope readings from MSN Astrology for a period of over
2 years. The data contains daily horoscopes for all zodiac signs for male and
female users. I am hoping to use this data at some point to build learning
models for astrological predictions.
6. Song Translation: As time
permits, I translate my favorite Hindi songs into English. Here is a
collection of my poetry.
Author:
Amruta Purandare
All Rights
Reserved