Amruta Purandare Resume/CV

 

Get a printer-friendly version: [resume.pdf] [resume.doc]

 

Research Interests: Artificial Intelligence, Natural Language Processing, Media and Entertainment Technology

 

Related Areas: Information Retrieval, Data Mining, Human-Computer Interaction, Multimedia Processing

 

=============

Work Experience:

=============

 

Research Staff at Singapore Management University (April 2009 - Present)

Division: School of Information Systems Research Center (Singapore)

Keywords: Social Network Analysis, Data Mining, Web 2.0

 

Research Intern at SONY Corporation (July – September 2008)

Division: Intelligent Systems Research Laboratory (Tokyo, Japan)

Project Title: Web Mining for Artist Relation Extraction

 

Research Assistant at University of Pittsburgh (September 2005 – April 2008)

Department: Intelligent Systems Program (Pittsburgh, Pennsylvania)

Keywords: Spoken Dialogs, Humor Analysis, Discourse Coherence, Conversational Interfaces

 

Summer Intern at University of Southern California (May – August 2007)

Division: Information Sciences Institute (Los Angeles, California)

Keywords: Lexical Semantics, Information Extraction, Text Mining

 

Software Development Engineering Intern at Amazon.com (May - August 2005)

Division: Item Metadata Analysis Group (Seattle, Washington)

Keywords: Data Mining, Text Clustering, Ontology Building

 

Research Assistant at University of Minnesota, Duluth (August 2002 – August 2004)

Department: Computer Science (Duluth, Minnesota)

Keywords: Word Sense Disambiguation, Latent Semantic Analysis, Text Similarity, N-gram Modeling

 

========

Education:

========

 

Master’s in Intelligent Systems (Aug 2006)

University of Pittsburgh

 

Master’s in Computer Science (Aug 2004)

University of Minnesota, Duluth

 

Bachelor’s in Computer Engineering (July 2002)

University of Pune (India)

Best Outgoing Student Award

 

===============

Featured Publication:

===============

 

Humor: Prosody Analysis and Automatic Recognition for F.R.I.E.N.D.S. (A. Purandare and D. Litman) – Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), July 22-23, 2006, Sydney, Australia.

 

-----------------------

Other Publications:

-----------------------

 

Analyzing Dialog Coherence using Transition Patterns in Lexical and Semantic Features (A. Purandare and D. Litman) – Proceedings of the 21st Florida Artificial Intelligence Research Society (FLAIRS) special track on Applied Natural Language Processing, May 15-17, 2008, Coconut Grove, Florida.

 

Content-Learning Correlations in Spoken Tutoring Dialogs at Word, Turn and Discourse Levels (A. Purandare and D. Litman) - Proceedings of the 21st Florida Artificial Intelligence Research Society (FLAIRS) special track on Intelligent Tutoring Systems, May 15-17, 2008, Coconut Grove, Florida.

 

Uncertainty Corpus: Resource to Study User Affect in Complex Spoken Dialogue Systems (K. Forbes-Riley, D. Litman, S. Silliman and A. Purandare) - Proceedings 6th Language Resources and Evaluation Conference (LREC), May 28-30, 2008, Marrakech, Morocco.

 

Comparing Linguistic Features for Modeling Learning in Computer Dialogue Tutoring (K. Forbes-Riley, D. Litman, A. Purandare, M. Rotaru and J. Tetreault) - Proceedings 13th International Confereence on Artificial Intelligence in Education (AIED) Los Angeles, CA, July, 2007.

 

Using System and User Performance Features to Improve Emotion Detection in Spoken Tutoring Dialogs (H. Ai, D. Litman, K. Forbes-Riley, M. Rotaru, J. Tetreault, A. Purandare) - Proceedings of Interspeech ICSLP, September, 2006, Pittsburgh, PA.

 

Resolving Ambiguities in Biomedical Text with Unsupervised Clustering Approaches (G. Savova, T. Pedersen, A. Purandare and A. Kulkarni) - University of Minnesota Supercomputing Institute Research Report UMSI 2005/80 and CB Number 2005/21, May.

 

Name Discrimination by Clustering Similar Contexts (T. Pedersen, A. Purandare, and A. Kulkarni) - Proceedings of the Sixth International Conference on Intelligent Text Processing and Computational Linguistics (CICLING), February 13-19, 2005, Mexico City.

 

Unsupervised Word Sense Discrimination by Clustering Similar Contexts (A. Purandare and T. Pedersen) - University of Minnesota Supercomputing Institute Research Report UMSI 2004/146, August 2004 (Master’s Thesis).

 

The Senseval-3 Multilingual English-Hindi lexical sample task (T. Chklovski, R. Mihalcea, T. Pedersen, and A. Purandare) - Proceedings of the Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text (SENSEVAL-3), July 25-26, 2004, Barcelona, Spain.

 

Word Sense Discrimination by Clustering Contexts in Vector and Similarity Spaces (A. Purandare and T. Pedersen) - Proceedings of the Conference on Computational Natural Language Learning (CoNLL), May 6-7, 2004, Boston, MA.

 

SenseClusters - Finding Clusters that Represent Word Senses (A. Purandare and T. Pedersen) – Intelligent Systems Demonstration in Proceedings of the Nineteenth National Conference on Artificial Intelligence (AAAI-04), July 25-29, 2004, San Jose, CA. [Also presented in NAACL-2004 Demonstration Session]

 

Improving Word Sense Discrimination with Gloss Augmented Feature Vectors (A. Purandare and T. Pedersen) - Proceedings of the Workshop on Lexical Resources for the Web and Word Sense Disambiguation, November 22, 2004, Puebla Mexico.

 

Discriminating Among Word Senses Using McQuitty's Similarity Analysis (A. Purandare) - Proceedings of the Student Research Workshop at Human Language Technology and North American Chapter of the Association for Computational Linguistics (HLT-NAACL), May 30-31, 2003, Edmonton, Canada

 

===========

Achievements:

===========

 

Internship offer from SONY Research Lab, Japan (Summer 2008)

 

Internship offer from IBM T. J. Watson Research Center in New York Hawthorne (Summer 2007)

 

Internship offer from the Information Sciences Institute at USC (Summer 2007)

 

People’s Choice Award (Feb 2006)

Graduate Student Poster Competition

Computer Science Day at University of Pittsburgh

 

Internship offer from Amazon.com (Summer 2005)

 

India Foundation Scholarship (2002)

 

Best Outgoing Student (2002)

Computer Engineering Department

Cummins College of Engineering, Pune (India)

 

2nd Prize in Artificial Intelligence & Fuzzy Logic (2001)

Concepts-2001, Technical Paper Presentation Competition

Pune Institute of Computer Technology (India)

 

Merit-Based Scholarships for Academic Excellence (1998-2002)

Cummins College of Engineering, Pune (India)

 

10 Awards by Garware High-School, Pune (India)

For Academic Performance (89.2%) in the SSC Board Exam (1996)

Includes the 1st Prize in English (86/100)

 

===============

Conference Activities:

===============

 

Program Committee Member for the FLAIRS Conference (2009-2010)

 

Reviewer for the Association of Computational Linguistics (ACL) Conference (2008)

 

Reviewer for the Journal of Biomedical Informatics (January 2008)

 

Student Member on the SEMEVAL (former SENSEVAL) Program Committee (2005 - 2007)

 

Program Committee Member for the Workshop on Building and Using Parallel Texts: Data Driven Machine Translation and Beyond (held at ACL 2005)

 

Co-organizer for the Senseval-3 English-Hindi Lexical Sample Task (2004)

 

National and International Conferences attended: ACL 2009 (Singapore), unConference 2009 (Singapore), Ad-Tech 2009 (Singapore), FLAIRS 2008 (Miami, FL), NAACL 2007 (Rochester, NY), NAACL 2006 (New York City, NY), ACL 2006 (Sydney, Australia), EMNLP 2006 (Sydney, Australia), InterSpeech /ICSLP 2006 (Pittsburgh, PA), NAACL 2004 (Boston, MA), CoNLL 2004 (Boston, MA), AAAI 2004 (San Jose, CA), Conference on Emails and Anti-Spams (CEAS) 2004 (Palo Alto, CA), NAACL 2003 (Edmonton, Canada)

 

Conference Volunteer at ACL 2009, InterSpeech/ICSLP 2006, AAAI 2004, NAACL 2003

 

=====

Skills:

=====

 

Human Languages: English, Hindi, Marathi (fluent in each)

 

Computer Languages: Perl, C, C++

 

Software Tools used in Projects: MySQL (database), CGI, Javascript (web programming), Apache (web server), Weka (machine learning), Cluto (clustering), Graphviz, TouchGraph, GCluto (visualization), Wavesurfer (audio processing), Avid (digital video editing), SVDPackC (dimensionality reduction), VHDL (hardware programming), Prolog (logic programming), Corel Draw, OpenGL (Graphics), Latex (paper writing)

 

NLP Tools used in Research: WordNet (lexicon), Charniak Parser (tree-bank style parsing), Minipar (dependency style parsing), Brill Tagger (part-of-speech tagging), Mxterminator (sentence boundary detection), SenseClusters (founding developer), N-gram Statistics Package (co-developer)

 

Datasets used in Research: English Gigaword, Wikipedia, Google N-gram, TREC, Switchboard, Enron Emails, FRIENDS tv-show, Product Inventory from Amazon.com

 

==============

Research Projects:

==============

 

Summer Project at SONY (July – September 2008)

Project Title: Web Mining for Artist Relation Extraction

 

Summary:

 

In this project, we automatically collect information about famous artists from the Web using Information Extraction and Text Mining algorithms from Natural Language Processing. Key milestones for this project were to - (1) design and implement relation extraction algorithms (2) build a large-scale relation database for several thousand artists (3) evaluate methods based on precision and recall (4) study metrics for computing relational similarity (e.g. “X collaborated with Y” is similar to “X worked with Y”, “X co-starred with Y”, “X teamed up with Y” etc). Our database of artist relations stores entity-relation triples in the form of (X, R, Y) where R represents a relation between an artist X and another entity Y (e.g. “X co-starred with Y”, “X is influenced by Y”, “X married Y”, “X was born in Y”, “X signed contract with Y”, “X was nominated for Y”, “X auditioned for Y” etc). Our relation extraction algorithms are designed such that there is no restriction on the type of entity Y or the set of relations R to be discovered. For example, entity Y could be a person (e.g. another artist), location, organization, award, movie title etc. Such database can be used for answering a broad-range of queries such as:

 

[1] Given two artists X and Y, describe their relation

 

[2] Find all entities Y related to a given artist X by relation R

 

[3] Find all artists X related to a given entity Y by relation R

 

[4] Given two artists X and Y, find all entities Z that have a same relation R with both X and Y

 

 

Research at University of Pittsburgh (September 2005 – April 2008)

Keywords: Spoken Dialogs, Humor Analysis, Dialog Coherence, Conversational Interfaces

 

Summary:

 

1. Humor Analysis – This was a class project for the Affective Dialog Systems class, and was published in EMNLP 2006 under the title “Humor: Prosody Analysis and Automatic Recognition for FRIENDS”. For this project, I used a novel source of data - dialog episodes from the classic comedy television show FRIENDS. Using lexical features extracted from the close-captions (text transcripts) and speech features (pitch, energy, tempo etc) derived from the audio, we developed a machine learning algorithm that automatically detects humor in FRIENDS dialogs. Input to the program thus consists of what a speaker says (language content) as well as how the speaker expresses this content (prosody/speech). The program then automatically decides which speaker turns are funny. As the original dialogs already include laughs after humorous turns, this data provides ready-made labeled examples for training and evaluation of the classifier. Our experiments showed that FRIENDS generally tend to speak faster, louder and more energetically while expressing humor.

 

2.  Dialog Coherence – In this project, we attempt to automatically distinguish between coherent and incoherent conversations. For this, we build a machine learning classifier using local transition patterns that span over adjacent dialog turns and encode lexical as well as semantic information in dialogs. The algorithm is evaluated on the Switchboard dialog corpus by treating the original Switchboard dialogs as the coherent (positive) examples. Incoherent (negative) examples are created by randomly shuffling turns from these Switchboard dialogs. Results are very promising with the accuracy of 89% (over 50% baseline) when incoherent dialogs show random order as well as random content (topics), and 68% when incoherent dialogs are random ordered but on-topic. We also conduct experiments on a newspaper text corpus and compare our findings on the two datasets. For further details, please see our FLAIRS 2008 paper on “Analyzing Dialog Coherence using Transition Patterns in Lexical and Semantic Features”.

 

 

Summer Project at USC/ISI (May – August 2007)

Keywords: Lexical Semantics, Information Extraction, Text Mining, Semantic Knowledge Acquisition

 

Summary:

 

1.  Relation Extraction - Given a pair of words such as library :: book, hospital :: doctor, camera :: picture, Honda :: car, Shakespeare :: Hamlet, India :: “Taj Mahal”, the program automatically describes their semantic relation, e.g.  “library collects books”, “doctor works in a hospital”, “Honda manufactures cars”, “Shakespeare is the author of Hamlet”, “Taj Mahal is located in India” etc.

 

2.  Explaining Similarity – Given a set of words like (Honda, Toyota, Mercedes, BMW, Fiat), (apple, orange, mango, banana), (doctor, lawyer, engineer, professor), the program describes why the given entities are similar or what properties they share in common. The output of the program looks like “they are all car companies” (or “they all manufacture cars”), “they are all fruits”, “they are all occupations” etc.

 

3.  Odd One Out – Given a set of words such as (Honda, Toyota, Sony, Mercedes, BMW), the program automatically picks Sony as the odd-member, and explains features that distinguish Sony from the rest of the group, e.g. “Sony is a consumer electronics company”, “Honda, Toyota, Mercedes, BMW are car companies” etc.

 

4.  Set Expansion – This functionality is similar to Google Sets and retrieves other words that are similar to a given set of words, e.g. if the given set is (Honda, Toyota, Mercedes, BMW), the program extracts other car companies like (Nissan, Chrysler, Volvo, Mazda, Ford, Fiat, Chevrolet, …) etc.

 

5.  Matching Pairs – Given two sets of words, the program re-orders the second set such that there is a one-to-one alignment between the two sets. For example, if the given two sets are: (Honda, IBM, IKEA, Sony, DSW) and (furniture, shoes, computer, car, camera), the output of the program will look like - (Honda :: car, IBM :: computer, IKEA :: furniture, Sony :: camera, DSW :: shoes).

 

Such knowledge about similarities and relations between lexical entities is useful for applications like Information Retrieval and Question Answering, so that user queries like “Japanese car companies”, “beaches in Hawaii”, “books written by Shakespeare”, “green leafy vegetables”, “nominees of Oscar 2007”, “female actress who played the role of Rachel Green in FRIENDS” etc can be answered with specific entity names, rather than giving a list of documents or text snippets that match such phrases.

 

 

Summer Project at Amazon.com (May – August 2005)

Keywords: Data Mining, Text Clustering, Ontology Learning

 

Summary:

 

During this internship, I designed and implemented a prototype web-service for clustering and analyzing similar consumer products on Amazon.com. Manually categorizing a large collection of product items is not only time-consuming but also leads to inconsistent categories. For example, a merchant may group golf balls in the same category as tennis balls, cricket balls, baseball balls, whereas another merchant may group them with other golf equipments: golf drivers, golf gloves, golf shoes etc. We therefore developed an automatic method for organizing a large collection of items based on the similarity of products’ features and descriptions. Apart from organizing the items, the method is also useful for retrieving other similar products for recommendation and building product ontology.

 

Example Ontology:

 

(Consumer Products (Electronics) (Furniture) (Apparel) (Cosmetics) (Jewelry) …)

 

(Electronics (Computers) (Cameras) (Cell Phones) (Home Appliances) (Music Instruments) ...)

 

(Home Appliances (Televisions) (Telephones) (Kitchen Appliances) (Laundry Machines) …)

 

(Kitchen Appliances (refrigerator) (oven) (microwave) (toaster) (coffee-maker) …)

 

(Music Instruments (violin) (guitar) (piano) (drums) (clarinet) …)

 

...

 

 

Research at University of Minnesota, Duluth (August 2002 – August 2004)

Keywords: Word Sense Disambiguation, Latent Semantic Analysis, Text Similarity, N-gram Modeling

 

Summary:

 

1. As a part of my Master’s thesis and research work, I developed the open source software SenseClusters (http://senseclusters.sourceforge.net) for clustering contextually similar text units (words, sentences, paragraphs, documents etc), including the support for data pre-processing, feature selection, dimensionality reduction and evaluation of the output clusters.

 

2. My Master’s thesis on “Unsupervised Word Sense Discrimination” compares the effect of using first versus second order feature representations, vector versus similarity space clustering, using local versus global training data and augmenting corpus derived feature vectors with dictionary glosses on the performance of word sense discrimination.

 

3. I was a co-organizer for the Senseval-3 English-Hindi lexical sample task and helped in collecting data on Hindi word senses.

 

4. I was also a co-developer on the N-gram Statistics Package (http://ngram.sourceforge.net) and added programs for computing statistics on higher order n-grams (N > 2), large-scale n-gram counting and for extracting k-th order word co-occurrences.

 

 

Undergraduate Project at Cummins College of Engineering, Pune (2001)

Keywords: Natural Language Understanding, Knowledge Representation, Logical Inference

 

Summary:

 

The project proposed a prototype model for a language understanding system that stores natural language text as predicate-forms in first-order logic, and applies inference rules (like resolution) to test for entailments, and to detect logical errors and contradictions in the text. The model was tested on sample text paragraphs collected from the GRE analytical section, using the PROLOG inference engine. This was the first time when I came across many issues in processing natural language texts using computers, such as synonymy, ambiguity, reference resolution, dependency parsing (identifying the subject, object, verb in the sentence) and some common-sense reasoning such as “x can donate y only if x first owns y” etc. The project was presented at Concepts-2001, the undergraduate-level paper presentation competition held by the Pune Institute of Computer Technology (India), and won the 2nd prize in Artificial Intelligence & Fuzzy Logic.

 

Example Text Paragraph:

 

To obtain a government post, you must donate campaign gold bullion and make a television speech. You can purchase gold bullion only if you are not expelled and you have donated campaign service of 300 hrs. To make a television speech one must be politically sound and donate campaign 300 hrs of service.

 

PROLOG Rules:

 

obtain(x, “government post”) => donate(x, campaign, “gold bullion”) ^ make(x, “television speech”)

 

purchase(x, “gold bullion”) => !expelled(x) ^ donate(x, campaign, “service of 300 hrs”)

 

make(x, “television speech”) => is_politically_sound(x) ^ donate(x, campaign, “300 hrs of service”)

 

Entailment Test:

 

Given that A obtains a government post, can we infer that A is politically sound?

 

 

===========

Miscellaneous:

===========

 

1. Music Analysis and Classification: The project analyzed vocal and instrumental content of Hindi film songs for automatic music classification.

 

2. Automatic Poetry Generation: Using a large corpus of text collected from the English literature, the program automatically generates poetry by identifying patterns of rhyming n-grams.

 

3. Code-Breaker: In this project, I implemented simple algorithms for encoding and decoding cipher texts.

 

4. Chat with F.R.I.E.N.D.S: A chat-agent that simulates characters from the FRIENDS TV-show.

 

5. Astrological Predictions: I have been collecting Horoscope readings from MSN Astrology for a period of over 2 years. The data contains daily horoscopes for all zodiac signs for male and female users. I am hoping to use this data at some point to build learning models for astrological predictions.

 

6. Song Translation: As time permits, I translate my favorite Hindi songs into English. Here is a collection of my poetry.

 

 

Author: Amruta Purandare

All Rights Reserved