Position Paper for the
CHI 97 Basic Research Symposium
(March 22-23, 1997, Atlanta, GA)

Supporting Human-Human Interaction beyond Cultural Backgrounds

Toshikazu KATO, Tadayuki SOTA, Nadia BIANCHI(+), and Kaori YOSHIDA(++)
Electrotechnical Laboratory (ETL), AIST, MITI
(+) Mirano University, Italy
(++) Kyushu Institute of Technology
1-1-4, Umezono, Tsukuba Science City 305, Japan
E-mail: {kato, sota, bianchi, kyoshida}@etl.go.jp
URL: http://www.etl.go.jp:8080/People/kato

Last updated: March 20, 1997



Index

Abstract || Keywords || Content || References


Abstract

In human-human interaction, we have to cover the difference of personal backgrounds building the other person's background model and interpret messages based on the estimated model. On this aspect, we have to consider how we express our idea and how we interpret the message according to our personal backgrounds. This sort of communication technology, we call "kansei-oriented communication," is rather important to express and to understand person well.


Keywords

Human centered communication, kansei model, subjective background


Supporting Human-Human Interaction beyond Cultural Backgrounds

1. Data Centered Multimedia Communication

In human communication, we exchange messages in multimedia presentations to express our ideas and to interpret those of others.

The history of communication technology had started to conquer the distance gap between two persons and provide face-to-face interaction. Many devices have been invented, such as telex, telephone, facsimile, television and nowadays force display to send and receive multimedia data, such as text, voice and sound, static image and video, and force. For the theoretical background, coding theories have been adopted for low error rate transmission on a noisy transmission channel. Data compression theories have been adopted to put large amount of data, such as multimedia data, into transmission channels. Teleconferencing through internet would be one of the current solutions for overcoming the distance gap. Users can interact in face-to-face style as if they are together in the same room.

Current communication technology is now moving to virtual reality technologies, on this thread, to give an information environment to the user as a natural human interface where users can naturally interact with others. Such a multimedia message consists of text, voice and sound, facial image, gesture of motion and other sort of visual representations.

2. Human Centered Multimedia Communication

The important point is that the message is much more than the message itself in human-human interaction. Another technical issue that we put stress in this paper is how we express our ideas and how we interpret messages. In human-human communication, we need "mediation mechanisms" to adapt to an individual person to communicate to. We can summarize three aspects for such mediation mechanisms.

(1) Knowledge mediation: We need knowledge mediation mechanisms to support describing knowledge of the real world, common sense, context of a task and also each person's knowledge background in relation to multimedia data; ontology base and ontology management.

Now, many database servers, for instance, a chemistry database and a plant engineering database, are connected to the internet and we are expecting that we can share and re-use the databases. We have to notice that each of the databases and indexes are created depending on each of the application domains. That means a technical term "water" is categorized as "solvent" in the chemistry database while "cooling material" in the plant engineering database, as shown in Figure 1. A message, composed of words, cannot be independent from the context of the each domain.

We expect knowledge mediation mechanisms to support seamless communications amongst users on cyber space to access, extend, share and re-use the knowledge on application domains.

Figure 1: Knowledge mediation

(2) Modality mediation: We need modality mediation mechanisms to support smooth human-computer interaction mechanisms with multimedia representation in a natural extension form of our sense and behavior.

Now, virtual reality technologies providing us virtual collaboration spaces and interface agents to support our intelligent works (Figure 2.) We are expecting that every person can receive information services through the global information infrastructure (GII.) We have to notice that various people, not only office workers but also school kids, elderly people, are now using computer facilities. That means a single desktop metaphor nor room metaphor, town metaphor, is not sufficient to the users. We need a personalized metaphor and mediation mechanisms amongst other peosons metaphors.

We expect modality mediation mechanisms to act to, or look at the real world through a cyber space with super-reality and actuality. Thus, we can share the real space in one-to-one through many-to-many communication amongst all the participants over a long distance in a multimedia manner.

Figure 2: Modality mediation

(3) Kansei mediation (subjectivity mediation): We need kansei mediation mechanisms to understand each user's cognitive process for multimedia information. Typical subjective information of a specific user is a personal knowledge of something, a taste, a feeling on something, an emotion, an intention, and ideas, and so on. It can create a multimedia message in the most familiar representation to each of the specific persons. It also enables to share and re-use subjective information amongst human beings with different personal backgrounds.

For instance as shown in Figure 3(a), if each member of a family, e.g., a computer scientist, a curator and a school kid, wishes to see a TV program which is "something interesting," they might be actually expecting different programs. Similarly as shown in Figure 1(b), even viewing the same flower, the three persons have different impressions and interpretations which might be expressed in different words. The reason is clear. Each person has different personal backgrounds in emotion, taste, hobby, vocabulary, education, and so on.

@ @@

Figure 3(a), (b): Kansei mediation for representations based on subjective backgrounds

In human-human interaction, we have to cover the difference of personal backgrounds building the other person's background model and interpret messages based on the estimated model. Transmission and/or interpretation of the multimedia message itself is not sufficient for communication. We need more support in this process, especially for communication amongst people belonging to different cultures.

For the next five decades, human-computer interaction issues should proceed to the stage of human-human interaction issues through cyberspace. On this aspect, we have to consider how we express our idea and how we interpret the message according to our personal backgrounds. This sort of communication technology, we call "kansei-oriented communication," is rather important to express and to understand person well.

The general research topics on human centered multimedia communication are;

3. Kansei Model for Multimedia Perception

As a working hypothesis, we have been developing a kansei model on multimedia perception process (Figure 4.)

Figure 4: Hierarchical schema of multimedia perception process

  1. Physical level interaction: A visual cue may often remind us similar images or related pictures. This process is a kind of similarity and associative retrieval of pictorial data by physical level interaction.
  2. Physiological level interaction: Early stage of mammal vision mechanism extracts graphical features such as intensity levels, edge, contrast, correlation, spatial frequency, and so on. Visual perception may depend on such parameterized graphical features.
  3. Psychological level interaction: We have to notice that the criteria for similarity belongs to a subjective human factor. Although human beings have anatomically the common organs, each person may show different interpretation in classification and similarity measure. It means each person has his own weighting factors on graphical features. The computer should evaluate similarity according to each personÕs subjective criterion.
  4. Cognitive level interaction: We often have difference impressions, even when viewing the same painting. Each person may also differently give a unique interpretation even viewing the same picture. It seems each person has his own correlation between concepts and graphical features and/or subjective features.

4. Basic Algorithms for Kansei Modeling

Our approach is to learn the correlation between multimedia information and its subjective interpretation by a specific person. We can expect reasonable correlation if he statistically shows consistent response to various kinds of multimedia information.

For instance, art critics view paintings from several aspects, such as motif, general composition and coloring. Impressionism reminds us that the dominant impression generated by paintings is coloring. This suggests that there is a reasonable correlation between the coloring and the words in the reviews.

Let us show our ideas for modeling kansei on artistic painting based on each viewerÕs personal background.

Figure 5: Overview of modeling subjective kansei on artistic painting

We can parameterize the coloring of a painting by the distribution of the RGB intensity value in the subpictures. We also need a subjective criterion on artistic impression. A user answers his artistic impressions on sample paintings as the weight vector of adjective words. (Currently, the adjectives are restricted up to about 30.) We may expect that the set of words and the parameterized col-oring feature correlates with each other. We will regard the correlation as the personal view model for the user. We can construct a unified feature (UF) space on this model to compare the subjective words and coloring features.

We will refer to the UF space of as the personal index of the user model. Note that we do not have to assign the adjectives to every painting in the database. Once the system has learned the linear mappings F and G, it can automatically estimate the personal interpretation only from the GF vectors.

For other applications of the UF space, we can retrieve paintings that give us a similar impression by showing a painting as a visual example. We can also infer the suitable keywords for simulating the user's personal view, using the inverse mappings.

5. Experimental Results on Kansei Modeling

Each user, or each users group, has a specific tendency in describing the impression. Even the same word may used in a different meaning in their own background and cultural context. Such difference can be observed by the relative distance amongst the vocabularies given by the specific users.

[Personal View]

In this experiment, 20 impressionist paintings are shown to each user. Each user answered the enquete as the weighted adjective vector to show his / her criteria for artistic impressions. The principal component analysis (PCA) visualizes the spatial relationship of the impression words in the user's SF space. We applied the PCA to each user's an-swer as well as the mean values of some users group.

Figure 6 shows the results of PCA on SF spaces of female students and of male students. In the left window, the SF space of female students, "sober" and "placid" are opposite to "gorgeous" and "folksy / ethnic" on the first axis, and "charming", "elegant" are opposite to "hard," "wild" and "dynamic" on the second axis. In the right window, the SF space of male students, we see different relationships. We can conclude that a personal view mechanism is quite important to accept and operate subjective descriptions in user's query.

Figure 6: Results of PCA on the artistic impressions viewing 20 paintings

[Sense Retrieval]

We have experimented with the learning algorithm and its application to the sense retrieval on our electronic art gallery, ART MUSEUM. In this experiment, we adjusted the UF space according to the personal view of female students. The learning algorithm is applied to the average answers of fe-male students. This is a user-group model on artistic impressions.

Figure 7 shows an example of sense retrieval. This figure shows the best eight candidates for the adjectives; "romantic, soft and warm". These paintings roughly satisfied the personal view of the subjects. We may conclude that the UF personal index on UF space reflects a personal sense of coloring.

Figure 7: Example of sense retrieval

[Similarity Retrieval]

Using the same mechanism, we can also perform the associative retrieval. In this retrieval, a picture is regarded as a visual cue, and the system has retrieved a set of pictures which may give the target users the similar impression.

Figure 8 contains three experiment on similarity retrieval. In each line, the leftmost picture is the visual cue and the rest are the candidate pictures which may give the similar impression to the female students.

Figure 8: Example of similarity retrieval

6. Beyond Cultures

We are now extending our ideas to accept free term sets of vocabularies of different languages.

The audiovisual data themselves are objective entities and we can refer them as the common metrics to measure the different cultural representation.


References

  1. Toshikazu Kato: "TRADEMARK: Multimedia Database with Abstracted Representation on Knowledge Base", Proc. of 2nd International Symposium on Interoperable Information Systems", Nov. 1988.
  2. Toshikazu Kato: "Multimedia Interaction with Image Database Systems", Proc.of Advanced Database System Symposium'89 (ADSS'89), Dec. 1989.
  3. Toshikazu Kato: "Multimedia Data Model for Advanced Image Information Systems", Proc.of Advanced System Symposium'89 (ADSS'89), Dec. 1989.
  4. Toshikazu Kato: "Visual Interaction with Electronic Art Gallery", Proc. of International Conference on Database and Expert Systems Applications (DEXA'90), Aug. 1990.
  5. Toshikazu Kato: "A Congnitive Approach to Visual Interaction", ACM SIGIR, Proc. of International Conference on Multimedia Information Systems'91 (MIS'91), Jan. 1991.
  6. Toshikazu Kato: "Congnitive View Mechanism for Multimedia Database Sysem", IEEE Computer Society, International Workshop on Interoperability in Multidatabase Systems (IMS'91), Apr. 1991.
  7. Toshikazu Kato: "Intelligent Visual Interaction with Image Database Systems -- Toward the Multimedia Personal Interface --", Jounal of Information Processing, Vol. 14, No. 2, Apr. 1991.
  8. Toshikazu Kato: "Database Archtecture for Content Based Image Retrieval", SPIE / IS&T, Proc. of International Conference on Image Storage and Retrieval Systems, Feb. 1992.
  9. Toshikazu Kato: "Query by Visual Example -- Content Based Image Retrieval --", Proc. of International Conference on Extending Databese Technology'92 (EDBT'92), Mar. 1992.
  10. Toshikazu Kato: "Cognitive View Mechanism for Content-based Multimedia Information Retrieval," in R. Cooper (ed) "Interfaces to Database Systems," pp.244-262, 1992.
  11. J. Long, H. Inoue, T. Kato, N. Miyake, T. Green, M. Harrison and E. Pollizer: "SOFT Science and Technology meets Cognitive Science and Human-Computer Interaction for Cooperation," in Y. Anzai, K. Ogawa and H. Mori (ed) "Symbiosis of Human and Artifact: Future Computing and Design for Human-Computer Interaction," pp.199-204, 1995.
  12. Toshikazu Kato: "Cognitive User Interface to Cyber Space Database -- Human Media Technology for Global Information Infrastructure --," Proc. of Int'l Symposium on Cooperative Database Systems for Advanced Applications, pp.184-190, Dec. 1996.
  13. Toshikazu Kato: "A Cognitive Aspect of Human Computer Interaction through Information Superhighway," Proc. of Singapore and Pacific Conference on Expert Systems PACES'97, Feb. 1997.
  14. Grosky, W. I. And Mehrotra, R. (ed.) : "Image Database Management," IEEE Computer (special issue), Vol. 22, No. 12, pp. 7-71, Dec. 1989.
  15. E. J. Neuhold and M. Kracker, "Cognitive Aspects of Accessing Multi-Media Information Systems," Proc. of Computer World 89, pp. 119-126, Sep. 1989.



Return to Top of Page || Index.