There still exist many fields in which ways are to be explored to improve the human-system interaction. These systems must have the capability to take advantage of the environment in order to improve interaction. This extends the capabilities of system (machine or robot) to better reach natural language used by human beings. We propose a methodology to solve the multimodal interaction problem adapted to several contexts by defining and modelling a distributed architecture relying on W3C standards and web services (semantic agents and input/output services) working in ambient intelligence environment. This architecture is embedded in a multi-agent system modelling technique. In order to achieve this goal, we need to model the environment using a knowledge representation and communication language (EKRL, Ontology). The obtained semantic environment model is used in two main semantic inference processes: fusion and fission of events at different levels of abstraction. They are considered as two context-aware operations. The fusion operation interprets and understands the environment and detects the happening scenario. The multimodal fission operation interprets the scenario, divides it into elementary tasks, and executes these tasks which require the discovery, selection and composition of appropriate services in the environment to accomplish various aims. The adaptation to environmental context is based on multilevel reinforcement learning technique. The overall architecture of fusion and fission is validated under our framework (agents, services, EKRL concentrator), by developing different performance analysis on some use cases such as monitoring and assistance in daily activities at home and in the town.