Android AI Assistant is a graduation project that converts spoken commands into structured Android actions. The system combines Android SpeechRecognizer, a FastAPI backend, deterministic command rules, an XLM-RoBERTa intent classifier, validation and parameter enrichment, and Android Accessibility/System APIs.
The application supports English, Turkish, and Arabic, including an RTL-aware Arabic interface. It is designed primarily for hands-busy scenarios and users with limited physical mobility. Initial installation and system-permission setup still require manual interaction and may require assistance.
- 41 Android-executable intents covering app management, navigation, text input, search, calls, alarms, camera, media, and system settings.
- Hybrid command understanding: deterministic rules handle open-ended or safety-sensitive commands, while XLM-RoBERTa provides multilingual intent classification.
- Dynamic UI interaction: the Accessibility node tree is searched using exact, normalized, fuzzy, positional, and icon-alias matching.
- Ambiguity handling: matching UI elements can be numbered directly on screen for voice-based selection.
- Custom command flows: users can save and execute ordered, multi-step command sequences.
- Persistent data layer: PostgreSQL stores devices, app catalogs, command history, and custom flows; Redis accelerates app-catalog lookup.
- Localized Android UI: Home, Permissions, Custom Commands, Guide, and Settings screens are available in three languages.
A separate demo page is available for previewing the main app workflows. It includes three short demos that can be viewed interactively. GitHub README files do not run JavaScript, so the interactive preview is hosted on GitHub Pages.
Open the interactive demo gallery
![]() Home |
![]() Permissions |
![]() Custom Commands |
![]() Guide |
![]() Settings |
![]() Dynamic Selection |
flowchart LR
A[Android SpeechRecognizer] --> B[FastAPI Backend]
B --> C{Deterministic rule matched?}
C -->|Yes| D[Rule Engine]
C -->|No| E[XLM-RoBERTa Classifier]
D --> F[Validation and Enrichment]
E --> F
F --> G[Android Command Executor]
G --> H[Accessibility and System APIs]
B <--> I[(PostgreSQL)]
B <--> J[(Redis Cache)]
The multilingual classifier was fine-tuned from FacebookAI/xlm-roberta-base using 58 classification labels.
| Metric | Validation | Test |
|---|---|---|
| Accuracy | 95.87% | 93.45% |
| Macro F1 | 95.08% | 90.93% |
| Intent accuracy | 97.49% | 94.31% |
| Layer | Technologies |
|---|---|
| Android client | Java, Android SDK, AccessibilityService, SpeechRecognizer, Retrofit |
| Backend | Python, FastAPI, PyTorch, Hugging Face Transformers |
| NLP | XLM-RoBERTa, rule engine, parameter extraction, validation/enrichment |
| Data | PostgreSQL, Redis |
| Languages | English, Turkish, Arabic |
Android_AI_Assistant/
|-- project/
| |-- Android_App/ # Android client and command execution
| |-- Backend/V3/ # FastAPI, NLP pipeline, database, and cache
| |-- Machine Learning Model/ # Dataset and model training scripts
|-- assets/ # README screenshots and demo GIFs
|-- demo/ # GitHub Pages interactive demo gallery
|-- docs/ # Thesis and presentation files
|-- README.md
- Android Studio and Android SDK 35
- JDK 17 or newer (Java 11 source compatibility)
- Python 3.13 and Pipenv
- PostgreSQL
- Redis (recommended for catalog caching)
- Android 7.0+ device; a physical device is recommended for Accessibility testing
git clone https://github.com/SE-ABOSALIM/Android_AI_Assistant.git
cd Android_AI_AssistantThe trained model is not stored in Git because of its size. Download the complete model bundle and place it under:
project/Backend/V3/models/result_model/
The directory must contain model.safetensors, tokenizer files, and model configuration files.
Model download: Google Drive link
cd project/Backend
pip install pipenv
pipenv syncCopy the example environment file:
Copy-Item V3/.env.example V3/.envUpdate V3/.env with your local PostgreSQL credentials and Redis URL. Apply the SQL migrations in numerical order:
$env:PSQL_DATABASE_URL="postgresql://postgres:YOUR_PASSWORD@localhost:5432/android-ai-assistant"
Get-ChildItem V3/database/migrations/*.sql |
Sort-Object Name |
ForEach-Object { psql $env:PSQL_DATABASE_URL -f $_.FullName }pipenv run python -m uvicorn V3.main:app --host 0.0.0.0 --port 8001API documentation is available at http://localhost:8001/docs.
cd ../Android_App
Copy-Item local.properties.example local.propertiesEdit local.properties:
sdk.dir=C\:\\Users\\YOUR_USER\\AppData\\Local\\Android\\Sdk
backend.baseUrl=http://YOUR_COMPUTER_LAN_IP:8001/A physical phone cannot reach the computer backend through localhost; use the computer's LAN IP and keep both devices on the same network. Open project/Android_App in Android Studio, build the app, then grant the requested runtime and advanced Android permissions from the Permissions screen.
# Android unit tests
cd project/Android_App
.\gradlew.bat :app:testDebugUnitTest
# Backend tests
cd ../Backend
pipenv run python -m unittest discover V3.testsCurrent verification status: Android unit tests pass, and the backend unittest suite passes with 120 tests. The backend suite covers rule parsing, validation, app-catalog matching, and intent-contract checks.
- Initial permission setup requires manual Android Settings interaction.
- UI automation quality depends on the accessibility metadata exposed by the active application.
- The Android client requires network access to the backend in the current architecture.
- Android speech-recognition session behavior can vary by device and installed recognition service.
- Continuous speech recognition: replace the session-based Android
SpeechRecognizerflow with anAudioRecord, voice activity detection, and continuous STT pipeline to reduce listening interruptions. - Background-media protection: reduce false command activation from videos and other media through wake-word detection, playback-state checks, and acoustic echo-cancellation techniques.
- Optional offline operation: evaluate on-device speech recognition and intent processing so that core commands can run without a backend connection where device resources permit.
- More robust UI interaction: supplement Accessibility metadata with OCR or lightweight computer-vision techniques when applications expose incomplete labels, descriptions, or resource identifiers.
- Broader Android compatibility testing: validate behavior across additional Android versions, device manufacturers, screen sizes, and vendor-specific Accessibility implementations.
- User and pronunciation personalization: support user-specific command aliases, pronunciation variants, and adaptive matching to improve recognition for names, applications, and frequently used commands.
- More resilient custom command flows: improve custom command execution to make multi-step workflows more flexible, scalable, and reliable across dynamic UI states.
Muhammed Chreiki
Software Engineering, Istanbul Topkapi University
GitHub





